To HEC with syslog – All grown up

A few years ago flying across the Atlantic, unable to sleep, I had an idea to integrate common syslog aggregation servers using Splunk’s new HTTP event Collector rather than file and the tired and true Universal Forwarder. This little idea implemented in python started solving real problems of throughput and latency while reducing the complexity of configuring syslog aggregation servers. I’m very pleased to say the python script I created is now obsolete. Leading syslog server products syslog-ng and rsyslog upstreams have implemented maintained modules. Even more exciting Mark Bonsack has invested considerable time to further develop modular configuration for both to make getting data though syslog and into Splunk even easier!

Native syslog –> http event collector modules

Modular Configuration Repositories

Quick Public Service Notice

While all linux distributions include a syslog server this should NOT be used as the production syslog aggregation solution. Linux distros are often many point releases or worse selectively back port patches based only on their own customer reported issues. Before attempting to build a syslog aggregation solution for production it is critical you source current upstream binaries or build your own.

Is your LDAP Slow? It might make your Splunk Slow

I’ve had this crop up enough times, I think its worth a short post. Most Splunk deployments use local and/or LDAP authentication. LDAP configuration is something of a black art and often the minimal configuration that works is the first and last time this is considered. It is worth your time as an administrator to optimize your LDAP configuration or better yet move to the more secure and reliable SAML standard.

Things to consider

  • Stop using LDAP and use SAML
  • LDAP authentication should never be enabled on indexers. If you have enabled LDAP authentication remove this. Indexers rarely require authentication, when required only applicable to admins and then under very strict conditions.
  • Ensure the Group BIND and Group Filters are both in use and limit the groups to only those required for Splunk access management
  • Ensure the User BIND and User Filters are appropriate to limit the potential users to only those users who may login to Splunk.
  • Validate the number of users returned by the LDAP query used is under 1000 or increase the number of precached users via limits.conf to an appropriate number.
  • Ensure DNSMASQ or an alternative DNS cache client is installed on Linux Search Heads and Indexers.

Stages to LDAP Auth

  • Interactive User Login (ad-hoc search) or scheduled report execution
  • Check User Cache if not cached or cache expired
  • DNS lookup to resolve the LDAP host to IP (This is the reason DNS Cache on linux is important)
  • TCP Connection to LDAP Server
  • Post query for specific user to LDAP
  • Wait for response
  • Process Referrals if applicable by repeating the above sequence

The time taken for each DNS query and LDAP query is added to the time taken too login to the Splunk UI, execute an ad-hoc search OR scheduled report. Its important to ensure the DNS and LDAP infrastructure is highly available and able to service potentially thousands of requests per second. Proper use of caching ensures resources on the Splunk Server including TCP client sessions are not exhausted causing users to wait in line for their turn at authentication or worst case time outs and authorization failures.