MaxMind Databases and Splunk Enterprise

I’ve finally been able to take a couple of days and update and refresh my MaxMind Add-on for Splunk Enterprise and Enterprise Cloud. The latest version of the add-on updates the GeoIP2 library allowing for additional fields from the licensed anonymous IP database. It also built and tested using the new Addonfactory CI/CD infrastructure at Splunk. (See my conf talk). This is a major version as it introduces a requirement for python3 and thus Splunk Enterprise 8.0> because GeoIP2 is now python3 only. Older versions should still work for now if you can not upgrade. Head over to Splunkbase to get it now

Your cloud vendor wants to send syslog cloud to cloud

I get asked about this from time to time whats wrong with sending syslog over the internet its a standard right?

IETF Syslog meaning RFC5424 over TLS (RFC5425) seems like a good idea until you think of the consequences and just what those consequences might be?

How do you plan to authenticate that.

Certificates well maybe this opens your SIEM up to a nasty low cost denial of service problem. Client cert auth is trivial to use as DOS with any invalid cert and expensive validation options. If this was happening how would you know neither syslog nor rsyslog will log this in an obvious way.

Secret SDATA? now we allow any client to auth and send data we must accept and parse the data to find out if its allowed sure that can’t be abused

IP Restrictions I have some beach front property for you.

All of the above

How will you scale that? please see prior posts on load balancing syslog

Next time you hear the suggestion of RFC 5424 syslog just laugh at the joke and ask what options are really being proposed.

When I say syslog what I really mean is

Syslog is a ambiguous term so I thought I would clarify what I am talking about

syslog is a daemon where Linux/UNIX sent logs back in the day. This in most cases results in an entry in a file in /var/log that may or may not have any particular structure this is normally not what I am talking about

Syslog was not a standard in the beginning. RFC 3164 is not a standards document its a memorialization of some common practices. Do you want a 1988 Honda Civic if you vendor’s Syslog looks like this you should look at it like a used car.

<111> July 01 12:13:11 My old car's logs

Syslog is not just text over tcp/udp. A syslog message must have the PRI such as <111> it must have a structure something like this:

<34>1 2003-10-11T22:14:15.003Z mymachine myapplication 1234 ID47 [example@0 class="high"] BOMmyapplication is started

Syslog is now a set of standards

  • RFC 5424 is the transport neutral message format
  • RFC 5425 describes how to use TLS as the transport (best practice) if network security matters worst practice when performance matters
  • RFC 5426 describes how to use UDP as the transport best practice for performance
  • RFC 6587 describes how to use TCP as the transport worst practice for performance best practice for large messages over unreliable networks

A message should not be considered “standard Syslog” if it is not in the RFC5424 protocol using RFC 5425 5426 or 6587 as the transport. Standards compliance matters lets start making vendors feel bad they have had 12 years to get it right.

Devices that think you know their name

What exactly is that talkers name is one of the most frustrating problems in syslog eventing and the most frustrating in analytics. For far too long the choices have been to use the devices name OR use reverse DNS but never both. Today SC4S 1.20.0 solves this problem by doing what you would do!

  1. If the device has a host name in the event use that
  2. Else if our management/cmdb solution knows the right name use that instead
  3. Else maybe someone updated DNS try that instead.

Simple logical easy to understand and available now in Splunk Connect for Syslog. No more of this

Event with IP as a host

Plenty more like this

IP translated to host using CMDB sourced lookup

Performant AND Reliable Syslog UDP is best

The faces I’ve seen made to this statement say a lot. I hope you read past the statement for my reasons and when other requirements may prompt another choice.

Wait you say TCP uses ACKS so data won’t be lost, yes that’s true but there are buts

  • But when the TCP session is closed events published while the system is creating a new session will be lost. (Closed Window Case)
  • But when the remote side is busy and can not ack fast enough events are lost due to local buffer full
  • But when a single ack is lost by the network and the client closes the connection. (local and remote buffer lost)
  • But when the remote server restarts for any reason (local buffer lost)
  • But when the remote server restarts without closing the connection (local buffer plus timeout time lost)
  • But when the client side restarts without closing the connection

That’s a lot of buts and its why TCP is not my first choice when my requirement is for mostly available syslog (no such thing as HA) with minimized data loss.

Wait you say when should I use TCP syslog. To be honest there is only one case. When the syslog event is larger than the maximum size of the UDP packet on your network typically limited to Web Proxy, DLP and IDs type sources. That is messages that are very large but not very fast compared to firewalls for example. So we jump to TCP when the network can’t handle the length of our events

There is a third option TLS a subset of devices can forward logs using TLS over TCP this provides some advantages with proper implementation.

  • TLS can continue a session over a broken TCP reducing buffer loss conditions
  • TLS will fill packets for more efficient use of wire
  • TLS will compress in most cases

While I am here I want to say a word about Load Balancers as a means of high availability. This is snake oil.

  • TCP over an NLB double the opportunity for network error to cause data loss and almost always increases the size of the buffer lost I have seen over 25% loss on multiple occasions
  • TCP over NLB can lead to imbalanced resource use due to long-lived sessions. The NLB is not designed to balance for connection KbS its design to balance connections in TCP all connections are not equal leading to out of disk space conditions
  • UDP can not be probed UDP over NLB can lead to sending logs to long-dead servers.
  • Load Balancers break message reassembly common examples of 1 of 3 type messages like Cisco ACS, Cisco ISE, Symantec Mail Gateway can not be properly processed when sprayed across multiple servers.

Wait you ask how do I mitigate down time for Syslog?

  • Use VM Ware or hyper-v with a cluster of hosts which will reduce your outage to only host reboots which in this day and time is rare
  • Use a Clustered IP solution (i.e. Keepalived) so you can drain the server to a partner before restart.

A few other idea’s you may have to bring “HA” to syslog that will be counter productive

  • DNS –
    • Most known Syslog sources will only use 1 typically the first or one random IP from a list of A records for a very long period of time ignoring the TTL. Using DNS to change the target is likely to not work in a short enough period of time in some cases hours
    • DNS Global Load Balancer similar to the above clients often holds cached results for far longer than TTL. In addition, the actual device configuration does not use the correct DNS servers for GLB to properly detect distance and will route incorrectly
  • AnyCast
    • UDP anycast can work in exceptional condition the scale of a single clustered pair of Syslog servers can not provide capacity. (Greater than 10 TB per day) However, because of the polling issues described with NLBs above my experience with AnyCast has been high data loss and project failure. Over a dozen projects with well-known logos over the last 10 years names you would know.
    • TLS/TCP anycast, this is an oxymoron don’t try it
  • Sending the message multiple times to multiple servers to so it can be “de-duplicated” by “someone’s software” Deduplication requires global unique keys this doesn’t exist so this isn’t possible. More than once is worse than sometimes never because if we are counting errors or attacks we see more than is real resulting in false positives and causing lack of operational trust in the data making your project effectively useless. A missed event will more likely than not occur again and be captured in short order.

A syslog time zone is a terrible thing to get wrong

Splunk release 1.2.0 of Splunk Connect for syslog today. This release focused on timezone management. We all wish time was standardized on UTC many of us have managed to get that written into approved standards but did not live to see the implementation of it. SC4S 1.2.0 enables the syslog-ng feature “guess-timezone” allowing the dynamic resolution of time zone of those poorly behaving devices relative to UTC. As a fall back or to deal with devices that batch/or stream with high latency device TZ can be managed at the host/ip/subnet level. Ready to upgrade? If you are running the container version just restart SC4S this feature is auto-magic.

Want to know more about SC4S Checkout these blog posts.

Syslog server you say

I’ve had quite a bit to say about syslog as a component of a streaming data architecture primarily feeding Splunk Enterprise (or Enterprise Cloud). In seven days I will be presenting the culmination of small developments that have taken shape into the brand new Splunk Connect for Syslog (SC4S).

You don’t have to wait swing over via Splunk Base

SC4S is designed to:

Do the heavy lifting of deploying a functioning current build of the awesome syslog-ng OSE (3.24.1 as of this posting).

Support many popular syslog vendor products OOB with zero configuration or as little configuration as a host glob or IP address

Scale your Splunk vertically by very evenly distributing events across indexers by the second

Scale your syslog-ng servers by reducing constrains on CPU and disk

Reduce your exposure to data loss by minimizing the amount of data at rest on the syslog-ng instance

Promote great practices and collaboration. SC4S is a liberally licensed open source solution. We will be able to collaborate directly with the end users on filters and usage to promote great big data deployments.

Personal thanks to many but especially Mark Bonsack and Balazs Scheidler (syslog-ng creator)

Bias in ML

One day perhaps we can teach machines to avoid bias but maybe just maybe we need to understand how to teach humans the same first.

It shouldn’t be a news flash that bias people “train” bias into computers just like we train bias into our children. We will one day realize we have no other choice but hard continuous work to eliminate bias.

Phishing from someone else’s container ship.

This is a theoretical attack abusing a compromised kubectl certificate pair and exposed K8s api to deploy a phishing site transparently on your targets infrastructure. This is a difficult attack to pull off and required existing compromised administrative access to the k8s cluster. A privileged insider, or compromised cert based authentication credential can be used.

  • Target which is one of my test domains.
  • Desired outcome detect an attempt to intercept admin login for a wordpress site we will utilize a fake email alert informing the administrator a critical update must be applied.
  • We will deploy the site hidden behind the targets existing ingress controller, this allows us utilize the customers own domain and certificates eliminating detection by domain name (typo squatting etc) and certificate transparency reporting monitoring.

Phase one: Recon

Using kubectl identify name spaces and find the ingress controller used for the site you intended to compromise. For the purposes of my poc my target used a very obvious “wordpress” namespace.

kubectl -n wordpress get ing


site 8020h

Phase two: deploy gophish

I’m not going to go into details on deploying gophish and setting up or sending the phishing emails. Thats beyond the scope of the blog post, I’m here to help the blue team so lets get on to detection.

The following manifest hides the gophish instance on a path under the main site url. Of note in this case /wplogin.cgi” is the real site while /wplogin is where we are credential harvesting.

apiVersion: extensions/v1beta1
kind: Ingress
name: “site”
namespace: wordpress
annotations: alb internet-facing Environment=dev,Team=test arn:aws:acm:us-east-1:174701313045:certificate/15d484c8-ca0c-4194-a4ef-f38a43b7b977 ‘[{“HTTP”: 80}, {“HTTPS”:443}]’ ‘{“Type”: “redirect”, “RedirectConfig”: { “Protocol”: “HTTPS”, “Port”: “443”, “StatusCode”: “HTTP_301”}}’
– host:
– path: /wplogin
serviceName: gophish
servicePort: 80
– path: /
serviceName: wordpress
servicePort: 80

Phase three: Detecting what we did.

Using the K8S events and meta data onboard using Splunk Connect for K8S we have some solid places we can monitor for abuse. Side note don’t get hung up on “gophish” this is a hammer type tool your opponent may be much more subtle

  • Modification to an ingress in my case AWS ALB, ingress records for most will not change often when they are changed an associated approved deployment should also exist.

“” sourcetype=”kube:container:alb-ingress-controller” modifying

  • New Image, New Image Repository, New Registry, maintain a list of approved images and registries alert when an image is used not on the pre-defined list, this . may be noisy in dev clusters for no prod clusters reporting may be better than alerting.

index=em_meta “spec.containers{}.image”=”*”


“Safely” Exposing Splunk S2S to the internet

Splunk has a great token based auth solution for its S2S protocol it was added several versions back. Inputs have both just worked and remained unchanged for so long many administrators have not noticed the feature. This allows you to safely expose indexers or heavy forwarders so that UFs on the internet can forward data back in without VPN. This is super critical for Splunk endpoints that don’t require a connection to the corporate network via VPN constantly

When a Splunk input is exposed to the internet there is a risk of resource exhaustion dos. A simple type of attack where junk data or “well formed but misleading” data is feed to Splunk until all CPU/memory/disk is consumed.

Once this feature is enabled all UF/HF clients must supply a token to be allowed to connect if your adding this feature to a running Splunk deployment be ready to push the outputs.conf update and inputs.conf updates in close succession to prevent a data flow break.

Update your inputs.conf as follows, note you can use multiple tokens just like HEC so you can mitigate the number of tokens that need to be replaced if a stolen token is used in a DOS attack

# Access control settings.
[splunktcptoken://<token name>]
* Use this stanza to specify forwarders from which to accept data.
* You must configure a token on the receiver, then configure the same
  token on forwarders.
* The receiver discards data from forwarders that do not have the
  token configured.
* This setting is enabled for all receiving ports.
* This setting is optional.

token = <string>
* token should match regex [A-Za-Z0-9\-]+ and with a min length of 12

Update outputs.conf  use the token value of choice from inputs.conf

token = <string>
* The access token for receiving data.
* If you configured an access token for receiving data from a forwarder,
  Splunk software populates that token here.
* If you configured a receiver with an access token and that token is not
  specified here, the receiver rejects all data sent to it.
* This setting is optional.
* No default.