Dev Life: Splunk Add-ons like a developer

As a life long (seems that way) software developer come to Splunk I would like to have some of the properties of a Integrated Development Environment (IDE). This blog post walks you through setting up and experiencing my approach to development for Splunk I wrote a second post in this series creating an actual add-on for Splunk using this toolchain. https://www.rfaircloth.com/2019/01/07/building-a-cef-source-add-on-for-splunk-enterprise/

  • I can edit “code” i.e splunk conf in my editor and reload the code without restarting
  • Every time I build/debug I have a clean environment
  • I can run unit tests manually or automatically in a consistent way.
  • I can participate in VCS (i.e. git) if desired
  • I can consistently reproduce build and packaging including integration into a CI/CD process
  • I can leverage dependencies from other developed products.
  • Have ready access to common tools like add-on builder and eventgen

Setting up the environment Mac OSX

  • Install Brew
  • Install LibMagic “brew install libmagic”
  • Install python “brew install python”
  • Install pandoc “brew install pandoc”
  • Install moreutills “brew install moreutils”
  • Install jq “brew install jq”
  • Install lxml support “xcode-select –install”
  • Install git “brew install git”
  • Install git flow “brew install git-flow”
  • Install gitversion “brew install gitversion”
  • Install virtual env for python “sudo pip install virtualenv”
  • Install docker
  • Create the virtual env “virtualenv ~/venv/splservices”
  • Activate the new env “source ~/venv/splservices/bin/activate”
  • Install pip “sudo python easy_install pip”
  • Install our specific requirements “pip install -r https://bitbucket.org/SPLServices/addonbuildimage/raw/master/requirements.txt”
  • I personally prefer the atom editor

Setup the local project

For demonstration purposes we are going to work with one of my recent add-ons for Splunk. A full tutorial on git is beyond the scope of this article we will simply clone the repo and start a feature branch.

  • Clone the repo “git clone https://bitbucket.org/SPLServices/ta-cef-for-splunk.git”
  • Cd into the repo “cd ta-cef-for-splunk”
  • Initials git submodules “git submodule init”
  • Setup git flow “git flow init -d”
  • Start a new feature “git flow feature start myfeature”

Package and Test

Before we change anything we should verify we can recreate a successful build.

  • Build a package “make package”
  • Verify the package builds the last line will report something like this, path and version will vary.
slim package: [NOTE] Source package exported to "/Users/user/Downloads/ta-cef-for-splunk/out/packages/splunkbase/TA-cef-for-splunk-0.2.0-myfeature.1+17.tar.gz"
  • Test the package using Splunk’s appinspect “make package_test”
  • Verify the test report shows one failure. While developing this one failure is expected which is the version number does not conform to release rules for Splunk Base. Note: per semver.org the feature branch version clearly indicates this is a development build this is helpful in preventing accidental “escapes” to production
splunk-appinspect inspect out/packages/splunkbase/TA-cef-for-splunk-0.2.0-myfeature.1+17.tar.gz --data-format junitxml --output-file test-reports/TA-cef-for-splunk.xml --excluded-tags manual
Validating: TA-cef-for-splunk Version: 0.2.0-myfeature.1+17
.......F.....SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
SSSSSSSSSSSSSSSSSSSSSSSSSSSSS

A default value of 25 for max-messages will be used.
Splunk app packaging standards These checks validate that a Splunk app has been 
correctly packaged, and can be provided safely for package validation. 
    Check that the extracted Splunk App contains a default/app.conf file 
    that contains an [id] or [launcher] stanza with a version property that is 
    formatted as Major.Minor.Revision. 
        FAILURE: `Major.Minor.Revision` version numbering is required. 
            File: default/app.conf Line Number: 20 


TA-cef-for-splunk Report Summary:

       skipped: 176
       success:  9
  manual_check:  0
       failure:  1
       warning:  0
         error:  0
not_applicable:  3
-------------------
         Total: 189

Please note that more issues could be found out later during the optional manual review process.

Interactive Development

Now for the good stuff how can we interactively lets fire up a Splunk Docker container with the latest version of Splunk and our local copy of the addon. “make docker_dev” wait for the text “Ansible playbook complete” to appear on terminal indicating Splunk is ready to work. Visit “http://127.0.0.1:8000” and login to a fresh copy of Splunk with the addon ready to go. The password will be “Changed!11” lets prove life by making a simple change to our addon.

  • Open atom or the editor of your choice
  • Navigate to <project>/src/TA-cef-for-splunk/default/props.conf
  • Add the “EVAL-alive=”yes”” to the [cef] stanza
  • Return to the running copy of Splunk and visit http://127.0.0.1:8000/debug/refresh/ (click refresh)
  • Turn on the event gen “Settings –>Data Inputs –> SA Event-Gen then click enable
  • Wait about and minute and click disable
  • Go back to search and check for the alive field “index=* sourcetype=cef | head | table sourcetype,alive”

Further reading

Upgrading Splunk Add Ons

This topic comes up every now and then working with customers and partners deploying and upgrading add ons for Splunk does not have to be hard there are a few rules to live by. I’m going to use Splunk_TA_Windows 5.0.1 in this walk through. This upgrade has some specific guidance in addition to the usual steps. As can often be the case with software correcting issues can require additional work for compatibility.

Upgrading things to do first

Be proactive read the docs

In the release notes Splunk advises that sourcetypes will change. Two new source types “EventLog” and “XMLEventLog” will replace all previous event log specific source types. The source will indicate the specific log used which is consistent with most other source/sourcetypes use in Splunk. Sourcetype is a structure and source is an instance of the structure for a specific host. As instructed review custom searches and eventypes and update to utilize source rather than sourcetype.

Review the additional changes in the docs determine if any apply to your environment.

Review local changes

Identify any search time local changes made to the sourcetypes managed by the add-on in question. In most cases these will located in the $SPLUNK_HOME/etc/apps/Splunk_TA_windows/local folder however in some cases you may find them in $SPLUNK_HOME/etc/apps/Splunk_TA_windows/search. Review and compare to the latest version of the add on, confirm they can or should remain at upgrade time.

Identify any index time local changes made to the sourcetypes managed by the add on in most cases these can be found in the cluster master $SPLUNK_HOME/etc/master-apps/Splunk_TA_windows/local however in some organizations customizations are made in another custom app. If you have inherited this environment be sure to consider how others may have made customizations before you.

While this post is specific to Splunk_TA_windows most steps do apply to any add-on deployment.

Installing the upgrade

The first step when possible is to test in a non production environment, in many cases the only complete environment is Production care should be taken and changes should be made in off hours.

Search Heads (non clustered)

Repeat on each search head

  1. Backup the current app from by copying Splunk_TA_windows to a safe location
  2. Install the app using the CLI or app browser “install from file”
  3. Restart the search head
  4. Verify any custom dashboards or alert searches continue as expected

Search Heads (clustered)

Repeat on each search head cluster

  1. Backup the current app from by copying Splunk_TA_windows to a safe location on a search head cluster member
  2. (ES Only) For a ES Search head cluster only remove the Splunk_TA_Windows from shcluster/apps on the deployer and apply the new bundle to the cluster using the preserve-lookups option as documented in the Enterprise Security documentation
  3. Verify Splunk_TA_Windows is removed from the peers
  4. Expand Splunk_TA_Windows into shcluster/apps
  5. Apply the new bundle using preserve-lookups if ES

Indexers

  1. Expand Splunk_TA_Windows in a temporary location, remove the following files
    1. <app>/bin
    2. <app>/default/eventgen.conf
    3. <app>/default/inputs.conf
    4. <app>/default/wmi.conf
    5. <app>/default/indexes.conf
  2. Splunk has removed all index definitions from this add-on in accordance with best practices and app verification requirements. Review the indexes in use and ensure the indexes have been re-defined according to your environments requirements.
  3. Verify the organizations indexes.conf contains all required indexes
  4. Deploy the updated add-on via master-apps for clustered indexers (automatic rolling restart) or to apps on all non clustered indexers and restart.

Intermediate Heavy Forwarders

  1. Expand Splunk_TA_Windows in a temporary location, remove the following files
    1. <app>/bin
    2. <app>/default/eventgen.conf
    3. <app>/default/inputs.conf
    4. <app>/default/wmi.conf
    5. <app>/default/indexes.conf
  2. Deploy to apps on all instances and restart

Collecting Forwarders using the deployment server

  1. Review all deployment-apps/*/local/inputs.conf applied to windows systems as follows.
    1. Ensure index is specified on each utilized input
    2. Ensure disabled=false is specified on each utilized input
    3. If no inputs.conf is found “demo defaults” has been utilized up to this point. Copy Splunk_TA_windows/default/inputs to Splunk_TA_windows/local and review stanzas to determine which should remain enabled
    4. Backup deployment-apps/Splunk_TA_windows to a safe location and remove
    5. Expand Splunk_TA_windows to deployment-apps.
    6. Reload the deployment server
  2. Verify no “missing index” messages appear in the cluster if so identify the incorrectly configured input and redeploy
  3. Verify no new use of last change or main index if so identify the incorrectly configured input and redeploy.
  4. Repeat verification of searches and alerts as above.

 

Identifying obvious sourcetype problems in Splunk

This is a short one, on boarding data into any system is great making it identifiable and usable by the end users thats even more important. In Splunk source, sourcetype, and index are the most basic bits of metadata available to users and often they work with only these three because its just so easy. When our upstream sources don’t set these values correctly it can stress the environment because we are doing unnecessary  work like “line merging” and our users can’d find data. Using Splunk logs we can see where this may be happening and start to fix it. This search will identify suspect sourcetypes. Review the onboarding of each identified to make it better.

index=_internal source="*metrics.log" sourcetype=splunkd group=per_sourcetype_thruput
| eval sourcetype_error=if(match(series,"^[\$\%\#]"),"__Invalid_char",sourcetype_error)
| eval sourcetype_error=if(isnull(series) OR st="" ,"__Invalid_null",sourcetype_error)
| eval sourcetype_error=if(match(series,"^\/"),"__Invalid_usedpath",sourcetype_error)
| eval sourcetype_error=if(match(series,"^\d+\.\d+\.\d+\.\d+"),"__Invalid_used_IP",sourcetype_error)
| eval sourcetype_error=if(match(series,"\s"),"__Invalid_space",sourcetype_error)
| eval sourcetype_error=if(like(series,"%small"),"__Invalid_too_small",sourcetype_error)
| eval sourcetype_error=if(match(series,"\d+"),"__Invalid_numeric",sourcetype_error)
| eval sourcetype_error=if(match(series,"\-\d"),"__Invalid_learnednum",sourcetype_error)
| eval sourcetype_error=if(match(series,"\-error"),"__Invalid_learnederror",sourcetype_error)
| eval sourcetype_error=if(match(series,"\*"),"__Invalid_asterisk",sourcetype_error)
| eval sourcetype_error=if(match(series,"\.\w{1,4}$"),"__Invalid_filename",sourcetype_error)
| eval sourcetype_error=if(match(series,"[\.\-]log$"),"__Invalid_autousinglogfilename",sourcetype_error)
| eval sourcetype_error=if(match(series,"^![\w\_\-\:]+$"),"__Invalid_nonsourcetype_errorndardform",sourcetype_error)
| search sourcetype_error=*
| stats sum(kb) as kb avg(kbps) as kbps_avg avg(eps) as eps_avg sum(ev) as ev values(sourcetype_error) by series
| eval mb=round(kb/1024,2)
| fields - kb
| sort limit=0 -mb

Code as snippet https://bitbucket.org/snippets/rfaircloth-splunk/Benb45

Protecting ATMs from the two arm bandits

Jackpot ATM style

According to Krebs two arm bandits are about to hit the jack pot on American ATMS, also known as ABM machines out side of the US. Like most security issues its an arms race, did you know ATM machines have holes in the bottom so crooks can’t fill them with water and blow up the door without damaging the cash? Well they started out solid someone noticed that flaw and exploited it we learned and got better.

Just before Y2K and in the years after banking systems moved from proprietary operating systems and applications, custom interfaces and hardware to Windows based “open” systems with vendor agnostic drivers and tools allowing for innovation and cost reduction. This change swapped out custom controller cards for “USB” devices, Bisync serial for TCP over ethernet, wifi, 4G, PPP. The builders of these new networks didn’t have much experience in network security and left open many many doors. The physical design protects you card number and pin but left the cash open. To keep service costs low the PC components are in a section of the machine called the “hood” and can be serviced without opening the safe and exposing the cash. This is a great design from the perspective of PC service. It also ensure the safety of the repair tech as they can not access the bulk cash there is no reason to rob them at gun point. Great but we still have a problem. The USB and network interfaced are now protected by a 4-6 pin basic lock, all the keys in a region are the same because keeping track of keys are hard. Protecting from a breach from a physical attacker is something the design precludes so we could die on this hill but we can’t take it, what can we do?

You have Splunk! you also have a remote CCTV system (nvr) or physical alarm what if we pull this data together build a threat model and respond faster.

  • Monitor “motion” events from the NVR system
    • Identify cameras indicating motion front and back of the ATM
      • ATM ID
      • Front/Back
      • Duration of Motion
    • Motion in back of more than n seconds and motion in front of more than x seconds without y duration alert
  • Monitor the network switch/wifi
    • map switch/ap events where the port/connection disconnects to the ATM ID
  • Monitor the _internal source from the installed UF silence of more than n seconds
  • Use the UF to monitor for XFS events via ETW or windows events
    • Hood open
    • Dispenser disconnect
    • New Device
  • Install Splunk Stream to monitor TLS/HTTPS aggregate by certificate ID every 5 min. Map src to atm ID alert if the presented cert changes for the Authorization Server
  • Using XYGate monitor your Switch (base24/efunds) or SyncSort (Z/OS based custom) monitor for dispenser totals mismatch for the ATM ID

Summarize each of the alerts above using | collect normalizing based on ATM ID. Use Splunk built in alert function to notify ATM OPS and physical security on any occurrence of 3 or more in 15 min, tune for false positives.

Lets Encrypt and get an A for A Great Splunk TLS config

Setting up SSL/TLS on Splunk doesn’t have to be super hard or costly. While running Splunk in cloud providers has many benefits there are some hassles like provisioning certificates we can better manage using let’s encrypt. This method of installing browser trusted certificates can help to keep your administrative costs down in large Splunk deployments such as MssP services.

Expanding on prior work https://www.splunk.com/blog/2016/08/12/secure-splunk-web-in-five-minutes-using-lets-encrypt.html

NGINX

First we are going to install NGINX we will use this as a front end reverse proxy. Why, we can renew our certs with minimal own time in the future, OCSP stapling (improved page load times) and other things (future posts)

#centos

yum install nginx

#ubuntu

apt-get install nginx

Second setup a new vhost for the splunk reverse proxy. Any request to http will be redirected to https except for requests related to certificate management.

map $uri $redirect_https {

    /.well-known/                      0;

    default                            1;

}

server {

    listen       80;

    server_name  hf-scan.splunk.example.com;

    root /usr/share/nginx/html;

    if ($redirect_https = 1) {

       return 301 https://$server_name$request_uri;

    }

#    return       301 $scheme://hf-scan.splunk.example.com$request_uri;

}

server {

    

    listen 443 ssl http2;

    server_name hf-scan.splunk.example.com;

    root /usr/share/nginx/html;

    index index.html index.htm;

   location / {

        proxy_pass_request_headers on;

        proxy_set_header x-real-IP $remote_addr;

        proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for;

        proxy_set_header host $host;

        proxy_pass https://127.0.0.1:8000;

        add_header Strict-Transport-Security “max-age=31536000; includeSubDomains” always;

      }

    

    

    ssl_certificate     /etc/letsencrypt/live/hf-scan.splunk.example.com/fullchain.pem;

    ssl_certificate_key /etc/letsencrypt/live/hf-scan.splunk.example.com/privkey.pem;

    ssl_protocols       TLSv1.2;

    ssl_ciphers         HIGH:!aNULL:!MD5;

    ssl_dhparam /etc/nginx/ssl/dhparam.pem;

    ssl_session_cache shared:SSL:50m;

    ssl_session_timeout 1d;

    ssl_session_tickets off;

    ssl_prefer_server_ciphers on;

    ssl_stapling on;

    ssl_stapling_verify on;

    resolver 8.8.8.8 8.8.4.4 valid=300s;

    resolver_timeout 5s;

    add_header Strict-Transport-Security “max-age=31536000; includeSubDomains” always;

}

Setup a deploy hook script this will prepare the cert files as splunk needs them and will also be used on renewal. Save this script as /etc/letsencrypt/renewal-hooks/deploy/splunk.sh

#!/bin/bash
#deploy to /etc/letsencrypt/renewal-hooks/deploy/splunk.sh
#when requesting a cert add "--deploy-hook /etc/letsencrypt/renewal-hooks/deploy/splunk.sh" to the command
dir=/opt/splunk/etc/auth/ssl
if [[ ! -e $dir ]]; then
    mkdir -p $dir
elif [[ ! -d $dir ]]; then
    echo "$dir already exists but is not a directory" 1>&2
fi
openssl rsa -aes256 -in $RENEWED_LINEAGE/privkey.pem -out $dir/protected.pem -passout pass:password
if [[ ! -f $dir/protected.pem ]]; then
    exit 1
fi
cat $dir/protected.pem $RENEWED_LINEAGE/fullchain.pem > $dir/server.pem
cp $RENEWED_LINEAGE/fullchain.pem $dir/
cp $RENEWED_LINEAGE/privkey.pem $dir/
chown splunk:splunk $dir/*
systemctl restart splunk

Request the certificate note correct the webroot folder for your platform and the certificate with the fqdn of your server

certbot certonly –webroot -w /var/www/html –hsts -d hf-scan.splunk.example.com –noninteractive –agree-tos –email your@example.com –deploy-hook /etc/letsencrypt/renewal-hooks/deploy/splunk.sh

Setup Splunk

Update /opt/splunk/etc/system/local/web.conf

[settings]

enableSplunkWebSSL = true

#sendStrictTransportSecurityHeader = true

sslVersions = tls1.2

cipherSuite = TLSv1.2:!NULL-SHA256:!AES128-SHA256:!ADH-AES128-SHA256:!ADH-AES256-SHA256:!ADH-AES128-GCM-SHA256:!ADH-AES256-GCM-SHA384

privKeyPath =  /opt/splunk/etc/auth/ssl/privkey.pem

caCertPath = /opt/splunk/etc/auth/ssl/fullchain.pem

Update /opt/splunk/etc/system/local/server.conf

[general]

serverName = hf-scan.splunk.example.com

[sslConfig]

sslVersions = tls1.2

sslVersionsForClient = tls1.2

serverCert = $SPLUNK_HOME/etc/auth/ssl/server.pem

sslRootCAPath = $SPLUNK_HOME/etc/auth/ssl/fullchain.pem

dhFile = /opt/splunk/etc/auth/ssl/dhparam.pem

sendStrictTransportSecurityHeader = true

allowSslCompression = false

cipherSuite = TLSv1.2:!NULL-SHA256:!AES128-SHA256:!ADH-AES128-SHA256:!ADH-AES256-SHA256:!ADH-AES128-GCM-SHA256:!ADH-AES256-GCM-SHA384

useClientSSLCompression = false

useSplunkdClientSSLCompression = false

Test

  • Option 1 SSL labs, limited to port 443 (don’t forget about 8089)
  • Option 2 testssl.sh CLI based doesn’t share data no letter grade (management likes letters)
  • Option 3 High Tech Bridge https://www.htbridge.com/ssl allows testing multiple ports similar coverage to ssllabs less well known

Renew certs

Setup a cron job to run the following command at least once per week in your scheduled change window. If a certificate renewal is required splunk will be restarted

certbot renew –webroot  -w /usr/share/nginx/html

Can we even patch this Spectre/Meltdown oh and AV also

Isn’t it great when things are in meltdown and you can’t patch yet because your waiting on another patch?

Microsoft has stated you can’t patch until AV goes first

http://www.zdnet.com/article/windows-meltdown-spectre-fix-how-to-check-if-your-av-is-blocking-microsoft-patch/

https://support.microsoft.com/en-us/help/4072699/january-3-2018-windows-security-updates-and-antivirus-software

Bottom line if your AV vendor hasn’t update to set this registry to give the update permissions to install or you don’t use AV and instead use an application whitelist approach for security the patch won’t apply. You can use splunk to track down hosts that will refuse to apply the patch by adding this monitor to splunk and well Splunking the results

Key="HKEY_LOCAL_MACHINE" Subkey="SOFTWARE\Microsoft\Windows\CurrentVersion\QualityCompat" Value="cadca5fe-87d3-4b96-b7fb-a231484277cc" Type="REG_DWORD”
Data="0x00000000

Add the following to the inputs.conf applied to all windows system and ensure the server class is set to restart the UF and happy Splunking

 

[WinRegMon://HKLMSoftwareMSWindowsQualityCompat]
index = epintel
baseline = 1
disabled = 0
hive = \\REGISTRY\\MACHINE\\Software\\Microsoft\\Windows\\CurrentVersion\\QualityCompat\\.*
proc = .*
type = delete|create|set|rename

Tuning Splunk when max concurrent searches are reached

Your searches are queued but you have cores, memory and IO to spare? Tuning your limits can allow Splunk to utilize “more” of your hardware when scaled up instances are in use.

Warnings

This approach is NOT  useful when searches run LONG. If regular searches such as datamodel acceleration, summary and reporting searches are not completing inside of the expected/required time constraints this information could make the symptoms worse.

This approach is useful when searches consistently execute faster than the required times for datamodel acceleration, summary and reporting and additional searches are queued while the utilization of cpu, memory, storage IOPS, storage bandwidth are well below the validated capacity of the infrastructure.

 

Details

First in all certain versions of Splunk apply the following setting to disable a feature that can slow search initialization.

$SPLUNK_HOME/etc/local/limits.conf

$SPLUNK_HOME/etc/master-apps/_cluster/local/limits.conf

[search]
#Splunk version >=6.5.0 <6.5.6
#Splunk version >=6.6.0 <6.6.3
#Not required >7.0.0
#SPL-136845 Review future release notes to determine if this can be reverted to auto
max_searches_per_process = 1

On the search head only where DMA is utilized (ES) update the following

$SPLUNK_HOME/etc/local/limits.conf

#this is useful when you have ad-hoc to spare but are skipping searches (ES I'm looking at you) or other 
# home grown or similar things
[scheduler]
max_searches_perc = 75
auto_summary_perc = 100

Evaluate the load percentage on the search heads and indexers including memory, cpu utilized and memory utilized.  We can increase the value of base_max_searches in increments of 10 to allow more concurrent searches per SH until one of the following occurs

  • CPU or memory utilization is 60% on IDX or SH
  • IOPS or storage throughput hits  ceiling and no longer increases  decrease the system is fully utilized to prevent failure due to unexpected load decrement the base_max_searches value by 10 and confirm IOPS is no longer constant.
  • Skipping /queuing no longer occurs (increase by 1-3 additional units from this point to provide some “head room”
#limits.conf set SH only
[search]
#base value is 6 increase by 10 until utilization on IDX or SH is at 60% CPU/memory starting with 20
#base_max_searches = TBD

Outage due to DDOS

The sites been down for a few days, BlueHost has been suffering from a DDOS on at least one of the sites they host. My site shared infrastructure. for $3.95 a month I don’t expect too much but having some ability to move sites to new hosts would be nice.  Anyways, I’m up on Azure now until I decide if I want to be my own webmaster or revert to paying someone else to pretend to worry about things like that.  On the plus side of things, the outage forced me to update the site infrastructure. Now using certificates from Let’s Encrypt.  If you have CLI access to your apache hosted site, super easy and free to enable good encryption.

sudo certbot –apache -d www.rfaircloth.com -d rfaircloth.com -d rfaircloth.westus.cloudapp.azure.com –must-staple –redirect   –hsts   –uir –rsa 4096

What’s in a URL now you can Splunk that

Hunting we find URLs in logs both email and proxy that are interesting all the time. What will that URL return, if it redirects where is it going and what kind of content questions you might be asking. If you are not asking them now is the time to start. I’ve released a new add on to Splunk Base, a little adaptive response action that can be used with just Splunk Enterprise OR Splunk Enterprise Security to collect and index information about those URLs.

https://splunkbase.splunk.com/app/3630/