Finding signal in the noise of DNS data using Splunk

DNS is a fundamental component of our computing infrastructure before we identify bad actions easily we should remove what we can easily identify to be good. For all of our queries we will rely on common information model fields and extractions. For most customers I will assist them in deploying the Splunk App for Stream to collect query information from their DNS servers in a reliable way regardless of the logging capabilities of their chosen server product.

Note: Be sure to install Cedric’s URLToolbox Add on we will make use of its power here.

Lets start by looking at the data everyone is spending the most time talking about queries for A (ipv4) and AAA (ipv6). Lets search for no more than the last 60 min while we are working to be kind to our indexers. For real analysis you will use bigger windows.

tag=dns tag=resolution tag=dns index=* NOT source=”stream:Splunk_*” (query_type=A OR query_type=AAAA)

My sample environment is small very small 5 users, 10 windows servers. In the last 24 hours this query gave me 24,000+ results way more than I can examine lets start to cut that down. We also need to remember what we will probably be learning from our data that is which domains require investigation for suspicion of involvement in malicious activity.

Reduction #1 Lets remove all domains owned by our organization for email or web hosting.

  • Update the following files to include the domains used for email or web hosting.
    1. Splunk_SA_CIM/lookups/cim_corporate_email_domains.csv
    2. Splunk_SA_CIM/lookups/cim_corporate_web_domains.csv
  • Update our search to extract the domain and tld for latter use. this is more complicated than it looks so we will make up a uri and let UTToolbox do the work for us
  •  The new base search will look like this
    tag=dns tag=resolution tag=dns NOT source=”stream:Splunk_*” index=* (query_type=A OR query_type=AAAA)
    | eval uri=”dnsquery://”+query
    | `ut_parse(uri)` | fields – ut_fragment ut_netloc ut_params ut_path ut_port ut_query ut_scheme
  • Now we can use our email and web domain lookups to reduce the data set we are working with. This took about about 13% of my results. Notice I use fields – to get rid of stuff I don’t need moved from my indexer back to my search heads.
  • tag=dns tag=resolution tag=dns NOT source=”stream:Splunk_*” index=* (query_type=A OR query_type=AAAA)
    | eval uri=”dnsquery://”+query
    | `ut_parse(uri)`
    | fields – ut_fragment ut_netloc ut_params ut_path ut_port ut_query ut_scheme
    | lookup cim_corporate_email_domain_lookup domain as ut_domain OUTPUT domain as cim_email_domain
    | lookup cim_corporate_web_domain_lookup domain as ut_domain OUTPUT domain as cim_web_domain
    | where isnull(cim_email_domain) AND isnull(cim_web_domain)
    | fields -cim_email_domain cim_web_domain)
  • The next easy win is to remove all queries for one of our assets we do that by kicking out all queries for one of our assets by dns name or where the resulting IP is one of our assets
  • tag=dns tag=resolution tag=dns NOT source=”stream:Splunk_*” index=* (query_type=A OR query_type=AAAA)
    | eval uri=”dnsquery://”+query
    | `ut_parse(uri)`
    | fields – ut_fragment ut_netloc ut_params ut_path ut_port ut_query ut_scheme
    | lookup cim_corporate_email_domain_lookup domain as ut_domain OUTPUT domain as cim_email_domain
    | lookup cim_corporate_web_domain_lookup domain as ut_domain OUTPUT domain as cim_web_domain
    | where isnull(cim_email_domain) AND isnull(cim_web_domain)
    | fields – cim_email_domain cim_web_domain
    | lookup asset_lookup_by_str dns as query OUTPUTNEW asset_id as query_asset_id
    | lookup asset_lookup_by_cidr ip as host_addr OUTPUTNEW asset_id as host_addr_asset_id
    | where isnull(query_asset_id) AND isnull(host_addr_asset_id)
    | fields – query_asset_id host_addr_asset_id
  • Next up is to remove all queries for Alexa Top 1 M domains why? well in the Top 1M we will probably not find any new domains, or any domains being used for C2 using a DNS channel. Thats not to say XML file on drop box or feedburner can’t be used but we won’t find that threat here. This further reduced by data set by 92%
  • tag=dns tag=resolution tag=dns NOT source=”stream:Splunk_*” index=* (query_type=A OR query_type=AAAA)
    | eval uri=”dnsquery://”+query
    | `ut_parse(uri)`
    | fields – ut_fragment ut_netloc ut_params ut_path ut_port ut_query ut_scheme
    | lookup cim_corporate_email_domain_lookup domain as ut_domain OUTPUT domain as cim_email_domain
    | lookup cim_corporate_web_domain_lookup domain as ut_domain OUTPUT domain as cim_web_domain
    | where isnull(cim_email_domain) AND isnull(cim_web_domain)
    | fields – cim_email_domain cim_web_domain
    | lookup asset_lookup_by_str dns as query OUTPUTNEW asset_id as query_asset_id
    | lookup asset_lookup_by_cidr ip as host_addr OUTPUTNEW asset_id as host_addr_asset_id
    | where isnull(query_asset_id) AND isnull(host_addr_asset_id)
    | fields – query_asset_id host_addr_asset_id
    | lookup alexa_lookup_by_str domain as ut_domain OUTPUTNEW rank as alexa_rank
    | where isnull(alexa_rank)
  • Down from 24K to under 1700 but that’s still alot, at this point I noticed a couple of things. I have queries for .local domains I can’t explain but I know are not malicious, bare host names (no period)  and I have a couple of devices servicing DNS from guest wifi identify those points and update the search to remove them. This leaves me with 216 domains to investigate. But we can tune this even further lets keep going.
  • CDN networks can host malicious content however dns analysis is again not the way to find such threats. This takes me down to 173 domains
    • Create a new lookup Splunk_SA_cim/lookups/custom_cim_cdn_domains.csv you may find new domains and need to update this list over time
    • Upload this file custom_cim_cdn_domains
    • add a new lookup via Splunk_SA_cim/local/transforms.conf [custom_cim_cdn_domain_lookup]
      filename    = custom_cim_cdn_domains.csv
      match_type  = WILDCARD(domain)
      max_matches = 1
    • Update with a new search to exclude known CDN domains
    • tag=dns tag=resolution tag=dns NOT source=”stream:Splunk_*” index=* (query_type=A OR query_type=AAAA)
      query=”*.*” NOT query=”*.local”
      | eval uri=”dnsquery://”+query
      | `ut_parse(uri)`
      | fields – ut_fragment ut_netloc ut_params ut_path ut_port ut_query ut_scheme
      | lookup cim_corporate_email_domain_lookup domain as ut_domain OUTPUT domain as cim_email_domain
      | lookup cim_corporate_web_domain_lookup domain as ut_domain OUTPUT domain as cim_web_domain
      | where isnull(cim_email_domain) AND isnull(cim_web_domain)
      | fields – cim_email_domain cim_web_domain
      | lookup asset_lookup_by_str dns as query OUTPUTNEW asset_id as query_asset_id
      | lookup asset_lookup_by_cidr ip as host_addr OUTPUTNEW asset_id as host_addr_asset_id
      | where isnull(query_asset_id) AND isnull(host_addr_asset_id)
      | fields – query_asset_id host_addr_asset_id
      | lookup alexa_lookup_by_str domain as ut_domain OUTPUTNEW rank as alexa_rank
      | where isnull(alexa_rank)
      | lookup custom_cim_cdn_domain_lookup domain as query OUTPUTNEW is_cdn |
      where isnull(is_cdn)

       

  • Optional Step if you have domain tools integration enabled (whois) the following lines added to your search will show when the domain was first seen by you and when it was registered.
  • | rename ut_domain as domain
    | `get_whois`
    | eval “Age (days)”=ceil((now()-newly_seen)/86400)

  • Many people of written on what to do with this data now, go hunting!

 

Get started with Splunk App Stream 6.4 for DNS Analysis

Passive DNS analysis is all the rage right now, the detection opportunities presented have been well discussed for some time. If your organization is like most now is the time you are being asked how you can implement these detection strategies. Leveraging your existing Splunk investment you can get started very quickly with less change to your organization than one might think. Here is what we will use older versions will work fine however the screen shots will be a bit off:

  •  Splunk Enterprise 6.3.1
  • Splunk App for Stream 6.4

We will assume Splunk Enterprise 6.3.1has already been installed.

Decide where to install your Stream App. Typically this will be the Enterprise Security search head. However if your ES search head is also a search head cluster you will need to use an AD-HOC search head,  dedicated search head or a deployment server.

Note: If using the deployment server (DS) you must configure the server to search the indexer or index cluster containing your stream data.

  1. Install Splunk App for Stream using the standard procedures located here.
  2. Copy the deployment TA to your deployment server if you installed on a search head. /opt/splunk/etc/deployment-apps/Splunk_TA_stream
  3. On your deployment server create a new folder to contain configuration for your stream dns server group.
    • mkdir -p Splunk_TA_stream_infra_dns/local
  4. Copy the inputs.conf from the default TA to the new TA for group management
    • cp Splunk_TA_stream/local/inputs.conf Splunk_TA_stream_infra_dns/local/
  5. Update the inputs.conf to include your forwarder group id
    • vi Splunk_TA_stream_infra_dns/local/inputs.conf
    • Alter “stream_forwarder_id =” to “stream_forwarder_id =infra_dns”
  6. Create a new server class “infra_stream_dns” include both the following apps and deploy to all DNS servers (Windows DNS or BIND)
    • Splunk_TA_stream
    • Splunk_TA_stream_infra_dns
  7. Reload your deployment server

Excellent at this point the Splunk Stream app will be deployed to all of your DNS servers and sit idle. The next few steps will prepare the environment to start collections

  • Create a new index I typically will create stream_dns and setup retention for 30 days.

Configure your deployment group

  1. Login to the search head with the Splunk App for Stream
  2. Navigate to Splunk App for Stream
  3. If this is your first time you may find you need to complete the welcome wizard .
  4. Click on Configure the “Distributed Forwarder Management”
    • stream_configure_dfm
  5. Click Create New Group as follows then click Next
    1. Name Infra_DNS
    2. Description Applied to All DNS servers
    3. Include Ephemeral Streams? No
  6. Enter “infra_dns” as this will ensure all clients deployed above will pickup this configuration from the Stream App
  7. Search for “Splunk_DNS” and select each match then Click Finish
    • stream_dns_aggs
  8. Click on Configuration then “Configure Streams”
    • stream_configure
  1. Click on New Stream
  2. Setup basic info as follows then click Next
    1. Protocol DNS
    2. Name “Infra_DNS”
    3. Description “Capture DNS on internal DNS servers”
    4. stream_configure_dns
  3. We will no use Aggregation so leave this as “No” and click Next
  4. The default fields will meet our needs so go ahead and click Next
  5. Optional Step: Create filters in most cases requests from the DNS server to the outside are not interesting as they are generated based on client requests that cannot be answer from the cache. Creating filters will reduce the total volume of data by approximately 50%
    1. Click create filter
    2. Select src_ip as the field
    3. Select “Not Regular Expression” as the type
    4. Provide a regex capture that will match  all DNS server IPs example “(172\.16\.0\.(19|20|21))” will match in my lab network.
      • stream_filter
    5. Click next
    6. Select only the Infra_DNS group and click Create Stream

At this point stream will deploy and begin collection however index selection is not permitted in this workflow so we need to go back and set it up now.

  1. Find Infra_DNS and click edit
  2. Select the index appropriate for your environment
  3. Click save

Ready to check your work? Run this search replace index=* with your index

index=* sourcetype=stream:dns | stats count by query | sort – count

 

Getting all the logs – Avoiding the WEC

I get asked about this one often, I happen to have a bit of experience with this which is very rare. There is scant documentation on the technology from Microsoft or anyone else. I do know of some success being had with very specific low volume use cases but that’s not what I do. I’m a specialist of sorts I walk of a Delta plane, drop my bag at a Marriott then walk into change someones world with data. Actual facts about their environment from their environment and I need and use data my customers don’t know they had. Which brings me to Windows Event Collection (WEC).

Customer ask me about it its seems so easy lets talk about the parts

  • Group policy use to make changes to all systems in an environment.
  • Remote Power Shell
  • COM/DCOM/Com+ and all of the RPC that goes with it
  • Kerberos authentication

How does it work?

  1. Group policy instructs the computer to connect to a collector and gather a policy
  2. Policy read causes a Com+ server to read the event log (yes this is code you have not been running it can and will impact your endpoints)
  3. Local filter determines what do do with this event (xml parsing with XPATH and XSLT)
  4. RPC call using computer account to Collector
  5. Denial (Auth required)
  6. Authentication (event log write on DC and on Collector)
  7. Serial write with sync and block to round robin data base on the server. So if 300 events come in these have to get in queue to go to disk.
  8. Close connection
  9. Poll period go back to 3

Lots of steps? Lets ask about failure modes

  • What happens if my collector is down
    • Answer client goes to sleep and retries hope your logs don’t wrap
  • What happens if my collector won’t get back up
    • Answer build a new one, open a change record, wait for approval, explain to audit why you don’t have logs
  • What happens to the format of the logs?
    • Answer Good question I can’t explain what MS is doing to these logs if you know please share
  • What about log rotation and archival
    • Answer not possible you need another tool to read back and store them some place (splunk)
  • My collector isn’t keeping up what do I do now?
    • Answer Well hopefully the org structure of your Domain will support creating an assignment policy at the OU level, you might be able to use the same policy/collector pair at multiple OU points but you might also need to break up the OUs to manage the policy.
  • Cross domain?
    • Answer 1 or more collectors per domain.
  • Wait I only want events XX and ZZYY from certain servers for compliance.
    • Answer you get another collection policy
  • I can’t make this work on server2134
    • Answer call Support at MS, explain what event collection is, hopefully convince that person it is supported
  • My sensitive “application/service log” doesn’t use the event log
    • Answer logfile this is windows who would do that?

Lets compare to universal forwarders with Splunk

  • What happens if my “indexer” is down
    • Answer Client connect to another indexer, in a production system the indexer itself is replicated and you retain access to all data.
  • What happens if my collector won’t get back up
    • Answer. Data is replicated still available
  • What happens to the format of the logs?
    • Answer We capture the original text of all logs
  • What about log rotation and archival
    • Answer Built in
  • My collector isn’t keeping up what do I do now?
    • Answer Horizontal scaling Splunk will help you plan for this with experience and performance data from real world implementations
  • Cross domain?
    • Certainly, WAN no issue, Cloud not a problem. VPN sure why not
  • Wait I only want events XX and ZZYY from certain servers for compliance.
    • Deployment server will push a configuration based on the server names you select
  • I can’t make this work on server2134
    • Answer call Support (paid) at Splunk, we have real people with real knowledge  and a great community who has probably solved that problem before.
  • My sensitive system doesn’t use the event log file it
    • Answer probably not a problem, files, database, network capture can be a data source we do this all the time.