No Regrets Using Autoregress

This blog post is part 26 of the “Hunting with Splunk: The Basics” series, which takes a single Splunk search command or hunting concept and breaks it down to its basic parts.

If you’re like me, you’ve occasionally found yourself staring at the Splunk search bar trying to decide how best to analyze a series of data, iterating against one or more fields.

If your brain gravitates towards traditional programming syntax, the first thing that pops into your mind may be application of a for or while loop (neither of which follow Turing convention in SPL). With commands like stats, streamstats, eventstats, or foreach at your disposal, which one should a hunter use?

Well, it depends on the data and the required outcome. For example, let’s say we want to calculate the total distance travelled by a salesperson or an escaped toad. The data may contain waypoint information that requires iterative calculation, such as latitude and longitude (or, in some cases, this enrichment may be extracted from the source data, such as with the iplocation command).

Enter autoregress. Sounds fancy. But here’s the thing, the autoregression command is used to calculate a moving average. Here is a link to the Splunk docs description of the autoregress command. Go ahead and check it out, we’ll wait.

Finished? Awesome. Let’s talk about practical applications.

Because the autoregress command is a centralized streaming command, it applies a transformation to each event returned by a search and only works on the search head.

You might be saying to yourself, “Self, I’ve never heard of this command before.” Well, you’re not alone. It’s not new, but not particularly well known. Kyle Smith of Aplura, LLC, included autoregress in his .conf2016 talk, “Lesser Known Search Commands”. Unlike iterative commands, such as map or foreach, the autoregress command is a statistical command (in the same family as the widely used stats and tstats commands).

Kyle expands on the definition as “a Moving Average is a succession of averages calculated from successive events (typically of constant size and overlapping) of a series of values“ and notes the following:

Let’s say we’re planning a road trip to visit some of the top craft breweries in the Mid Atlantic United States, and fed that data into Splunk. We want to compute the distance between waypoints and the total distance we’re traveling (so we know how much fuel to put into our personal jetpack). We apply autoregress to both latitude and longitude in order to iterate through the waypoints, then perform any further applicable calculations, such as `globedistance()` or streamstats.

Once you’ve pulled the relevant fields, your command may look something like this:

… | autoregress lat as prev_lat | autoregress lon as prev_lon | 
`globedistance(lat,lon,prev_lat,prev_lon,units)` | streamstats sum(distance) AS totaldistance

Here’s an example:

As shown above, the autoregress command may help you gather the information where commands like stats, streamstats, eventstats, or foreach alone aren’t necessarily suitable. If you’re like me, you should have no regrets adding the autoregress command to your SPL utility belt.

We invite you to join us for the Sixth Annual Boss of the SOC premiering at .conf21, where you’ll have the chance to buckle up and flex your Splunk super powers.

Happy hunting!

Follow all the conversations coming out of #splunkconf21!

Follow @splunk

Related Articles

Tales of a Principal Threat Intelligence Analyst
Security
3 Minute Read

Tales of a Principal Threat Intelligence Analyst

Discover how threat intelligence can offer valuable insights to help fend off future attacks, no matter how covert or cunning they appear to be.
Get Started with Splunk for Security: Splunk Security Essentials
Security
2 Minute Read

Get Started with Splunk for Security: Splunk Security Essentials

Splunk Security Essentials (SSE) is now part of the Splunk security portfolio and fully supported with an active Splunk Cloud or Splunk Enterprise license. Start using SSE and apply prescriptive guidance and deploy pre-built security detections in your Splunk environment.
Visualising a Space of JA3 Signatures With Splunk
Security
2 Minute Read

Visualising a Space of JA3 Signatures With Splunk

One common misconception about machine learning methodologies is that they can completely remove the need for humans to understand the data they are working with. In reality, it can often place a greater burden on an analyst or engineer to ensure that their data meets the requirements, cleanliness and standardization assumed by the methodologies used. However, when the complexity of the data becomes significant, how is a human supposed to keep up? One methodology is to use ML to find ways to keep a human in the loop!