Standard Deviation and Simple Time Series Outlier Detection Using Table Calculations

I recently used this technique as a quick and dirty way to find anomalies in time series data and thought I would share.

Let’s say you have simple time series data, event counts grouped by day:

SELECT 
	TO_CHAR(DATE_TRUNC('week', updates.date),'YYYY-MM-DD') AS "updates.date_week",
	COUNT(*) AS "updates.count"
FROM public.updates AS updates

To find big ‘jumps’ in this data, you could use the 3 sigma rule. Basically any observations that fall outside of three standard deviations from the mean is considered an outlier.

mu  = mean of the data
std = standard deviation of the data
IF abs(x-mu) > 3*std  THEN  x is outlier

To model this in a Look, I used table calculations. The first thing we need is the Standard Deviation of the count field. This is calculated by taking the square root of the average of the squared deviations of the values from their mean value.

This is pretty easy to define as a table calculation:

round(sqrt(mean(power(${updates.count} - mean(${updates.count}), 2))),2)

Once we have that, we can pretty easily define our 3-sigma rule:

if(abs(${updates.count} - mean(${updates.count})) >  ${standard_deviation} * 3, "Yes", "No")  

And that’s all you need to detect outliers!

6 3 13.4K
3 REPLIES 3
Top Labels in this Space