The SOC Metrics that Matter…or Do They?

chuvakin
Staff

Measuring the SOC: What Counts and What Doesn't in 2025?

We all know security operations is notoriously difficult. Many organizations have a lot of challenges when it comes to expected security outcomes, especially with the threat detection, investigation, and response workflow. It is one thing to want a SOC, but it is very different to have a well-running, operationally effective Security Operations Center.

But why is that? We've got the standard answers, the ones that we all know and love such as the lack of data and context, limited attacker insight, and manual and complex processes. But there's an additional challenge, and that's the ability to use metrics to demonstrate the value that security teams bring to the larger organization. 

In our upcoming webinar, The SOC Metrics the Matter…or Do They? guest speaker Allie Mellen, principal analyst at Forrester, and Anton Chuvakin, security advisor at Office of the CISO at Google Cloud, dig into what to consider when it comes to SOC metrics in 2025.

The Paradox of Metrics

Why, after all these years, do we still struggle with this? It's time to cut through the noise and find the metrics that illuminate, not just inundate.

Looking back at the archives of SOC metrics it’s safe to say some of the things that people wanted to measure circa 2005 are suspiciously similar to what they want to measure today - number of events, various false positive ratios, and at the same time the issue remains contentious. So there's a bit of a paradox here. Why didn’t we get to accepted and standard SOC metrics in almost 20 years?

In theory, if you build a SOC following the traditional SOC blueprint, you should know what to measure. But it is not the case often enough, and there is a lot of debate on the topic. So what's going on? Why can't we just get it done?

It's a surprisingly difficult problem to solve. There are a lot of different metrics that you can track and there's a lot of different ways to do it. And there are a couple of reasons why it's so difficult. There was once a SOC that really optimized alert handling speed, but was later proven helpless in the face of a real attack. Metrics they focused on made them better … but only until the attacker showed up.

First, every organization is different. And so you run into a lot of situations where an organization doesn’t want to or can’t track particular metrics. A SOC built to withstand top-tier threats for a major defense contractor may care about (and thus measure) different things compared to a SOC for a chain of rural hospitals. This means cookie-cutter metrics don't work.

Second, tool-centric metrics won’t do it. A lot of the dashboards that are pre built into tools are not made to show metrics that show the value of the security operations function. They are made to show the value of the tool that security operations is using. That is useful for the vendor, but not very useful for the team that is using it and trying to prove their value. Thus, use the tools but step away from the metrics that showcase the tool's performance, not the SOC's.

Third, workflow is chaotic at many SOCs. We've changed a lot as far as what security operations team is responsible for and how effectively we're able to do it. A big part of this issue is the standardization of workflows if you want to track metrics appropriately. And the truth is, many organizations don’t have a standardization of workflows or a standardization of process. This is one of the reasons why SOAR has been so difficult for so many teams because you can't automate what you don't know you do consistently. We've seen SOCs where almost every analyst had their own way of documenting activities, leading to inconsistent data and near-impossible metrics.

In another example, you could focus on wanting things done in a short period of time like a push to decrease MTTD (mean time to detect). But what does that mean? Clicking OK on every alert is very short, so this is one - obviously unproductive - way to do it. Or actually writing a script that can click a button automatically in sub milliseconds is even faster. But that's not the point. So sometimes the time metrics have their own paradox. They're good. Unless you use them as goals, and then they're really bad. Site Reliability Engineers (SRE) talk about Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and then remind us that not every SLI needs to become an SLO.

The Easy Metrics vs the Metrics You Really Want

Are there easy metrics like logs collected per day and alerts delivered to each analyst? They are easy to collect, but what security outcome do they measure? Even MTTD can be easy if done naively.

For example, let's say that you have reduced mean time to detect, and now your mean time to detect is under a minute. That sounds great, except when you think about the fact that your false positive rate has increased by 95%. That's a problem, right? It would be more practical to take a five minute mean time to detect with a false positive rate of 10-20% than it would be to take a mean time to detect of one minute and a false positive rate of 95%. It's about effective detection, not just fast detection.

One question to ask is what are you incentivizing with the metrics that you're establishing? And if the metrics are oversimplified then you're going to incentivize people to respond faster without necessarily focusing on completeness of response, as an example. If you only reward speed, you'll get rushed, sloppy work. You must incentivize thoroughness and accuracy.

Another question to ask is what am I going to do with these metrics? If the answer is you’re just reporting them up to the CISO or the board or whoever I need to report these to once every month or once every six months, odds are you’re not going to put a lot of work into tracking them.  Metrics should drive action, process improvement, and resource allocation; everybody knows this, but few practice.

One consideration would be to leverage detection engineering because one of the benefits is that it enables you to identify and iteratively improve on specific metrics more effectively than what we've seen in the past when we weren't really focused on agile workflows and was difficult to achieve. These metrics can be used for process improvement and to improve the operations that they've built. 

Specifically, shift the mindset away from focusing on the tickets that you’re closing to how you’re building better detections and better automation workflows. Ultimately, the better the detection is, the better the automation workflow is and the faster you're going to be able to close those tickets. 

Detection engineering isn't just about writing rules; it's about creating a feedback loop that continuously uses your metrics outputs and improves your detection posture, as well as the relentless drive to automate detection and triage activities results.

Another consideration is the maturity of the team and how to make sure that the team gets the metrics that are going to support what they're doing and the system they're building. We can throw out a list of metrics that people should track, but it doesn't mean that you have the resources or the capabilities to track them or do anything about them. 

Mixing Them Up

At the end of the day, a key way to make metrics work is to create a mesh of interconnected metrics. If you don't connect them against other metrics, the metric is often meaningless. And we see this problem come up a lot in many different areas. A $100m market cap, for example, can seem like a lot if you don't have the context of all the other businesses that have much larger market caps than that. Context makes metrics work for you!

One way to determine what to track is to structure metrics as tactical, operational, and strategic. These are interconnected - tactical metrics roll up into operational metrics, which in turn, roll up into strategic metrics. Tactical metrics focus on process improvement, operational level metrics would focus on things such as analyst experience, while strategic metrics would align to the business needs and be the highest level that's going to the board and to the CISO, and typically involves other teams, not just security operations. 

What’s the benefit? Can you dig down and see what strategic metrics actually mean in practice? What factors contributed to that? You can go down to the operational level or to the tactical level. And similarly, you can roll up to the strategic level. As an example, that might then be able to show you the connection between something like revenue loss due to IP theft and the incident growth relative to company growth.

Why we can’t agree on recommendations

We should have this figured out by now. But SOCs are complex beasts. Workflow dependencies, varying maturity levels, and the fundamental question of "what are we really trying to measure?" all contribute to the ongoing debate.

Differences between environments and in maturity, at the end of the day, is the main reason there isn’t a canonical list of metrics we all copy and use. There needs to be a maturity journey for metrics. Your metrics will look very different depending on where you are in maturity. If you're not there yet, don't just assume that those are the metrics that you should be tracking, because it's going to be really hard to do so effectively. 

Key Takeaways

  1. There's no perfect answer when it comes to which metrics to use. A metric without context is just a number.
  2. Consider the maturity level when choosing the metrics to track. Your metrics should reflect your SOC's maturity level.
  3. Build a metrics pyramid: tactical, operational, and strategic metrics must be interconnected.
  4. Shift the focus: It's not about closing tickets; it's about building better detections and using automation to drive the speed, thus affecting the metrics.

Interested in learning more? Register for the webinar, The SOC Metrics the Matter…or Do They? 

3 0 37.6K
Authors