Beyond True and False Positive: Annotating Alerts ...

vindw · 05-07-2025 06:29 AM

Imagine that you’ve received an alert from a detection rule that triggered on a suspicious user-agent value. You see that within the user-agent, there is an injected JSON command and what seems to be an unknown actor signature.

Digging further you learn that AcmeSec Labs is an external red team service provider that was engaged to perform a penetration testing assessment within your own organization. You’ll immediately stumble into a tricky situation. The injected string is a sign of malicious activity but was the intent malicious in nature? That is, could one confidently classify this alert within the confines of the True/False Positive model?

In our investigations we’ve come across many such scenarios. As security teams strive for accurate metrics, in that pursuit we hit some ambiguity in reporting. We would like to share our annotation philosophy and ways Google Security Operations’ (SecOps) users can immediately use such annotations in their investigations.

Security Annotations practiced in the industry

We first would like to establish a definitional foundation to the annotations that were used to triage the alert in the above scenario.

There have been decades of research in arriving at some standards on annotations and conventions to be used in the security industry. For example, we use the term True Positive to confirm a detection to be as what it should have been. Also, False Positive to denote that a detection is not accurate per expectation from the detector. More formal definitions and types of annotations can be found here.

Furthermore, the term “threats” in the real world has a similar significance in the digital space and properly annotating threats helps organizations to protect their assets. Likewise, malicious and benign fall in the threat category denoting an adversary’s intention to impact an organization.

Our intent is to show the usefulness of combining these annotations - true and false positive, malicious and benign - in a layered fashion to provide a deeper contextual understanding of an incident.

Triaging an Alert in SecOps World

Google SecOps has the following interface where users can view/triage and resolve security threats to the organizations. The interface is very intuitive and provides a one-stop solution for all the security issues. The following page provides a high level overview of the alerts users can view for their organization -

Once an alert is selected, it can be triaged and corresponding case can be resolved (by clicking the the top-right icon circled in below view -

There will be a few options presented to the user to select from, like, annotating the alert as malicious or not malicious and select appropriate root cause and then can close the case (as shown below).

Implications of using known Terminology

As highlighted in the scenario at the beginning, usage of terms for annotating alerts becomes tricky based on the context. This becomes more relevant when we are dealing with security tools that detect malicious activities from adversaries. As the scale of tools increases, so do the False positives. In most of the scenarios, tools emit benign activities as malicious. So, as mentioned above, various security practitioners use known names/notations as filters to these tools to avoid false positives.

So, what is the problem?

The issue is, there could be malicious actors under the hood who would be re-using these known notations/naming conventions to disguise themselves.

How can we differentiate the benign/malicious user in those instances? So, the context definitely helps in those situations. However, the automated tooling does not have insights on those contexts and the intent of the actor. The value of custom annotations shines here, as the analyst can review previous finalized cases with the same tags and use a metrics-based approach to more confidently triage the active alert.

Without custom annotations, skewed metrics can arise. For instance, consider 100 alerts where 60 are true positives (TPs). Out of 40 false positives (FPs), 25 alerts represent intended detection behavior (non-malicious).

Marking these 25 intended alerts as FPs results in a 60% true positive (TP) rate, influencing the perceived security posture accordingly.
Conversely, categorizing these 25 as TPs changes the metrics (85% TP rate).

Consequently, tool effectiveness and security posture metrics would toggle between 60% and 85%.

Security Engineer’s role when handling alerts with known semantic

In the example given above, one can imagine that it is not a malicious threat. However, is it important to detect it, thus wasting security engineer cycles? Binary answers to such questions/scenarios always lead to the risk tolerances from the organization. The outcome can be severe if these alerts are silenced, either by marking them as false positives or by using known criteria (such as names or traffic origins) to filter them.

Rather than using standard notations like True Positive (TP), consider adopting a notation like TP-Testing. This approach can confirm the efficacy of security tools and ensures that engineers receive security alerts, and potentially expose malicious insiders (if any).

Example annotations highlighting the Intent versus outcome

Malicious Intent, Benign Outcome
Benign Intent, Malicious Outcome
Will we always know what the intent was? What are the implications of accepting that we won’t always know what the intent was? When in doubt, it is better to annotate an alert as Malicious

Google SecOps interface does provide the ability for users to define and apply custom tags that can be used to triage alerts and annotate the cases accordingly. The following views provide a high level workflow that can be adopted -

And once appropriate tags/notations are defined and applied, users can extract corresponding metrics and take appropriate actions. For example, if there has been a lot of benign/internal alerts, it could indicate some malicious insider activity.

Take aways

Based on above discussions and our observations, the following are a few recommendations to highlight best practices for security analysis while triaging an alert -

Avoid following stringent binary annotations of true positives and false positives

Use the judgement and try to include additional annotations like - TP-Testing, TP-Expected (to reflect situational awareness).
In the Google SecOps environment, make use of custom annotations, tags (as highlighted above).

Be accommodating to security alerts even if they are alerting based on known benign activities

While this may incur security engineer’s cycles, having visibility of known benign activities will always help boost the tooling confidence and identify any insider threats.

Finally, do not attach security detection capabilities with context as automated machines will not be able to extract context or intent precisely.

This will help with robust tooling and security detections being agnostic to context and avoid any assumptions (that are usually fed as contextual knowledge into tooling).

Check out the new Professional Security Operations Engineer certification beta!