New to Google SecOps: Building a Rule To Monitor for Risky Behavior

jstoner

Over the past month or so, we’ve slowly broadened our use of Google Security Operations (SecOps) rule engine to build composite rules. Today’s blog applies the concepts we’ve covered to date but now we are going to leverage risk calculations from the underlying rules and their detections to generate an alert focused on a user or asset’s activities.

Risk Score

Let’s start with the risk score. Long time readers of this blog series may recall that we have discussed risk scores in the past. It’s called in the outcome section of a YARA-L rule and can be used with conditional statements to generate a value. We then extended the use of a risk score to Risk Analytics within Google SecOps.

What you may not know about risk scores is that because rules could be created without a risk score, rules wouldn’t always have a risk value in them, which makes it a bit difficult to quantify risk. So, in an effort to leverage risk scoring in detections and alerts, we’ve added a default risk score. This default score will be overridden with risk values that the detection engineer writing the rule provides. But if no risk value exists in the rule, we assign detections a value of 15 and alerts a value of 40.

This means that the risk score can reside in two different places in the detection dataset; as an integer field in the collections endpoint with the field name of detection.detection.risk_score and as a key/value pair in detection.detection.outcomes["risk_score"]. It will always reside in the former, and if the detection engineer overrides the default value by placing it in the outcome section of the rule, it will appear in the latter.

This means that we have some decisions to make when writing our composite rules, but we will come back to that in a bit. Let’s start by taking a look at a query, executed in dashboards and see what information we have in the detections dataset as it pertains to risk scores.

$rule_name = detection.detection.rule_name
detection.detection.rule_name = /(^composite|^producer)_/
$detection_time = timestamp.get_timestamp(detection.created_time.seconds)
match:
  $detection_time, $rule_name, detection.detection.rule_id
outcome:
  $key_value_risk_score = array_distinct(detection.detection.outcomes["risk_score"]) //As a key/value pair string
  $risk_score = max(detection.detection.risk_score) //As a field represented as an integer
limit: 10

To narrow my detections down a bit, I’m going to filter on the rule name. The rules that I am returning start with either the string of composite_ or producer_. I’ve added a date and the rule ID as well, but the two columns on the far right are the ones I’d like to call your attention to. These are the two methods that return a risk score in the detection data set. Notice how some of them have the same value for both variables and others have a value in just the $risk_score variable, that is the one represented by the integer. This one will always have a value, that is the default of 15 for detections, 40 for alerts or the value from the outcome variable that the detection engineer specifies.

Let’s validate this by right clicking on the first entry in the list, producer_recon_environment_enumeration_active_directory_cisa_report, and viewing the detections. Here we can add the same two columns to this view, but let’s actually look at the underlying rule by clicking Rule Options - Edit Rule.

In the rule editor, we can see the entire rule. Notice in the outcome section that there is no variable with the name of $risk_score. Because of this, the value in detection.detection.outcomes["risk_score"] is null. Also note that the Alerting option is off for this rule. This means that the rule will create a detection. Therefore, the detection.detection.risk_score is 15.

Building a Rule using Risk Score

With that background in place, let’s apply this to a rule. For this use case, we are going to take into account all of our rules to calculate the risk for a specific user for an entire 24 hour window. Because we are taking into account all rules, we will have a number of detections with various risk scores. Because we are adding up risk scores and we want to account for all the detections, even the ones that the detection engineer didn’t create a risk score for in the rule, we are going to use the field detection.detection.risk_score. Additionally, because we are assessing risk by user, we are going to use a match variable that aligns to user values.

In the events section, we have a single event variable because all the detections have the same criteria. If we wanted to refine the different detections available for this risk calculation, we could add additional criteria. Some of the techniques we’ve discussed in previous blogs can be found in the events section but are commented out.

Notice in the events section, that we are referencing the outcome variables in our detections to get the value stored in the variable principal_user_userid. We are storing that value in the placeholder variable of $user and then using that to aggregate over the 24 hour window by using this variable in the match section. We also filtered out the user SYSTEM to filter out some extraneous noise.

rule composite_cumulative_risk_score_threshold_exceeded_user {

 meta:
   author = "Google Cloud Security"
   description = "Detects a userid that exceeds a risk score threshold based on detections/alerts from all rules"
   severity = "High"
   priority = "High"
   type = "composite"

 events:
   $detect_prod.detection.detection.outcomes["principal_user_userid"] = $user
   $detect_prod.detection.detection.outcomes["principal_user_userid"] != "SYSTEM"  
   //$detect_prod.detection.detection.rule_name != /^producer_/
   //$detect_prod.detection.detection.alert_state = "NOT_ALERTING"
   //$detect_prod.detection.detection.rule_labels["type"] != "composite"

 match:
   $user over 24h

 outcome:
   $risk_score = 60
   $uniq_detection_count = count_distinct($detect_prod.detection.detection.rule_id)
   $total_detection_count = count($detect_prod.detection.detection.rule_id)
   $rules_triggered = array_distinct($detect_prod.detection.detection.rule_name)
   $cumulative_risk_score = sum($detect_prod.detection.detection.risk_score) // sum of the risk score to measure against the threshold
   $key_value_risk_score = sum(cast.as_float($detect_prod.detection.detection.outcomes["risk_score"])) // sum of the risk score to measure against the threshold

 condition:
   $detect_prod and $cumulative_risk_score >= 60 and $total_detection_count > 3
}

The outcome section of our rule has a few variables to note. The first is $uniq_detection_count which generates a count of the unique detections found within this rule. The next is $total_detection_count which just counts the number of detections in the rule. Both can be useful to tune your rule. The other outcome variable of note is $cumulative_risk_score which is creating a summed value of those risk scores that we referenced earlier.

Finally, the condition section uses the $cumulative_risk_score and $total_detection_count to tune the threshold for this rule to trigger and return what we are defining as high risk users.

When we test the rule, we have two detections, one of them is related to the user account tim.smith_admin. Expanding the test detection, we can see that there are both alerts and detections that make up this broader composite detection. We can also see the different risk and detection calculations from the rule above.

The two variables that I want to look closely at are the $cumulative_risk_score and $key_value_risk_score. The $cumulative_risk_score is a sum of the integer value detection.detection.risk_score and is 460. The $key_value_risk_score uses the outcome variable of $risk_score from the underlying detections. Because this is a key/value pair, we need to convert the value from string to integer using the cast.as_float function first before using the aggregation function of sum. The other important thing to note is that this sum is 415, probably because a few of the underlying detections have a null $risk_score value, hence the offset between the two values.

I’ve spent a lot of time reinforcing the risk score and where it resides within the detection dataset and how to calculate it. I hope that showing the similarities and differences between these fields will be helpful as you create your own rules using risk scores. The actual composite detection we developed was very straightforward, and similar rules for hostnames can be just as simple. Once you have a working rule, it just comes down to tuning it using calculations like unique detections, total detections and risk score. However, that risk score could be very different based on your approach to risk scores in the underlying detections, which is why this has been such a point of emphasis today!