New to Google SecOps: Building a Rule Using Match and Outcome Variables

jstoner

As we continue our mini-series on building composite rules, we are going to take a look at how outcome and match variables from our producer rules can be used. For those who have been keeping up on this series, you may have seen us using these variables lightly in our examples, mainly because we needed a method to join disparate detections together, but today, we are going to take a harder look at these values, so let’s dive in!

The collections endpoint, where all detection data is written, is the place we’ll start today. Let’s start with this query, executed in dashboards and see what information we have in the detections dataset as it pertains to match and outcome variables and their associated values.

This search is returning all detections for the time window in the chart or dashboard. We are aggregating by the rule name and outputting four outcome variables called $match_variable, $match_value, $outcome_variable and $outcome_value.

$rule_name = detection.detection.rule_name
match:
  $rule_name
outcome:
  $match_variable = array_distinct(detection.detection.detection_fields.key)
  $match_value = array_distinct(detection.detection.detection_fields.value)
  $outcome_variable = array_distinct(detection.detection.outcomes.key)
  $outcome_value = array_distinct(detection.detection.outcomes.value)

These fields contain all of the variables and values found in the match and outcome sections of a rule. In the results, we can see that the match_variable column contains variable names like hostname, ip and session while the outcome_variable column contains names like principal_process_file_full_path, vendor_name, historical_threshold and more. Their associated value fields contain names of hosts like win-server, users like tim.smith_admin and product names like Crowdstrike.

Last time, we discussed the format that Google Security Operations (SecOps) uses when presented with key/value pairs, which looks like this:

UDM_field_name["key"] = "value"

This same concept applies to not just key/value pairs in the meta section, but also key/value pairs from the match and outcome sections of a rule. With that in mind, let’s return to our query, and tune it to output the match variables and three outcome variables from our rules. Why three outcome variables? No specific reason aside from wanting to keep it readable, we could build a query that captured many more outcome variables if we wanted.

That all said, much like meta labels, 1) using outcome variables in rules and 2) having consistent naming becomes critically important if you are going to use these capabilities when building composite detections.

In our query, we are going to return all detections that start with the string producer. I’ve started using this to pair my producer rules to my composite detections, you certainly don’t have to, but I found that as I transition some standalone rules into smaller more bite size detections, changing their names to fit with this producer/composite flow made sense.

$rule_name = detection.detection.rule_name
detection.detection.rule_name = /^producer/
detection.detection.alert_state = "NOT_ALERTING"
match:
  $rule_name
outcome:
  $match_hostname = array_distinct(detection.detection.detection_fields["hostname"])
  $outcome_userid = array_distinct(detection.detection.outcomes["principal_user_userid"])
  $outcome_cmdline = array_distinct(detection.detection.outcomes["target_process_command_line"])
  $outcome_hash = array_distinct(detection.detection.outcomes["target_process_file_sha256"])

In the outcome section, we have key/value pairs representing the match variable of hostname and outcome variables of principal_user_userid, target_process_command_line and target_process_file_sha256. Because we are grouping the query by rule name, we need to apply an aggregation function to each of the fields in the outcome section and to prevent duplicate values, we are using array_distinct.

In the results, we can see our rule name and the values associated with each of our match and outcome variables. For instance, the first detection had a match variable value of win-server in one detection and win-adfs in another. Interestingly, none of the three outcome variables have associated values for that rule. Based on that, either I need to refine that rule to add some additional outcome variables to it, or the rule coverage does not lend itself to user and endpoint values and I need to either plan to leverage it with other like hostnames or find other outcome variables, like IP that might be a way to link this detection with other detections.

Many of the other detections in this listing contain a hostname, username and file hash that readily can be used. The command line output looks robust as well, but string matching on command lines is always dicey so perhaps using regular expressions with this value may be something to consider.

Based on the query we just reviewed, it appears we have a set of composite rules that have been written with a match condition that aggregates by a hostname value. Suppose we wanted to generate a composite detection but did not want the aggregation to be based on hostname but on the userid. We could handle this by using the outcome variable from the producer rule within the composite rule.

Here’s the general flow. We have a set of producer rules and these rules are standardized on a match variable of hostname. So far so good. The composite rule that I want to build is going to be based on these rules that start with the string producer_, are not alerting and single event rule types. However, I want this composite rule to be user centric, that is aggregate the detections by the user rather than by the host.

Below is an example of how this rule could be written. In the events section, we have the detection descriptors for the rule type and alert state and their associated enumerated values. The next line of criteria is the rule name as a regular expression. We are looking for rules that start with the string producer_. Keep in mind that we could also do this with functions like re.regex or even strings.starts_with. Any way we do it, we have our detections.

The last line in the events section looks for the outcome variable of principal_user_userid in the detections and writes the value to the placeholder variable $userid. In the match section, that new placeholder variable is used to aggregate the detections over two hours and in the outcome section, we are calculating a risk score and counting distinct detections and total detections and then using those values as thresholds in the condition section.

rule composite_all_producer_by_userid {

 meta:
   author = "Google Cloud Security"
   description = "Aggregating producer rules by the outcome variable userid"
   severity = "High"
   type = "composite"

 events:
   $detect_prod.detection.detection.alert_state = "NOT_ALERTING"
   $detect_prod.detection.detection.rule_type = "SINGLE_EVENT"
   $detect_prod.detection.detection.rule_name = /^producer/

   $userid = $detect_prod.detection.detection.outcomes["principal_user_userid"]

 match:
   $userid over 2h

 outcome:
   $risk_score = 60
   $total_detection_count = count($detect_prod.detection.detection.rule_id)
   $uniq_detection_count = count_distinct($detect_prod.detection.detection.rule_id)
   $rules_triggered = array_distinct($detect_prod.detection.detection.rule_name)

 condition:
   $detect_prod and $uniq_detection_count > 2 and $total_detection_count > 5
}

When we test the rule, we have a detection where the user tim.smith_admin has generated detections on multiple systems, both win-server and win-adfs. Notice at the top of the image, we have the detection field key/value pair which was set in the composite detection but below in the detections setting, the key/value pair for the producer detections is the hostname and the two servers. Additionally, we can view the source field that this value was stored in, in this case principal.hostname.

In this blog, we used match and outcome variables and their associated values in our composite rule. This builds on some of the earlier concepts in recent blogs to highlight some of the different use cases you can address with composite rules. As mentioned before, it is crucial to ensure you are using variables in your rules and then that you are consistent in your naming conventions on match and outcome variables to facilitate the joining of these variables in composite detections.

The use of key/value pairs in the match and outcome sections, like the meta section we previously covered, provides a powerful method to build robust use cases in Google SecOps.