New to Google SecOps: What's in a String?

jstoner · 09-18-2024 08:30 AM

I’d like to introduce you to a few more string functions today. These three string functions are observing a portion of a string and, depending on the function, will output a boolean value or a portion of the string. We will go into each one, but understand that these functions provide greater flexibility in both search and rules as you develop content for your hunts, investigations and detections.

The three string functions we are going to cover are strings.contains, strings.starts_with and strings.substr. Let’s take a look at each.

strings.contains

In an earlier blog, I covered how re.regex returns a boolean value based on whether it finds a match for a regular expression within a given string. The strings.contains function is similar but is focused on matching a string not a regular expression. Let’s take a look at how we could apply this function.

When reviewing process launch events, we may come across the string cmd.exe /c in the target.process.command_line field. This signifies that a command is being executed within a new instance of the command prompt and then is immediately terminated once the command completes. While this can be used for legitimate purposes, a number of command and control platforms will leverage this method when executing commands on a victim system.

The other thing to consider is that cmd.exe /c may not be at the beginning of the command line, so we need to have some flexibility to find that substring anywhere in the field. With that in mind, here’s a search that will allow us to find instances of cmd.exe /c being executed in our environment.

metadata.event_type = "PROCESS_LAUNCH"
strings.contains(strings.to_lower(target.process.command_line), "cmd.exe /c")
metadata.product_name = "Microsoft-Windows-Sysmon"
principal.hostname = $hostname
match:
 $hostname
outcome:
 $event_count = count_distinct(metadata.id)
 $command_line = array_distinct(re.capture(strings.to_lower(target.process.command_line), (`.*cmd.exe /c.*`)))
order:
 $event_count desc
limit: 10

Notice we used a nested strings.to_lower function for the command line so that we can ensure that our pattern match is the same case. Although you can choose to have case sensitivity set on or off in search, having this function in place is a nice way to ensure that you are getting the same results with either setting.

Our search results show a few systems that are seeing this kind of behavior. We could drill into these systems and assess if the behavior is expected or anomalous and if it is anomalous, continue to investigate further.

We can apply the strings.contains function to rules as well. In this example, I converted the previous search and set the time window to aggregate common principal.hostname values to one hour.

rule strings_contains_example {
 meta:
   author = "Google Cloud Security"
   description = "Identify substring of cmd.exe /c being issued"
   severity = "Low"
 events:
   $process.metadata.event_type = "PROCESS_LAUNCH"
   $process.target.process.command_line != ""
   $process.metadata.product_name = "Microsoft-Windows-Sysmon"
   $process.principal.hostname = $hostname
   strings.contains(strings.to_lower($process.target.process.command_line), "cmd.exe /c")
 match:
   $hostname over 1h
 outcome:
   $event_count = count($process.metadata.event_type)
   $command_line = array_distinct(re.capture(strings.to_lower($process.target.process.command_line), (`.*cmd.exe /c.*`)))
 condition:
   $process
}

When I tested the rule, systems that had one or more command lines that contained the substring of cmd.exe /c were detected.

To recap, strings.contains searches for a string pattern within a field and returns a boolean value. In search, testing for the presence or absence of that substring can be used to narrow the result set.

strings.starts_with

If you feel comfortable with strings.contains, strings.starts_with will be a snap. The only difference between these two functions is that strings.starts_with matches from, you guessed it, the start of the string where strings.contains attempts to match anywhere within the string. Both return a boolean value, so their application is very similar.

In fact, because it is so familiar, this time we will build a search using our function in the outcome section. In this example, we want to identify executables that are launched from the user folder. Perhaps a user has downloaded an executable and run it from their desktop or download folder. We want to learn more about the executable and pinpoint its launch location.

metadata.event_type = "PROCESS_LAUNCH"
target.process.command_line != ""
metadata.product_name = "Microsoft-Windows-Sysmon"
match:
 target.process.command_line
outcome:
 $exec_from_users_folder = array_distinct(if(strings.starts_with(strings.to_lower(target.process.command_line), "c:\\users\\"),"TRUE","FALSE"))
order: 
 $exec_from_users_folder desc

The filtering statement of our search is pretty straightforward and this time we aggregate our search by the target.process.command_line value. If we wanted to group by the hostname and command line, we could easily do that as well.

In the outcome section, we are going to use our strings.starts_with function and look for command line values that start with c:\users\. Again, we use the strings.to_lower function to insulate ourselves against case sensitivity issues.

Now, pay close attention to this. The strings.starts_with function outputs a boolean value. Because we are aggregating our results based on the command line, we need to have an aggregation function (e.g. array_distinct) prepended to our outcome. Unfortunately, array_distinct is expecting just about anything except a boolean value, so we are going to add a conditional statement (if/then/else) with text outputs of the string “TRUE” or “FALSE”, so that we then have string values that the array_distinct aggregation function can consume.

Here we can see the command lines that were issued during our search window with the values marked TRUE originating in the users folder. We could take these results and hunt or investigate further on these command line values.

If we wanted to apply this same concept to a rule, that is to detect when process launches are occurring within the user folder, we could do that. While we could keep our boolean logic in the outcome section, like we did in our search, this isn’t the most efficient way to write the rule, because we’d then have to add a condition that stated the outcome variable equals TRUE.

 condition:
   $process and $exec_from_users_folder = "TRUE"

Instead, by moving the strings.starts_with function to the events section of the rule, we can isolate the events that meet this criteria using the boolean result and then aggregate our events by the hostname over a one hour window.

rule strings_starts_with_example {
 meta:
   author = "Google Cloud Security"
   description = "Identify process execution in user folders"
   severity = "Low"
 events:
   $process.metadata.event_type = "PROCESS_LAUNCH"
   $process.target.process.command_line != ""
   $process.metadata.product_name = "Microsoft-Windows-Sysmon"
   strings.starts_with(strings.to_lower($process.target.process.command_line), "c:\\users\\")
   $process.principal.hostname = $hostname
 match:
   $hostname over 1h
 outcome:
   $event_count = count($process.metadata.event_type)
   $user_folder = array_distinct(re.capture(strings.to_lower($process.target.process.command_line), `c:\\users\\(.+?)\\`))
   $user = array_distinct($process.principal.user.userid)
 condition:
   $process
}

In the outcome section, I have also added a re.capture function that is extracting the name of the user from the folder where the command was executed.

In the results, we have a few systems where processes were launched from different user folders including the public folder (C:\Users\Public).

strings.substr

The final function we are going to discuss today is the strings.substr function. This function extracts string values from a field or variable using an integer to define the start, followed by an integer for the length. If the value of the first argument is set to 1 or any value less than 1, the substring will start at the first character. The second argument, length, is optional and if not specified, it will continue to gather characters until it reaches the end of the string.

Depending on your comfort with using regular expressions, this function may serve as a nice way to avoid having to build regular expressions. On the other hand, you might be thinking we should just use a regular expression instead. Our goal here is to raise awareness that these different functions exist and there are many approaches to solving a problem, so think of this function as another tool in the tool box.

To illustrate one way we could use this function, we are going to turn to a set of network HTTP events. These events happen to be Microsoft Graph API activity logs. In our search, we want to identify the applications that an enumerated service principal has had access to and how many times this has occurred.

Our search is focused on successful requests of HTTP events and using a regular expression, we are narrowing our results to the URL that would return a listing of applications that the service principal has been granted access to.

With that basic search criteria, we have a number of URLs with the role GUID within the broader URL that look like this:

https://graph.microsoft.com/v1.0/servicePrincipals(appId='5b344a8d-5a01-4a2a-9e50-05b0bedd9b15')/appRoleAssignedTo

I’m interested in isolating that GUID from the rest of the URL and this is where we can use different functions to complete the task at hand. We could use re.capture and provide a regular expression to extract the GUID from the string. Another option would be to use the strings.substr function and count the number of characters from the start of the URL and start our capture from there.

I chose a different path, not a better or worse path, just a different one because I’m not keen on counting that far into a substring, so I used the re.replace function to take the leading portion of the URL and replace it in my variable with a null value. This effectively moved the GUID up to position 0 in my variable and from there I know that a GUID has 36 characters in it, including the dashes, so I specify my end value as 36.

metadata.event_type = "NETWORK_HTTP"
re.regex(target.url, `^https://graph.microsoft.com/v1.0/servicePrincipals\(appId='.*'\)/appRoleAssignedTo$`)
network.http.response_code = 200
network.http.method = "GET"
$guid = strings.substr(re.replace(target.url, `^https://graph.microsoft.com/v1.0/servicePrincipals\(appId='`, ""), 0, 36 )
match:
   $guid
outcome:
   $event_count = count(metadata.event_type)
order: $event_count desc

With the GUID written to its own placeholder variable ($guid), we can use it in the match section as our aggregate and we can count the number of times each GUID was called in our search window.

Today we covered two functions that take string values and identify pattern matches at the beginning or in the midst of a string and return a boolean value and another which provides users the ability to specify a range within a string to extract a value. These additional tools provide users more flexibility as they conduct their threat hunts, investigate and build detections. I hope you will find these functions useful and you will apply these capabilities to your use cases with Google SecOps!

The Google Cloud Security Community is upgrading platforms!

Read more and check out our FAQ.