New to Google SecOps: UDM Lookup

jstoner
Staff

I’d like you to take a moment and think about the first time you worked in security operations. Perhaps that role was triaging alerts or writing detections or hunting for threats. Depending upon the role, you likely were digging through parsers, documentation or asking colleagues what field might contain certain values of interest. I’d even wager that this was one of the more challenging hurdles of the learning curve of using a new tool for security operations.

I see a few of you smirking. You may be thinking that tooling you’ve used could search through  raw data and you could find anything you wanted. Speaking as someone who has worked extensively with one of those tools for many years, that’s not quite what I am referring to. For log management tools to be performant SIEMs, they needed to leverage a data model or schema to organize data. When those log management tools wrap some sort of schema around the data, they encounter the same challenge.

ntc-udmlookup-00.jpeg

We experience this challenge at Google. When I first looked at Unified Data Model (UDM), I was thrilled to have this extensible schema providing a tremendous number of fields to describe the principal user or asset, the target process, its command line, the hash and much more. But how can I build my search when I have all these fields chocked full of data but I’m not sure which ones contain the data that I am interested in? To address this, we are going to take a look at the UDM Lookup and how this capability can aid users to identify fields in their dataset where their key values reside and shorten the learning curve for understanding a new data schema.

UDM Lookup can search for strings contained within unenriched fields from today back until August 10, 2023, when it was introduced. Why unenriched? Well, unenriched fields are ingested from raw logs and are parsed to become a UDM event. Enriched fields are added to events after parsing and provide additional context, such as the department and title to a userid. Those enriched fields are continually being mapped to fields like principal.user.department or target.user.title.

When a user wants to search for a value, they don’t need to worry about capitalization or lack thereof impacting the lookup as UDM Lookup is case insensitive. Characters like hyphens and underscores are ignored within a text string as well. The goal is to help analysts identify the field(s) that these types of values reside in and then use the fields of interest in a UDM search to refine the data set they are interested in for the job at hand.

ntc-udmlookup-01.png

On the search screen, we can see UDM Lookup right below the search prompt. Clicking on UDM Lookup will open a pop-up. Alternatively, typing a term of interest into the prompt below the UDM Lookup and clicking enter will deliver you to the pop-up. Let’s start by searching for a term that we might find in a number of different fields; powershell.

ntc-udmlookup-02.png

Our results contain a list of entries on the left side of the pop-up. Notice the top two entries, metadata.log_type = POWERSHELL and metadata.product_name = PowerShell and the time entries on these. The most recent fields that contain our value are at the top of the list and are sorted in descending order by time. The third item in the list is the field security_result.rule_name and contains PowerShell within the field, but not as the entire value.

Clicking on the row for security_result.rule_name, we get additional information on the right side of the screen including a description of the security_result portion of the UDM json tree and a description of the security_result.rule_name field, including its attributes. From this, we can see that it is a string field and it is repeated, meaning that it can contain multiple values. We can also see that this field is expected to contain the name of the security rule pertinent to this event. At the bottom of the screen, we have the option to copy the UDM to the clipboard or append to search.

ntc-udmlookup-03.png

As we scroll further through our list of fields that contain the value powershell, we start seeing that a field, in this case, target.resource_name is seeing this value represented as a portion of the entire value in this field. UDM Lookup returns the unique value for each field along with the date when that field and value was last seen.

ntc-udmlookup-4a.png

As we scroll further, we get to the last field that we have a definitive date for the field and value that contains powershell and then we move into fields that are classified as possible value matches. 

Possible value matches don’t return every permutation in this view. Here’s why. The following fields potentially contain blocks of text that could create a number of permutations that would start making the UDM Lookup a bit unwieldy to dig through.

  • metadata.description
  • security_result.description
  • security_result.detection_fields.value
  • security_result.summary
  • network.http.user_agent

Similarly, values in fields that are expected to contain a folder tree or directory structure that end in the following values will also return possible value matches.

  • .command_line
  • .file.full_path
  • .labels.value
  • .registry.registry_key
  • .url
  • Additional.fields.value (begins with Additional)

None of this prevents us from clicking Append to Search and adding these strings to our search. In fact, we added three possible value matches and got the following UDM search:

 

 

 

network.http.user_agent = /powershell/ NOCASE OR principal.process.file.full_path = /powershell/ NOCASE OR target.process.command_line = /powershell/ NOCASE

 

 

 

Notice how search defaults the value to regex with the nocase modifier for each field selected and separates them with the logical operator or. From there we can search and filter using UDM search and find the events of interest based on our use case.

ntc-udmlookup-05.png

It’s worth noting that there are a few fields that the UDM lookup doesn’t search in. For example, certain ID based fields like process IDs or session IDs are not searched. Again values associated with process IDs, for example reside in very predictable fields and can quickly be searched in those specific fields.

You might be thinking, that’s great but what about something like a GUID or a SID? Here’s a quick example of how we search for a GUID in our data set. This GUID happens to be from Entra ID and is an enterprise application but could just as easily be a user or service principal or anything else for that matter. Notice how we can see this value is represented in a few different fields on February 3, as well as it being a possible match in a few additional fields like target.url. With this information, we can start searching for this value on that date, working back in time for additional fields that contain those values.

ntc-udmlookup-06.png

One last thing…I like to recommend that users keep the UDM field list handy, particularly when building content, but with UDM Lookup, we can search for a field and lookup information about the field itself in the pop-up. We did this earlier, but that was in the context of a value we uncovered. Here we are using it strictly as a reference, so if we want to see what field names exist in the UDM schema that reference the string userid, we can see that there are a number and by clicking on the one of interest, we can get additional information about the field itself.

ntc-udmlookup-07.png

The UDM Lookup is designed to make it simpler for users to find their parsed data and the fields associated with it. Searching raw data is nice, we can do that as well, but understanding where data resides so that searches, rules and dashboards can be created and executed in a performant manner is crucial. UDM Lookup does this. If you haven’t tried it yet, I’d suggest taking a look!

3 0 2,201
Authors