New to Google SecOps: First and Last Seen

jstoner · 04-22-2024 06:00 AM

"New to Chronicle" is a deep-dive series by Google Cloud Principal Security Strategist John Stoner which provides practical guidance for security teams that are either new to SIEM or replacing their SIEM with Google SecOps.

A few months ago, we discussed the concept of prevalence and how Google SecOps can measure and store this metric within the entity graph on a per customer basis for every domain (and subdomain), ip address and file hash that is utilized in their environment. You may recall that prevalence is what we refer to as a derived context and is stored in the entity graph so that it can be recalled over time and utilized for YARA-L rules to generate detections.

Today, we are going to discuss another derived context, which is first and last seen. This simple concept, but when applied at scale, can be computationally heavy. But like prevalence, Google SecOps executes first and last seen with ease.

Google SecOps not only collects and maintains first and last seen for file hashes, domains and IP addresses, but also tracks first seen for assets and users. Did I mention that a user or administrator doesn’t need to do a thing to calculate these values? Just like prevalence, this happens automatically within Google SecOps. With this collection of derived context, YARA-L rules can be written for a number of different use cases. Let’s have a look.

A common concern organizations have is understanding when one of their systems starts to communicate with a domain that they have never seen before in their environment. To detect DNS queries for domains that have only been seen in the past day, we could develop a rule that looks like this.

In our rule lines 7-9 gather our DNS queries as well as the hostname of the system. That domain name is joined to the entity graph on line 12. Lines 13-15 describe first seen domains and ensure our rule is performant while line 18 calculates the difference in time between when the rule runs and when the domain was first seen. If that difference is less than a day (86400 seconds), then we should group the resulting events by the hostname (line 21).

rule domain_queries_first_seen_in_past_day {
    
    meta:
        author = "Google Cloud Security"

    events:
        $dns.metadata.event_type = "NETWORK_DNS"
        $dns.principal.hostname = $hostname
        $dns.network.dns.questions.name = $domain
        
        //derived from events ingested by Google SecOps
        $entity.graph.entity.hostname = $domain
        $entity.graph.metadata.entity_type = "DOMAIN_NAME"
        $entity.graph.metadata.source_type = "DERIVED_CONTEXT"
        $entity.graph.entity.domain.first_seen_time.seconds > 0
        
        //difference between current time and the first time seen
        86400 > timestamp.current_seconds() - $entity.graph.entity.domain.first_seen_time.seconds

    match:
        $hostname over 24h

    condition:
        $dns and $entity
}

Pro Tip: If we wanted to use assets, users, IP addresses or file hashes in our rule, the same template applies, but check out the documentation to find the correct field names where the data is stored for these entity types.

The last seen value can be a little trickier to use. With rules we are detecting things that are happening now, so of course the last time we see a file hash, ip address or domain is going to be now. That doesn’t mean we can’t take that last time value and compare it to when something was first seen or some other time metric and use that output in our rule.

In this next example, we are going to identify process launch or file creation events where their hashes are on the Safe Browsing list. We covered Safe Browsing previously, and if you recall, we already know that events that correlate with this list of hashes likely require remediation.

Now, let’s take this a step further and bubble up file hashes that are on the Safe Browsing list but are being seen again within our environment. Much like our first rule, lines 7-9 capture the events of interest as well as the hostname in variables that we will use in our match section.

Lines 12-15 join the events with the Safe Browsing data and lines 18-21 join our first and last seen data as well. None of that is different from our first example except that we are leveraging two different sources of entity data. Line 22 is calculating the difference between the first and last seen time that the hash was observed in our environment to determine if it is more than seven days (604800 seconds). OK, the seven days is a somewhat arbitrary value but the idea here is that we have seen this malicious hash in our environment before and now we are seeing it again, because it is recurring over a time boundary that we define, we want an alert raised.

The match section on line 25 is grouping our events by both the hostname and the hash over a four hour window. When we get to lines 28-34, we can see that first and last seen are being used to increment the risk_score. In short, Google SecOps is looking at the difference between these two values and for each week, the score goes up 25 points. If the difference between the first and last seen dates extend over one month, the risk_score is 100. The idea behind this is that if a malicious hash was seen over a month ago, but continues to be launched within the environment, it indicates a greater risk to the organization because it continues to happen.

rule safebrowsing_process_creation_hashes_seen_more_than_7_days {

  meta:
    author = "Google Cloud Security"

  events:
    ($execution.metadata.event_type = "PROCESS_LAUNCH" or $execution.metadata.event_type = "FILE_CREATION")
    $execution.principal.hostname = $hostname
    $execution.target.process.file.sha256 = $sha256

    // Safe Browsing file hashes provided by GCTI Feed
    $safebrowse.graph.entity.file.sha256 = $sha256
    $safebrowse.graph.metadata.entity_type = "FILE"
    $safebrowse.graph.metadata.product_name = "Google Safe Browsing"
    $safebrowse.graph.metadata.source_type = "GLOBAL_CONTEXT"

    // derived from events ingested by Google SecOps
    $seen.graph.entity.file.sha256 = $sha256
    $seen.graph.metadata.entity_type = "FILE"
    $seen.graph.metadata.source_type = "DERIVED_CONTEXT"
    $seen.graph.entity.file.last_seen_time.seconds > 0
    604800 <= $seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds

  match:
    $hostname, $sha256 over 4h

  outcome:
    $risk_score = max( if($seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds > 604800 and
        $seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds < 1209600, 25, 0) +
            if($seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds >= 1209600 and
                $seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds < 1814400, 50, 0) +
            if($seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds >= 1814400 and
                $seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds < 2592000, 75, 0) +
            if($seen.graph.entity.file.last_seen_time.seconds - $seen.graph.entity.file.first_seen_time.seconds >= 2592000, 100, 0))
    $event_count = count_distinct($execution.metadata.id)

  condition:
    $execution and $safebrowse and $seen
}

When we test our rule, we can see the file hash associated with the filename executable.exe on stevemorris-pc generated a Safe Browsing match. We can also see that this hash was first seen in our environment on 12/22/2022 and it was last seen on 6/15/2023, which means this hash has been seen on assets at various times over the past seven months. Because of that and the way we built our risk_score metric, the risk for this system is 100.

There is always more tuning to be done with these kinds of rules to suit an organization’s data sets and risk appetite but I hope this provides a solid overview of first and last seen and how these derived entities can be used in your YARA-L rules.