Seeking Advice: Implementing a reliable Data Exfil...

maxjunker · 01-31-2025 05:44 AM

Hello everyone,

I am currently working on implementing a Chronicle SIEM Use Case using YARA-L to detect data exfiltration and large uploads, but I am struggling to find a reliable approach.

Challenges & Observations:

A threshold-based approach (e.g., simply alerting on high outbound data volume) is too imprecise and results in excessive false positives.
YARA-L metrics, such as metrics.network_bytes_outbound, seem promising, but I am unsure how to structure the rule effectively to differentiate between legitimate large uploads and actual data exfiltration attempts.
My benchmark is the data exfiltration detection capability in Cortex (Palo Alto), which works very well and reliably. I would like to achieve similar detection quality in Chronicle.

Questions for the Community:

Has anyone successfully implemented a data exfiltration detection rule in Chronicle SIEM?
Are there best practices for leveraging YARA-L metrics to improve detection accuracy?
Any recommendations on additional signals or anomaly detection techniques that could be used to reduce false positives?

I would really appreciate any insights or experiences you can share!

Thanks in advance for your help.

Best regards,
Max

rajukg

You may choose to use different strategies by combining some of these techniques:

rule network_prevalence_uncommon_domain_ioc_match {
  events:
    $e.metadata.event_type = "NETWORK_DNS"
    $e.network.dns.questions.name = $hostname
    //only match FQDNs, e.g., exclude chrome dns access tests and other internal hosts
    $e.network.dns.questions.name = /(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]/

    //prevalence entity graph lookup
    // - prevalence does not populate vendor_name, so we use that to uniquely join against
    $p.graph.metadata.vendor_name = ""   
    $p.graph.metadata.entity_type = "DOMAIN_NAME"
    $p.graph.entity.domain.prevalence.rolling_max <= 3
    $p.graph.metadata.threat.severity = "HIGH"
    $p.graph.entity.hostname = $hostname
 match:
    $hostname over 10m

  outcome:
    $risk_score = max(
        //increment risk score based upon rolling_max prevalence
        if ( $p.graph.entity.domain.prevalence.rolling_max = 3, 50) + 
        if ( $p.graph.entity.domain.prevalence.rolling_max = 2, 70) + 
        if ( $p.graph.entity.domain.prevalence.rolling_max = 1, 90)
    ) 
  condition:
    $e and $p

rule ueba_first_time_interactive_service_account_login_by_user_id {

  meta:
    rule_name = "First Time Interactive Service Account Login by Principal User ID and Target Service Account"
    description = "Detects the first time a user has a successful interactive service account login, tracked by principal user ID and target service account in the last 30 days. This may be indicative of behavior related to entry vectors."
    severity = "Low"
    tactic = "TA0001" // Initial Access
    technique = "T1113" // External Remote Services

  events:
    $e.metadata.event_type = "USER_LOGIN"
    ($e.security_result.action = "ALLOW" or $e.security_result.action = "ALLOW_WITH_MODIFICATION")
    $service_account = $e.target.user.userid
    $user_id = $e.principal.user.userid

    $e.target.user.attribute.labels.value = /OU=Service Accounts/ nocase
    $e.extensions.auth.auth_details = /Logon Type (2|10)/ nocase

match:
    $user_id, $service_account over 48h

outcome:
    $risk_score = max(35)
    $event_count = count_distinct($e.metadata.id)

    $historical_threshold = max(metrics.auth_attempts_success(
      period:1d, window:30d,
      metric:first_seen, agg:min,
      principal.user.userid:$user_id, target.user.userid:$service_account))

    $principal_user_id = array_distinct($user_id)
    $target_service_account = array_distinct($service_account)

  condition:
    $e and ($historical_threshold = 0)
}

rule raju_network_traffic {
   meta:
       description = "Users with total bytes transferred less than 50% of average in a month"
   events:
       (($byte.network.received_bytes > 0 AND $byte.network.received_bytes < 1000000000000000) 
          OR ($byte.network.sent_bytes > 0 AND $byte.network.sent_bytes < 1000000000000000))
       $byte.principal.user.userid = $userid
   match:
       $userid over 3h
   outcome:
       $cnt_bytes_tranferred = sum($byte.network.received_bytes + 
           $byte.network.sent_bytes)
       $avg_bytes_tranferred = max(metrics.network_bytes_total(
           period:1d, window:30d, metric:value_sum, agg:avg,
           principal.user.userid:$userid ))
       $threshold = $avg_bytes_tranferred * 1.5   
   condition:
       $byte and ($cnt_bytes_tranferred > $threshold)
}

maxjunker

Thank you very much @rajukg. There´s a lot to consider here. I will check out those approaches. Thanks for now!

New SecOps Webinar May 14th! Learn about Gemini's generative AI within Google SecOps

Seeking Advice: Implementing a reliable Data Exfiltration Detection Rule in Chronicle SIEM