Seeking Advice: Implementing a reliable Data Exfiltration Detection Rule in Chronicle SIEM

Hello everyone,

I am currently working on implementing a Chronicle SIEM Use Case using YARA-L to detect data exfiltration and large uploads, but I am struggling to find a reliable approach.

Challenges & Observations:

  • A threshold-based approach (e.g., simply alerting on high outbound data volume) is too imprecise and results in excessive false positives.
  • YARA-L metrics, such as metrics.network_bytes_outbound, seem promising, but I am unsure how to structure the rule effectively to differentiate between legitimate large uploads and actual data exfiltration attempts.
  • My benchmark is the data exfiltration detection capability in Cortex (Palo Alto), which works very well and reliably. I would like to achieve similar detection quality in Chronicle.

Questions for the Community:

  1. Has anyone successfully implemented a data exfiltration detection rule in Chronicle SIEM?
  2. Are there best practices for leveraging YARA-L metrics to improve detection accuracy?
  3. Any recommendations on additional signals or anomaly detection techniques that could be used to reduce false positives?

I would really appreciate any insights or experiences you can share!

Thanks in advance for your help.

Best regards,
Max

0 2 422
2 REPLIES 2

You may choose to use different strategies by combining some of these techniques:

rule network_prevalence_uncommon_domain_ioc_match {
  events:
    $e.metadata.event_type = "NETWORK_DNS"
    $e.network.dns.questions.name = $hostname
    //only match FQDNs, e.g., exclude chrome dns access tests and other internal hosts
    $e.network.dns.questions.name = /(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]/

    //prevalence entity graph lookup
    // - prevalence does not populate vendor_name, so we use that to uniquely join against
    $p.graph.metadata.vendor_name = ""   
    $p.graph.metadata.entity_type = "DOMAIN_NAME"
    $p.graph.entity.domain.prevalence.rolling_max <= 3
    $p.graph.metadata.threat.severity = "HIGH"
    $p.graph.entity.hostname = $hostname
 match:
    $hostname over 10m

  outcome:
    $risk_score = max(
        //increment risk score based upon rolling_max prevalence
        if ( $p.graph.entity.domain.prevalence.rolling_max = 3, 50) + 
        if ( $p.graph.entity.domain.prevalence.rolling_max = 2, 70) + 
        if ( $p.graph.entity.domain.prevalence.rolling_max = 1, 90)
    ) 
  condition:
    $e and $p
rule ueba_first_time_interactive_service_account_login_by_user_id {

  meta:
    rule_name = "First Time Interactive Service Account Login by Principal User ID and Target Service Account"
    description = "Detects the first time a user has a successful interactive service account login, tracked by principal user ID and target service account in the last 30 days. This may be indicative of behavior related to entry vectors."
    severity = "Low"
    tactic = "TA0001" // Initial Access
    technique = "T1113" // External Remote Services

  events:
    $e.metadata.event_type = "USER_LOGIN"
    ($e.security_result.action = "ALLOW" or $e.security_result.action = "ALLOW_WITH_MODIFICATION")
    $service_account = $e.target.user.userid
    $user_id = $e.principal.user.userid

    $e.target.user.attribute.labels.value = /OU=Service Accounts/ nocase
    $e.extensions.auth.auth_details = /Logon Type (2|10)/ nocase

match:
    $user_id, $service_account over 48h

outcome:
    $risk_score = max(35)
    $event_count = count_distinct($e.metadata.id)

    $historical_threshold = max(metrics.auth_attempts_success(
      period:1d, window:30d,
      metric:first_seen, agg:min,
      principal.user.userid:$user_id, target.user.userid:$service_account))

    $principal_user_id = array_distinct($user_id)
    $target_service_account = array_distinct($service_account)

  condition:
    $e and ($historical_threshold = 0)
}

 

rule raju_network_traffic {
   meta:
       description = "Users with total bytes transferred less than 50% of average in a month"
   events:
       (($byte.network.received_bytes > 0 AND $byte.network.received_bytes < 1000000000000000) 
          OR ($byte.network.sent_bytes > 0 AND $byte.network.sent_bytes < 1000000000000000))
       $byte.principal.user.userid = $userid
   match:
       $userid over 3h
   outcome:
       $cnt_bytes_tranferred = sum($byte.network.received_bytes + 
           $byte.network.sent_bytes)
       $avg_bytes_tranferred = max(metrics.network_bytes_total(
           period:1d, window:30d, metric:value_sum, agg:avg,
           principal.user.userid:$userid ))
       $threshold = $avg_bytes_tranferred * 1.5   
   condition:
       $byte and ($cnt_bytes_tranferred > $threshold)
}

Thank you very much @rajukg. Thereยดs a lot to consider here. I will check out those approaches. Thanks for now!