Parse timestamp out of CEF event

Hello everyone,

I'm triying to parse the timestamp out of a CEF event.

For example, we have the following Raw Event:

<135>CEF:0|Trend Micro|Deep Discovery Inspector|6.7.1077|300102|All components are up-to-date|2|dvc=10.250.1.24 dvcmac=20:88:10:C9:96:00 dvchost=sde01ddi02 deviceExternalId=7A00A6A455A4-461C84B6-5A34-114C-25B7 rt=Oct 21 2024 17:04:06 GMT+02:00 duser=SYSTEM outcome=Success

So the time, the events got created is located inside "rt" key value pair.

I already have the following parser, but unfortunately it's not extracting the timestamp to the metadata.event_timestamp UDM field

 

 
filter {
    mutate {
        replace => {
            "rt" => ""
            "header_version" => ""
            "organization" => ""
            "product_version" => ""
            "event_id" => ""
            "event_name" => ""
            "sev" => ""
            "cef_event_attributes" => ""
            "cat" => ""
            "act" => ""
            "action" => ""
            "security_result" => ""
            "shost" => ""
            "src" => ""
            "spt" => ""
            "smac" => ""
            "dvchost" => ""
            "dhost" => ""
            "dst" => ""
            "dpt" => ""
            "dmac" => ""
            "dvc" => ""
            "about" => ""
            "proto" => ""
            "suid" => ""
            "fsize" => ""
            "cs6" => ""
            "cs6Label" => ""
            "has_principal" => "false"
            "has_target" => "false"
        }
    }

    grok {
        match => {
            "message" => [
                "<%{INT}>CEF:(?P<header_version>[^|]+)\\|(?P<organization>[^\\|]+)\\|(?P<log_type>[^\\|]+)\\|(?P<product_version>[^\\|]+)\\|(?P<event_id>[^\\|]+)\\|(?P<event_name>[^\\|]+)\\|(?P<sev>[^\\|]+)\\|%{GREEDYDATA:cef_event_attributes}"
            ]
        }
        overwrite => [
            "rt",
            "header_version",
            "organization",
            "log_type",
            "product_version",
            "event_id"
            "cef_event_attributes",
            "event_name",
            "sev"
        ]
        on_error => "invalid_grok"
    }

    if [cef_event_attributes] != "" {
    mutate {
        gsub => ["cef_event_attributes","\\\\\\\\","\\"]
    }
    mutate {
        gsub => ["cef_event_attributes", "(\\\s+)([0-9a-zA-Z_.-]+?)=", "^$2="]
    }
        kv {
            source => "cef_event_attributes"
            field_split => "^"
            value_split => "="
            on_error => "invalid_kv1"
      }
  }

  if [rt] != "" {
    date {
      match => ["rt", "RFC3339", "MMM d HH:mm:ss"]
      target => "metadata.event_timestamp"
      rebase => true
      on_error => "date_error"
    }
  }
 
[...]
 
I have been able to parse almost all fields, except the timestamp.
Maybe someone can help.
 
Greetings
Jan
Solved Solved
0 2 400
1 ACCEPTED SOLUTION

Headed into the `date` function the value of rt is formatted: 

Oct 21 2024 17:04:06 GMT+02:00 

Which includes the year and timezone which isn't accounted for in your match: "MMM d HH:mm:ss"

You can add yyyy to the match in order to capture the year: 
"MMM d yyyy HH:mm:ss"

To address the timezone you also need to add "ZZ" to the match; but we have additional work since the match engine does recognize "+02:00" as a valid timezone format, but does not recognize "GMT+02:00". 

Splitting that out and fully handling it with support for different timezones is surprisingly complex, but there's a good example of it in the default TRENDMICRO_AV parser if you want to investigate.

If your data consistently shows GMT+|-<offset> you can do a more limited solution that only covers that specific format by using gsub to remove "GMT" from the string.  Additionally we only want to remove 'GMT' if its followed by the offset, and re2 doesn't support lookaheads; so we need to wrap the gsub in an IF statement to get the full effect. 

  if [rt] != "" {
    if [rt] =~ " GMT[\+-]" {
        mutate { gsub => ["rt", " GMT", " "]}
    }
    date {
      match => ["rt", "MMM d yyyy HH:mm:ss ZZ"]
      on_error => "date_error"
    }
  }

In my example I also removed the target and rebase arguments.  Target defaults to event_timestamp and doesn't need to be specified unless you are writing to a different timestamp.  Rebase is used when there isn't a year specified in the timestamp, your data includes a year so this shouldn't be used.

View solution in original post

2 REPLIES 2

Headed into the `date` function the value of rt is formatted: 

Oct 21 2024 17:04:06 GMT+02:00 

Which includes the year and timezone which isn't accounted for in your match: "MMM d HH:mm:ss"

You can add yyyy to the match in order to capture the year: 
"MMM d yyyy HH:mm:ss"

To address the timezone you also need to add "ZZ" to the match; but we have additional work since the match engine does recognize "+02:00" as a valid timezone format, but does not recognize "GMT+02:00". 

Splitting that out and fully handling it with support for different timezones is surprisingly complex, but there's a good example of it in the default TRENDMICRO_AV parser if you want to investigate.

If your data consistently shows GMT+|-<offset> you can do a more limited solution that only covers that specific format by using gsub to remove "GMT" from the string.  Additionally we only want to remove 'GMT' if its followed by the offset, and re2 doesn't support lookaheads; so we need to wrap the gsub in an IF statement to get the full effect. 

  if [rt] != "" {
    if [rt] =~ " GMT[\+-]" {
        mutate { gsub => ["rt", " GMT", " "]}
    }
    date {
      match => ["rt", "MMM d yyyy HH:mm:ss ZZ"]
      on_error => "date_error"
    }
  }

In my example I also removed the target and rebase arguments.  Target defaults to event_timestamp and doesn't need to be specified unless you are writing to a different timestamp.  Rebase is used when there isn't a year specified in the timestamp, your data includes a year so this shouldn't be used.

works grat, thanks a lot!