Solved: Logstash Conditional Parsing Issue in CEF Syslog F...

Sarthakd25 · 04-09-2025 12:30 AM

I am working on a Logstash parser for CEF-formatted syslogs, which includes both short and long formats. The long format contains a status field, whereas the short format does not.
I extract CEF event attributes using the following Logstash filter:

match => {
"message" => [
"%{abc} CEF: (xxxxxxxxxxx)\\|%{GREEDYDATA:cef_event_attributes}"
]
}
overwrite => ["message"]
}

kv {
source => "cef_event_attributes"
field_split => "|"
value_split => "="
target => "cef_fields"
}

Issue: The cef_fields.status field exists only in the long log format. However, when applying conditional logic to process this field, Logstash throws the following error:
"generic::invalid_argument: pipeline failed: filter conditional (13) failed: failed to evaluate expression: generic::invalid_argument: 'cef_fields.status' not found in state data"
Current Conditional Logic:

if [cef_fields][status] != "" { # Error occurs here if 'status' does not exist
if [cef_fields][status] in ["xxxxxx", "xxxxxxxx"] {
mutate {
replace => { "xxxxxx" => "xxxxxx" }
}
}
else if [cef_fields][status] == "xxxxxxx" {
mutate {
replace => { "xxxxxxxx.xxxxxxx" => "xxxxxx" }
}
}
else {
mutate {
replace => { "xxxxxxxx.xxxxxx" => "xxxxxxxxxxxxx" }
}
}
}

Excepted Behaviour: If cef_fields.status exists, apply the parsing logic. If it does not exist, the pipeline should continue without errors.

How can I correctly check for the existence of cef_fields.status before applying conditions to avoid the "not found in state data" error?

chrisd2

I see that you added the initialization snippet, but you should put it at the very beginning of the parser, just below the "filter {" line.
Where you put it in your pasted code, it just overwrites the parsing done by grok / kv, this is why you have unexpected behavior.

View solution in original post

chrisd2

Hello @Sarthakd25 ,

The (best?) good practice regarding CBN parsers is to initialize at the beginning of the parser, before any parsing and mapping operations, all the variables that you will use in your parser, as empty strings. This way you can use them later to test if some field was extracted from the raw log without causing the "not found in state data" error.
e.g :

filter {
  # Initialize tokens
  mutate {
    replace => {
      "token1" => ""
      "token2" => ""
      "cef_fields.status" => ""
    }
  }

  # Then you can extract from raw log, it will assign values to the tokens you initialized, and they will be left as empty strings if not extracted.
  grok {
    [...]
  }

  # Then you can test if the value is present without errors
  if [cef_fields][status] != "" {
    [...]
  }
}

Regards

Sarthakd25

Thanks for the response @chrisd2 ,

My logic looks like below

filter {

grok {
match => {
"message" => [
"%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} CEF: (?P<header_version>[^|]+)\\|%{GREEDYDATA:cef_event_attributes}"
]
}
overwrite => ["message"]
}

kv {
source => "cef_event_attributes"
field_split => "|"
value_split => "="
target => "cef_fields"
}

mutate {
add_field => { "security_result" => "{}" }
}

mutate {

add_field => { "event1" => "{}" }

}

mutate {
replace => {
"cef_fields.status" => ""
"cef_fields.createdOn" => ""
}
}

if [cef_fields][status] != "" {
if [cef_fields][status] in ["CREATED", "IN_PROGRESS"] {
mutate {
replace => { "security_result.threat_status" => "ACTIVE" }
}
}
else if [cef_fields][status] == "RESOLVED" {
mutate {
replace => { "security_result.threat_status" => "CLEARED" }
}
}
else {
mutate {
replace => { "security_result.threat_status" => "THREAT_STATUS_UNSPECIFIED" }
}
}
}

mutate {
merge => {
"event1.idm.read_only_udm.security_result" => "security_result"
}
}

if[cef_fields][createdOn] != "" {
date {
match => ["cef_fields.createdOn", "yyyy-mm-dd HH:mm:ss"]
target => "event1.idm.read_only_udm.metadata.collected_timestamp"
on_error => "time_stamp_wrong_format"
}
}

now what it is doing for the short log format which is not having status and createdOn field and is working fine but for long format log which is having these values is not getting parsed like security_result.threat_status etc is not getting replaced, where am I getting it wrong?

chrisd2

I see that you added the initialization snippet, but you should put it at the very beginning of the parser, just below the "filter {" line.
Where you put it in your pasted code, it just overwrites the parsing done by grok / kv, this is why you have unexpected behavior.

Sarthakd25

Hi @chrisd2
Thank you for your help! Your solution worked perfectly.

Check out the new Professional Security Operations Engineer certification beta!

Logstash Conditional Parsing Issue in CEF Syslog Format