Breaking out grouped data into distinct rows

I'm working on a search string that will eventually (hopefully) be used in API that will query our Tenable log source type and export the results directly to JIRA to be remediated by our infrastructure team. 

The problem I'm having is that the query I've built below has some columns returning multiple values. So for instance, the current search is saying vulnerability X is showing up on asset A,B,C,D but it is placing A,B,C,D in the same cell. What I need is for vulnerability X to be broken out into separate rows for each asset:

row 1. Vuln X | asset A

row 2. Vuln X | asset B etc. 

 "Plugin ID" (which is Tenable's version of the CVE) is the field I'd like to match on. 

Search String:

events:

metadata.log_type = "TENABLE_IO" AND security_result.detection_fields.value != "INFO" security_result.rule_id = $pluginID

match:

$pluginID

outcome:

$Description = array_distinct(extensions.vulns.vulnerabilities.description)

$FQDN = array_distinct(principal.hostname)

$IP_Address = array_distinct(principal.asset.ip)

$Name = array_distinct(extensions.vulns.vulnerabilities.name)

$NetBios = array_distinct(security_result.detection_fields.value)

$OS = array_distinct(principal.asset.platform_software.platform_version)

$Plugin_ID = array_distinct(security_result.rule_id)

$Plugin_Output = array_distinct(security_result.detection_fields.value)

$Risk = array_distinct(extensions.vulns.vulnerabilities.severity)

//$See_Also = array_distinct()

$Solution = array_distinct(security_result.description)

$Synopsis = array_distinct(security_result.summary)

$Vulnerability_State = array_distinct(security_result.detection_fields.value)

$Asset_UUID = array_distinct(principal.asset.product_object_id)

Solved Solved
0 3 149
1 ACCEPTED SOLUTION

The root of the issue lies in how YaraL-2 handles repeated fields and the match section. When a repeated field, like principal.asset.ip, is used in the outcome section of a rule with a match clause, it undergoes an implicit unnesting process. This means that if an event has multiple IP addresses associated with it, YaraL-2 creates separate internal rows for each IP address, duplicating the other fields from that event.

In this case, the match clause is $pluginID, which groups events by plugin ID. Because principal.asset.ip is a repeated field, and likely contains multiple assets for the same vulnerability, the aggregation in the outcome using array_distinct($IP_Address) will still combine all unique IPs for that plugin ID into a single array element.

To achieve the desired output, where each vulnerability is listed on a separate row for each associated asset, you need to modify the match section to include the asset identifier. Here's how the rule can be adjusted:

events:
  metadata.log_type = "TENABLE_IO" 
  security_result.detection_fields.value != "INFO" 
  $pluginID = security_result.rule_id
  $assetID = principal.asset.product_object_id

match:
  $pluginID, $assetID  // Group by Plugin ID AND Asset ID

outcome:
  $Description = extensions.vulns.vulnerabilities.description
  $FQDN = principal.hostname
  $IP_Address = principal.asset.ip  // No need for array_distinct as it's now a 1:1 relationship
  $Name = extensions.vulns.vulnerabilities.name
  $NetBios = security_result.detection_fields.value
  $OS = principal.asset.platform_software.platform_version
  $Plugin_ID = security_result.rule_id
  $Plugin_Output = security_result.detection_fields.value
  $Risk = extensions.vulns.vulnerabilities.severity
  //$See_Also = array_distinct() 
  $Solution = security_result.description
  $Synopsis = security_result.summary
  $Vulnerability_State = security_result.detection_fields.value
  $Asset_UUID = principal.asset.product_object_id

By including $assetID in the match clause, the grouping will now occur for each unique combination of plugin ID and asset ID. As a result, the $IP_Address variable will only contain the single IP address associated with that specific plugin ID and asset ID combination, effectively creating the desired one-to-one relationship and separate rows in the output. Note that array_distinct is no longer needed for most fields since there should now be a 1:1 relationship.

Remember that in a rule with a match section, if an outcome variable depends on a repeated field, or a field of an event variable that is part of a match, an aggregation function must be used.

View solution in original post

3 REPLIES 3

The root of the issue lies in how YaraL-2 handles repeated fields and the match section. When a repeated field, like principal.asset.ip, is used in the outcome section of a rule with a match clause, it undergoes an implicit unnesting process. This means that if an event has multiple IP addresses associated with it, YaraL-2 creates separate internal rows for each IP address, duplicating the other fields from that event.

In this case, the match clause is $pluginID, which groups events by plugin ID. Because principal.asset.ip is a repeated field, and likely contains multiple assets for the same vulnerability, the aggregation in the outcome using array_distinct($IP_Address) will still combine all unique IPs for that plugin ID into a single array element.

To achieve the desired output, where each vulnerability is listed on a separate row for each associated asset, you need to modify the match section to include the asset identifier. Here's how the rule can be adjusted:

events:
  metadata.log_type = "TENABLE_IO" 
  security_result.detection_fields.value != "INFO" 
  $pluginID = security_result.rule_id
  $assetID = principal.asset.product_object_id

match:
  $pluginID, $assetID  // Group by Plugin ID AND Asset ID

outcome:
  $Description = extensions.vulns.vulnerabilities.description
  $FQDN = principal.hostname
  $IP_Address = principal.asset.ip  // No need for array_distinct as it's now a 1:1 relationship
  $Name = extensions.vulns.vulnerabilities.name
  $NetBios = security_result.detection_fields.value
  $OS = principal.asset.platform_software.platform_version
  $Plugin_ID = security_result.rule_id
  $Plugin_Output = security_result.detection_fields.value
  $Risk = extensions.vulns.vulnerabilities.severity
  //$See_Also = array_distinct() 
  $Solution = security_result.description
  $Synopsis = security_result.summary
  $Vulnerability_State = security_result.detection_fields.value
  $Asset_UUID = principal.asset.product_object_id

By including $assetID in the match clause, the grouping will now occur for each unique combination of plugin ID and asset ID. As a result, the $IP_Address variable will only contain the single IP address associated with that specific plugin ID and asset ID combination, effectively creating the desired one-to-one relationship and separate rows in the output. Note that array_distinct is no longer needed for most fields since there should now be a 1:1 relationship.

Remember that in a rule with a match section, if an outcome variable depends on a repeated field, or a field of an event variable that is part of a match, an aggregation function must be used.

I very much appreciate your help here -- I was able to run the search successfully after aggregating some of the outcome variables per your recommendation. Now I need to start digging into how the API portion is going to work ๐Ÿ™‚ 

Good luck. If you have API questions, just post here and I'll try to help. It can be a steep learning curve.