Parser for field extraction and concatenation

Hi, I have a nested json data from a log source. It has a URL field but it can't be directly mapped to "url_back_to_product". We need to extract a part of it and concatenate it to a standard Top Level Domain

Sample json


"info": {
"data": "https://myvendor.com:443/string/string/string-abcd1-abcd2-abcd3-abcd4-abcd5" ,
  }

Only the final part of the URL (string-abcd1-abcd2-abcd3-abcd4-abcd5 )has to be extracted and concatenated to a standard static tld - https://mycompany.com/strings/ 

Field Extraction

url_back_to_product - https://mycompany.com/strings/string-abcd1-abcd2-abcd3-abcd4-abcd5  

Please advise on the possible configurations.

Solved Solved
0 3 309
1 ACCEPTED SOLUTION

Hi @Aswin_Asokan 

You'll need to use the GROK function to match on the value you want to capture. This can be assigned to a field which is stored in state and used to concat on a static string.

Grok - https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Supported patterns - https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

GROK is built on top of regex, so if a pattern doesn't exist you can write your own to use for the capture group.

1. Add a Grok function and use regex to create a pattern and capture group to store the required string

2. Use that new field to append on the known URL string and assign that concat string to the UDM field.

Below is an example of using regex to capture a value and assign it to the capture group `url_back_to_product`. That value is then assigned to the UDM field by appending it to the static URL

e.g

grok {
match => {
"info.data" => "https://myvendor.com:443/\\S+/\\S+/(?P<url_back_to_product>.+)"
}
on_error => "no_url_back_to_product"
overwrite => ["url_back_to_product"]
}
mutate {
replace => {
"event.idm.read_only_udm.metadata.url_back_to_product" => "https://mycompany.com/strings/%{url_back_to_product}"
}
on_error => "url_back_to_product"
}

 

View solution in original post

3 REPLIES 3

Field extractors have some limitations, what you are wanting to do can only be done with a CBN snippet.

You might want to take a look at the split function: https://cloud.google.com/chronicle/docs/reference/parser-syntax#split_function

Hi @Aswin_Asokan 

You'll need to use the GROK function to match on the value you want to capture. This can be assigned to a field which is stored in state and used to concat on a static string.

Grok - https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Supported patterns - https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

GROK is built on top of regex, so if a pattern doesn't exist you can write your own to use for the capture group.

1. Add a Grok function and use regex to create a pattern and capture group to store the required string

2. Use that new field to append on the known URL string and assign that concat string to the UDM field.

Below is an example of using regex to capture a value and assign it to the capture group `url_back_to_product`. That value is then assigned to the UDM field by appending it to the static URL

e.g

grok {
match => {
"info.data" => "https://myvendor.com:443/\\S+/\\S+/(?P<url_back_to_product>.+)"
}
on_error => "no_url_back_to_product"
overwrite => ["url_back_to_product"]
}
mutate {
replace => {
"event.idm.read_only_udm.metadata.url_back_to_product" => "https://mycompany.com/strings/%{url_back_to_product}"
}
on_error => "url_back_to_product"
}

 

Hi @alube @cbryant, Thanks for your kind inputs. I took the GROK function approach and utilized URIPATH and regex to capture the required values. 

if [info][data] != "" {
grok {
match => {
"info.data" => "%{URIPATH:full_path}(?<url_back_to_product>incident-[^/]+)"
}
on_error => "no_url_back_to_product"
overwrite => ["url_back_to_product"]
}
mutate {
replace => {
"event.idm.read_only_udm.metadata.url_back_to_product" => "https://mycompany.com/strings/%{url_back_to_product}"
}
on_error => "url_back_to_product"
}
}

Many Thanks again!