Solved: escaping backslash

mountaincode2 · 01-29-2024 04:21 AM

The Parser syntax reference documentation

https://cloud.google.com/chronicle/docs/reference/parser-syntax#gsub_function

says the following:

>Both the literal backslash and the backslash that indicates that it should be "escaped" must themselves be escaped, so if you want to refer to a literal backslash you need four backslashes ( \\).

"fieldname3", "[\\\\?#-]", "."

What does this mean? Why do you need four backslashes to refer to a literal backslash.

My understanding:

If you want to search for a literal backslash within your text, you need to escape it to remove its "escape character" function. This can be done by using a backslash before the backslash: \\.

The same doc also says:

> Regular expression syntax for parsers uses two slashes instead of the single slash for typical regex pattern matches.

What does the sentence above mean. Is there a different way to say this that makes the point more clear.

herrald

They are just trying to indicate that there are two levels of escaping going on. The first level is the parser itself. The second is the RE2 regular expression engine. Four consecutive backslashes in the parser config(////) evaluates to two consecutive backslashes (//) being sent to the RE2 regular expression engine. The regex engine in turn recognizes two consecutive backslashes as a single, literal, backslash.

The second excerpt you provided is attempting to reflect the same concept. That is, if the intent is for the RE2 regex engine to receive a character class such as \s, then it needs to be represented in the parser config as \\s. The parsing engine converts that to \s, and the RE2 engine sees \s as intended. I do think it is erroneous in that it incorrectly refers to backslashes as slashes. That's a mistake I will report. Other than that, though, I think it conveys the functionality pretty well.

View solution in original post

herrald

Yes. Your last comment: "every `\` in regex engine is represented as double backslash ('\\') in parser engine." is a very good practical way to look at it.

View solution in original post

mountaincode2

Can anyone please help me on this.

Thank you!

herrald

They are just trying to indicate that there are two levels of escaping going on. The first level is the parser itself. The second is the RE2 regular expression engine. Four consecutive backslashes in the parser config(////) evaluates to two consecutive backslashes (//) being sent to the RE2 regular expression engine. The regex engine in turn recognizes two consecutive backslashes as a single, literal, backslash.

The second excerpt you provided is attempting to reflect the same concept. That is, if the intent is for the RE2 regex engine to receive a character class such as \s, then it needs to be represented in the parser config as \\s. The parsing engine converts that to \s, and the RE2 engine sees \s as intended. I do think it is erroneous in that it incorrectly refers to backslashes as slashes. That's a mistake I will report. Other than that, though, I think it conveys the functionality pretty well.

mountaincode2

Hi @herrald

Thank you for your update.

I think i am inching closer to an understanding. so i guess the key is the following:

> Regular expression syntax for parsers uses two slashes instead of the single slash for typical regex pattern matches.

But given the above, if the intent is for the RE2 engine to recieve a whitespace character \s, then shouldn't it be represented in the parser config as \\\s. The parsing engine then strips away the first two backslashes (\\) and then sends \s to the re2 engine. The re2 engine then interprets \s as a whitespace character.

Please let me know where my train of thought is going wrong here.

Thank you!

mountaincode2

Or is it that every `\` in regex engine is represented as double backslash (`\\`) in parser engine.

herrald

Yes. Your last comment: "every `\` in regex engine is represented as double backslash ('\\') in parser engine." is a very good practical way to look at it.

mountaincode2

Thank you!

New SecOps Webinar May 14th! Learn about Gemini's generative AI within Google SecOps

escaping backslash