The Parser syntax reference documentation
says the following:
>Both the literal backslash and the backslash that indicates that it should be "escaped" must themselves be escaped, so if you want to refer to a literal backslash you need four backslashes ( \\).
"fieldname3", "[\\\\?#-]", "."
What does this mean? Why do you need four backslashes to refer to a literal backslash.
The same doc also says:
> Regular expression syntax for parsers uses two slashes instead of the single slash for typical regex pattern matches.
What does the sentence above mean. Is there a different way to say this that makes the point more clear.
Solved! Go to Solution.
They are just trying to indicate that there are two levels of escaping going on. The first level is the parser itself. The second is the RE2 regular expression engine. Four consecutive backslashes in the parser config(////) evaluates to two consecutive backslashes (//) being sent to the RE2 regular expression engine. The regex engine in turn recognizes two consecutive backslashes as a single, literal, backslash.
The second excerpt you provided is attempting to reflect the same concept. That is, if the intent is for the RE2 regex engine to receive a character class such as \s, then it needs to be represented in the parser config as \\s. The parsing engine converts that to \s, and the RE2 engine sees \s as intended. I do think it is erroneous in that it incorrectly refers to backslashes as slashes. That's a mistake I will report. Other than that, though, I think it conveys the functionality pretty well.
Yes. Your last comment: "every `\` in regex engine is represented as double backslash ('\\') in parser engine." is a very good practical way to look at it.
Can anyone please help me on this.
Thank you!
They are just trying to indicate that there are two levels of escaping going on. The first level is the parser itself. The second is the RE2 regular expression engine. Four consecutive backslashes in the parser config(////) evaluates to two consecutive backslashes (//) being sent to the RE2 regular expression engine. The regex engine in turn recognizes two consecutive backslashes as a single, literal, backslash.
The second excerpt you provided is attempting to reflect the same concept. That is, if the intent is for the RE2 regex engine to receive a character class such as \s, then it needs to be represented in the parser config as \\s. The parsing engine converts that to \s, and the RE2 engine sees \s as intended. I do think it is erroneous in that it incorrectly refers to backslashes as slashes. That's a mistake I will report. Other than that, though, I think it conveys the functionality pretty well.
Hi @herrald
Thank you for your update.
I think i am inching closer to an understanding. so i guess the key is the following:
> Regular expression syntax for parsers uses two slashes instead of the single slash for typical regex pattern matches.
But given the above, if the intent is for the RE2 engine to recieve a whitespace character \s, then shouldn't it be represented in the parser config as \\\s. The parsing engine then strips away the first two backslashes (\\) and then sends \s to the re2 engine. The re2 engine then interprets \s as a whitespace character.
Please let me know where my train of thought is going wrong here.
Thank you!
Or is it that every `\` in regex engine is represented as double backslash (`\\`) in parser engine.
Yes. Your last comment: "every `\` in regex engine is represented as double backslash ('\\') in parser engine." is a very good practical way to look at it.
Thank you!