I am trying to learn Regex in PHP and stuck in here now. My ques may appear silly but pls do explain.
I went through a link:
Extra backslash needed in PHP regexp pattern
But I just could not understand something:
In the answer he mentions two statements:
2 backslashes are used for unescaping in a string ("\\\\"
-> \\
)
1 backslash is used for unescaping in the regex engine (\\
-> \
)
My ques:
what does the word "unescaping" actually means? what is the purpose of unescaping? Why do we need 4 backslashes to include it in the regex?
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which '
ends the string and which is part of the string. That's where \
comes in:
$str = '\'';
It escapes the special meaning of '
, so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like as well.
This now means that \
is a special character with a special meaning. To escape this conundrum when you want to write a literal \
, you'll have to escape literal backslashes as \\
:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/
. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\
is first reduced to one \
by the PHP string escaping mechanism, so the actual regex is /\\/
, which is a regex which means "one backslash".
I think you can use "preg_quote()":
This function escapes special chars, so you can give an input as it is, without escaping by yourself:
<?php
$string = "online 24/7. Only for \o/";
$escaped_string = preg_quote($string, "/"); // 2nd param is optional and used if you want to escape also the delimiter of your regex
echo $escaped_string; // $escaped_string: "online 24\/7. Only for \\o\/"
?>