双转义十六进制字符,即\\ x80 - \\ xFF

I have finally started to understand the context behind escaping hexadecimal characters such as \x80. The documentation talks about the escape sequences, but I can also see that some regular expression use double backslashes such as \\x80 - \\xFF.

What's the difference between \\x80 - \\xFF and \x80 - \xFF when using something like preg_replace ?

When using preg_ functions, your string is parsed twice - first, by php compiler, and then by the PCRE engine. So if you have, for example:

preg_match("/\x80/"....)

the compiler turns it into

preg_match("/�/"....) // let � be chr(80)

and passes this to PCRE. When you have two slashes:

preg_match("/\\x80/"....)

the compiler turns the string into

preg_match("/\x80/"....)

and then it's the PCRE engine that converts this to the literal character .

It doesn't make a difference in this particular case, but consider:

preg_match("/\x5B/"....)

after compilation

preg_match("/[/"....)

and PCRE fails, because of the dangling metacharacter [. Now if you escape the slash

preg_match("/\\x5B/"....)

it's compiled to

preg_match("/\x5B/"....)

which makes PCRE happy, because it understands that [ should be taken literally.

How exactly php compiles your string depends on the quotes you use: double/single/heredocs/nowdocs. See docs for details. A simple rule of thumb is to use single quotes when possible, if you have to use doubles (for variable interpolation), escape everything twice, even if there's technically no need (e.g "\\b$word\\b").

To write hex x80, you use \ and that way you get \x80.
Now in PHP string \ escapes special characters. In string "$var" PHP will try to insert variable $var in that string (because string uses ". To escape $ you write "\$var" and output will be just simple string $var.
Now to write \ in string (no matter if it uses " or ') you use same escaping character \. So it becomes \\ to output \.

If you write "\x80" your output will be "x80" (without \). Than you escape \ with another \ => "\\x80" outputs "\x80".

So to summarize everything:
\x80 is hex character, and when you write it inside string, you write \\x80.

Just some fun:

PHP that outputs js function to alert \x80:

echo "function alertHex(){
    alert('\\\\x80 - \\\\xFF');
}";

Why 4 x \? First you escape PHP string to get alert('\\x80 - \\xFF'), that you escape JS string to get \x80 - \xFF.
Same with preg_replace: Allowed symbols: \, $, a-z, [, ]: patern: \\\$[a-z]\[\]; preg_replace('\\\\\$[a-z]\\[\\]', '', $str);