I'm unsure if I should be using preg_match, preg_match_all, or preg_split with delim capture. I'm also unsure of the correct regex.
Given the following:
$string = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
I want to get an array with the following elems:
[0] = "ok"
[1] = "that\'s"
[2] = "yeah that's \"cool\""
You can not do this with a regular expression because you're trying to parse a non-context-free grammar. Write a parser.
Outline:
\
remember it."
or '
check if the previous character was \
. You now have your delimiting condition.Your desired result set seems to trim spaces, you also lost a couple of the \
s, perhaps this is a mistake but it can be important.
I would expect:
[0] = " ok " // <-- spaces here
[1] = "that\\'s cool"
[2] = " \"yeah that's \\\"cool\\\"\"" // leading space here, and \" remains
Actually, you might be surprised to find that you can do this in regex:
preg_match_all("((?|\"((?:\\\\.|[^\"])+)\"|'((?:\\\\.|[^'])+)'|(\w+)))",$string,$m);
The desired result array will be in $m[1]
.
You can do it with a regex:
$pattern = <<<'LOD'
~
(?J)
# Definitions #
(?(DEFINE)
(?<ens> (?> \\{2} )+ ) # even number of backslashes
(?<sqc> (?> [^\s'\\]++ | \s++ (?!'|$) | \g<ens> | \\ '?+ )+ ) # single quotes content
(?<dqc> (?> [^\s"\\]++ | \s++ (?!"|$) | \g<ens> | \\ "?+ )+ ) # double quotes content
(?<con> (?> [^\s"'\\]++ | \s++ (?!["']|$) | \g<ens> | \\ ["']?+ )+ ) # content
)
# Pattern #
\s*+ (?<res> \g<con>)
| ' \s*+ (?<res> \g<sqc>) \s*+ '?+
| " \s*+ (?<res> \g<dqc>) \s*+ "?+
~x
LOD;
$subject = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
var_dump($match['res']);
}
I made the choice to trim spaces in all results, then " abcd "
will give abcd
. This pattern allows all backslashes you want, anywhere you want. If a quoted string is not closed at the end of the string, the end of the string is considered as the closing quote (this is why i have made the closing quotes optional). So, abcd " ef'gh
will give you abcd
and ef'gh