Example probably works best.
a|b|c
needs to become array('a', 'b', 'c')
a|\||\}
needs to become array('a', '\|', '\}')
ab\}aaa|ae\|aa
needs to become array('ab\}aaa', 'ae\|aa')
The string that's going to be transformed can have any type of characters, but there are 3 "special" characters that can be interpreted as a straightforward character, only if it is escaped with \
. |
separates an option but, if escaped, needs to be interpreted as an option or a part of it (like any other character). { and }
are always going to be escaped at this point.
The catch is that I need to do this without using regular expressions.
I have been struggling with this one for 10 hours, and I sure hope anyone has a simple answer to this.
***Edit
My plan was to search for a |
and if found, check if it is escaped. If yes, then continue searching for the next one. When I find |
, I would take out the first option of the string, and continue the same way, until there were no |
left.
while ($positionFound != 1) {
$intPrevPosition = $intPosition;
$intPosition = strpos($strTemp, '|', $intPosition);
if ($intPosition === false || (substr_count($strTemp, '|') == 1 && $strTemp{$intPosition + $intPrevPosition - 1} == '\\')) {
$arrOptions[] = $strTemp;
$positionFound = 1;
}
elseif ($strTemp{$intPosition + $intPrevPosition - 1} != '\\') {
$intPosition = $intPrevPosition + $intPosition;
$arrOptions[] = substr(substr($strTemp, 0, $intPosition + 1), 0, -1);
$strTemp = substr($strTemp, $intPosition + 1);
$intPosition = 0;
}
}
Write a simple parser:
$input = "ab\\}aaa|ae\\|aa"; // ab\}aaa|ae\|aa
$token = "";
$last_char = "";
$len = strlen($input);
$tokens = array();
for ($i = 0; $i < $len; $i += 1) {
$char = $input[$i];
if ($char === "|" && $last_char !== "\\") {
$tokens[] = $token;
$token = "";
}
$token .= $char;
$last_char = $char;
}
$tokens[] = $token; // capture last token
var_dump($tokens);
// array('ab\}aaa', 'ae\|aa')
Note that with this implementation the escape also triggers on: ab\\|cd
, the output is array("ab\\|cd")
and not array("ab\\", "cd")
.
Nested parser
For easy of understanding I'm going to forget about the \
rules for now.
Assume you have: a{b|c}|{d|e}
and the expected output is: abd, abe, acd, ace
First what you gotta do is translate a{b|c}|{d|e}
into:
array(
"a",
array("b", "c")
array("d", "e")
)
If the input is ab{cd|ef}|{gh|ij}
we want:
array(
"ab",
array("cd", "ef")
array("gh", "ij")
)
And ofcourse multiple levels of nesting should also work: a{b|{c|d}}|e
array(
"a",
array("b", array("c", "d"))
"e"
)
Here is the parse function. I hadn't quite figure out how to combine it back together yet
function parse($string, $i = 0) {
$token = "";
$tokens = array();
for (; $i < strlen($string); $i += 1) {
$char = $string[$i];
if ($char === "{") {
if ($token !== "") {
$tokens[] = $token;
}
$token = "";
$parse = parse($string, $i + 1);
$tokens[] = $parse["token"];
$i = $parse["index"];
continue;
}
if ($char === "}") {
// end of this part
if ($token !== "") {
$tokens[] = $token;
}
return array(
"token" => $tokens,
"index" => $i
);
}
if ($char === "|") {
if ($token !== "") {
$tokens[] = $token;
}
$token = "";
continue;
}
$token .= $char;
}
return $tokens;
}
var_dump(parse("ab{cd|ef}|{gh|ij}"));