PHP转换字符串而不使用正则表达式

Example probably works best.

  • a|b|c needs to become array('a', 'b', 'c')
  • a|\||\} needs to become array('a', '\|', '\}')
  • ab\}aaa|ae\|aa needs to become array('ab\}aaa', 'ae\|aa')

The string that's going to be transformed can have any type of characters, but there are 3 "special" characters that can be interpreted as a straightforward character, only if it is escaped with \. | separates an option but, if escaped, needs to be interpreted as an option or a part of it (like any other character). { and } are always going to be escaped at this point.

The catch is that I need to do this without using regular expressions.

I have been struggling with this one for 10 hours, and I sure hope anyone has a simple answer to this.

***Edit

My plan was to search for a | and if found, check if it is escaped. If yes, then continue searching for the next one. When I find |, I would take out the first option of the string, and continue the same way, until there were no | left.

while ($positionFound != 1) {
            $intPrevPosition = $intPosition;
            $intPosition = strpos($strTemp, '|', $intPosition);
            if ($intPosition === false || (substr_count($strTemp, '|') == 1 && $strTemp{$intPosition + $intPrevPosition - 1} == '\\')) {
                $arrOptions[] = $strTemp;
                $positionFound = 1;
            }
            elseif ($strTemp{$intPosition + $intPrevPosition - 1} != '\\') {
                $intPosition = $intPrevPosition + $intPosition;
                $arrOptions[] = substr(substr($strTemp, 0, $intPosition + 1), 0, -1);
                $strTemp = substr($strTemp, $intPosition + 1);
                $intPosition = 0;
            }
        }

Write a simple parser:

$input = "ab\\}aaa|ae\\|aa"; // ab\}aaa|ae\|aa

$token = "";
$last_char = "";
$len = strlen($input);
$tokens = array();
for ($i = 0; $i < $len; $i += 1) {
    $char = $input[$i];
    if ($char === "|" && $last_char !== "\\") {
        $tokens[] = $token;
        $token = "";
    }
    $token .= $char;
    $last_char = $char;
}
$tokens[] = $token; // capture last token
var_dump($tokens);
// array('ab\}aaa', 'ae\|aa')

Note that with this implementation the escape also triggers on: ab\\|cd, the output is array("ab\\|cd") and not array("ab\\", "cd").


Nested parser

For easy of understanding I'm going to forget about the \ rules for now.

Assume you have: a{b|c}|{d|e} and the expected output is: abd, abe, acd, ace

First what you gotta do is translate a{b|c}|{d|e} into:

array(
    "a",
    array("b", "c")
    array("d", "e")
)

If the input is ab{cd|ef}|{gh|ij} we want:

array(
    "ab",
    array("cd", "ef")
    array("gh", "ij")
)

And ofcourse multiple levels of nesting should also work: a{b|{c|d}}|e

array(
    "a",
    array("b", array("c", "d"))
    "e"
)

Here is the parse function. I hadn't quite figure out how to combine it back together yet

function parse($string, $i = 0) {
    $token = "";
    $tokens = array();
    for (; $i < strlen($string); $i += 1) {
        $char = $string[$i];
        if ($char === "{") {
            if ($token !== "") {
                $tokens[] = $token;
            }
            $token = "";
            $parse = parse($string, $i + 1);
            $tokens[] = $parse["token"];
            $i = $parse["index"];
            continue;
        }
        if ($char === "}") {
            // end of this part
            if ($token !== "") {
                $tokens[] = $token;
            }
            return array(
                "token" => $tokens,
                "index" => $i
            );
        }
        if ($char === "|") {
            if ($token !== "") {
                $tokens[] = $token;
            }
            $token = "";
            continue;
        }
        $token .= $char;
    }
    return $tokens;
}
var_dump(parse("ab{cd|ef}|{gh|ij}"));