从正则表达式获得比括号更多的反向引用

Ok this is really difficult to explain in English, so I'll just give an example.

I am going to have strings in the following format:

key-value;key1-value;key2-...

and I need to extract the data to be an array

array('key'=>'value','key1'=>'value1', ... )

I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:

/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/

to work with preg_match and this code:

for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
    $parameters[$matches[$i]] = $matches[$i+1];
}

However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.

In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.

You can use a lookahead to validate the input while you extract the matches:

/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/

(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.

This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.

If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

regex is powerful tool, but sometimes, its not the best approach.

$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
    $e = explode("-",$k);
    $array[$e[0]]=$e[1];
}
print_r($array);

Use preg_match_all() instead. Maybe something like:

$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';

preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
   $parameters[$match[1]] = $match[2];
}

print_r($parameters);

EDIT:

to first validate if the input string conforms to the pattern, then just use:

if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       

EDIT2: the final semicolon is optional

if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       

what about this solution:

$samples = array(
    "good" => "key-value;key1-value;key2-value;key5-value;key-value;",
    "bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
    "bad2" => "key;key1-value;key2-value;key5-value;key-value;",
    "bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);

foreach($samples as $name => $value) {
    if (preg_match("/^(\w+-\w+;)+$/", $value)) {
        printf("'%s' matches
", $name);
    } else {
        printf("'%s' not matches
", $name);
    }
}

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.