如何扩展正则表达式以查找多个匹配项?

This is my current regex (used in parsing an iCal file):

/(.*?)(?:;(?=(?:[^"]*"[^"]*")*[^"]*$))([\w\W]*)/

The current output using preg_match() is this:

//Output 1 - `preg_match()`
Array
(
    [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
    [1] => VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
)

I would like to extend my regex to output this (i.e. find multiple matches):

//Output 2
Array
(
    [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
    [1] => VALUE=DATE
    [2] => RSVP=FALSE
    [3] => LANGUAGE=en-gb
)    

The regex should search for each semicolon not contained within a quoted substring and provide that as a match.


Cannot just swap to preg_match_all() as gives this unwanted output

//Output 3 - `preg_match_all()`
Array
(
    [0] => Array
        (
            [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London";VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
        )

    [1] => Array
        (
            [0] => TZID="Greenwich Mean Time:Dublin; Edinburgh; Lisbon; London"
        )

    [2] => Array
        (
            [0] => VALUE=DATE;RSVP=FALSE;LANGUAGE=en-gb
        )

)
(.+?)(?:;(?=(?:[^"]*"[^"]*")*[^"]*$)|$)

Try this.See demo.

https://regex101.com/r/pG1kU1/18

You can use the following to match:

(.*?(?:;|$))(?![^"]*")

See DEMO

or split by:

;(?![^"]*")

See DEMO

You need to use preg_match_all to get all the match of the string.

The pattern you use isn't designed to get several results since [\w\W]* matches everything until the end of the string.
But it's only one of your problems, a pattern designed like this need to check (for each colon) if the number of quotes is odd or even until the end of the file!: (?=(?:[^"]*"[^"]*")*[^"]*$). Imagine a minute how many times the whole string is parsed with this lookahead.

To avoid the problem, you can use a different approach that doesn't try to find colons, but that tries to describe everything that is not a colon: So you are looking for every parts of text that doesn't contains quotes or colon + quoted parts whatever the content.

You can use this kind of pattern:

$pattern = '~[^
";]+(?:"[^"\\\]*(?:\\\.[^"\\\]*)*"[^
";]*)*~';

if (preg_match_all($pattern, $str, $matches))
    print_r($matches[0]);

pattern details:

~           # pattern delimiter
[^
";]+  #" # all that is not a newline, a double quote or a colon
(?:         # non-capturing group: to include eventual quoted parts
    "                  #"# a literal quote
    [^"\\\]*           #"# all that is not a quote or a backslash
    (?:\\\.[^"\\\]*)*  #"# optional group to deal with escaped characters
    "                  #"#
    [^
";]*         #"# 
)*          # repeat zero or more times 
~

demo