用于分解配方列表元素的RegEx语法

I am processing a list of recipe ingredients, an example of which looks like this:

Peanuts, Wheat Starch, Vegetable Oil, Modified Starch, Sugar, Mumbai Spice Flavour [Onion Powder, Herbs and Spices (Cumin, Curry Powder, Chilli Powder, Coriander), Garlic Powder, Potassium Chloride, Yeast Extract, Yeast Powder (contains Gluten and Barley), Citric Acid, Flavouring (contains Barley, Soya, Wheat, Celery)], Rice Flour, Salt, Colours (Concentrated Beetroot Juice, Curcumin, Paprika Extract).

I wish to explode each ingredient into an array (using PHP), seperated by commas. The problem I have is that some ingredients are sub-divided. In this example, the components of 'Mumbai Spice Flavour' are delimited by square brackets, and contains some ingredients, the sub-ingredients are which are then delimited by regular brackets.

A standard:

explode(",", $recipeStr) 

will give me a very messy result, so I'm looking for a Regular Expression statement that will explode each distinct element into an array, to take account of the optional square brackets, and optional sub-brackets. It also needs to be able to handle brackets that are not nested within square brackets.

The desired result would be an array list that looks like:

-Peanuts
-Wheat Starch
-Vegetable Oil
-Modified Starch
-Sugar
-Mumbai Spice Flavour [Onion Powder, Herbs and Spices (Cumin, Curry Powder, Chilli Powder, Coriander), Garlic Powder, Potassium Chloride, Yeast Extract, Yeast Powder (contains Gluten and Barley), Citric Acid, Flavouring (contains Barley, Soya, Wheat, Celery)]
-Rice Flour
-Salt
-Colours (Concentrated Beetroot Juice, Curcumin, Paprika Extract)

I am not very good at RegEx syntax, and so if any answer could also explain the syntax logic that would be greatly appreciated.

This seems to work (but maybe it's not the best solution) :)

preg_match_all('/\w[\w\s-]*(?:\[.*?\]|\(.*?\))?/', $string, $matches);

It's checking word character followed by 0 or more characters/spaces/dashes (add anything you want to capture to this group), then followed either by [...] or (...) or nothing (but brackets of the same type cannot be nested

So you can have:

- something
- anything [...]
- something different (...)

Ah, paranthesis-matching is not what a regular expression can easily do.

Maybe you should simply go through the string character by character:

$array = new Array();
$temp = "";

for($i = 0; $i < strlen($input); $i++)
{
    $c = $input[$i];
    if($c == '(')
        $paranthesis++;
    if($c == '[')
        $bracket++;

    if($c == ')')
        $paranthesis--;
    if($c == ']')
        $bracket--;
    if($c == ',' && $paranthesis + $bracket == 0)
    {
        $array[] = $temp;
        $temp = "";
    }
    else
        $temp .= $c;
}
$array[] = $temp;

I didn't test the code, but I hope it's clear what it is supposed to do.

This regex seems to work on your example. You won't be able to explode but it does capture each item/group which you can then loop through

([\w+ ]+\[[^\]]+\]|[\w+ ]+\([^\)]+\)|[\w+ ]+)

See demo here

To break it down:

(                      start capture group
[\w+ ]+\[[^\]]+\]    match any words followed by [...]
|                      or
[\w+ ]+\([^\)]+\)    match any words followed by (...)
|                      or
[\w+ ]+              match any other words
)                      end capture group