preg_match量词不适用于PHP 5.5

Hi i'm trying to use this pattern /^(–*\s*2\.2\.|2\.2\.)/ to match this strings, each line is a different string. EDIT sorry about the poor data formating

<?php
 $final_texts=array();
 $pattern='/^(–*\s*2\.2\.|2\.2\.)/';//this is generated automatically elsewhere btw
 $texts = array(
 "– 2.2.04 R",
 "–– 2.2.04.10 C",
 "–– 2.2.04.1 CO",
 "–– 2.2.04.2 CO",
 "–– 2.2.04.3 CO",
 "–– 2.2.04.4 CO",
 "–– 2.2.04.5 CO",
 "–– 2.2.04.6 CO",
 "–– 2.2.04.7 CO",
 "–– 2.2.04.8 CO",
 "–– 2.2.04.9 CO",
 "foooooooooooo",
 "barrrrrrrrrr",
 "-- foobar",
 "- 1123",
 );
 foreach($texts as $key=>$text){    
    if(preg_match($pattern, $text)){
        $final_texts[]=$text;
    }
  }
 print_r($final_texts); ?>

This is what i'm using preg_match($pattern, $string) As i Understand it * means 0 or more of the former, but i'm no expert .

But only matches the first string and not the ones with more than one dash "–" keep in mind that they are different string inside an array and i iterate over it to do something. should i be doing something different in the pattern, i'm trying to match all strings that start with any amount of dashes and spaces follwed by the 2.2. string. And I will have this problem with other numbers, and i may have strings with more than 2 dashes in the future so i don't see how can i solve this not using regex i've allready test it here http://preg_match.onlinephpfunctions.com/ and have the same problem. demo thanks to @hwnd for showing me this!

I believe the cause of this is the unicode dash you have placed in your regular expression. I recommend using the Unicode property \p{Pd} ( any kind of hyphen or dash ) to match these characters.

/^(\p{Pd}+\s*2\.2\.|2\.2\.)/mu

Note: The m (multi-line) modifier causes ^ to match the beginning of each line. The u modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).

Working Demo

Just for thought, instead of iterating over your array use preg_grep() here.

$final_texts = preg_grep('/^(\p{Pd}+\s*2\.2\.|2\.2\.)/mu', $texts);

Working Demo

En dash is encoded as three bytes (E2 80 93) in UTF-8. A quantifier will only be applied to the last byte so –* is equivalent to \x{e2}\x{80}\x{93}*.

You can simply wrap the Unicode character in parentheses (–)* to apply the quantifier to all three bytes. Or if you don’t want to capture it, use non-capturing group (?:–)*.

Character sets will also work with Unicode characters [–].

See runnable.