This regex should match lists just like in Markdown:
/((?:(?:(?:^[\+\*\-] )(?:[^
]+))(?:|
?))+)/m
It works in Javascript (with g
flag added) but I have problems porting it to PHP. It does not behave greedy. Here's my example code:
$string = preg_replace_callback('`((?:(?:(?:^\* )(?:[^
]+))(?:|
?))+)`m', array(&$this, 'bullet_list'), $string);
function bullet_list($matches) { var_dump($matches) }
When I feed to it a list of three lines it displays this:
array(2) { [0]=> string(6) "* one " [1]=> string(6) "* one " } array(2) { [0]=> string(6) "* two " [1]=> string(6) "* two " } array(2) { [0]=> string(8) "* three " [1]=> string(8) "* three " }
Apparently var_dump
is being called three times instead of just once as I expect from it since the regex is greedy and must match as many lines as possible. I have tested it on regex101.com. How do I make it work properly?
This regex won't work correctly if you have newlines in your input text.
The part (?:| ?)
matches either an or an
, but not both. (regex101 treats newlines as
only, so it works there).
Does the following work?
/(?:(?:(?:^[+*-] )(?:[^
]+))[
]*)+/m
(or, after removal of all the unnecessary non-capturing groups - thanks @M42!)
/(?:^[+*-] [^
]+[
]*)+/m
Your regex can be reduced to:
(?:^[+*-] [^
]+\R*)+
There're no needs to do all these groups.\R
means any kind of line break or
or
Edit: \R
looses its special meaning in a character class. [\R]
means R
Thanks to HamZa
This will match all bulleted lines until it gets to the first line that is not bulleted.
(?<=^|\R)\*[\s\S]+?(?=$|\R[^*])
\*
match a bullet where:(?<=^|\R)
it is preceeded by the start of the string or a newline.[\s|S]+?
match any character non-greedily where(?=$|\R[^*])
the matched sequence is followed by the end of string or a new line character followed by a *. Essentially this means that the sequence match is complete when a non-bullet line is found or when end of string.Results:
The resulting matches are shown in the RegexBuddy output below (Regex 101 can't handle it):