I am using PHP regex. Consider a text like this:
Lorem ipsum (dolor sit
amet (consectetur adipiscing
elit) sed do eiusmod) tempor
(incididunt) ut
labore.
I need to match the newlines inside the brackets, but not the ones outside brackets (like the last two). My current regex looks like this: /\([^)]*([ ]+)[^(]*\)/s
, however it doesn't capture the newline between "sit" and "amet" because of the nested brace. Can I make it work with regex only, or do I have to parse the text manually?
You may match all the nested parentheses with a recursive regex and then remove all CRLF sequences in the match values inside preg_replace_callback
.
Use this regex to match nested parentheses:
'~\((?>[^()]++|(?R))*\)~'
And here is a PHP demo:
$re = '~\((?>[^()]++|(?R))*\)~';
$str = "Lorem ipsum (dolor sit
amet (consectetur adipiscing
elit) sed do eiusmod) tempor
(incididunt) ut
labore.";
$output = preg_replace_callback($re, function($m) {
return str_replace("
", "", $m[0]);
}, $str);
echo $output;
Output:
Lorem ipsum (dolor sit amet (consectetur adipiscing elit) sed do eiusmod) tempor
(incididunt) ut
labore.
Additionally, see Recursive patterns
at php.net.