I'm trying to split a string into an array of parts.
String Example...
The quick brown fox [[random text here]] and then [[a different text here]]
Text between the square brackets will change and cannot be determined ahead of time. The preg_split I have so far will split, but it places the delimiters in other elements in the produced array, not the element I want it to be in.
$page_widget_split = preg_split('@(?<=\[\[)(.*?)(?=\]\])@', $page_content,-1, PREG_SPLIT_DELIM_CAPTURE);
This produces something like this...
[0] => "The quick brown fox [[",
[1] => "random text here]]",
[2] => " and then [[",
[3] => "a different text here]]"
The desired result would look like this...
[0] => "The quick brown fox",
[1] => "[[random text here]]",
[2] => " and then ",
[3] => "[[a different text here]]"
As I'm far from understanding Regex, could someone please take a look and tell me what I'm missing in the regex ?
This will get you pretty close
$page_content = 'the quick brown fox [[random text here]] and then [[a different text here]]';
print_r(preg_split('/(\[\[[^\]]+\]\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
The thing to remember is that this is the delimiter (\[\[[^\]]+\]\])
Output:
Array
(
[0] => the quick brown fox
[1] => [[random text here]]
[2] => and then
[3] => [[a different text here]]
)
When i say pretty close
, I do mean really pretty close...
The regex is pretty straight forward, capture 2 [
then anything but a ]
then 2 of those ]
. Which makes our delimiter, which we then capture. No empty flag is nice too.
Enjoy!
UPDATE
but it fails on " here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text"...Note the "[]" under the 'columns'
To handle that you will need a recursive regex pattern using (?R)
, like this:
$page_content = 'here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text [someother bracket]';
print_r(preg_split('/(\[(?:[^\[\]]|(?R))*\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
Output:
Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => and this is more text
[3] => [someother bracket] //single bracket capture
)
I won't pretend, this is kind of at the edge of my knowledge of regex, I should note this matches single brackets and not specifically double ones. You could try something like this /(\[(\[(?:[^\[\]]|(?2))*\])\])/
the (?2)
is like (?R)
but for a specific capture group. Which this works to match only [[ ... ]]
while keeping the inner nesting. But the issue is, then you have the capture duplicated, so you wind up with this:
Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => [{"widget":"table","id":"1","title": "Views Table", "columns": []}]
[3] => and this is more text [someother bracket]
)
Notice how it doesn't capture [someother bracket]
, but it captures the other one 2 times. There may be a way around that, but i can't think of it.
Rather or not capturing single bracket pairs is an issue I don't know.
But I have used this before, mainly for matching, matched pairs of "
or ( )
but it's the same concept.
The only other solution would be to make a lexer/parser for it, I have some examples of how do do that on my GitHub account. Regex (by itself) is not suited to nested elements. Most any regex solution will fail on nesting.
You might consider using preg_match_all
instead, it'll probably make the regex's logic easier to figure out:
/\[{2}.+?\]{2}|.+?(?=\[{2}|$)/
Alternate between:
\[{2}.+?\]{2}
, match [[
, lazy-repeat characters, followed by matching ]]
, or
.+?(?=\[{2}|$)
, lazy-repeat characters until lookahead matches [[
or the end of the string
https://regex101.com/r/ls6oBa/1
In PHP:
$str = "The quick brown fox [[random text here]] and then [[a different text here]] foobar";
preg_match_all('/\[{2}.+?\]{2}|.+?(?=\[{2}|$)/',$str, $result);