I want to extract a substring between two substrings. The problem is that it will extract only the first one. I want to do it in all of my document.
Example :
function getBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$content = $document;
$start = '<a data-id="';
$end = '"';
$data = getBetween($content,$start,$end);
echo $data;
$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';
$content = $document2;
$start = '<a data-A="';
$end = '"';
$data2 = getBetween($content,$start,$end);
echo $data2;
Now it's extracting only 777777
AAAAA
. But what I want is 777777
AAAAAA
888888
BBBBBB
999999
CCCCC
Just use preg_match_all function.
Example:
<?php
$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';
$list1 = [];
$list2 = [];
preg_match_all('/<a data-id="([^"]+)"/', $document, $list1);
preg_match_all('/<a data-A="([^"]+)"/', $document2, $list2);
print_r([$list1, $list2]);
Code: (Demo)
function getBetween($content,$start,$end){
return preg_match_all('/'.preg_quote($start,'/').'\K[^'.preg_quote($end,'/').']*(?='.preg_quote($end,'/').')/',$content,$out)?$out[0]:'';
}
$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$content = $document;
$start = '<a data-id="';
$end = '"';
$data = getBetween($content,$start,$end);
var_export($data);
$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';
$content = $document2;
$start = '<a data-A="';
$end = '"';
$data2 = getBetween($content,$start,$end);
var_export($data2);
Output:
array (
0 => '777777',
1 => '888888',
2 => '99999',
)array (
0 => 'AAAAAA',
1 => 'BBBBBB',
2 => 'CCCCCC',
)
My method effectively produces this pattern: /<a data-id="\K[^"]*(?=")/
which returns the desired substrings as fullstring matches. Not only does this pattern perform with fewer steps because it is without a capture group, it returns a smaller result array. Pattern Demo Link
preg_quote()
is used to escape all necessary characters so that the variable pattern doesn't "break".
$end
is used twice in the pattern -- once in the "negated character class" [^"]
and a second time in the "positive lookahead" (?=")
Just for the record:
/"([^"]*)"/
will work on your sample input.*Important, my pattern is only built to handle $end
as a single character. If it is more than one character, then the pattern will not work as expected and will need to be modified.
This is a slightly slower pattern / preg_match()
call that will allow larger $end
strings: (Pattern Demo)
preg_match_all('/'.preg_quote($start,'/').'\K.*?(?='.preg_quote($end,'/').')/',$content,$out)?$out[0]:'';