如何获得两个子串之间的子串?

I want to extract a substring between two substrings. The problem is that it will extract only the first one. I want to do it in all of my document.

Example :

function getBetween($content,$start,$end){
    $r = explode($start, $content);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $r[0];
    }
    return '';
}

$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$content = $document;
$start = '<a data-id="';
$end = '"';
$data = getBetween($content,$start,$end);
echo $data;

$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';
$content = $document2;
$start = '<a data-A="';
$end = '"';
$data2 = getBetween($content,$start,$end);
echo $data2;

Now it's extracting only 777777 AAAAA. But what I want is 777777 AAAAAA 888888 BBBBBB 999999 CCCCC

Just use preg_match_all function.

Example:

<?php
$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';

$list1 = [];
$list2 = [];
preg_match_all('/<a data-id="([^"]+)"/', $document, $list1);
preg_match_all('/<a data-A="([^"]+)"/', $document2, $list2);
print_r([$list1, $list2]);

Code: (Demo)

function getBetween($content,$start,$end){
    return preg_match_all('/'.preg_quote($start,'/').'\K[^'.preg_quote($end,'/').']*(?='.preg_quote($end,'/').')/',$content,$out)?$out[0]:'';
}


$document = '<a data-id="777777"></a><a data-id="888888"></a><a data-id="99999"></a>';
$content = $document;
$start = '<a data-id="';
$end = '"';
$data = getBetween($content,$start,$end);
var_export($data);

$document2 = '<a data-A="AAAAAA"></a><a data-A="BBBBBB"></a><a data-A="CCCCCC"></a>';
$content = $document2;
$start = '<a data-A="';
$end = '"';
$data2 = getBetween($content,$start,$end);
var_export($data2);

Output:

array (
  0 => '777777',
  1 => '888888',
  2 => '99999',
)array (
  0 => 'AAAAAA',
  1 => 'BBBBBB',
  2 => 'CCCCCC',
)

My method effectively produces this pattern: /<a data-id="\K[^"]*(?=")/ which returns the desired substrings as fullstring matches. Not only does this pattern perform with fewer steps because it is without a capture group, it returns a smaller result array. Pattern Demo Link

preg_quote() is used to escape all necessary characters so that the variable pattern doesn't "break".

$end is used twice in the pattern -- once in the "negated character class" [^"] and a second time in the "positive lookahead" (?=")

Just for the record:

  • /"([^"]*)"/ will work on your sample input.
  • When handling html strings, it is recommended to use html parsers: DomDocument, etc.

*Important, my pattern is only built to handle $end as a single character. If it is more than one character, then the pattern will not work as expected and will need to be modified.

This is a slightly slower pattern / preg_match() call that will allow larger $end strings: (Pattern Demo)

preg_match_all('/'.preg_quote($start,'/').'\K.*?(?='.preg_quote($end,'/').')/',$content,$out)?$out[0]:'';