查找并提取包含标签的文本中重复出现的字符串

Suppose I have this piece of text:

The <b>quick brown</b> fox jumps over the lazy dog
The quick brown fox jumps over the <b>lazy dog</b>
The quick brown fox <b>jumps over</b> the lazy dog

I want to get and extract all occurrences of this string from the text above:

<b>quick brown</b>
<b>lazy dog</b>
<b>jumps over</b>

Now I know I would need a while loop that checks until the end of the text and some string functions, but I'm not sure which ones.

Appreciate if someone could help with this.

Do like this..

<?php
$html='The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog';

function funcx($v)
{
    return "<b>".$v."</b>";
}

preg_match_all('~<b>(.*?)<\/b>~', $html, $matches);
$results=array_map('funcx',$matches[1]);
var_dump($results);

OUTPUT:

array (size=3)
  0 => string '<b>quick brown</b>' (length=18)
  1 => string '<b>quick brown</b>' (length=18)
  2 => string '<b>quick brown</b>' (length=18)

In case you would want to use regex, try the following:

/<b ?.*>(.*)<\/b>/

It will catch everything inside <b></b> tags including tags themselves.

Online Example

You could expand the usage of the regex above to more than only one <b> tag by using simple function and pass a tag you would want to catch:

Example:

function getTextBetweenTags($string, $tagname)
{
    $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
    preg_match_all($pattern, $string, $matches);
    return $matches;
}

Usage:

$string = 'The <b>quick brown</b> fox jumps over the lazy dog \
           The <b>quick black</b> fox jumps over the lazy dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b");
print_r($text);

Output:

Array
(
    [0] => Array
        (
            [0] => <b>quick brown</b>
            [1] => <b>quick black</b>
            [2] => <b>quick white</b>
        )

)

Online Example

EDIT 1:

I have extended the function above for you, so it will work with multiple tags:

Example:

function getTextBetweenTags($string, $tagsname)
{
    $tagsname = explode(',',$tagsname);
    foreach ($tagsname as $tagname) 
    {
        $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
        preg_match_all($pattern, $string, $matches);
        $results[] = $matches;
    }
    return $results;
}

Usage:

$string = 'The <b>quick brown</b> fox jumps <strong>over</strong> the lazy dog \
           The <b>quick black</b> fox jumps over the <span>lazy</span> dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b,strong,span"); // Single or multiple HTML tags
print_r($text);

Output:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => <b>quick brown</b>
                    [1] => <b>quick black</b>
                    [2] => <b>quick white</b>
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => <strong>over</strong>
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [0] => <span>lazy</span>
                )

        )

)

Online Example

$text = "The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog";
$part = "<b>quick brown</b>";
$count = substr_count($text, $part);
for($i=0;$i<$count;$i++)
{
echo $part."<br>";
}

OUTPUT

quick brown

quick brown

quick brown

If you replace

echo $part."<br>";

with

echo htmlspecialchars($part)."<br>";

OUTPUT

<b>quick brown</b>
<b>quick brown</b>
<b>quick brown</b>