查找并提取包含标签的文本中重复出现的字符串

Suppose I have this piece of text:

The <b>quick brown</b> fox jumps over the lazy dog
The quick brown fox jumps over the <b>lazy dog</b>
The quick brown fox <b>jumps over</b> the lazy dog

I want to get and extract all occurrences of this string from the text above:

<b>quick brown</b>
<b>lazy dog</b>
<b>jumps over</b>

Now I know I would need a while loop that checks until the end of the text and some string functions, but I'm not sure which ones.

Appreciate if someone could help with this.

Do like this..

<?php
$html='The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog';

function funcx($v)
{
    return "<b>".$v."</b>";
}

preg_match_all('~<b>(.*?)<\/b>~', $html, $matches);
$results=array_map('funcx',$matches[1]);
var_dump($results);

OUTPUT:

array (size=3)
  0 => string '<b>quick brown</b>' (length=18)
  1 => string '<b>quick brown</b>' (length=18)
  2 => string '<b>quick brown</b>' (length=18)

In case you would want to use regex, try the following:

/<b ?.*>(.*)<\/b>/

It will catch everything inside  tags including tags themselves.

Online Example

You could expand the usage of the regex above to more than only one  tag by using simple function and pass a tag you would want to catch:

Example:

function getTextBetweenTags($string, $tagname)
{
    $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
    preg_match_all($pattern, $string, $matches);
    return $matches;
}

Usage:

$string = 'The <b>quick brown</b> fox jumps over the lazy dog \
           The <b>quick black</b> fox jumps over the lazy dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b");
print_r($text);

Output:

Array
(
    [0] => Array
        (
            [0] => <b>quick brown</b>
            [1] => <b>quick black</b>
            [2] => <b>quick white</b>
        )

)

Online Example

EDIT 1:

I have extended the function above for you, so it will work with multiple tags:

Example:

function getTextBetweenTags($string, $tagsname)
{
    $tagsname = explode(',',$tagsname);
    foreach ($tagsname as $tagname) 
    {
        $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
        preg_match_all($pattern, $string, $matches);
        $results[] = $matches;
    }
    return $results;
}

Usage:

$string = 'The <b>quick brown</b> fox jumps <strong>over</strong> the lazy dog \
           The <b>quick black</b> fox jumps over the <span>lazy</span> dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b,strong,span"); // Single or multiple HTML tags
print_r($text);

Output:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => <b>quick brown</b>
                    [1] => <b>quick black</b>
                    [2] => <b>quick white</b>
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => <strong>over</strong>
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [0] => <span>lazy</span>
                )

        )

)

Online Example

$text = "The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog";
$part = "<b>quick brown</b>";
$count = substr_count($text, $part);
for($i=0;$i<$count;$i++)
{
echo $part."<br>";
}

OUTPUT

quick brown

If you replace

echo $part." ";

with

echo htmlspecialchars($part)." ";

OUTPUT

<b>quick brown</b>
<b>quick brown</b>
<b>quick brown</b>