I have a working getbetween function which finds the value between 2 tags.
function GetBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
$content = "<title>Hello World</title>";
echo GetBetween($content,"<title>","</title>");
However, it only finds 1 value. If the page includes same multiple tags, how can I get them all ?
In your example it would be the following:
$html = "<title>Hello World</title><title>Hello StackOverflow</title><title></title>";
$start = '<title>'; // I would use $tagName variable, but whatever
$end = '</title>';
$splitByStartParts = explode($start, $html);
$result = [];
foreach ($splitByStartParts as $i => $part) {
if ($i > 0) {
$r = explode($end, $part);
$result[] = $r[0];
}
}
return $result;
$html = "<title>Hello World</title><title>Hello StackOverflow</title>";
$tagName = 'title';
$regEx = "/(?<=<$tagName>).+(?=<\/$tagName>)/U";
// Note "U" RegEx flag for "Ungreedy"
preg_match_all($regEx, $html, $matches);
return $matches[0];
Which returns:
array(2) {
[0] =>
string(11) "Hello World"
[1] =>
string(19) "Hello StackOverflow"
}
Using regular expressions provides you with more neat and readable code, and contains all logic which is needed to identify a match in one string.
For example, if you need to gather only non-empty values (as done in the example above), you use .+
in your expression, whereas if you need all values, including empty ones, you just change that to .*
. But, if expressed in PHP code, that would add yet another expression, and pieces of code handling such edge cases can (and usually do) lead to unobvious errors if accumulated.
You can achive this using preg_replace_callback()
example:
function myFunction($match)
{
//$match[1] means without the <title> tags.
//you can test with using var_dump($match)
return $match[1];
}
$content = "<title>Hello World</title>";
$result = preg_replace_callback('#\<title\>(.+?)\<\/title\>#s','myFunction', $content);
echo $result;