How would one parse the content inside these tags, assuming the link is dynamic?
<h3 class="lvtitle">
<a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473"
class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">
Chicago, Chicago XXX Audio CD
</a>
</h3>
What I'm after is getting the "Chicago, Chicago XXX Audio CD" part.
Parser example:
$string = '<h3 class="lvtitle"><a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473" class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">Chicago, Chicago XXX Audio CD</a></h3>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($string); // load the string into the object
$links = $doc->getElementsByTagName('a'); //get all links
foreach ($links as $link) { //loop through all links
echo $link->nodeValue; //output text content of links
}
Output:
Chicago, Chicago XXX Audio CD
References:
http://php.net/manual/en/domelement.getelementsbytagname.php
http://php.net/manual/en/domdocument.loadhtml.php
...or if you really required a regex for some reason ( I don't see why parser wouldn't work)...
$string = '<h3 class="lvtitle"><a href="http://www.ebay.com/itm/Chicago-Chicago-XXX-Audio-CD-/351478948979?hash=item51d5c72473" class="vip" title="Click this link to access Chicago, Chicago XXX Audio CD">Chicago, Chicago XXX Audio CD</a></h3>';
preg_match_all('~<a\h.*?>(.*?)</a>~', $string, $links_content);
print_r($links_content[1]);
Output:
Array
(
[0] => Chicago, Chicago XXX Audio CD
)
~
= delimiter<a
= literally match <a
\h
= a horizontal white space.*?
= anything untile the first occurrence of the next character>
= a literal >
(.*?)
= a capture grouping capturing everything until the next character again</a>
= literal </a>
~
= closing delimiter
If you prefer regex101 write up, https://regex101.com/r/sT6yA9/1.
Also note the preg_match_all
that was incase your string had multiple links in it. With a single occurrence you could use preg_match
.
Regular expressions are kind of limited in this case, as they cannot be aware of commented text areas, etc.
A simple approach using regular expressions could however look like this:
.*"Click this link to access (.*?)".*
This will extract the link's text
^.*?<a.*?>(.*?)<\/a>
Here is the test results: https://regex101.com/r/xZ6kJ1/1