This question already has an answer here:
It's been a couple of days now that I am trying to find a way to solve my problem. I use CURL to get the content of a webpage and then use prey_match_all to use the content on my style, but I've got a problem when it's time to find some < a > tags in the document.
I want preg_match_all to find all < a > tags that are followed by a < strong > tag and than store all href values of these < a > tags in a array variable.
Here's what I've thought :
preg_match_all("~(<a href=\"(.*)\"><strong>\w+<\/strong>)~iU", $result, $link);
It's returning me :
Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )
Can somebody help me please !!
</div>
I strongly recommend you go with DomDocument
This code should do the trick...
<?php
/**
* @author Jay Gilford
* @edited KHMKShore:stackoverflow
*/
/**
* get_links()
*
* @param string $url
* @return array
*/
function get_links($url) {
// Create a new DOM Document to hold our webpage structure
$xml = new DOMDocument();
// Load the url's contents into the DOM (the @ supresses any errors from invalid XML)
@$xml->loadHTMLFile($url);
// Empty array to hold all links to return
$links = array();
//Loop through each <a> and </a> tag in the dom
foreach($xml->getElementsByTagName('a') as $link) {
//if it has a strong tag in it, save the href link.
if (count($link->getElementsByTagName('strong')) > 0) {
$links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
}
}
//Return the links
return $links;
}
firstly, your regex can fail easily
<a alt="cow > moo" href="cow.php"><strong>moo</strong></a>
second your regex is slightly out, the following will work:
~(<a href="(.*)"><strong>\w+</strong></a>)~
thirdly, and most important, if you want to be guaranteed to extract what you want without ever failing, like @KHMKShore has pointed out, DOMDocument is the best path.