I used Simple HTML DOM Parser for Parsing but it was too slow. So I chose cURL. I learning through Some blogs. Now I print to display the href between two tags.
<?php
class tagSpider
{
var $crl;
var $html;
var $binary;
var $url;
function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}
function fetchPage($url)
{
$this->url = $url;
if (isset($this->url)) {
$this->ch = curl_init ();
curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($this->ch, CURLOPT_URL, $this->url);
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary);
$this->html = curl_exec($this->ch);
curl_close ($this->ch);
}
}
function parse_array($beg_tag, $close_tag)
{
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data);
return $matching_data[0];
}
}
?>
<?php
$urlrun="http://m4.cricbuzz.com/";
$stag='<span>';
$etag="</span>";
$tspider = new tagSpider();
$tspider->fetchPage($urlrun);
$linkarray = $tspider->parse_array($stag, $etag);
foreach ($linkarray as $result) {
echo strip_tags($result, '<br><div>');
echo "<br>-<br>";
}
?>
How to display the href using the same Program
I see you are simply copying and pasting someone else code without actually understand what it is actually doing (which is fine! I did it when I was a newbie)
you should notice the code are cut in 2 separate section. the second part should be in a html body tabe since it is printing html code. simply add html and body tag around it
<html>
<body>
<?php
$urlrun="http://www.yahoo.com/";
$stag='<span>';
$etag="</span>";
$tspider = new tagSpider();
$tspider->fetchPage($urlrun);
$linkarray = $tspider->parse_array($stag, $etag);
foreach ($linkarray as $result) {
echo strip_tags($result, '<br><div>');
echo "<br>-<br>";
}
?>
</body>
</html>
edit: if you want the link instead, it's more of a regular expression thing.
<html>
<body>
<?php
$urlrun="http://www.google.com/";
$stag='href\=\"';
$etag="\"";
$tspider = new tagSpider();
$tspider->fetchPage($urlrun);
$linkarray = $tspider->parse_array($stag, $etag);
foreach ($linkarray as $result) {
echo strip_tags($result, '<br><div>');
echo "<br>-<br>";
}
?>
</body>
</html>
this will get you things in the format of...
href="http://www.google.com/imghp?tab=wi"
href="http://maps.google.com/maps?tab=wl" I am sure you can figure out the rest like getting rid of the href= part of the string