I'm trying to get 'title' from websites, at the moment I'm using preg_match
to get the title but it's very slow to load.
What I have at the moment:
This passes links through to a function:
<?php
foreach($savedLinks as $s)
{
echo "<div class='savedLink'>";
echo "<h5>" . getMetaData($s) . "</h5>";
echo "<a href='" . $s . "'>" . $s . "</a><br />";
echo "</div>";
}
?>
This function grabs the title from each website passed in:
function getMetaData($url)
{
if(!@file_get_contents($url))
{
return "";
}
else
{
if(preg_match('/<title>(.+)<\/title>/',file_get_contents($url),$matches) && isset($matches[1]))
return $matches[1];
else
return "Not Found";
}
}
Is there a fast way to get 'title' from each page?
I'm going to go out on a limb and guess that the file_get_contents is taking a lot longer than the preg_match, which I would expect to be pretty fast.
If you're doing this across a lot of sites, this method may not work, but you might want to look into byte range requests. If you can predict that the tag is within the first X bytes of the HTML response, you can do a partial request with byte-range and avoid having to move the whole document over the wire just to get the title tag. If the pages are dynamically generated it would require that the code on the server support this. If they're static docs, chances are good that byte range requests are supported.
https://serverfault.com/questions/398219/how-can-i-enable-byte-range-request
As this example suggests in the second answer, also try enabling keepalive by changing "Connection: close" to "Connection: keep-alive". Again, this will only work if you're hitting the same server multiple times and if the server has it enabled. Those two things together could save a lot of time per request.
You need a DOM parser for retrieving an HTML page information quickly. I have used the following DOM parser for the example:
http://simplehtmldom.sourceforge.net/
Download:
http://sourceforge.net/projects/simplehtmldom/files/
for example:
<?php
include('simplehtmldom_1_5/simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('http://joinform.com.au');
foreach($html->find('title') as $e)
echo $e->innertext . '<br>';
?>