please help me strip the following more efficiently.
a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"
the site I visit has a few of those, I would only need everything in between the two periods:
vFIsdfuIHq4gpAnc
I would like to use my current format and coding that works around the regex environment. Please help me tune up my following preg match line:
preg_match_all("(./(.*?).html)", $sp, $content);
Any kind of help I get on this is greatly appreciated and thank you in advance!
Here is my complete code
$dp = "http://www.cnn.com";
$sp = @file_get_contents($dp);
if ($sp === FALSE) {
echo("<P>Error: unable to read the URL $dp. Process aborted.</P>");
exit();
}
preg_match_all("(./(.*?).html)", $sp, $content);
foreach($content[1] as $surl) {
$nctid = str_replace("mv/","",$surl);
$nctid = str_replace("/","",$nctid);
echo $nctid,'<br /><br /><br />';
the above is what I have been working on
It's pretty okay, really. It's just that you don't want to match .*?
, you want to match multiple characters that aren't a full stop, so you can use [^.]+
instead.
$sp = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
preg_match_all( '/\.([^.]+).html/', $sp, $content );
var_dump( $content[1] );
The result that is printed:
array(1) {
[0]=>
string(16) "vFIsdfuIHq4gpAnc"
}
Here's an example of how to loop through all links:
<?php
$url = 'http://www.cnn.com';
$dom = new DomDocument( );
@$dom->loadHTMLFile( $url );
$links = $dom->getElementsByTagName( 'a' );
foreach( $links as $link ) {
$href = $link->attributes->getNamedItem( 'href' );
if( $href !== null ) {
if( preg_match( '~mv/.*?([^.]+).html~', $href->nodeValue, $matches ) ) {
echo "Link-id found: " . $matches[1] . "
";
}
}
}
How about using explode
?
$exploded = explode('.', $sp);
$content = $exploded[1]; // string: "vFIsdfuIHq4gpAnc"
You can use explode()
:
$string = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
if(stripos($string, '/mv/')){
$dots = explode('.', $string);
echo $dots[(count($dots)-2)];
}
even more simpler
$sp="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html";
$regex = '/\.(?P<value>.*)\./';
preg_match_all($regex, $sp, $content);
echo nl2br(print_r($content["value"], 1));