I want to extract a href attribute but this attributes especially has mailto function. and i want to do this not just for one link but all links belongs to main webpage.
I tried this:
<?php
$url = "https://www.omurcanozcan.com";
$html = file_get_contents( $url);
libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
$node = $xpath->query( "//a[@href='mailto:']")->item(0);
echo $node->textContent; // This will print **GET THIS TEXT**
?>
I expect for instance a code is
<a href='mailto:omurcan@omurcanozcan.com'>omurcan@omurcanozcan.com</a>
I want to echo
<p>omurcan@omurcanozcan.com</p>
The main problem is that in your XPath, you are checking for
//a[@href='mailto:']
This will looks for a href attribute which only contains mailto:
, what you want is where the href starts with mailto:
, you can do this using starts-with()
...
$node = $xpath->query( "//a[starts-with(@href,'mailto:')]")->item(0);
The second thing is that I don't think your page is fully loaded when you get the content, a common test I do is to save the HTML once I've loaded it so I can check it out first...
$url = "https://www.omurcanozcan.com";
$html = file_get_contents( $url);
file_put_contents("a.html", $html);
If you then look in a.html you can see the HTML it is using, in the content I cannot see any mailto:
links.