Currently I have the following code:
//loop here
foreach ($doc['a'] as $link) {
$href = pq($link)->attr('href');
if (preg_match($url,$href))
{
//delete matched string and append custom url to href attr
}
else
{
//prepend custom url to href attr
}
}
//end loop
Basically I've fetched vial curl an external page. I need to append my own custom URL to each href link in the DOM. I need to check via regex if each href attr already has a base url e.g. www.domain.com/MainPage.html/SubPage.html
If yes, then replace the www.domain.com
part with my custom url.
If not, then simply append my custom url to the relative url.
My question is, what regex syntax should I use and which php function? Is preg_replace() the proper function for this?
Cheers
You should use internals as opposed to REGEX whenever possible, because often the authors of those functions have considered edge cases (or read the REALLY long RFC for URLs that details all of the cases). For you case, I would use parse_url()
and then http_build_url()
(note that the latter function needs PECL HTTP, which can be installed by following the docs page for the http package):
$href = 'http://www.domain.com/MainPage.html/SubPage.html';
$parts = parse_url($href);
if($parts['host'] == 'www.domain.com') {
$parts['host'] = 'www.yoursite.com';
$href = http_build_url($parts);
}
echo $href; // 'http://www.yoursite.com/MainPage.html/SubPage.html';
Example using your code:
foreach ($doc['a'] as $link) {
$urlParts = parse_url(pq($link)->attr('href'));
$urlParts['host'] = 'www.yoursite.com'; // This replaces the domain if there is one, otherwise it prepends your domain
$newURL = http_build_url($urlParts);
pq($link)->attr('href', $newURL);
}