I have a big string that has mixture of two types of data sets. I want to get the data after the last slash in all href values (for example 168702 and 167504) and its corresponding alt=
values (which is episode 29 and episode 20). I tried the following, but I can't get the correct data.
preg_match_all('/<a class=\"asite-thumbnail\" href="(.*?)"/s', $code2, $foo);
print_r($foo[1]);
first data set type:
<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>
second data set type:
<a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">
Here's how you can accomplish this using the domdocument...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a>';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a'); // pull all links
foreach ($links as $link) { //loop through each link
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "
"; //strip down the url to all content after the last /
$images = $link->getElementsByTagName('img');//get all images in the link
foreach($images as $image) { //loop through all links
echo 'Alt attribute = ' . $image->getAttribute('alt') . "
"; // output the alt attributes content
}
}
Output:
End of Link=168702
Alt attribute = Episode 29
Demo: https://regex101.com/r/eW0zI1/1
... or using both data sets...
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504""><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
echo 'End of Link=' . preg_replace('~^.*/~', '', $link->getAttribute('href')) . "
";
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
echo 'Alt attribute = ' . $image->getAttribute('alt') . "
";
}
}
End of Link=168702
Alt attribute = Episode 29
End of Link=167504
Alt attribute = episode 20
Update:
$input = '<a class="asite-thumbnail" href="/season/path/12345/1/168702"><img src="http://asite.image2432424.jpg" alt="Episode 29"><div class="asite-title">Episode 29</div><div class="asite-info">starwar season 2</div></a><a class="asite-thumbnail" title="episode 20 start war season 2" href="/season/path/12345/1/167504"><img src="http://asite.com/_thumb_dfsdfsdf.jpg" alt="episode 20">';
$doc = new DOMDocument();
$doc->loadHTML($input);
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
$linkimage['endlink'][] = preg_replace('~^.*/~', '', $link->getAttribute('href'));
$images = $link->getElementsByTagName('img');
foreach($images as $image) {
$linkimage['alt'][] = $image->getAttribute('alt');
}
}
print_r($linkimage);
Output:
Array
(
[endlink] => Array
(
[0] => 168702
[1] => 167504
)
[alt] => Array
(
[0] => Episode 29
[1] => episode 20
)
)