I'm trying to extract all the links that comes from after a specific user, But my regex only grab one link:
HTML:
<div class="from_name">
USERNAME
</div>
<div class="media_wrap clearfix">
<div class="media clearfix pull_left media_photo">
<div class="fill pull_left">
</div>
<div class="text">
<a href="https://google.com</a>
</div>
</div>
<div class="text">
<a href="https://yahoo.com</a>
</div>
</div>
Codes:
preg_match_all('/USERNAME[\s\S]*?href="(.*?)</', $data, $matches);
print_r($matches);
//output, it's only caputre google.com :
Array
(
[0] => Array
(
[0] => FullCapture
[1] => Array
(
[0] => https://google.com
)
)
Can you use two regex? First to match the entire area with USERNAME and second to match the urls.
preg_match('/(?<=USERNAME).*(?<=href=").*?</s', $string, $matches);
preg_match_all('/(?<=href=").*?(?=<)/', $matches[0], $newMatches);
var_dump($newMatches);
This gives you:
array(1) {
[0]=>
array(2) {
[0]=>
string(18) "https://google.com"
[1]=>
string(17) "https://yahoo.com"
}
}
Unfortunately, I am not familiar with the telegram messenger. Not I am almost sure that your problem cannot be solved (easily) with a regex. There are too many exceptions to the rule. So I will provide 2 alternatives:
Use a proper HTML parser, throw away what you so not need, capture the relevant information.
Use a hack
After the parsing, you will have a structure similar with:
You can do this parsing and deleting using string functions, or even regex'es.