正则表达式根据用户名从href属性中提取URL

I'm trying to extract all the links that comes from after a specific user, But my regex only grab one link:

HTML:

<div class="from_name">
   USERNAME
</div>
<div class="media_wrap clearfix">
   <div class="media clearfix pull_left media_photo">
      <div class="fill pull_left">
      </div>
      <div class="text">
         <a href="https://google.com</a>
      </div>
   </div>
   <div class="text">
      <a href="https://yahoo.com</a>
   </div>
</div>

Codes:

preg_match_all('/USERNAME[\s\S]*?href="(.*?)</', $data, $matches);

print_r($matches);

//output, it's only caputre google.com :

Array
(
    [0] => Array
        (
            [0] => FullCapture

    [1] => Array
        (
            [0] => https://google.com
        )

)

Can you use two regex? First to match the entire area with USERNAME and second to match the urls.

preg_match('/(?<=USERNAME).*(?<=href=").*?</s', $string, $matches);

preg_match_all('/(?<=href=").*?(?=<)/', $matches[0], $newMatches);

var_dump($newMatches);

This gives you:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(18) "https://google.com"
    [1]=>
    string(17) "https://yahoo.com"
  }
}

Unfortunately, I am not familiar with the telegram messenger. Not I am almost sure that your problem cannot be solved (easily) with a regex. There are too many exceptions to the rule. So I will provide 2 alternatives:

  1. Use a proper HTML parser, throw away what you so not need, capture the relevant information.

  2. Use a hack

    • Parse the HTML
      • throw away everything which does not bring relevant information
      • you will end up with a list of names and links
    • if a name is followed by another name, delete it, since it has no links;
    • load whatever remains in an array, with the links associated to their repsective users.

After the parsing, you will have a structure similar with:

  • name
    • link
    • link
  • name <--- you will delete this, before loading the data in an array
  • name
    • link
  • ...

You can do this parsing and deleting using string functions, or even regex'es.