使用PHP从html结构中提取电子邮件地址

I am trying to modify a php file (It is of Joomla extension Community Builder 1.9.1, and the file is \components\com_comprofiler\plugin\templates\default\default.php), in order to extract the e-mail address from a variable.

For description’s sake, let’s say this variable is $html. To make sure this variable is the right one containing the e-mail address that I'm targeting, I insert:

<pre><?php print_r($html) ?></pre>

Into the file, and its output is the email address with a mailto link, and the corresponding HTML is something like

<span id="cbMa47822" class="cbMailRepl"><a href="mailto:myemail@yahoo.com">myemail@yahoo.com</a></span>

So I guess I can use:

<?php $html_array = explode("\"",$html);echo $html_array[5]; ?>

Io get 'mailto:myemail@yahoo.com'; But actually it only returns a notice of:

undefined offset:5

So I print_r($html_array), and it return something like

Array
(
    [0] =>  cbMa14768
    [2] =>  class=
    [3] => cbMailRepl
    [4] => >... 
)

It looks like the <a> tag part of the html output is replaced by "...", like what you see in Chrome’s developer tool html inspector, where before you expand it, the HTML looks like:

<span id="cbMa47822" class="cbMailRepl">...</span>

I looked deeper into the php code, trying to find out how this $html is contructed, but it is totally beyond my understanding.

For learning purpose, my questions are:

  1. why there is no [1] in the result of print_r($html_array)

  2. How do I test a variable’s value more exactly, by more exactly I mean totally without html input, like if the value is "<a href="htt://foo.com">foo</a>", if should display the HTML as is, but not a link (when I use print_r, it returns a link)?

  3. And most importantly, based on the information given above, can you give my any hint regarding how I can extract the e-mail address from a variable like this?

Finally, for those who are willing to take a deeper look into this, the variable I am talking about is $this->tableContent[$userIdx][1][6]->value in \components\com_comprofiler\plugin\templates\default\default.php, originally it wasn't in the code but I did some test and confirm it contains the email address. I inserted the following code between line 450 & 451

<?php $html_array = explode("\"",$this->tableContent[$userIdx][1][6]->value);echo $html_array[5]; ?>
  1. To avoid links you can use escape sequence.
  2. you can use regular expression to match if the given string matches the email address pattern and print it
  3. PHP has a vast support for functions which can perform wierdest tasks so search for them

To extract an e-mail address from an HTML strcuture as you describe, just use regex and preg_match:

$html = '<span id="cbMa47822" class="cbMailRepl"><a href="mailto:myemail@yahoo.com">myemail@yahoo.com</a></span>';

preg_match("/mailto:(.*)\">/is", $html, $matches);

echo '<pre>';
print_r($matches);
echo '</pre>';

The output would be:

Array
(
    [0] => mailto:myemail@yahoo.com">
    [1] => myemail@yahoo.com
)

So to access that e-mail address, just do this:

echo $matches[1];

The output would be:

myemail@yahoo.com