从字符串中提取Span和href数据

I have some HTML strings with this format

   <span>SpanText</span>
   <a href="link.html" title="link">Link Text</a>

I use this regexp to extract the data

   $regexp = "<span>(.*)<\/span><a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
   preg_match_all("/$regexp/siU", $string, $matches, PREG_SET_ORDER);

This returns nothing.

There must be a problem with the regexp ?

I want to extract the span text and the link text.

You can use the regex :

<span>(.*)<\/span>(?:.|
)*?<a\s[^>]*?href=\"??[^\" >]*?[^>]*>(.*)<\/a>

DEMO

Problem with your code:

Why you used \\1 ? (I didnt understand that)

Do not use regex to parse DOM, it's not the appropriate tool for that... Instead use a DOM parser... Here's an example with PHP Simple HTML DOM Parser:

// includes Simple HTML DOM Parser
include "simple_html_dom.php";

$input = '
            <span>SpanText</span>
            <a href="link.html" title="link">Link Text</a>
        ';

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Retrieve the text from elements
$span = $html->find('span',0)->plaintext;
$anchor = $html->find('a',0)->plaintext;

echo "$span - $anchor";

// Clear DOM object
$html->clear();
unset($html);

OUTPUT

SpanText - Link Text

Working DEMO

For more information, you can read more on PHP Simple HTML DOM Parser Manual

But, if you're working only on this piece of html code, then maybe regex can be used here... So you can try this pattern:

/<span>([^<]+)<\/[^<]+<a[^>]+>([^<]+)/g

Live DEMO