使用正则表达式在<td>中查找内容

How can I extract "Areal" and "93 m²" by using Regex in scraping a html page?

<tr><td>Areal</td><td>93 m²</td></tr>

Please mind, that there are multiple <tr> with <td>'s inside this document, however there should only be one matching "Areal", "Rooms" etc.

You haven't posted what language you are using, so I'll just give the regex that matches you target text without any code of how to use the regex:

(?<=<td>).*?(?=</td>)

This uses a look-behind (a zero-width assertion) for <td>, a look-ahead for </td> and a non-greedy match (one that won't goggle up all input to the last </td>) for input between these two assertions.


In PHP:

$html = "<tr><td>Areal</td><td>93 m²</td></tr>";
preg_match_all("/(?<=<td>).*?(?=<\/td>)/", $html, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] => Areal
            [1] => 93 m²
        )

)