如何使用PHP中的类从字符串中提取div中的标题标记值和特定文本

i have to parse an html file using regular expression header inside a div tag this is my html tag where i am trying to parse

<div class="descriptionArea-2" style="visibility: visible;">
<img src="(image Url Here)" />
<br />
<h2>"Product Title"</h2>
        <div class="displayDescription">"product description here."<div class="icons">icons</div></div>

</div>

i tried a lots of time in this one for getting "product title" and "product description here"

Reg expression for this

'/<h2>"([^"]*?)"<\/h2>/'

use function preg_match_all

are you sure the title is in always in double quotes?

Your html code is not valid there is no closing tag for div with description

i dont know how generic the pages are but these expressions could work:

Product title:

/<h2>"(.*)"<\/h2>/

description:

/<div class="displayDescription">"(.*)"<div class="icons">/

maybe a more generic way to get description:

/<div class="displayDescription">([^<]*)/

use preg_match(_all) to get the values you want

preg_match_all('/<h2>"(.*)"<\/h2>/',$string,$matches)
$matches[1][0] //gets the first title

Here is a possible way of getting what you want with regexps:

/<div class="descriptionArea-2"[^>]*>(?: *<[^h][^2][^>]*>\/>)*<h2>([^<]*)<\/h2>[^<]*<div class="displayDescription">([^<]*)</

The above tries to match the exact same hierarchy as the sample html provided in the question. Replace the classes strings as needed. If the h2 and nested div tag (the one with displayDescription class) are in reverse order, or if there is any other tag in between, the regexp will not work.

The first returned value will be the h2 text, and the second the inner div text.


Another option is to use xpath, if your html document is well formed. Here are the xpath solutions for each string:

//div[@class="descriptionArea-2"]/h2/text()

//div[@class="descriptionArea-2"]/div[@class="displayDescription"]/text()