i have to parse an html file using regular expression header inside a div tag this is my html tag where i am trying to parse
<div class="descriptionArea-2" style="visibility: visible;">
<img src="(image Url Here)" />
<br />
<h2>"Product Title"</h2>
<div class="displayDescription">"product description here."<div class="icons">icons</div></div>
</div>
i tried a lots of time in this one for getting "product title" and "product description here"
Reg expression for this
'/<h2>"([^"]*?)"<\/h2>/'
use function preg_match_all
are you sure the title is in always in double quotes?
Your html code is not valid there is no closing tag for div with description
i dont know how generic the pages are but these expressions could work:
Product title:
/<h2>"(.*)"<\/h2>/
description:
/<div class="displayDescription">"(.*)"<div class="icons">/
maybe a more generic way to get description:
/<div class="displayDescription">([^<]*)/
use preg_match(_all) to get the values you want
preg_match_all('/<h2>"(.*)"<\/h2>/',$string,$matches)
$matches[1][0] //gets the first title
Here is a possible way of getting what you want with regexps:
/<div class="descriptionArea-2"[^>]*>(?: *<[^h][^2][^>]*>\/>)*<h2>([^<]*)<\/h2>[^<]*<div class="displayDescription">([^<]*)</
The above tries to match the exact same hierarchy as the sample html provided in the question. Replace the classes strings as needed. If the h2
and nested div
tag (the one with displayDescription
class) are in reverse order, or if there is any other tag in between, the regexp will not work.
The first returned value will be the h2
text, and the second the inner div
text.
Another option is to use xpath, if your html document is well formed. Here are the xpath solutions for each string:
//div[@class="descriptionArea-2"]/h2/text()
//div[@class="descriptionArea-2"]/div[@class="displayDescription"]/text()