I want to replace certain html tags with null string and want to retrieve text only. Below is the example that I want.
preg_match_all("/<span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">(.*)<\/span>/U", $content, $matches);
The above line retrieves something like this.
<span id="priceblock_ourprice" class="a-size-medium a-color-price">50</span>
Now, I want to retrieve the integer value only (i.e 50). I tried the following statement to remove the HTML tags.
foreach($matches[0] as $key=>$val) {
$price = preg_replace( '/<(.*)>/', '', $val);
}
But the problem is, it replaces everything, and a null string is returned. It should return 50, no the null. The output file $price variable should be like:
$price = 50
Try adding a question mark to your regular expression
foreach($matches[0] as $key=>$val) {
$price = preg_replace( '/<(.*?)>/', '', $val);
}
This will have the effect of finding the first >
instead of the last one. Regular expressions are greedy and will find everything it can.
Also, keep in mind that the way you are doing this will replace $price with each loop. I am assuming you're doing something with $price before the next loop occurs, but if not, you should store the price in an array.
If it seems to match more than expected use ?
for a non greedy match. Greedy (.*
) will consume as much as possible, while making it non greedy (.*?
) will prevent this.
preg_replace('/<(.*?)>/', '', $val);
I would consider using DOM
for this also, below is an example.
$content = <<<DATA
<span id="priceblock_ourprice" class="a-size-medium a-color-price">50</span>
<span id="priceblock_ourprice" class="a-size-medium a-color-price">40</span>
<span id="foo">30</span>
DATA;
$doc = new DOMDocument();
$doc->loadHTML($content); // Load your HTML content
$xpath = new DOMXPath($doc);
$vals = $xpath->query("//span[@id='priceblock_ourprice']");
foreach ($vals as $val) {
echo $val->nodeValue . "
";
}
Output
50
40