This question already has an answer here:
<div id="test">some<b>bold</b> or <i>italic</i> text</div>
<div id="test">and again<b> bold text</b><i>and italic text<i></div>
1 : some bold or italic text
2 : and again blod text and italic text
string(//div)
normalize-space(//div)
Give the good formatting answer, but only one result came.
id('test')//text()
Give all text but split the result.
I tried to use string-join, or concat but with no luck. I want to do this in php.
</div>
Try this:
$dom = new \DOMDocument();
$dom->loadHTML('<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
<div id="test1">some<b>bold</b> or <i>italic</i> text</div>
<div id="test2">and again<b> bold text</b><i>and italic text</i></div>
</body>
</html>');
$xpath = new \DOMXPath($dom);
foreach ( $xpath->query('//div[contains(@id,"test")]') as $node ) {
echo $node->nodeValue , PHP_EOL;
}
Outputs:
somebold or italic text
and again bold textand italic text
There is not many style marks in html, you can try just create your own function to erase the unwanted html. Something like:
function htmlToText(text) {
return text.replace(/<i>/i, '').replace(/<b>/i, '').replace(/<s>/i, '').replace(/<span>/i, '');
}
You're going to need to use regular expressions here to extract the text from inside the HTML tags. If you're not hot on regex, this site will burn you up.
http://www.regular-expressions.info/
You then use preg_replace (http://php.net/preg_replace) to extract the text using the pattern that you constructed.
Suppose you have this XML document:
<html>
<div id="test">some<b>bold</b> or <i>italic</i> text</div>
<div id="test">and again<b> bold text</b><i>and italic text</i></div>
</html>
Then just use:
string(/*/div[1])
The result of evaluating this XPath expression is:
somebold or italic text
Similarly:
string(/*/div[2])
when evaluated produces:
and again bold textand italic text
In case you want to delimit each text node with space, this cannot be achieved with a single XPath 1.0 expression (can be done with a single XPath 2.0 expression). Instead, you will need to evaluate:
/*/div[1]//text()
This selects (in a list or array structure, depending on your programming language) all text node descendants of /*/div[1]
:
"some" "bold" " or " "italic" " text".
Similarly:
/*/div[2]//text()
selects (in a list or array structure, depending on your programming language) all text node descendants of /*/div[2]
:
Now, using your programming language, you have to concatenate these with intermediate space to produce the final wanted result.