选择所有子项文本值[重复]

This question already has an answer here:

How do you parse and process HTML/XML in PHP? 30 answers

Text to parse

<div id="test">some<b>bold</b> or <i>italic</i> text</div>
<div id="test">and again<b> bold text</b><i>and italic text<i></div>

Result i'd like to have

1 : some bold or italic text
2 : and again blod text and italic text

What I tried

string(//div)
normalize-space(//div)

Give the good formatting answer, but only one result came.

id('test')//text()

Give all text but split the result.

I tried to use string-join, or concat but with no luck. I want to do this in php.

</div>

Try this:

             $dom = new \DOMDocument();
             $dom->loadHTML('<!DOCTYPE HTML>
<html lang="en-US">
<head>
       <meta charset="UTF-8">
       <title></title>
</head>
<body>
       <div id="test1">some<b>bold</b> or <i>italic</i> text</div>
       <div id="test2">and again<b> bold text</b><i>and italic text</i></div>
</body>
</html>');

              $xpath = new \DOMXPath($dom);
              foreach ( $xpath->query('//div[contains(@id,"test")]') as $node ) {
                      echo $node->nodeValue , PHP_EOL;
              }

Outputs:

somebold or italic text
and again bold textand italic text

There is not many style marks in html, you can try just create your own function to erase the unwanted html. Something like:

function htmlToText(text) {
    return text.replace(/<i>/i, '').replace(/<b>/i, '').replace(/<s>/i, '').replace(/<span>/i, '');
}

You're going to need to use regular expressions here to extract the text from inside the HTML tags. If you're not hot on regex, this site will burn you up.

http://www.regular-expressions.info/

You then use preg_replace (http://php.net/preg_replace) to extract the text using the pattern that you constructed.

Suppose you have this XML document:

<html>
  <div id="test">some<b>bold</b> or <i>italic</i> text</div>
  <div id="test">and again<b> bold text</b><i>and italic text</i></div>
</html>

Then just use:

string(/*/div[1])

The result of evaluating this XPath expression is:

somebold or italic text

Similarly:

string(/*/div[2])

when evaluated produces:

and again bold textand italic text

In case you want to delimit each text node with space, this cannot be achieved with a single XPath 1.0 expression (can be done with a single XPath 2.0 expression). Instead, you will need to evaluate:

 /*/div[1]//text()

This selects (in a list or array structure, depending on your programming language) all text node descendants of /*/div[1]:

"some" "bold" " or " "italic" " text".

Similarly:

 /*/div[2]//text()

selects (in a list or array structure, depending on your programming language) all text node descendants of /*/div[2]:

Now, using your programming language, you have to concatenate these with intermediate space to produce the final wanted result.