在一个元素中获取XML内容并将结果与其他子元素分开

I work with xml file that looks like this:

<text>
  <paragraph/>
    First text
  <paragraph/>
    Second text
</text>
<text>
  <paragraph/>
    Third text
  <paragraph/>
    Fourth text
</text>

I need to get the value of text element but the result should be in 4 rows. So every <paragraph/> element starts new row:

1 | First text
2 | Second text
3 | Third text
4 | Fourth text

My code:

$filexml = File::get('../file.xml');

$xml = simplexml_load_string($filexml);

for ($i=1; $i < count($xml->text) + 1; $i++) {

    foreach ($xml->text as $text_item) {
        echo $i++." | ".$text_item."<br/>";
    }

}

My result:

1 | First text Second text
2 | Third text Fourth text

What should I do next? Or maybe there is different approach how can I achieve the desired result?

You could use DOMDocument and DOMXPath. In the expression you could get the text nodes using text().

Then you could loop those and check for empty strings.

$filexml = File::get('../file.xml');
$doc = new DOMDocument();
$doc->loadXML($filexml);
$xpath = new DOMXpath($doc);
$i = 1;
$expression = "//text/text()";
foreach ($xpath->query($expression) as $text) {
    $result = trim($text->nodeValue);
    if ($result !== "") {
        echo sprintf("%d | %s<br>", $i++, $result);
    }
}

Demo

Try to change this:

<text>
  <paragraph/>
    First text
  <paragraph/>
    Second text
</text>
<text>
  <paragraph/>
    Third text
  <paragraph/>
    Fourth text
</text>

for this:

<text>
  <paragraph/>
    First text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Two text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Three text
  <paragraph/>
</text>
<text>
  <paragraph/>
    Four text
  <paragraph/>
</text>

Okay, this isn't particularly pretty, and I suggest you still give this a try using XPath, but here goes...

<?php

$filexml = "<root>
<text>
<paragraph/>
First text
<paragraph/>
Second text
</text>
<text>
<paragraph/>
Third text
<paragraph/>
Fourth text
</text>
</root>";

$xml = simplexml_load_string($filexml);
$i=1;

foreach($xml->text as $textNode)
{
    $textCounter = 1;
    foreach ($textNode->paragraph as $text_item) {
        echo $i++." | ".trim(explode(PHP_EOL.PHP_EOL, (string)$textNode)[$textCounter++])."<br/>";
    }
}


?>

You were basically on the right track, but your inner loop needs to iterate over the paragraph nodes, not the text nodes again. You also then need to be able to split apart the text within the text nodes. If the file really does have everything on individual lines, then you're fine, as you can split on newlines. If it doesn't (everything on one line), then this won't work.

SimpleXML does not work well with mixed child nodes. You will need to use DOM for that. You can use an Xpath expression to fetch the nodes (texts are nodes, too).

//text/*|//text/text()[normalize-space(.) != ""] filters for any child element node or any text node (this includes cdata sections) inside a text element. It will ignore text nodes that contain only whitespaces.

The result is a list of nodes that you can iterate with foreach. Check if it a separator (a paragraph element node). If yes, output the buffer otherwise add the text content of the node to the buffer.

$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);

$buffer = '';
$counter = 0;
foreach ($xpath->evaluate('//text/*|//text/text()[normalize-space(.) != ""]') as $node) {
  if ($node instanceof DOMElement && $node->localName === 'paragraph') {
    if ($buffer !== '') {
      echo ++$counter, ' | ', trim($buffer), "
";
      $buffer = '';
    }
  } else {
    $buffer .= $node->textContent;
  }
}
if ($buffer !== '') {
  echo ++$counter, ' | ', trim($buffer), "
";
}

Output:

1 | First text
2 | Second text
3 | Third text
4 | Fourth text

在一个元素中获取XML内容并将结果与​​其他子元素分开

在一个元素中获取XML内容并将结果与其他子元素分开