从MediaWiki API调用中提取内容(XML,cURL)

URL:

http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml

This outputs something like:

<api><parse><text xml:space="preserve">text...</text></parse></api>

How do I get just the content between <text xml:space="preserve"> and </text>?

I used curl to fetch all the content from this URL. So this gives me:

$html = curl_exec($curl_handle);

What's the next step?

Use PHP DOM to parse it. Do it like this:

//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');

//display what you need:
echo $nodes->item(0)->nodeValue;

This outputs:

text...