URL:
http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml
This outputs something like:
<api><parse><text xml:space="preserve">text...</text></parse></api>
How do I get just the content between <text xml:space="preserve">
and </text>
?
I used curl
to fetch all the content from this URL. So this gives me:
$html = curl_exec($curl_handle);
What's the next step?
Use PHP DOM to parse it. Do it like this:
//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');
//display what you need:
echo $nodes->item(0)->nodeValue;
This outputs:
text...