I am able to successfully parse many RSS Feeds with PHP SimpleXML. There are two particular feeds I am having difficult reading with PHP SimpleXML. Both of these feeds show fine in my browser.
The Bloomberg feed seems to be returning no data:
https://www.bloomberg.com/politics/feeds/site.xml
The Health Affairs topic feeds are having connection timeouts:
I've tried different url encoding with Health Affairs and setting different stream context options with libxml_set_streams_context. Here is example code that I am using to open these connections.
$opts = array( 'http' => array( 'timeout' => 10 ) );
$context = stream_context_create( $opts );
libxml_set_streams_context( $context );
libxml_use_internal_errors( true );
$rss = simplexml_load_file( $feed );
$error_msg = '';
if ( $rss === false ) {
foreach( libxml_get_errors() as $error ) {
$error_msg .= ' [' . $error->message . ']';
}
libxml_clear_errors();
}
// ...feed parsing
Curl seems to be returning an html page asking if I am a robot for Bloomberg. For the Health Affairs feeds, curl is getting a timeout. I've tried different options with curl, including checking whether it is gzip content.
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $feed);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt($ch, CURLOPT_FAILONERROR, 1 );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1 );
curl_setopt($ch, CURLOPT_TIMEOUT, 15 );
curl_setopt($ch, CURLOPT_ENCODING, "gzip" );
// $output contains the output string
$debug_output .= '[RSS FEED: ' . $feed . ']' . "
";
$debug_output .= curl_exec( $ch );
// close curl resource to free up system resources
curl_close($ch);