I tried to parse nature magazine's feed using php and several different rss/atom reader yet I can't find a proper way of reading them.
Their feed structure is bizzare to me its not RSS for sure but by the help of atom readers I couldn't get any proper answer too.
Anyone knows what is their feed type and how to parse them?
According to the raw feed itself (http://feeds.nature.com/nphys/rss/current?format=xml) it's RSS1 format, with a bunch of other tags thrown in via the xmlns: prefix which denominates a particular namespace for those tags (e.g. rdf, prism, feedburner, etc). So if you ignore all the declared namespaces (e.g. everything with a tag starting with <something:something>
or any attribute with a colon in its name, and just parse the tags as you would with RSS1 xml specification, you should be fine...
It uses what it says the in the root element:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:prism="http://prismstandard.org/namespaces/basic/2.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns="http://purl.org/rss/1.0/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
See https://en.wikipedia.org/wiki/RDF_feed
The various additional XML namespaces extend the basic RDF document with elements from other XML applications. Those elements without a namespace are RSS 1.0 elements, e.g.
<title>Nature Physics - Issue - nature.com science feeds</title>
This is also indicated by xmlns="http://purl.org/rss/1.0/"
.
Follow the given URLs to learn more about the XML applications used within that document.
You can parse that document easily with DOM
or SimpleXML
or XMLReader
.