用RSS::Parser,遇到不规范的feed,无法解析。
用hpricot解析rss feed,无法读取link属性。
使用最新的0.8.1版,测试如下
[code="java"]
doc = Hpricot("test")
puts doc
[/code]
运行结果为 test
换为0.7版,运行结果为 test
好象只对 link 标签有问题,其他还没发现有问题的。
请问谁遇到过这个问题,是如何解决的,谢谢。
hmm,RSS是XML来的诶,为什么不用Hpricot::XML("test")?我用的Hpricot是0.6系的,比较老了。
其实处理个rss根本不用杀鸡用牛刀的。REXML + XPath就可以的,而且很方便,是ruby标准库。
不论规范还是不规范的rss格式,你要提取的标签就是那几个,像等等。都是可枚举的。
简单举几个例子:
1.
[code="java"]
require'rexml/document'
doc=REXML::Document.new("test")
link=REXML::XPath.new(doc,"//a/link").text
puts link #test
[/code]
2.更复杂一点的
[code="ruby"]
doc=<<-RSS
http://api.douban.com/people/2599975
Hooopo
My Twitter:https://twitter.com/Hooopo
My Blog:http://Hooopo.blogspot.com
My Gtalk:wxz125627771@gmail.com
湖南长沙/db:location
db:uidhooopo/db:uid
RSS
require'rexml/document'
require'pp'
require'rexml/document'
class People
class << self
def attr_names
[
:id,
:location,
:title,
:link,
:content,
:uid
]
end
end
attr_names.each do |attr|
attr_accessor attr
end
def initialize(feed)
doc=REXML::Document.new(feed)
id=REXML::XPath.first(doc,"//entry/id")
@id=id.text if id
content=REXML::XPath.first(doc,"//entry/content")
@content=content.text if content
title=REXML::XPath.first(doc,"//entry/title")
@title=title.text if title
location=REXML::XPath.first(doc,"//entry/db:location")
@location=location.text if location
uid=REXML::XPath.first(doc,"//entry/db:uid")
@uid=uid.text if uid
REXML::XPath.each(doc,"//entry/link") do|link|
@link||={}
@link[link.attributes['rel']]=link.attributes['href']
end
end
end
people=People.new(doc)
pp people
puts people.id
puts people.location
puts people.link["self"]
[/code]
结果:
[code="ruby"]
# @content=
"My Twitter\357\274\232https://twitter.com/Hooopo\nMy Blog:http://Hooopo.blogspot.com\nMy Gtalk:wxz125627771@gmail.com",
@id="http://api.douban.com/people/2599975",
@link=
{"self"=>"http://api.douban.com/people/2599975",
"icon"=>"http://otho.douban.com/icon/u2599975-3.jpg",
"alternate"=>"http://www.douban.com/people/hooopo/",
"homepage"=>"http://://hooopo.blogspot.com/"},
@location="\346\271\226\345\215\227\351\225\277\346\262\231",
@title="Hooopo",
@uid="hooopo">
http://api.douban.com/people/2599975
湖南长沙
http://api.douban.com/people/2599975
[/code]
改一下:
[quote]
require'rexml/document'
doc=REXML::Document.new("test")
link=REXML::XPath.new(doc,"//a/link").text
puts link #test
[/quote]
================>
[code="ruby"]
require'rexml/document'
doc=REXML::Document.new("test")
link=REXML::XPath.first(doc,"//a/link")
puts link.text #=>test
[/code]
相信楼主想用Hpricot也就是图个方便……这玩儿也算不上牛刀
[quote="汪兆铭"][code="ruby"]require'rexml/document'
doc=REXML::Document.new("test")
link=REXML::XPath.first(doc,"//a/link")
puts link.text #=>test[/code][/quote]
你看要是用Hpricot的话……
[code="ruby"]require 'hpricot'
doc = Hpricot::XML('test')
link = (doc/:link).first
puts link.inner_html #=> test[/code]
或者用更快的Nokogiri:
[code="ruby"]require 'nokogiri'
doc = Nokogiri.parse 'test'
link = (doc/:link).first
puts link.text[/code]
写起XPath来比标准库里的XPath要方便些 ^ ^