我想要获得<h2>多云</h2> 里面的多云,
<<div class="w-number"> <span class="tpte">14℃</span> </div>里面的14℃
请问我应该使用怎样的正则表达式??谢谢
<div class="box-s1-l">
<div class="col"> <span class="day_s">白天</span>
<div class="w-icon"><img alt='多云' src='http://www.sinaimg.cn/dy/weather/images/yb2/45_45/duoyun_0.gif' /></div>
<h2>[color=red]多云[/color]</h2>
<div class="w-number"> <span class="tpte">[color=red]14℃[/color]</span> </div>
</div>
<div class="col"> <span class="day_s">夜间</span>
<div class="w-icon"><img alt='多云' src='http://www.sinaimg.cn/dy/weather/images/yb2/45_45/duoyun_1.gif' /></div>
<h2>多云</h2>
<div class="w-number"> <span class="tpte">6℃</span> </div>
</div>
</div>
#!/usr/bin/env python #-*- coding: utf8 -*- import re html = """ <div class="box-s1-l"> <div class="col"> <span class="day_s">白天</span> <div class="w-icon"><img alt='多云' src='http://www.sinaimg.cn/dy/weather/images/yb2/45_45/duoyun_0.gif' /></div> <h2>多云</h2> <div class="w-number"> <span class="tpte">14℃</span> </div> </div> <div class="col"> <span class="day_s">夜间</span> <div class="w-icon"><img alt='多云' src='http://www.sinaimg.cn/dy/weather/images/yb2/45_45/duoyun_1.gif' /></div> <h2>多云</h2> <div class="w-number"> <span class="tpte">6℃</span> </div> </div> </div> """ if __name__ == '__main__': p = re.compile('<[^>]+>') print p.sub("", html)
这个是去掉所有HTML标签。你可以把你需要提取数据的那行HTML用这个正则把非HTML标签的内容提取出来
例如:
取天气
#!/usr/bin/env python #-*- coding: utf8 -*- import re html = """ <h2>多云</h2> """ if __name__ == '__main__': p = re.compile('<[^>]+>') print p.sub("", html)
取温度
#!/usr/bin/env python #-*- coding: utf8 -*- import re html = """ <div class="w-number"> <span class="tpte">14℃</span> </div> """ if __name__ == '__main__': p = re.compile('<[^>]+>') print p.sub("", html)