I try to use this golang package to scrape website images.
This is the html node that I need to scrape.
<ul class="list clearfix">
<li>
<div>
<a href=www.example.com/asda">
<img src="..sadsada./ssa/3.jpg">
</a>
</div>
</li>
<li>
<div>
<a href=www.example.comsdsds/sds">
<img srr="..sadsada./ssa/2.jpg">
</a>
</div>
</li>
<li>
<div>
<a href=www.example.com/sdds">
<img src="..sadsada./ssa/1.jpg">
</a>
</div>
</li>
.......
</ul>
How do I get the image src?
Here is the matches I tried:
matcher := func(n *html.Node) bool {
if n.DataAtom == atom.A && n.Parent != nil && n.Parent.Parent != nil && n.Parent.Parent.Parent != nil && n.Parent.Parent.Parent.Parent != nil {
return scrape.Attr(n.Parent.Parent.Parent.Parent, "class") == "list clearfix"
}
return false
}
images := scrape.FindAll(root, matcher)
But it doesn't work.
Fixed code:
matcher := func(n *html.Node) bool {
if n.Data == "img" && // Is img tag
n.Parent != nil && // Parent exists
n.Parent.DataAtom == atom.A && // Parent is <a>
n.Parent.Parent != nil && // Parent's Parent exists (div)
n.Parent.Parent.Parent != nil && // Parent's Parent's Parent exists (li)
n.Parent.Parent.Parent.Parent != nil { // Parent's Parent's Parent's Parent exists (ul)
return scrape.Attr(n.Parent.Parent.Parent.Parent, "class") == "list clearfix"
}
return false
}
images := scrape.FindAll(root, matcher)
for i, img := range images {
src := scrape.Attr(img, "src")
fmt.Printf("Image %d src=%s
", i, src)
}
I just modified your matcher
func to fix the issues you had.
Also note, your HTML in your question is invalid. There were a few missing "
's along with a mispelt src
attribute.