I am new in Go. I'm trying to do scraper for the data from one my website, to be able to use it inside the Go app.
I using goroutines and sync.WaitGroup for waiting for results, but I have a problem. If I use goroutines and try to use json.Marshal for my dataset, I have empty array inside the structure, that is filling inside the goroutines.
If I am filling my structure without routines, I have everything working well.
Here are my structures:
type CategoryScrapper struct {
Name string `json:"name"`
Link string `json:"link"`
Products []Product.ProductData `json:"products"`
}
type ProductData struct {
Name string `json:"name"`
Link string `json:"link"`
Thumbnail string `json:"thumbnail"`
OriginPrice string `json:"OriginPrice"`
Excerpt string `json:"Excerpt"`
}
Here is a part of my app
func main() {
cs := Category.CategoryScrapper{
Name: "Name",
Link: "/link",
}
wg := new(sync.WaitGroup)
go cs.GetProducts(wg)
wg.Wait()
res, _ := json.Marshal(cs)
fmt.Println(string(res))
}
func (s *CategoryScrapper) GetProducts(pool *sync.WaitGroup) {
pool.Add(1)
defer pool.Done()
maxPageNum := s.getMaxPageNum()
localPool := new(sync.WaitGroup)
s.Products = make([]Product.ProductData, 0)
for i := 1; i <= maxPageNum; i++ {
go s.getPage(i, localPool)
}
localPool.Wait()
}
func (s *CategoryScrapper) getPage(page int, waitingPool *sync.WaitGroup) {
product := Product.ProductData{
Name: "Name",
Link: "Link",
Thumbnail: "Thumb",
OriginPrice: "1111",
Excerpt: "Excerpt",
}
s.Products = append(s.Products, product)
}
The abstractions used here don't seem consistent for me. I'd just make the basic call to GetProducts a blocking call, then just use the wait group for when you need them. Once you simplify then you can see that there is no sync action happening where the data is written. (in getPage) so you do end up with nil
Race condition fixed
package main
import (
"encoding/json"
"fmt"
"sync"
)
type CategoryScrapper struct {
Name string `json:"name"`
Link string `json:"link"`
Products []ProductData `json:"products"`
}
type ProductData struct {
Name string `json:"name"`
Link string `json:"link"`
Thumbnail string `json:"thumbnail"`
OriginPrice string `json:"OriginPrice"`
Excerpt string `json:"Excerpt"`
}
func (*CategoryScrapper) getMaxPageNum() int {
return 1
}
func main() {
cs := CategoryScrapper{
Name: "Name",
Link: "/link",
}
cs.GetProducts()
res, err := json.Marshal(cs)
if err != nil {
fmt.Printf("ERROR: %v", err)
}
fmt.Println(string(res))
}
func (s *CategoryScrapper) GetProducts() {
maxPageNum := s.getMaxPageNum()
var wg sync.WaitGroup
ch := make(chan ProductData)
go func() {
for p := range ch {
s.Products = append(s.Products, p)
}
wg.Done()
}()
for i := 1; i <= maxPageNum; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
ch <- s.getPage(i)
}(i)
}
wg.Wait()
// make sure we close the chan reader go routine
wg.Add(1)
close(ch)
wg.Wait()
}
func (s *CategoryScrapper) getPage(page int) ProductData {
return ProductData{
Name: "Name",
Link: "Link",
Thumbnail: "Thumb",
OriginPrice: "1111",
Excerpt: "Excerpt",
}
}
A few stylistic points: For your Products
slice you don't need to initialise it to a zero length size, nil is find to append to. I removed a number package prefixes like Product
it helps people answer if you can get you example to run in play.golang.org easily
Hope this helps. It is only obvious when you've done it a few times, and even then, not always :-)