GoLang刮板机。如何在网站上抓取动态生成的链接？

I am trying to scrape product video links (which are generated dynamically by another web service. The location is under the product images on the left side). You can check following link, https://www.tokopedia.com/chocoapple/ready-stock-bnib-iphone-128gb-7-plus-jet-black-garansi-apple-1-tahun-10?src=topads The google chrome "inspect element" shows the div tag. But The same tag is not present in the page source. How to do it? I am looking into goQuery to implement the task but not sure will it work or not. I am not a web developer so please consider giving suggestions if my question description is not specific. Thank you.

If the tag is not in the source, then GoQuery will not work. GoQuery is for parsing HTML source using a jQuery-like API.

You need to first process the webpage with a headless WebKit like phantomjs, chromeless, or puppeteer. Each of these tools will allow you to process all the Javascript on the webpage before processing it. This way, the AJAX for rendering the video you are interest in will be processed and the source will be updated. You can then download the corresponding source which should have the div in it.

Please find the next tag <img class="thumbnail-img horizontal" src="//i.ytimg.com/vi/oKR2fh09Nic/mqdefault.jpg">. As you see src contain ID "oKR2fh09Nic". This is need path https://www.youtube.com/watch?v=oKR2fh09Nic

Also, you can use http://youtube.com/get_video_info?video_id= oKR2fh09Nic for loading video information.

Example here https://github.com/kkdai/youtube/blob/master/youtube.go

You probably need to evaluate the page like a browser does. As schollz answered it, this is possible via so called headless browser (browsers usable via the cli or an api, which does not show their gui).

In go world there is chromedp

https://github.com/knq/chromedp

https://www.youtube.com/watch?v=_7pWCg94sKw

GoLang刮板机。 如何在网站上抓取动态生成的链接？

GoLang刮板机。如何在网站上抓取动态生成的链接？