I hope you are doing great
I am trying to scrape this website using the scrapy framework. I just need to output the result titles in a csv file. The task seemed easy at first glance, but when using the scrapy shell, I found that the response object is empty. Here is a screenshot:
I said to myself that this website uses AJAX requests. I took a look on developers tab and found that indeed, the website uses AJAX requests. I reverse-engineered it and found the request url for the AJAX call which is this : https://www.tineye.com/search/get_domains/9fdeed61d697e871c74e38116d9c41276bce052e?
here is an image showing what I did
Then I modified my code accordingly, to request the url of the AJAX call, rather than the page itself.
import scrapy
import json
#cpt = 0
class SearchSpider(scrapy.Spider):
name = "search"
allowed_domains = ["www.tineye.com"]
start_urls = ["https://www.tineye.com/search/get_domains/9fdeed61d697e871c74e38116d9c41276bce052e?"]
def parse(self, response):
#global cpt
data = json.loads(response.text)
data = data["domains"]
for item in data:
#cpt = cpt + int(item[1])
yield {
"link": item[0],
"times": item[1],
}
But once again, the response object is empty. What should I do? Thanks in advance