I try to scrape some websites with the help of Python and BeautifulSoup.
When a website uses an ajax query with this kind of URL:
https://techcrunch.com/wp-json/tc/v1/magazine?page=2&_embed=true,
I can get the JSON content and analyze it. But how can I detect this links to automatically execute a query to get the JSON content?
Thanks, Rata
I recommend using the requests library in addition to BeautifulSoup, if you aren't already.
Assuming you have a reliable way to scrape these urls, you can do something like:
import requests
# ...
response = requests.get('https://techcrunch.com/wp-json/tc/v1/magazine?page=2&_embed=true')
try:
json_response = response.json()
# GET request returned a JSON response
# ...
except ValueError:
# GET request did not return JSON response
# ...