I have a couple of websites which I check regularly to compare product prices. Atm I have to login manually and search by a product id on every website in order to get the product details (price).
This is time consuming and boring after a while.
I was thinking about making a web application in which I could enter all those websites with my login credentials. I would simply need to enter a product ID and my webapp should get all the results from those websites and display them in a comparable way.
I wouldn't assume those websites have API's so I'm searching for the best way to approach this. I'm thinking it's not that simple since I need to login + search for a product.
Any recommendations on how I could accomplish this?
Thanks!
+1 to Marc B's comment. If the TOS doesn't explicitly forbid it (and since this would also be considered a crawler), you should see if /robots.txt
disallows you from accessing the product search. If neither forbid you, I would suggest using a browser-based bot to fetch results for you, simply because it sounds more practical and you wouldn't have to deal with cookies.
If you want to make the page requests with PHP, though, I would direct you to HttpRequest. Have a page where you can log into all the sites (using a POST request right on the login scripts), and keep the session cookies returned handy. When you search the product pages, identify what part of the HTML consistently returns the list of products after it (a regex may be helpful), and create an algorithm (which should be different for every website you want to scrape) that returns information about the product. Then compare the results!