I came across a situation which looks difficult as I am newbie. I searched on google but could not get any idea about it. There is a site which provide users a facility to search for doctors in their area/state etc.
There may be a situation when number of doctors addresses can be increased on that site in any state. How can I know this new doctor added in that site, without visiting and searching in that site.
In other words. Suppose you have a site. If you change your site content. How can I know that you change your site contents.
I want to syncs the records with theirs, either daily or weekly because their databases changes.
you can use cURL in PHP.
a basic example of usage :
<?php
$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>
You can also define a whole range of parameters, so it's easy to mimic a POST
or GET
- then you run the script regularly as a CRON job, save the results each time to a new file and compare it to the old one with PHP
If they have some incentive to share updates with you (such as a business development arrangement), you can request read-only access to their database, an API interface, or that they deploy a script on their end to ping you whenever there is an update, or even send you updates as they come.
Most likely you just need to rescan their site on some interval and update your data accordingly. You can avoid unnecessary processing and bandwidth overhead if you:
Analyze the HTTP header returned by the page. It may contain a "Last-modified" or "etag" which will help you identify whether the page has been changed since the last time you visited it.
Use a CRC32() or md5() hash function on the contents loaded. If it has changed at all it will produce a different hash.