I want to screen scrape using PHP cURL from a shared server (only ports 80 and 443 are open) from behind the TOR network. I try the code below and get "Access Denied" error from my server because port 8118 and 9050 are closed. I contacted support and they said it is impossible. I doubt it, but searched forever and couldn't find an easy solution. Any thoughts?
<?php
$fh = fopen('curldebug.txt','w') or die($php_errormsg);
// Initialize cURL
$ch = curl_init();
// Set the website you would like to scrape
curl_setopt($ch, CURLOPT_URL, "http://www.fixitts.com/whatismyip.php");
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0');
curl_setopt($ch, CURLOPT_REFERER, 'http://www.fixitts.com');
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:8118');
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $fh);
// Set cURL to return the results into a PHP variable
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// This executes the cURL request and places the results into a variable.
$curlResults= curl_exec($ch);
if(curl_errno($ch))
{
echo 'Curl error: ' . curl_error($ch);
}
$info = curl_getinfo($ch);
print_r ($info);
// Close curl
curl_close($ch);
fclose($fh) or die($php_errormsg);
// Echo the results to the screen>
echo $curlResults;
?>
Your web host's support is probably correct.
As a side note, screen-scraping through TOR is an incredibly antisocial thing to do. It is a large part of the reason why many web sites block access from known TOR exit nodes. Please stop.
I assume you have a local proxy listening on 8118 (Polipo or Privoxy).
Ports 8118 and 9050 are what TOR and Polipo use by default, on localhost (127.0.0.1).
The localhost ports are NOT being blocked by a shared server - 127.0.0.1 is YOUR PC. If they are blocked, you have something on your PC (a firewall) doing that.
Also, you can tell TOR and Polipo (or whatever) to use different ports in their configuration files. Change 8118 to something else in your code above, and also on Polipo/Privoxy.
Is does not matter if the shared server is restricted to 80 and 443. That's all TOR needs to send your stuff out. The TOR exit server unwraps whatever it gets and sees what port it's supposed to go to (the original destination port).
It is possible that the shared server is blocking port 80 and 443 connections to known TOR servers. Open up a browser, set the proxy to SOCKS127.0.0.1 port 9050, and see if you can browse the web. If that doesn't work, you probably have your answer. You can check the TOR documentation. I recall they tell you there how to tell if TOR is being blocked.