I'm trying to scrape some recipes off a page to use as samples for a school project, but the page just keeps loading a blank page.
I'm following this tutorial - here
This is my code:
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$continue = true;
$url = curl("https://www.justapinch.com/recipes/main-course/");
while ($continue == true) {
$results_page = curl($url);
$results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
$separate_results = explode("<h3 class=\"tight-margin\"",$results_page);
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
}
}
// Commented out to test code above
// if (strpos($results_page,"Next Page")) {
// $continue = true;
// $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
// if (strpos($url,"Back</a>")) {
// $url = scrape_between($url,"Back</a>",">Next Page");
// }
// $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
// } else {
// $continue = false;
// }
// sleep(rand(3,5));
print_r($results_urls);
}
?>
I'm using cloud9
and I've installed php5 cURL
, and am running apache2
. I would appreciate any help.
This is where the problem lies:
$results_page = curl($url);
You tried to fetch content not from a URL, but from a HTML page. Because, right before while()
, you set $url
to the result of a page. I think you should do the following:
$results_page = curl("https://www.justapinch.com/recipes/main-course/");
edit:
You should change how you query the html to using DOM.
why do people do this? code completely void of error checking, then they go to some forum and ask why is this code, which completely ignores any and all errors, not working?
I DONT FKING KNOW, BUT AT LEAST YOU COULD PUT UP SOME ERROR CHECKING AND RUN IT BEFORE ASKING. it's not just you, lots of people are doing it, and its annoying af, and you should all feel bad for doing it. curl_setopt returns bool(false) if there's an error setting the option. curl_exec returns bool(false) if there was an error in the transfer. curl_init returns bool(false) if there was an error creating the curl handle. extract the error description with curl_error, and report it with \RuntimeException. now delete this thread, add some error checking, and if the error checking does not reveal the problem, or it does but you're not sure how to fix it, THEN make a new thread about it.
here's some error-checking function wrappers to get you started:
function ecurl_setopt ( /*resource*/$ch , int $option , /*mixed*/ $value ):bool{
$ret=curl_setopt($ch,$option,$value);
if($ret!==true){
//option should be obvious by stack trace
throw new RuntimeException ( 'curl_setopt() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
}
return true;
}
function ecurl_exec ( /*resource*/$ch):bool{
$ret=curl_exec($ch);
if($ret!==true){
throw new RuntimeException ( 'curl_exec() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
}
return true;
}
function return_var_dump(/*...*/){
$args = func_get_args ();
ob_start ();
call_user_func_array ( 'var_dump', $args );
return ob_get_clean ();
}