刮书价格

I'm trying to write a scrape app, and I'm running in to problems. My PHP Curl code isn't pulling up the pages with the price of the books. It's returning me to the web root of the domain.

I'm trying to search the site by ISBN.

I've been bashing my head against the wall for days. Any help will be most appreciated!

Code:

<form method="post" for="new-search" name="SearchTerm" class='form-validate' id="SearchTerm" action="index.php">
    <textarea rows="3" name="SearchTerm" id="SearchTerm" cols="40" class="validate-required error"></textarea><div class="error" id="SearchTerm-error">
    <br>                        
    <button class="search primary" type="submit">continue</button>

</form>


<?php

/*
echo("<pre>");print_r($_GET);echo("</pre>");
echo("<pre>");print_r($_POST);echo("</pre>");
*/

$isbn = $_POST['SearchTerm'];


$userAgent = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16';

$fields = array(
    'url' => ("http://www.bookleberry.com/Search/SearchKeyword"),
    'qurl' => ("http://www.bookleberry.com/Search/SearchKeyword/" . $_POST['SearchTerm']),
    'SearchTerm' => ($_POST['SearchTerm']),
    'Page' => ('1'),
    'class' => ('textfield validate-required'),
    'for' => ('new-search'),
    'result-count' => ('1'),
    'status' => 'success',
);

$SearchTerm = ($fields['SearchTerm']);
$url = ($fields['url']);
$Page = ($fields['Page']);


echo("<pre>");
print_r($fields);
echo("</pre>");

if ($isbn != NULL){

    //open connection
    $ch = curl_init($url);
    //set the url, number of POST vars, POST data
    curl_setopt($ch, CURLOPT_HEADER, $userAgent);
    curl_setopt($ch, CURLOPT_URL, $url);
        echo "before curl_exec:<br>";
        echo "curl_errno=". curl_errno($ch) ."<br>";
        echo "curl_error=". curl_error($ch) ."<br>";
    curl_setopt($ch,CURLOPT_POST,count($fields));
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "?SearchTerm=$SearchTerm");
    curl_setopt($ch, CURLOPT_HTTPGET, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 9999999);
     curl_setopt($ch,CURLOPT_HTTPHEADER,array (
        "Accept: application/json"
    ));




    $info = curl_getinfo($ch);

    //execute post
    $result = curl_exec($ch);
    print $result;


print "<pre>
";
print_r(curl_getinfo($ch));  // get error info

?>

Don't hurt your head, use it!

  • Install fiddler.
  • Do a request using the browser, look in fiddler to exactly what is posted. This includes all headers, cookies and form variables.
  • Do a post using your code, examine fiddler again
  • Compare the differences between the two and adjust your script.
  • Repeat.

Also it helps to install firebug. Using the copy Xpath, and putting that into a php DOM xpath query makes scraping fun and easy!