处理大卷曲响应 - PHP

I wrote a PHP script that makes HTTP POST request using curl and does the following,

  • Prepare post variables
  • Initialize curl
  • Set client cookie to use in request
  • Set POST variables as query string
  • Set other curl options
  • Execute curl

Here is the code:

    $ch = curl_init ( $url );

    curl_setopt ( $ch, CURLOPT_COOKIE, "cookie=cookie");
    curl_setopt ( $ch, CURLOPT_POST, 1);
    curl_setopt ( $ch, CURLOPT_POSTFIELDS, $post_string);
    curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ( $ch, CURLOPT_HEADER, 0);
    curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1);

    $response = curl_exec( $ch );
    // this point
    extr ( $response, $param_1, $param_2);

Problem is, sometimes the response is larger than 1GB, so the PHP code pauses until, full response is received (shown in code as // this point), and if there is malformed HTML receive, PHP generates error so, all thing here needs to do from beginning.

Here is rest of the functions:

function extr($string = '',$a,$b)
{
    $doc = new DOMDocument;
    @$doc -> loadHTML($string);
    $table = $doc -> getElementById('myTableId');

    if(is_object($table)):
    foreach ($table->getElementsByTagName('tr') as $record)
    {
        $rec = array();
        foreach ($record->getElementsByTagName('td') as $data)
        {
            $rec[] = $data -> nodeValue;
        }
        if ($rec)
        {
            put_data($rec);
        }
    }
    else:
    {
        echo 'Skipped: Param1:'.$a.'-- Param2: '.$b.'<br>';
    }
    endif;
}

function put_data($one = array())
{
    $one = json_encode($one) . "
";
    file_put_contents("data.json", $one, FILE_APPEND);
}

ini_set('max_execution_time', 3000000);
ini_set('memory_limit', '-1');

The alternative i can think of is process data as it received, if possible , using curl, or continue previous curl request from the previous state.

Is there any possible workaround for this?

Do i need to switch to any other language than PHP for this ?

You can process the data in chunks as they come using CURLOPT_WRITEFUNCTION option with a callback:

curl_setopt($ch, CURLOPT_WRITEFUNCTION, function(&$ch, $data) {
   echo "

chunk received:
", $data; // process your chunk here
   return strlen($data); // returning non-positive number aborts further transfer
});

As was already mentioned in the comments though, if your response content type is HTML that you're loading into DOMDocument, you'll need the full data first anyway.

you can do two things:

a) use a SAX parser. A Sax parser is like a DOM parser, but it can deal with streaming input where a DOM parser has to have the whole document, or it will throw errors. The Sax parser will just feed you events to process.

What is the difference between SAX and DOM?

b) when using the SAX parser, pass it data incrementally using CURLOPT_WRITEFUNCTION .. just saw that lafor also posted this, so upvoting that