PHP多cURL性能比顺序file_get_contents差

I am writing an interface in which I must launch 4 http requests to get some infomation.

I implemented the interface in 2 ways:

using sequential file_get_contents.
using multi curl.

I have benchmarked the 2 versions with jmeter. The result shows that multi curl is much better than sequential file_get_contents when there's only 1 thread in jmeter making requests, but much worse when 100 threads.

The question is: which could bring the bad performance of multi curl?

My multi curl code is as below:

$curl_handle_arr = array ();
$master = curl_multi_init();
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = curl_init( $url );
    $curl_handle_arr [$key] = $curl_handle;
    curl_setopt( $curl_handle , CURLOPT_RETURNTRANSFER , true );
    curl_setopt( $curl_handle , CURLOPT_POST , true );
    curl_setopt( $curl_handle , CURLOPT_POSTFIELDS , http_build_query( $params_arr [$key] ) );
    curl_multi_add_handle( $master , $curl_handle );
}
$running = null;
$mrc = null;
do
{
    $mrc = curl_multi_exec( $master , $running );
}
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
while ( $running && $mrc == CURLM_OK )
{
    if (curl_multi_select( $master ) != - 1)
    {
        do
        {
            $mrc = curl_multi_exec( $master , $running );
        }
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    }
}
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = $curl_handle_arr [$key];
    if (curl_error( $curl_handle ) == '')
    {
        $result_str_arr [$key] = curl_multi_getcontent( $curl_handle );
    }
    curl_multi_remove_handle( $master , $curl_handle );
}
curl_multi_close( $master );

1. Simple optimization

You should sleep about 2500 microseconds if curl_multi_select failed.
Actually, it defintely fails sometimes for each execution.
Without sleeping, your CPU resources get occupied by lots of while (true) { } loops.
If you do nothing after some (not all) of the requests have finished,
you should let maximum timeout seconds larger.
Your code is written for old libcurls. As of libcurl version 7.2,
the state CURLM_CALL_MULTI_PERFORM does not appear anymore.

So, the following code

$running = null;
$mrc = null;
do
{
    $mrc = curl_multi_exec( $master , $running );
}
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
while ( $running && $mrc == CURLM_OK )
{
    if (curl_multi_select( $master ) != - 1)
    {
        do
        {
            $mrc = curl_multi_exec( $master , $running );
        }
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    }
}

should be

curl_multi_exec($master, $running);
do
{
    if (curl_multi_select($master, 99) === -1)
    {
        usleep(2500);
        continue;
    }
    curl_multi_exec($master, $running);
} while ($running);

Note

The timeout value of curl_multi_select should be tuned only if you want to do something like...

curl_multi_exec($master, $running);
do
{
    if (curl_multi_select($master, $TIMEOUT) === -1)
    {
        usleep(2500);
        continue;
    }
    curl_multi_exec($master, $running);
    while ($info = curl_multi_info_read($master))
    {
        /* Do something with $info */
    }
} while ($running);

Otherwise, the value should be extreamly large.
(However, PHP_INT_MAX is too large; libcurl treats it as an invalid value.)

2. Easy experiment in one PHP process

I tested using my parallel cURL executor library: mpyw/co

(The prep. for is improper and it should be by, sorry for my poor English xD)

<?php 

require 'vendor/autoload.php';

use mpyw\Co\Co;

function four_sequencial_requests_for_one_hundread_people()
{
    for ($i = 0; $i < 100; ++$i) {
        $tasks[] = function () use ($i) {
            $ch = curl_init();
            curl_setopt_array($ch, [
                CURLOPT_URL => 'example.com',
                CURLOPT_FORBID_REUSE => true,
                CURLOPT_RETURNTRANSFER => true,
            ]);
            for ($j = 0; $j < 4; ++$j) {
                yield $ch;
            }
        };
    }
    $start = microtime(true);
    yield $tasks;
    $end = microtime(true);
    printf("Time of %s: %.2f sec
", __FUNCTION__, $end - $start);
}

function requests_for_four_hundreds_people()
{
    for ($i = 0; $i < 400; ++$i) {
        $tasks[] = function () use ($i) {
            $ch = curl_init();
            curl_setopt_array($ch, [
                CURLOPT_URL => 'example.com',
                CURLOPT_FORBID_REUSE => true,
                CURLOPT_RETURNTRANSFER => true,
            ]);
            yield $ch;
        };
    }
    $start = microtime(true);
    yield $tasks;
    $end = microtime(true);
    printf("Time of %s: %.2f sec
", __FUNCTION__, $end - $start);
}

Co::wait(four_sequencial_requests_for_one_hundread_people(), [
    'concurrency' => 0, // Zero means unlimited
]);

Co::wait(requests_for_four_hundreds_people(), [
    'concurrency' => 0, // Zero means unlimited
]);

I tried for five times to get the following results:

I also tried in reverse order (The 3rd request was kicked xD):

These results represent too many concurrent TCP connections actually decrease throughputs.

3. Advanced optimization

3-A. For different destinations

If you want to optimize for both few and many concurrent requests, the following dirty solution may help you.

Share the number of requesters using apcu_add / apcu_fetch / apcu_delete.
Switch methods(sequencial or parallel) by current value.

3-B. For the same destinations

CURLMOPT_PIPELINING will help you. This option bundles all HTTP/1.1 connections for the same destination into one TCP connection.

curl_multi_setopt($master, CURLMOPT_PIPELINING, 1);