长PHP脚本运行多次

I have a products database that synchronizes with product data ever morning.

The process is very clear:

  • Get all products from database by query
  • Loop through all products, and get and xml from the other server by product_id
  • Update data from xml
  • Log the changes to file.

If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.

I checked everything I could think of, for example:

  • Are variables not used twice without overwriting each other
  • Does the function call itself
  • Does it happen with a low amount of products too: no.
  • The script is called using a cronjob, are the settings ok. (Yes)

The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?

EDIT wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute

Code is hundreds of lines long...

Thanks

I solved the problem myself. Thanks for all the replies!

My MySQL timed out, that was the problem. As soon as I added:

    ini_set('mysql.connect_timeout', 14400);
    ini_set('default_socket_timeout', 14400);

to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

You have:

  • max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
  • memory_limit disabled. There is no limit to how much data stored in memory.

500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.

If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.

You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.

class ProductCronJob
{
    protected $lockValue;

    public function run()
    {
        // Obtain a lock
        if ($this->obtainLock()) {
            // Run your script if you have valid lock
            $this->syncProducts();

            // Release the lock on complete
            $this->releaseLock();
        }
    }

    protected function syncProducts()
    {
        // your long running script
    }

    protected function obtainLock()
    {
        $time = new \DateTime;
        $timestamp = $time->getTimestamp();
        $this->lockValue = $timestamp . '_syncProducts';

        $db = JFactory::getDbo();

        $lock = [
            'lock'         => $this->lockValue,
            'timemodified' => $timestamp
        ];
        // lock = '0' indicate that the cronjob is not active.
        // Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
//        $result = $db->updateObject('#__cronlock', $lock, 'id');

//        $lock = SELECT * FROM #__cronlock where name = 'syncProducts';

        if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
            // Currently there is an active process - can't start a new one

            return false;

            // You can return false as above or add extra logic as below

            // Check the current lock age - how long its been running for
//            $diff = $timestamp - $lock['timemodified'];
//            if ($diff >= 25200) {
//                // The current script is active for 7 hours.
//                // You can change 25200 to any number of seconds you want.
//                // Here you can send notification email to site administrator.
//                // ...
//            }
        }

        return true;
    }

    protected function releaseLock()
    {
        // Update #__cronlock set lock = '0' where name = 'syncProducts'
    }
}

I see two possibilities: - chron calls the script much more often - script takes too long somehow.

you can try estimate the time a single iteration of the loop takes. this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.

$productsToSync = $db->loadObjectList();

and

foreach ($productsToSync AS $product) {

it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.

I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.

also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.

Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:

wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync

Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...

Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?