针对80k行的PHP数组优化

I need help to find workaround for getting over memory_limit. My limit is 128MB, from database I'm getting something about 80k rows, script stops at 66k. Thanks for help.

Code:

$posibilities = [];
    foreach ($result as $item) {
            $domainWord = str_replace("." . $item->tld, "", $item->address);

            for ($i = 0; $i + 2 < strlen($domainWord); $i++) {
                $tri = $domainWord[$i] . $domainWord[$i + 1] . $domainWord[$i + 2];


                if (array_key_exists($tri, $possibilities)) {
                    $possibilities[$tri] += 1;
                } else {
                    $possibilities[$tri] = 1;
                }
            }
        }

Your bottleneck, given your algorithm, is most possibly not the database query, but the $possibilities array you're building.

If I read your code correctly, you get a list of domain names from the database. From each of the domain names you strip off the top-level-domain at the end first.

Then you walk character-by-character from left to right of the resulting string and collect triplets of the characters from that string, like this:

example.com => ['exa', 'xam', 'amp', 'mpl', 'ple']

You store those triplets in the keys of the array, which is nice idea, and you also count them, which doesn't have any effect on the memory consumption. However, my guess is that the sheer number of possible triplets, which is for 26 letters and 10 digits is 36^3 = 46656 possibilities each taking 3 bytes just for key inside array, don't know how many boilerplate code around it, take quite a lot from your memory limit.

Probably someone will tell you how PHP uses memory with its database cursors, I don't know it, but you can do one trick to profile your memory consumption.

Put the calls to memory-get-usage:

  • before and after each iteration, so you'll know how many memory was wasted on each cursor advancement,
  • before and after each addition to $possibilities.

And just print them right away. So you'll be able to run your code and see in real time what and how seriously uses your memory.

Also, try to unset the $item after each iteration. It may actually help.

Knowledge of specific database access library you are using to obtain $result iterator will help immensely.

Given the tiny (pretty useless) code snippet you've provided I want to provide you with a MySQL answer, but I'm not certain you're using MySQL?

But - Optimise your table.

  • Use EXPLAIN to optimise your query. Rewrite your query to put as much of the logic in the query rather than in the PHP code. edit: if you're using MySQL then prepend EXPLAIN before your SELECT keyword and the result will show you an explanation of actually how the query you give MySQL turns into results.

  • Do not use PHP strlen function as this is memory inefficient - instead you can compare by treating a string as a set of array values, thus:

    for ($i = 0; !empty($domainWord[$i+2]); $i++) {

  • in your MySQL (if that's what you're using) then add a LIMIT clause that will break the query into 3 or 4 chunks, say of 25k rows per chunk, which will fit comfortably into your maximum operating capacity of 66k rows. Burki had this good idea.

At the end of each chunk clean all the strings and restart, set into a loop

$z = 0;
 while ($z < 4){
///do grab of data from database. Preserve only your output 
$z++;
}

But probably more important than any of these is provide enough details in your question!! - What is the data you want to get? - What are you storing your data in? - What are the criteria for finding the data?

These answers will help people far more knowledgable than me to show you how to properly optimise your database.