生成非重复数字,警惕数据库

I am looking to generate a random number that has no repeats according to a MySQL database. How could I change the following function so that it checks a database if the number(s) generated already exist, and if not, to insert them into a table. Also, how should said table be laid out? I am not the best designing the rows to make them as small (and practical) as possible.

The function:

function genNonRepNum($min, $max, $quantity) {
    $numbers = range($min, $max);
    shuffle($numbers);
    return array_slice($numbers, 0, $quantity);
}


Using the function:

print_r(genNonRepNum(1000, 10000, 3));

returns:

Array ( [0] => 8586 [1] => 9666 [2] => 8169 )

Which is awesome, but I only wish for it to check a database to see whether it exists, and to insert it if it doesn't. Thanks in advance.

The solution here is to use encryption on an auto incrementing number. As an example of what I mean, imagine that you had an encryption algorithm that took in 8 bits, a key and spit out 8 bits of encrypted data. If you used the same key and encrypted the values 0 to 255, you'd get as output all values 0 to 255 but in a different order. You can't get any duplicates because encryption by definition is reversible, which means two different values can't encrypt to the same value, with the same key and algorithm, because you wouldn't be able to unencrypt it. The number sequence would appear random due to cryptographic qualities like the avalanche effect. So basically, you just need to encrypt an auto incrementing number, with a secret key, using an algorithm of your choice based on your quality vs speed needs. This is how new credit card numbers are generated, garaunteeing that a card number hasn't been issued yet. For more information check out "format preserving encryption".

I think that's not the way you should go about it just use the php openssl-random-pseudo-bytes which gives you a string of random bytes (any cryptographically secure random generator/hash function that can create unpredictable secure IDs would do the work) -this will give keys that are for sure different(or highly unlikely to be the same in theory) or if you want a nice number use auto-increment and get lastinsert_id provided by your driver eg :

mysqli_insert_id()
PDO::lastInsertId()

after the query is executed and use it in your software

Checking manually if inserted and retrying is both expensive and bad and should be avoided

Please try to keep your trips to the database to the minimum!!!!!!!

First of all, usually you don't do things this way - you could design a large enough number that probability of collision is negligible, then use a strong random generator to generate random numbers of that size.

Otherwise, since you would have to store the numbers anyway (unless you implement some sort of esoteric cryptographic scheme -- you could for example store a counter and encrypt it using AES; of course the numbers would not be truly "random", but neither would they be easy to predict), and you have to define the random range at the beginning, you can create a table with the random numbers already inserted: fill a table with sequential numbers, then insert into another with a unique auto_increment key ordering by RAND().

Now when you need a random number, you fetch the i-th element of the table and read off its value column, then increment i to ensure you do not reuse the number. You'll want to use locking and transactions in case two processes both need a random number.

This also has the advantage of knowing in advance when the number pool is going to dry out; at that point you can insert new numbers. This will be slightly less random (the first million numbers will be random in the 0-999999 range, the second million numbers will be in the 1000000-1999999 range), but maybe it's enough for your purposes.

Or you can create the random table with two UNIQUE columns

CREATE TABLE randompool (
    id integer not null primary key auto_increment,
    value integer
);
CREATE UNIQUE INDEX randompool_uniq ON randompool(value);

and use a secondary process to check when a global variable saved somewhere, NEXT_ID, is say within 10% of COUNT(*) FROM randompool - meaning that the random pool is down to 10% capacity - and, if so, generate some random numbers and try to insert them

INSERT IGNORE INTO randompool (value) VALUES (?),(?),(?),...

Of course, the larger randompool, the less efficient this operation. When randompool contains 200 million numbers, generating a random positive signed 32-bit number will have a 10% probability of hitting a duplicate and being rejected, so that inserting 1000 new random numbers will cost proportionately more than at the beginning; also considering that index duplicate lookup will cost more. Not so much more, but more nonetheless. But if the process is independent and runs when the system is not too loaded, this could well be a non-issue.

Selection of random numbers would still use the NEXT_ID counter to directly fetch them from the table, so that reading off numbers would be very cheap.