通过MySQL获取最不随机的记录集

I am trying to find a proper way of getting X random records. Also I want to check usage of a specific record so I won't use the same random record as often as others.

I am testing the set with these 3 tables, one table for questions, one table for users, and one table for the served question for a specific user. I want to make this perform with around 6000 questions.

CREATE TABLE `questions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `question` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `served` (
  `user` int(11) NOT NULL DEFAULT '0',
  `question` int(11) NOT NULL DEFAULT '0',
  `count` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`user`,`question`),
  KEY `count` (`count`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `nickname` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The query I found to work very well with getting random records from the question table is as following:

SELECT  id, question
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        @lim := @lim - 1
                FROM    questions q 
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ) i

But now I would like to incorporate the served table to make sure the random values being picked from the questions that were served the least. The query I thought of was as following:

SELECT  id, question, count
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        s.count,
                        @lim := @lim - 1
                FROM    questions q
                LEFT JOIN served s
                ON s.question = q.id
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ORDER BY count ASC) i

The problem with this query is that it never gives my limit of 10 results + it never gives the records I would want. Could anyone push me in the right direction?

As requested a SQL Fiddle with some data to test with: http://sqlfiddle.com/#!2/3e5ed/5. I would expect the results to be 10 questions where the "count" of served for user 1 is the least (or not existing offcourse).

I have ended up using a modified query, it had to be quick:

SELECT q.*, s1.count AS count_a, s2.count AS count_b
FROM questions q
LEFT JOIN served s1
ON (s1.question = q.id AND s1.user = 1)
LEFT JOIN served s2
ON (s2.question = q.id AND s2.user = 2)
WHERE q.categorie = 1
ORDER BY IFNULL(s1.count, 0) + IFNULL(s2.count, 0) + RAND()
LIMIT 10

A common way that people get random records in MySQL is like this:

To get 10 random records:

SELECT * FROM questions
ORDER BY RAND()
LIMIT 10

Of course, as may be obvious, this gets all the records in the database, then sorts them randomly in order to get 10 records. It doesn't actually just choose 10 random records from the database. However, this method does easily prevent duplicates.

Now, using the same technique, if you wanted to favor less served questions, you could do something like this:

SELECT questions.* FROM questions
LEFT JOIN served
ON served.question = questions.id
ORDER BY IFNULL(served.count, 0) + RAND()
LIMIT 10

Tweak the algorithm to alter the amount you favor the serve count.

There are more performant ways to fetch random records, such as getting the maximum primary key value (assuming auto_increment) then use RAND() on that, then pick just one record. You can use LIMIT 1 just in case RAND() returns a gap in your keys. However, then you could have duplicates if you repeat this process to return more than one record.

If you have contiguous auto_increment values, you could easily leverage PHP to choose a random set of keys then fetch each record individually. If they're not contiguous, you first fetch a list of keys.

These techniques are covered in more detail in Chapter 16 Random Selection, in the book SQL Antipatterns.