数据库设计进行优化

A couple of years ago I designed a reward system for 11-16yo students in PHP, JavaScript and MySQL.

The premise is straightforward:

Members of staff issue points to students under various categories ("Positive attitude and behaviour", "Model citizen", etc)
Students accrue these points then spend them in our online store (iTunes vouchers, etc)

Existing system

The database structure is also straightforward (probably too much so):

Transactions

239,189 rows

CREATE TABLE `transactions` (
 `Transaction_ID` int(9) NOT NULL auto_increment,
 `Datetime` date NOT NULL,
 `Giver_ID` int(9) NOT NULL,
 `Recipient_ID` int(9) NOT NULL,
 `Points` int(4) NOT NULL,
 `Category_ID` int(3) NOT NULL,
 `Reason` text NOT NULL,
 PRIMARY KEY  (`Transaction_ID`),
 KEY `Giver_ID` (`Giver_ID`),
 KEY `Datetime` (`Datetime`),
 KEY `DatetimeAndGiverID` (`Datetime`,`Giver_ID`),
 KEY `Recipient_ID` (`Recipient_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=249069 DEFAULT CHARSET=latin1

Rewards

82 rows

CREATE TABLE `rewards` (
 `Reward_ID` int(9) NOT NULL auto_increment,
 `Title` varchar(255) NOT NULL,
 `Description` text NOT NULL,
 `Image_URL` varchar(255) NOT NULL,
 `Date_Inactive` datetime NOT NULL,
 `Stock_Count` int(3) NOT NULL,
 `Cost_to_User` float NOT NULL,
 `Cost_to_System` float NOT NULL,
 PRIMARY KEY  (`Reward_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=latin1

Purchases

5,889 rows

CREATE TABLE `purchases` (
 `Purchase_ID` int(9) NOT NULL auto_increment,
 `Datetime` datetime NOT NULL,
 `Reward_ID` int(9) NOT NULL,
 `Quantity` int(4) NOT NULL,
 `Student_ID` int(9) NOT NULL,
 `Student_Name` varchar(255) NOT NULL,
 `Date_DealtWith` datetime default NULL,
 `Date_Collected` datetime default NULL,
 PRIMARY KEY  (`Purchase_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=6133 DEFAULT CHARSET=latin1

Problems

The system ran perfectly well for a period of time. It's now starting to slow-down massively on certain queries.

Essentially, every time I need to access a students' reward points total, the required query takes ages. Here is a few example queries and their run-times:

Top 15 students, excluding attendance categories, across whole school

SELECT CONCAT( s.Firstname,  " ", s.Surname ) AS  `Student` , s.Year_Group AS `Year Group`, SUM( t.Points ) AS  `Points` 
FROM frog_rewards.transactions t
LEFT JOIN frog_shared.student s ON t.Recipient_ID = s.id
WHERE t.Datetime >  '2013-09-01' AND t.Category_ID NOT IN ( 12, 13, 14, 26 )
GROUP BY t.Recipient_ID
ORDER BY  `Points` DESC 
LIMIT 0 , 15

EXPLAIN 1

Run-time: 44.8425 sec

SELECT Recipient_ID, SUM(points) AS Total_Points FROMtransactionsGROUP BY Recipient_ID

EXPLAIN 2

Run-time: 9.8698 sec

Now I appreciate that, especially with the second query, I shouldn't ever be running a call which would return such a vast quantity of rows but the limitations of the framework within which the system runs meant that I had no other choice if I wanted to display students' total reward points for teachers/tutors/year managers/leadership to view and analyse.

Time for a solution

Fortunately the framework we've been forced to use is changing. We'll now be using oAuth rather than a horrible, outdated JavaScript widget format.

Unfortunately - or, I guess, fortunately - it means we'll have to rewrite quite a lot of the system.

One of the main areas I intend to look at when rewriting the system is the database structure. As time goes on it will only get bigger, so I need to do a bit of future-proofing.

As such, my main question is this: what is the most efficient and effective way of storing students' point totals?

The only idea I can come up with is to have a separate table called totals with Student_ID and Points fields. Every time a member of staff gives out some points, it adds a row into the transactions table but also updates the totals table.

Is that efficient? Would it be efficient to also have a Points_Since_Monday type field? How would I update/keep on top of that?

On top of the main question, if anyone has suggestions for general improvement with regard to optimisation of the database table, please let me know.

Thanks in advance, Duncan

There is nothing particularly wrong with your design which should make it as slow as you have reported. I'm thinking there must be other factors at work, such as the server it is running on being overloaded or slow, for example. Only you will be able to find out if that is the case.

In order to test your design I recreated it on the 2008 SQL Server I have running on my desktop computer. I have a standard computer, single hard disc, not SSD, not raid etc. so on a proper database server the results should be even better. I had to make some changes to the design as you are using MySQL but none of the changes should affect performace, it's just so I can run it on my database.

Here's the table structure I used, I had to guess at what you would have in the Student and Staff tables as you do not descibe those. I also took the liberty of changing the field names in the Transaction table for Giver_ID and Receiver_ID as I assume only staff give points and students receive them.

database tables

I generated random data to fill the tables with the same number of rows as you said you have in your database

data generation

I ran the two queries you said are taking a long time, I've changed them to suit my design but I (hope) the result is the same

SELECT TOP 15
        Firstname + ' ' + Surname
       ,Year_Group
       ,SUM(Points) AS Points
FROM    points.[Transaction]
        INNER JOIN points.Student ON points.[Transaction].Student_ID = points.Student.Student_ID
WHERE   [Datetime] > '2013-09-01'
        AND Category_ID NOT IN ( 12, 13, 14, 26 )
GROUP BY Firstname + ' ' + Surname
       ,Year_Group
ORDER BY SUM(Points) DESC 


SELECT  Student_ID
       ,SUM(Points) AS Total_Points
FROM    points.[Transaction]
GROUP BY Student_ID

Both queries returned results in about 1s. I have not created any additional indexes on the tables other than the CLUSTERED indexes generated by default on the primary keys. Looking at the execution plan the query processor estimates that implementing the following index could improve the query cost by 81.0309%

CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [points].[Transaction] ([Datetime],[Category_ID])
INCLUDE ([Student_ID],[Points])

execution plan

As others have commented I would look elsewhere for bottlenecks before spending a lot of time redesigning your database.

Update:

I realised I never actually addressed your specific question:

what is the most efficient and effective way of storing students' point totals?
The only idea I can come up with is to have a separate table called totals with Student_ID and Points fields. Every time a member of staff gives out some points, it adds a row into the transactions table but also updates the totals table.

I would not recommend keeping a separate point total unless you have explored every other possible way to speed up the database. A separate tally can become out of sync with the transactions and then you have to reconcile everything and track down what went wrong, and what the correct total should be.

You should always focus on maintaining the correctness and consistency of the data before trying to increase speed. Most of the time a correct (normalised) data model will operate quickly enough.

In one place I worked we found the most cost effective way to speed up our database was simply to upgrade the hardware; much quicker and cheaper than spending many man-hours redesigning the database :)