I have a MySQL database that contains all the words in the standard English alphabet, which I am using to create a simple Scrabble word generator. The database is separated into 26 tables: one for each letter in the alphabet. Each table contains two columns:
In my application, the user enters in any number of letters into a textbox (indicating their tiles) and I query the database using this code:
// this is looped over 26 times, and $char is a letter between 'A' and 'Z'
// check if the user entered in character $char or a blank tile (signified by ? in app)
// this check prevents me from having to query useless tables
if (in_array($char, $lettersArray) || $blanks)
{
// if so, select all words that have a length that's possible to make
$query = 'SELECT Word FROM '.$char.'Words WHERE Length <= '.strlen($letters);
$result = $db->query($query);
$num_results = $result->num_rows;
for ($j = 0; $j < $num_results; $j++)
{
// determine if it's possible to create word based on letters input
// if so, perform appropriate code
}
}
Everything is working, but my application takes a long time compared to the competition (theoretical competition, that is; this is more of a learning project I created for myself and I doubt I'll release it on the internet), despite the fact the application is on my local computer. I tried used the automatic optimization feature of phpMyAdmin, but that provided no noticeable speed increase.
I don't think the performance problem is really the database. The structure of your data store is going to have the most significant impact on the performance of your algorithm.
One fairly easy-to-understand approach to the problem would be to handle the problem as anagrams. You could alphabetize all of the letters in each of your words, and store that as a column with an index on it.
word dorw
-------- -------
DALE ADEL
LEAD ADEL
LED DEL
HELLO EHLLO
HELP EHLP
Then, given a set of letters, you could query the database for all matching anagrams. Just alphabetize the set of letters passed in, and run a query.
SELECT word FROM dictionary WHERE dorw = 'AERT'
RATE
TARE
TEAR
Then, you could query for subsets of the letters:
SELECT word FROM dictionary WHERE dorw IN ('AER','AET','ART','ERT')
This approach would get you the longest words returned first.
This isn't the most efficient approach, but it's workable.
Handling a "blank" tile is going to be more work, you'd need to substitute a possible letter for it, and checking all 26 possibilities could be done in one query,
If they have letters ABCD and the blank tile, for example...
SELECT word FROM dictionary WHERE dorw IN ('AABCD','ABBCD', 'ABCCD'
, 'ABCDD', 'ABCDE', 'ABCDE', 'ABCDF', ..., 'ABCDZ')
That gets more painful when you start dealing with the subsets...
(In Crossword and Jumble puzzles, there aren't any blank tiles)
So this may not be the most appropriate algorithm for Scrabble.
There are other algorithms that may be more efficient, especially at returning the shorter words first.
One approach is to build a tree.
The root node is a "zero" letter word. As a child of the root node, would be nodes of all one-letter words. Each node would be marked whether it represented a valid word or not. As children of those nodes, you would have all possible three-letter words, again marked as whether it was valid or not.
That will be a lot of nodes. For words up to 12 letters in length, that's a total possible space of 1 + 26 + 26**2 + 26**3 + 26**4 + ...
But you wouldn't need to store every possible node, you'd only store those branches that result in a valid word. You wouldn't have branches below ->Z->Z or ->X->Q
However, you would have a branch under ->X->Y->L, even though XYL is not a word, it would be the beginning of a branch leading to 'XYLOPHONE'
But that's a tree traversal algorithm, which is fundamentally different.
It sounds like you need to learn about indexes. If you created indexes in the database, even if all the data was in one table, it would not be querying" useless letters".
You should provide some more information though, how long a query takes to return a result if you run it from the mysql console, how long it takes to move that result from the database to the PHP engine. You might for example be bringing back a 100 meg result set with each query you are running, if that is the case, limit the results to the first or a number of possible results.
To look at how much data is being returned, manually run one of your queries in the console and see how many records are being returned. If the number is high, the data will take longer to be passed to PHP, but it will also mean your code must iterate through a lot more results. You might want to consider dropping our of the for
loop after you find the first word that can be accepted. If at least one word is possible, don't check it again until another letter is placed.
I know this question is about optimizing your database but if I were doing this I would only read the words from the database once, initialize some data structure and search that structure instead of continually querying the database.
Sorry if this was completely irrelevant.