I have a database of 10,000+ different files and I need to grab 4 random ones out of that database every 5 seconds.
I am wondering what are you suggestions for architect. I am thinking of using a flat file or maybe a sqlite? Doing this via mysql would be very very slow?
Just a Generalized Approach Running The selectRandom()
procedure every 5seconds will be too messy. instead run selectRandomGroups()
every 120 seconds.
selectRandomGroups()
executes selectRandom()
120/5 times in every execution. and stores the resulting groups in some storage (cache).
next use grabRandomGroup()
every 5 seconds that retrieves the last Group from the cache.and remove it.
Just put the files into a directory with deterministic names (0.dat, 1.dat, etc) and read them directly.
The B-tree and caching in the file system will take care of the rest. This is EXACTLY what it was designed to do.
Coming from a very heavy relational DB background, my first reaction is not to recommend the "NoSQL" document type of database systems. But in this case, it sounds like basically you will be needing one large hash table. I would look at something like MongoDB or CouchDB. You can see the differences here (http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid).
The other thing you might look at is storing this info in the "cloud" using Amazon S3 or something similar if you think you need to scale a lot soon and don't have the capacity inhouse.
Otherwise, the file system idea noted above would work also
Why not just store links to the images instead of the entire image files? A query that returns 4 filenames every 5 seconds shouldn't put much if any load on the mysql server. Certainly less than installing and running another DBMS presumably on the same server.
Assuming that this is some type of website you could offload it to user machines by generating the random file names with javascript and then do whatever you needed to with the images.
I guess from your comments that your files are stored in the same table. Why would something like this be too slow?
SELECT * FROM files WHERE approved = 1 AND type = image ORDER BY RAND() LIMIT 4
And why would it be faster to run a script every 24 hours that fetches 69120 files and writes them to a directory or whatever? But sure, you could do that. Just change the limit in the above query.