I have an API which is being used by around 200 websites right now. The number is expected to grow very soon. I need to store information of each visitor (IP address etc) on clients' websites. The number of daily visitors for each client ranges from 2000 to 50000. That means I am adding 400000 to 500000 rows everyday. For that right now I am making a different table for each client.
Now the problem is when I try to fetch data from all tables combined, it takes a lot of time. How should I handle this? How should I store the data?
Thanks!
I always try to keep tables to a minimum in my schemas. Perhaps you should make a client table with relevant client information and then have a visitor table with all the visitor information. Link the two with a foreign key.
Since all the tables are the same, I'd just keep the visitor information in one table, with a column to identify the client / website.
The question then is whether a large table like that will still perform... Obviously you need your indexing and so on, but here are a couple of ideas:
Partitioning: I know nothing about partitioning on Mysql (but have tried it on Postgresql). The idea is to design the physical data storage to suit your data retrieval / work needs. Might be an idea if your table gets huge.
"Live" and "archive" tables. I'm sure there's proper terminology for this. Again, depending on how you're analysing your data, you can keep today's / this week's / this month's / whatever you need's data in the "live" table where new records are added, then have housekeeping functions that move older records to a larger archive table. The idea would be to keep only the records you want to analysis frequently in the smaller live table, so query performance is fast.
Lastly, you might be pleasantly surprised by the performance of Mysql even on large tables. I've got a Postgresql table with several million records and performance is more than adequate without any playing around.
Do not store raw data in mysql. Put visitors data into queue (based on redis, rabbitmq etc) and store only aggregated data which is necessary for your business model.