I have a mySQL database that stores category related information for approximately 200 different unique users. The information being stored and retrieved for each user is in the hierarchy of
imageCategories
> Parent Category 1
> Child Category 1 : "45,19,3,4,8"
> Child Category 2 : "17,1,99"
> ... etc
> Parent Category 2
> Child Category 1 : "83,6"
> Child Category 2 : "19,74,26"
... etc
> etc
The string value of each child category is a series of comma-separated ids which reference descriptions (on a separate table) stored under that child category. I store all of this as an array in a column for each user by means of a json_encoded string in the form of:
{"Parent Category 1":{"Child Category 1":["45,19,3,4,8"],"Child Category 2":["17,1,99"]},"Parent Category 2":{"Child Category 1":["83,6"],"Child Category 2":["19,74,26"]}}
The system works by retrieving this json_string when a user logs and decoding it to a session array. Whenever any changes are made to it, it's reencoded to a json string, saved to the database and the session array is updated to reflect this. This works fine. While my research way back when made me do so, I was never quite sure if storing a multi-dimensional array in mySQL is good best practise. What I do know is that this keeps organising it quite stress-free and I haven't noticed it causing a lot of overhead, which is not to say that it doesn't.
What I want to do now is add a string description to each Child Category in the database. Potentially to each Parent Category later but baby steps first.
I was initially going to start a third dimension for the overall array. Instead of:
"Child Category Key" : "id string"
I would change it to:
"Child Category Key" : ["id string", "description string"]
or:
"Child Category Key" : ["id string", id for description on another table]
I don't see an issue with either, but I'm wondering if im veering way off best practises. Should I be creating a new table for the entire category structure, rather than storing all of it as a json string in a column with other user settings (it's never going to get too unwieldly in terms of character length). The current structure is quite easy to get my head around and I wouldn't necessarily jump to a solution that would provide minimal overhead benefits if it's structure makes managing the database unecessarily complicated (keep in mind some of us aren't naturals at this and our brains process this kinda structure a little slower than others).
I may miss out on describing specifics needed as I'm unsure what the most pertinent information is from what's relevant. I can elaborate where needed. What seems the most important design requirement is that each user has unique category keys and values. They can only be in the form of parent
> child
> csv of ids
but each user will have custom key titles and a different number of each. The order of each is also essential.
I'm currently running on a server with ssd disk, 1gb of memory and a single 2ghz core from an Intel hexcore. Requests to the database are primarily retrieving the categories on both a front and backend. The majority use little traffic so nothing has been too taxing apart from occasional spikes. I will upgrade when I see a bottleneck approaching. Just trying to use what I have as efficiently as possible at the moment and keep best practices in play.
Right now my table structure is in the form of (omitting other columns not relevant to the question):
Table usersettings:
+-----+----------------------+-----+
| id | imageCategories | ... |
+-----+----------------------+-----+
| 1 | {"Parent Category... | ... |
| 2 | {"Parent Category... | ... |
| 3 | {"Parent Category... | ... |
| ... | | |
+-----+----------------------+-----+
Table users:
+-----+----------------------+---------+--------+
| id | username | cluster | server |
+-----+----------------------+---------+--------+
| 1 | johndoe | 1 | 1 |
| 2 | katedoe | 1 | 1 |
| 3 | ellendoe | 1 | 1 |
| ... | | | |
+-----+----------------------+---------+--------+
Table descriptions_0001:
+-----+---------+---------------+-----+
| id | title | descriptions | ... |
+-----+---------+---------------+-----+
| 11 | Title 1 | Description 1 | ... |
| 56 | Title 2 | Description 2 | ... |
| 78 | Title 3 | Description 3 | ... |
| ... | | | |
+-----+---------+---------------+-----+
There is an equal row for every usersettings entry in users with matching ids. So their username etc. can always referenced from usersettings by knowing its own id number. Currently I only have one database but in an attempt to future proof it to some degree I store descriptions in a table with an index in its name and each user has a cluster number value as well as a server number value. Each user has, on average, about 100 descriptions row so this is coming to 20,000 rows at the moment. When this is creating a bottleneck I'll start a descriptions table 0002, and later a second server should it be needed. Perhaps I'm naive in my workflow but it seems like it should help.
So in summary, should I adapt my categories array to store a string description for child categories by:
Making the child categories key have an array value rather than the current string value that contains the current string value and an additional string description.
Like 1 but make the string description an id number that references a string on a new table
Look at not using a json encoded array at all and move the entire category structure into its own table
Create a table for parent categories, one for child categories and one for the csv contents. Include a description column (per the conundrum above) and an order column (essential, per the design requirements above) in each - or is there a better method of storing order than retrieving and updating the order column for each relevant row when the table will contain unique category information for multiple users? It sounds like it may require a lot of overhead.
I ended up going for a solution somewhat similar to (4). I also better appreciate the importance of describing the design requirements now as what led me to this decision was the realisation that it was more efficient in processing (I believe?) and simpler to comprehend working with select levels of a hierarchy at a time.
For example, If I'm dealing with all descriptions under parent category 2, child category 1, I just fetch or insert all descriptions in a description table with a shared identifier, rather than dealing with a multidimensional array that contains all hierarchies. The latter made organising users in the db easier but the categorisation was becoming large enough that I decided it did warrant separate tables for each level of the hierarchy. There's enough situations where I'm working with only an isolated level of the categorisation hierarchy that putting the entire categorisation into a single md array felt like the poorer choice.
In terms of overhead difference, I'm unsure for now. There's less sorting of arrays happening in php to isolate data I need but there's far more calls to the db.
My hesitation in understanding the design requirements (and still not giving a thorough answer on this) is that I'm new to large user databases and am not good at forecasting the needs. I'm designing it in such a way that it feels scalable to me and so, again, the table for each level of the hierarchy feels the least cumbersome (after the cumbersome set up - I'm currently redoing tonnes of code to make functions work with the new set up) and more scaleable as needs change.