MySQL管理目录视图

A friend of mine has a catalogue that currently holds about 500 rows or 500 items. We are looking at ways that we can provide reports on the catalogue inclduing the number of times an item was viewed, and dates for when its viewed.

His site is averaging around 25,000 page impressions per month and if we assumed for a minute that half of these were catalogue items then we'd assume roughly 12,000 catalogue items viewed each month.

My question is the best way to manage item views in the database.

First option is to insert the catalogue ID into a table and then increment the number of times its viewed. The advantage of this is its compact nature. There will only ever be as many rows in the table as there are catalogue items.

`catalogue_id`, `views`

The disadvantage is that no date information is being held, short of maintaining the last time an item was viewed.

The second option is to insert a new row each time an item is viewed.

`catalogue_id`, `timestamp`

If we continue with the assumed figure of 12,000 item views that means adding 12,000 rows to the table each month, or 144,000 rows each year. The advantage of this is we know the number of times the item is viewed, and also the dates for when its viewed.

The disadvantage is the size of the table. Is a table with 144,000 rows becoming too large for MySQL?

Interested to hear any thoughts or suggestions on how to achieve this.

Thanks.

As you have mentioned the first is a lot more compact but limited. However if you look at option 2 in more detail; for example if you wish to store more than just view count, for instance entry/ exit page, host ip ect. This information maybe invaluable for stats and tracking. The other question is are these 25,000 impressions unique? If not you are able to track by username, ip or some other unique identifier, this could enable you to not use as many rows. The answer to your question relies on how much detail you wish to store? and what is the importance of the data?

Update:

True, limiting the repeats on a given item due to a time interval would be a good solution. Also knowing if someone visited the same item could be useful for suggested items perdition widgets similar to what amazon does. Also knowing that someone visited an item many times says to me that this is a good item to promote to them or others in a mail-out, newsletter or popular product page. Tracking unique views will give a more honest view count, which you can choose to display or store. On the issue of limiting the value of repeat visitors, this mainly only comes into play depending on what information you display. It is all about framing the information in the way that best suits you.

Your problem statement: We want to be able to track number of views for a particular catalogue item.

Lets review you options.

First Option:

In this option you will be storing the catalogue_id and a integer value of the number of views of the items.

Advantages:

  1. Well since you really have a one to one relationship the new table is going to be small. If you have 500 items you will have 500 hundred rows. I would suggest if you choose this route not to create a new table but add another column to the catalogue table with the number of views on it.

Disadvantages:

  1. The problem here is that since you are going to be updating this table relatively frequently it is going to be a very busy little table. For example 10 users are viewing the same item. These 10 updates will have to run one after the other. Assuming you are using InnoDB the first view action would come in lock the row update the counter release the lock. The other updates would queue behind it. So while the data is small on the table it could potentially become a bottleneck later on especially if you start scaling the system.

  2. You are loosing granular data i.e. you are not keeping track of the raw data. For example lets say the website starts growing and you have a interested investor they want to see a breakdown of the views per week over the last 6 months. If you use this option you wont have the data to provide to the investor. Essentially you are keeping a summary.

Second Option:

In this option you would create a logging table with at least the following minimal fields catalogue_id and timestamp. You could expand this to add a username/ip address or some other information to make it even more granular.

Advantages:

  1. You are keeping granular data. This will allow you to summarise the data in a variety of ways. You could for example add a ip address column store the visitors IP and then do a monthly report showing you products viewed by country(you could do a IP address lookup to get a idea of which country they were from). Another example would be to see over the last quarter which products was viewed the most etc. This data is pretty essential in helping you make decisions on how to grow you business. If you want to know what is working what is not working as far as products are concerned this detail is absolutely critical.

  2. Your new table will be a logging table. It will only be insert operations. Inserts can pretty much happen in parallel. If you go with this option it will probably scale better as the site grows compared to a constantly updated table.

Disadvantages:

  1. This table will be bigger probably the biggest table in the database. However this is not a problem. I regularly deal with 500 000 000 rows+ tables. Some of my tables are over 750GB by themselves and I can still run reporting on it. You just need to understand your queries and how to optimise them. This is really not a problem as MySQL was designed to handle millions of rows with ease. Just keep in mind you could archive some information into other tables. Say you archive the data every 3 years you could move data older than 3 years into another table. You dont have to keep all the data there. Your estimate of 144 000 rows means you could probably safely keep about 15+ years worth without every worrying about the performance of the table.

My suggestion to you is to serious consider the second option. If you decide to go this route update your question with the proposed table structures and let us have a look at it. Don't be scared of big data rather be scared of BAD design it is much more difficult to deal with.

However as always the choice is yours.