寻找存储/缓存计数的推荐方法[关闭]

I'm building a website using php/mysql where there will be Posts and Comments.

Posts need to show number of comments they have. I have count_comments column in Posts table and update it every time comment is created or deleted.

Someone recently advised me that denormalazing this way is a bad idea and I should be using caching instead.

My take is: You are doing the right thing. Here is why:

See the field count_comments as not being part of your data model - this is easily provable, you can delete all contents of this field and it is trivial to recreate it.

Instead see it as a cache, the storage of which is just co-located with the post - perfectly smart, as you get it for free whenever you have to query for the post(s)

I have no idea what was meant by "Caching" and I'll be interested in some other answer that the one I have to offer:

Remove redundant information from your database is important and, in a "Believer way" (means that I didn't really test it, its merely speculative), I think that using SUM() function from your database is a better way to go for it. Assuming that all your comments has a post_id, all you need is something like:

SELECT SUM(id) FROM comments WHERE id = {post_id_variation_here}

That way, you reduce 1 constant CRUD happening just to read how much comments there are and increase performance.

I do not think this is a bad approach.

One thing i do recognize is that its very easy to introduce side effects as code base is expanded by having a more rigid approach. The nice part is at some point the amount of rows in the database will have to be calculated or kept track of, there is not really a way of getting out of this.

I would not advise against this. There are other solutions to getting comment counts. Check out Which is fastest? SELECT SQL_CALC_FOUND_ROWS FROM `table`, or SELECT COUNT(*)

The solution is slower upon selects, but requires less code to keep track of comment count.


I will say that your approach avoids LIMIT DE-optimization, which is a plus.

Unless you haven't hundreds or thousands of hits per seconds on your application there's nothing wrong about using a SQL statement like this:

select posts_field1, ..., (select count(*) from comments where comments_parent = posts_id) as commentNumber from posts

you can go with caching the html output of your page anyway. than no database query has to be done at all.

Maby you could connect the post and comment tables to each other and count the comments rows in mysql with the mysql function: mysql_num_rows. Like so:

Post table

postid*
postcontent

Comment table

commentid
postid*
comment

And then count the comments in mysql like:

$link = mysql_connect("localhost", "mysql_user", "mysql_password"); 
mysql_select_db("database", $link);

$result = mysql_query("SELECT * FROM commenttable WHERE postid = '1'", $link); 
$num_rows = mysql_num_rows($result);

This is an optimization that is almost never needed for two reasons:

1) Proper indexing will make simple counts extremely fast. Ensure that your comments.post_id column has an index.

2) By the time you need to cache this value, you will need to cache much more. If your site has so many posts, comments, users and traffic that you need to cache the comments total, then you will almost definitely need to be employing caching strategies for much of your data/output (saving built pages to static, memcache, etc.). Those strategies will, no doubt, encompass your comments total, making the table field approach moot.