I’m working on a web service for an API that provides a feed of posts. Right now the posts are organized chronologically, and I paginate with opaque before and after tokens which are essentially timestamps. However, we want to move from a chronological feed to an algorithmic one. While I can calculate the post scores and send the first page of data, I’m not sure how to paginate relative to that. I suppose snapshot it and bundle up like 200 sorted post IDs and serialize them into an HMAC blob for the tokens, but this is a nontrivial overhead for each request. Is there a better way to handle this kind of pagination?
If you can store post score in database you can make an index on them and access them fast. Top pages will be fast anyway. If you need pagination by rating with big depth standard approach with order by rating desc limit 50 offset 10000
will be slow. Here you can find a second order field - eg timestamp. If there’re several posts with the same ration - which one should be on top? Add this field to the sort index and query DB like where rating < ..., timestamp <... order by rating, timestamp
.
If you recalculate rating often I recommend to store it on a separate table like post_id, rating
. Query this table for post_id
s - it should be faster then walk through the whole table and join posts on it.