When dealing with records of at least 1 million rows, in terms of performance, is it better to:
Select the whole record e.g., SELECT * FROM tbl
then paginate the result using array_chunk()
or array_slice()
or
Select only part of the record e.g., SELECT * FROM tbl LIMIT x
per page?
It is better to use LIMIT. think about it.. The first one will get all even if you have 1000000 rows. vs limit which will only get your set number each time.
You will then want to make sure you have your offsets set correctly to get the next set of items from the table.
i think it depends, you can stock the whole response in the memory using memcache if your table is not too big and it will avoid HDD request which is more time consuming, but as you dont know if your user will look for lot of pages, it would be better to limit it with SQL.
It depends.
Yes -> you need query DB.
Then use some kind of search engine like Elasticsearch and don't query DB just populate search engine
Use some kind of cache like redis/memcache
It really depends on your needs.
The best method will depend on your context. If you choose to use the database directly, beware of this issue:
The naive LIMIT method will give you problems when you get into later pages. ORDER BY some_key LIMIT offset,page_size works like this - go through the key, through away the first offset records, then return page_size records. So offset + page_size records examined, if offset is high you have a problem.
Better - remember the last key value of the current page. When fetching next page use it like this:
SELECT * FROM tbl WHERE the_key > $last_key ORDER BY the_key ASC LIMIT $page_size
If your key is not unique, make it unique by adding an extra unique ID column at the end.
It REALLY depends on context.
In general you want to make heavy use of indexes to select the content that you want out of a large dataset with fast results. It's also faster to paginate through the programming language than to use the database. The database is often times the bottleneck. We had to do it this way for an application that had 100's of queries a minute. Hits to the database needed to be capped so we needed to return datasets that we knew may not need another query to the DB, around 100 results, and then paginate by 25 in the application.
In general, index and narrow your results with these indexes and if performance is key with lots of activity on the db, tune your db and your code to decrease I/O and DB hits by paginating in the application. You'll know why when your server is bleeding with a load of 12 and your I/O is showing 20 utilization. You'll need to hit the operating table stat!