I have two tables... the first is a page list with (page_id, page_title) rows. The second is a list of items ON those pages each with a price (item_id, page_id, item_title, item_price).
I'd like to grab the top three items from each page (ordered by highest item_price first) with the page having the cumulatively highest price ordered first. This is quite beyond my MYSQL abilities and I'm looking for advice on how to make this the most efficient! :) Thanks!
You could do this a few different ways. What I would do is run one query that says "get me all pages ordered by the sum total of their items" then loop through them in php, and for each one, do a "get me the top 3 items for the current page".
Make sense?
Query one (untested, written on my phone):
SELECT p.page_name, (SELECT SUM(item_price) FROM items WHERE page_id = p.page_id) AS cumulative_price FROM pages p ORDER BY cumulative_price DESC;
Query two (also untested) looping through results of query one:
SELECT * FROM items WHERE page_id = '$currentPageId' ORDER BY item_price DESC LIMIT 3;
I'm just going to call the second table "table2" and the item_id I'm looking up "SOME_ITEM_ID"
SELECT * FROM `table2` WHERE `item_id` = 'SOME_ITEM_ID' ORDER BY `item_price` DESC;
In English what this is saying is:
Select everything from table2 where the item_id is this, and order the list by item_price in descending order
This SQL statement will return every hit, but you would just output the first three in your code.
My gut tells me that there is probably no faster way to do this than to do a for-loop in the application over all of the pages, doing a little select item_price from item where page_id = ? order by item_price desc limit 3
for each paqe, and possibly sticking the results in something like memcached so you aren't taxing your database too much.
But I like a challenge, so i'll attempt it anyhow.
SELECT p1.*, i1.*,
(SELECT count(*)
FROM items i2
WHERE i1.item_price < i2.item_price
AND p1.page_id = i2.page_id) price_rank,
FROM pages p1
LEFT OUTER JOIN items i1
ON p1.page_id = i1.page_id
WHERE price_rank < 3;
That odd sub-select is going to probably do an awful lot of work for every single row in items
. many other RDBMses have a feature called window functions that can do the above much more elegantly. For instance, if you were using PostgreSQL, you could have written:
SELECT p1.*, i1.*,
RANK(i1.item_price)
OVER (PARTITION BY p1.page_id
ORDER BY i1.item_price DESC) price_rank
FROM pages p1
LEFT OUTER JOIN items i1
ON p1.page_id = i1.page_id
WHERE price_rank <= 3;
And the planner would have arranged to visit rows in an order that results in the rank occurring properly.