Possible Duplicate:
When to use a query or code
I have a large database of users and their details. I want to find all the people that are most similar to each other. Which way is preferable for comparison ? Directly writing SQL queries such that they do manipulation within database to retrieve a filtered (matched) record. OR Retrieving the raw data first in an array in php & then applying comparison operations on that array ? Lets say I have 1000 users, and I want to find the people with most mutual friends.
UserA has {a,b,r,c,g,h,r,q,l}
UserB has {x,y,z}
UserC has {a,c,r,g,q}
.
.
.
UserN has{x,y....n}
So I want to find the users with the most mutual friends with that of UserA. Such as comparing each element of UserA to all the elements of all the user arrays from whole database ? In the above case UserC should be the most similar user to UserA. I don't know how to accomplish this.
Suppose you represent friendships in a table like this:
Table: friendships
from_id | to_id
----------------------------
a | a
a | b
a | r
a | c
...etc...
b | x
b | y
b | z
...etc...
Now you can write a query to answer your question (tested in MySQL):
SELECT user_id AS user_id_with_most_common_friends, MAX(cnt) AS number_of_common_friends FROM
(SELECT f2.from_id AS user_id, COUNT(*) AS cnt
FROM friendships f1
JOIN friendships f2 ON (f1.to_id = f2.to_id AND f2.from_id <> 'a')
WHERE f1.from_id = 'a'
GROUP BY f2.from_id) totals;
As to the performance question, empirical testing in your specific situation will give the most reliable answer, but for a large database I would expect using the above SQL query to be much faster than querying every row and calculating the results in PHP, for these reasons:
Unless your php code has something up its sleeve to compensate for these disadvantages (like an algorithm with lower complexity that cannot be expressed in SQL, or calling out to some specialized C code, e.g. image processing), it's typically going to be faster to keep the work inside the RDBMS as much as possible.