I am trying to figure out retained users for our application using the login history we keep for every user.
I want to know users who registered in month 1, then came back in month 2, then also came back in month 3 and so on.....
A single query is simple:
Query 1
SELECT DISTINCT player_id
FROM login_history
WHERE timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-02-01 00:00:00')-1);
The above gives me unique logins for January
, simple.
Query 2
SELECT DISTINCT player_id
FROM login_history
WHERE timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1);
The above now gives me the unique logins for February
.
How can I join the two, or more, to give me the player_id's common to all results?
I have tried using IN statements but that is hugely slow..
Any suggestions would be appreciated
Thanks
Use group by
and having
instead of distinct
:
SELECT player_id
FROM login_history
GROUP BY player_id
HAVING SUM(timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00') AND (UNIX_TIMESTAMP('2014-02-01 00:00:00')-1)) > 0 and
SUM(timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00') AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1)) > 0;
For efficiency, you should still include the where
:
SELECT player_id
FROM login_history
WHERE timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00') AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1)
GROUP BY player_id
HAVING SUM(timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00') AND (UNIX_TIMESTAMP('2014-02-01 00:00:00')-1)) > 0 and
SUM(timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00') AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1)) > 0;
maybe it is not the best answer but, you just need to put another condition and it is done.
SELECT DISTINCT
player_id
FROM login_history
WHERE (timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-02-01 00:00:00') - 1))
OR (timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-03-01 00:00:00') - 1));
Another solution could be use the same table again with an alias
SELECT DISTINCT lh1.player_id
FROM login_history lh1 join login_history lh2 on lh1.player_id = lh2.player_id
WHERE lh1.timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-02-01 00:00:00')-1)
AND lh2.timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1);
If the number of rows is too high you can use subqueries
SELECT DISTINCT lh1.player_id
FROM
(Select distinct player_id from login_history
WHERE lh1.timestamp BETWEEN UNIX_TIMESTAMP('2014-01-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-02-01 00:00:00')-1) )lh1
JOIN
(Select distinct player_id from login_history
WHERE lh2.timestamp BETWEEN UNIX_TIMESTAMP('2014-02-01 00:00:00')
AND (UNIX_TIMESTAMP('2014-03-01 00:00:00')-1) )lh2 on lh1.player_id = lh2.player_id;
Hopes it helps. Regards.