选择列表中值的第一个外观(DISTINCT / GROUP BY)

I have a query, which using two JOINs, returns me a list in this format:

unique_id | non_unique_id | timestamp

The full list is big (thousands of rows), the result of the query is only a few dozens of rows, as the query has WHERE timestamp >= 'some timestamp in the past'

So now, I have the list like this:

89 | 286 | 1406219705
87 | 286 | 1406219518
79 | 922 | 1406216949
78 | 228 | 1406216871
77 | 126 | 1406216748
76 | 939 | 1406216722
74 | 126 | 1406216352
64 | 939 | 1406212540
63 | 126 | 1406212522
49 | 228 | 1406205715
48 | 228 | 1406204851
37 | 228 | 1406196435
32 | 228 | 1406190209
23 | 126 | 1406182577  <- 'limiting timestamp'
18 | 871 | 1406181991
10 | 922 | 1406178816
 9 | 764 | 1406178778
 7 | 609 | 1406178699
 5 | 126 | 1406177398
 4 | 871 | 1406177379  <- 'some timestamp in the past'

So now, I only need to select rows between the 'limiting timestamp' and the end of the list ('some timestamp in the past'). I could have specified the 'limiting timestamp' in the WHERE condition for the original query, but the problem is: I need the resulting set to have no records with non_unique_id, that have already appeared in the list above the 'limiting timestamp'. This is how the result of the query should look like:

                       <- 'limiting timestamp'
18 | 871 | 1406181991

 9 | 764 | 1406178778
 7 | 609 | 1406178699

                       <- 'some timestamp in the past'

So the result will return 3 rows, which all have non_unique_id that did not appear in the results above. But if the 'non_unique_id' had already appeared in the list between 'limiting timestamp' and the 'some timestamp in the past', then only the first occurrence should be kept. Note: the last part condition is optional, as it will be pretty easy to extract the duplicate from the final list.

So far I was only able to come up with the solution of doing a JOIN between the list >= 'some timestamp in the past' and > 'limiting timestamp'. This way I'll see if there are any occurrences of the top list in the bottom list. However, it can be assumed that the query is complex and time needed to produce it's results shouldn't be doubled by running it again, but with a slightly different condition.

You can try this if SQL Server 2008+ (sqlfiddle: http://sqlfiddle.com/#!3/0bc33/3):

WITH cteOrdered
AS
(
    SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,
              t1.*
    FROM      Table1 t1 LEFT JOIN
              (SELECT  Non_Unique_ID
               FROM    Table1
               WHERE   Timestamp < 1406177379 OR
                       Timestamp > 1406182577) t2
              ON t1.Non_Unique_ID = t2.Non_Unique_ID
    WHERE     t2.Non_Unique_ID IS NULL AND
              t1.Timestamp > 1406177379 AND 
              t1.Timestamp < 1406182577
 )

SELECT    Unique_ID,
          Non_Unique_ID,
          Timestamp
FROM      cteOrdered
WHERE     RID = 1;

I've added another row into the data

(18, 871, 1406181990),

to see if the query was producing you want. You said if there are duplicate non_unique_id's within the search range only the "first" occurrence should be kept. I take this is the one with the EARLIEST timestamp? If opposite, you can change this line

SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,

to

SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp DESC) AS RID,

and that will flip the order to retain the LATEST timestamp for the duplicates.

Try this:

SELECT *
FROM my_table AS t1
WHERE timestamp < @limiting_timestamp
  AND timestamp > @some_timestamp_in_the_past
  AND NOT EXISTS(SELECT 1
              FROM my_table AS t2
              WHERE timestamp > @limiting_timestamp
                AND t1.non_unique_id = t2.non_unique_id)

That will get you the records between your timestamps that don't occur after the limiting time stamp. Note that if you want records that equal the filtering timestamps you can use the between keyword instead of < and >.

To eliminate you can use another subquery:

AND t1.timestamp = (SELECT MAX(timestamp)
                FROM my_table AS t2
                WHERE timestamp < @limiting_timestamp
                  AND timestamp > @some_timestamp_in_the_past
                GROUP BY unique_id)