I want to select new distinct values and track them over time.
I have a table where each row represents a score awarded to a particular person. – timestamp (when the score was awarded) – name (which person received the score) – score (what score the person received)
I want the result to look like:
The above table should be interpreted as how many new distinct names appear in each day.
Because 6-NOV is the first day, all the names are new hence 3 new names.
On 7-NOV Michael is the only new name so the value is 1.
On 8-NOV we have 3 new names (Don, Alex, Tina)
And on 9-NOV 0 new names appear a Jimmy and Sara have both been score before.
Thanks for the help
Advertisement
Answer
Find the minimum timestamp for each name and then count how many names in each timestamp
select timestamp, count(*) as new_names from (select name, min(timestamp) as timestamp from mytable group by name) group by timestamp order by timestamp
To include all days even without any names
select t.timestamp, nvl(new_names,0) as new_names from (select timestamp, count(*) as new_names from (select name, min(timestamp) as timestamp from mytable group by name) group by timestamp) c RIGHT OUTER JOIN (select distinct timestamp from mytable) t ON c.timestamp = t.timestamp order by t.timestamp
To include dates that don’t appear in the table at all you need to have a list of dates from a calendar somewhere and then put that table instead of the subquery I have RIGHT OUTER JOINed to
You can do this
select t.timestamp, nvl(new_names,0) as new_names from (select timestamp, count(*) as new_names from (select name, min(timestamp) as timestamp from mytable group by name) group by timestamp) c RIGHT OUTER JOIN ( SELECT TRUNC (SYSDATE - ROWNUM - 1) dt FROM DUAL CONNECT BY ROWNUM < 366 ) t ON c.timestamp = t.timestamp order by t.timestamp
But you’d have to adjust the -1 and 366 to be the date range you wanted and it’s much more standard to use a calendar that already exists in your database