Skip to content
Advertisement

How to sample from different values in a column but only return records that are unique from another column?

I am struggling with a sampling issue using Teradata

Below is the format of the data

ID    Group     Rank
1     dog       1 
1     cat       1 
1     lion      1  
1     elephant  2 
2     dog       1 
2     cat       1 
2     lion      1 
2     elephant  1 
3     dog       1
3     cat       2 
3     lion      1 
3     elephant  1 
4     dog       2 
4     cat       1 
4     lion      1 
4     elephant  1 
... 

I would ideally like to return a sample number for each entry in Group but with only unique values from ID.

Below is the current query I produced but this returns duplicates for ID

SELECT ID, Group FROM Table 
WHERE rank = 1 
SAMPLE 
 WHEN group = 'dog' then 10
 WHEN group = 'cat' then 10
 WHEN group = 'elephant' then 5
 WHEN group = 'lion' then 5
END

Advertisement

Answer

with cte as
 (
   SELECT ID, Group,
      random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
   FROM Table 
   WHERE rank = 1 
 )
SELECT ID, Group
FROM cte
QUALIFY 
   ROW_NUMBER() -- get one random row per ID
   OVER (PARTITION BY ID 
         ORDER BY rnd) = 1
SAMPLE 
 WHEN group = 'dog' then 10
 WHEN group = 'cat' then 10
 WHEN group = 'elephant' then 5
 WHEN group = 'lion' then 5
END
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement