I am struggling with a sampling issue using Teradata
Below is the format of the data
ID Group Rank 1 dog 1 1 cat 1 1 lion 1 1 elephant 2 2 dog 1 2 cat 1 2 lion 1 2 elephant 1 3 dog 1 3 cat 2 3 lion 1 3 elephant 1 4 dog 2 4 cat 1 4 lion 1 4 elephant 1 ...
I would ideally like to return a sample number for each entry in Group but with only unique values from ID.
Below is the current query I produced but this returns duplicates for ID
SELECT ID, Group FROM Table WHERE rank = 1 SAMPLE WHEN group = 'dog' then 10 WHEN group = 'cat' then 10 WHEN group = 'elephant' then 5 WHEN group = 'lion' then 5 END
Advertisement
Answer
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END