I am struggling with a sampling issue using Teradata
Below is the format of the data
x
ID Group Rank
1 dog 1
1 cat 1
1 lion 1
1 elephant 2
2 dog 1
2 cat 1
2 lion 1
2 elephant 1
3 dog 1
3 cat 2
3 lion 1
3 elephant 1
4 dog 2
4 cat 1
4 lion 1
4 elephant 1
I would ideally like to return a sample number for each entry in Group but with only unique values from ID.
Below is the current query I produced but this returns duplicates for ID
SELECT ID, Group FROM Table
WHERE rank = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
Advertisement
Answer
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END