Let say we have a table in this format:
x
id strata sample_size
1 s1 2
2 s1 2
3 s1 2
4 s2 1
5 s2 1
.
From this example, we see two stratas s1 and s2. What I want to do is stratified sampling and the sample size is the last column. For example, I want to randomly sample 2 instances from s1 and 1 random sample from s2. Any help is appreciated. Please keep in mind that I have many strata and hard coding is out of the picture. I
Advertisement
Answer
You can use row_number()
:
select t.*
from (select t.*,
row_number() over (partition by strata order by rand()) as seqnum
from t
) t
where seqnum <= strata_size;