Skip to content
Advertisement

sql query for stratified sampling with dynamic sample size

Let say we have a table in this format:

id   strata sample_size
1    s1     2
2    s1     2
3    s1     2
4    s2     1
5    s2     1
....

From this example, we see two stratas s1 and s2. What I want to do is stratified sampling and the sample size is the last column. For example, I want to randomly sample 2 instances from s1 and 1 random sample from s2. Any help is appreciated. Please keep in mind that I have many strata and hard coding is out of the picture. I

Advertisement

Answer

You can use row_number():

select t.*
from (select t.*,
             row_number() over (partition by strata order by rand()) as seqnum
      from t
     ) t
where seqnum <= strata_size;
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement