SQL/Snowflake Sampling with specific probability

Question

Suppose I have table 1 below, how can I select the values from table 1 with the specified probabilities, where each probability is the chance of the respective value getting selected? Table 1: Group ...

Accepted Answer

Give each group a consecutive range. For example, for 15%, the range will be between 30 and 45.Pick a random number between 0 and 100.Find in which range that random number falls:create or replace temp table probsas select 'a' id, 1 value, 20 probunion all select 'a', 2, 30 union all select 'a', 3, 40union all select 'a', 4, 10union all select 'b', 1, 5 union all select 'b', 2, 7 union all select 'b', 3, 8union all select 'b', 4, 80;with calculated_ranges as (    select *, range_prob2-prob range_prob1    from (        select *, sum(prob) over(partition by id order by prob) range_prob2        from probs    ))select id, random_draw, value, probfrom (  select id, any_value(uniform(0, 100, random())) random_draw   from probs group by id) ajoin calculated_ranges busing (id)where range_prob1<=random_draw and range_prob2>random_draw;

Advertisement

Answer