Skip to content
Advertisement

SQL/Snowflake Sampling with specific probability

Suppose I have table 1 below, how can I select the values from table 1 with the specified probabilities, where each probability is the chance of the respective value getting selected?

A possible outcome is (assuming 30 and 25 are selected simply because of their higher probabilities):

I’m trying to solve this on Snowflake and have not been able to through various methods, including partitioning the values and comparing their ranks, as well as using the Uniform function to create random probabilities. Not sure if there’s a more elegant way to do a sampling and partition by Group. The end goal is to have the Value field in Table 1 deduplicated, so that each value is given a chance of getting selected based on their probabilities.

Advertisement

Answer

  • Give each group a consecutive range. For example, for 15%, the range will be between 30 and 45.
  • Pick a random number between 0 and 100.
  • Find in which range that random number falls:

enter image description here

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement