Skip to content
Advertisement

SQL Query: Constructing a Control Group

I have two data sets. The first data set contains two (uniquely) identifying characteristics – here ZIP and race – as well as a variable called count. The second data set contains information on individuals – ZIP, race, and some outcome variable. My goal is to construct a subset of the second data set where the number of observations of a certain ZIP/race combination is the count of the first data set. To make it clear:

Data Set #1:

Data Set #2:

The goal is to have a output that returns a subset of data set #2 with 59 white individuals from ZIP 30218, 23 black individuals from ZIP 30218, etc.

Either sample SQL code to use or a general strategy would be helpful. Thank you

Advertisement

Answer

You can use the row_number window function to number the rows by some criteria and then join that to data set 1. Note that I renamed count to n here to avoid using a keyword:

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement