Skip to content
Advertisement

group by id and random sample by id from two tables in big query sql

I have a 2 tables with the same structure:

table1

table2

I want to randomly sample data from table1 and table2 based on id. so basically, sample every observation for a random sample of ids from table1 and and every observation from random sample of ids from table2 so that 50 ids are from table 1 and 50 are from table 2 . any idea on how to do this on big query SQL?

en example would be the following where I want to get 3 ids from table1 and 3 from table2

randomly ids 1,2,3 are selected from table1, ids 101, 110 and 103 are selected from table2

the resulting table is then:

so basically any observation from table1 with id 1,2,3 and any observation from table2 with id 101, 103, 110 are selected and put in the same table: so the passages are two: first randomly select a certain number of ids from table1, and a certain number of ids from table2, then I select any observation corresponding to those ids from both tables and I join them in the same table

Advertisement

Answer

If you want 50 ids from each table, then you can limit them using subqueries:

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement