I have a 2 tables with the same structure:
table1
id text var 1 "bla bla" 100 1 "blabla1" 30 2 "qweweqty" 0 2 etc... 7 3 3 1 .. 100
table2
id text var 101 "bla bla" 10 101 "bla1" 60 101 "bla" 5 103 etc... 102 103 102 110 .. 200
I want to randomly sample data from table1 and table2 based on id. so basically, sample every observation for a random sample of ids from table1 and and every observation from random sample of ids from table2 so that 50 ids are from table 1 and 50 are from table 2 . any idea on how to do this on big query SQL?
en example would be the following where I want to get 3 ids from table1 and 3 from table2
randomly ids 1,2,3 are selected from table1, ids 101, 110 and 103 are selected from table2
the resulting table is then:
id. text var 1. .. .. 1 2 2 3 3 1 101 101 101 103 103 110
so basically any observation from table1 with id 1,2,3 and any observation from table2 with id 101, 103, 110 are selected and put in the same table: so the passages are two: first randomly select a certain number of ids from table1, and a certain number of ids from table2, then I select any observation corresponding to those ids from both tables and I join them in the same table
Advertisement
Answer
If you want 50 ids from each table, then you can limit them using subqueries:
select t1.* from t1 join (select distinct id from t1 order by rand() ) ids on t1.id = ids.id union all select t2.* from t2 join (select distinct id from t2 order by rand() ) ids on t2.id = ids.id