group by id and random sample by id from two tables in big query sql

I have a 2 tables with the same structure:

table1

id text        var  
1  "bla bla"   100
1  "blabla1"    30
2  "qweweqty"    0
2    etc...
7
3
3
1
..
100

​x
 
id text        var  1  "bla bla"   1001  "blabla1"    302  "qweweqty"    02    etc...7331..100​

table2

id text        var  
101 "bla bla"   10
101  "bla1"      60
101  "bla"    5
103    etc...
102
103
102
110
..
200

 
id text        var  101 "bla bla"   10101  "bla1"      60101  "bla"    5103    etc...102103102110..200​

I want to randomly sample data from table1 and table2 based on id. so basically, sample every observation for a random sample of ids from table1 and and every observation from random sample of ids from table2 so that 50 ids are from table 1 and 50 are from table 2 . any idea on how to do this on big query SQL?

en example would be the following where I want to get 3 ids from table1 and 3 from table2

randomly ids 1,2,3 are selected from table1, ids 101, 110 and 103 are selected from table2

the resulting table is then:

id. text var
 1.   ..  ..
 1
 2
 2
 3
 3
 1
 101
 101
 101
 103
 103
 110

 
id. text var 1.   ..  .. 1 2 2 3 3 1 101 101 101 103 103 110​

so basically any observation from table1 with id 1,2,3 and any observation from table2 with id 101, 103, 110 are selected and put in the same table: so the passages are two: first randomly select a certain number of ids from table1, and a certain number of ids from table2, then I select any observation corresponding to those ids from both tables and I join them in the same table

Answer

If you want 50 ids from each table, then you can limit them using subqueries:

select t1.*
from t1 join
     (select distinct id
      from t1
      order by rand()
     ) ids
     on t1.id = ids.id
union all
select t2.* 
from t2 join
     (select distinct id
      from t2
      order by rand()
     ) ids
     on t2.id = ids.id

 
select t1.*from t1 join     (select distinct id      from t1      order by rand()     ) ids     on t1.id = ids.idunion allselect t2.* from t2 join     (select distinct id      from t2      order by rand()     ) ids     on t2.id = ids.id​

Advertisement

Answer