I have a simple table with only 2 columns:
| name | domain |
I need a SQL query to get this data:
| domain | names count | sample name 1 | ...2 | ...3 | ...4 | ...5 |
So, the first part is simple, just grouping by domain and getting names count. But the second part, where I need to take 5 samples (they can be random) of names in the group – I have no idea how to resolve it.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL SELECT domain, names_count, samples[OFFSET(0)] AS sample_name_1, samples[SAFE_OFFSET(1)] AS sample_name_2, samples[SAFE_OFFSET(2)] AS sample_name_3, samples[SAFE_OFFSET(3)] AS sample_name_4, samples[SAFE_OFFSET(4)] AS sample_name_5 FROM ( SELECT domain, COUNT(name) names_count, ARRAY_AGG(name ORDER BY RAND() LIMIT 5) samples FROM `project.dataset.table` GROUP BY domain )