I have a simple table with only 2 columns:
x
| name | domain |
I need a SQL query to get this data:
| domain | names count | sample name 1 | 2 | 3 | 4 | 5 |
So, the first part is simple, just grouping by domain and getting names count. But the second part, where I need to take 5 samples (they can be random) of names in the group – I have no idea how to resolve it.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
SELECT domain, names_count,
samples[OFFSET(0)] AS sample_name_1,
samples[SAFE_OFFSET(1)] AS sample_name_2,
samples[SAFE_OFFSET(2)] AS sample_name_3,
samples[SAFE_OFFSET(3)] AS sample_name_4,
samples[SAFE_OFFSET(4)] AS sample_name_5
FROM (
SELECT domain,
COUNT(name) names_count,
ARRAY_AGG(name ORDER BY RAND() LIMIT 5) samples
FROM `project.dataset.table`
GROUP BY domain
)