I’m looking to only select one data point from each date in my report. I want to ensure each day is accounted for and has at least one row of information, as we had to do a few different things to move a large data file into our data warehouse (import one large Google Sheet for some data, use Python for daily pulls of some of the other data – want to make sure no date was left out), and this data goes from now through last summer. I could do a COUNT DISTINCT clause to just make sure the number of days between the first data point and yesterday (the latest data point), but I want to verify each day is accounted for. Should mention I am in BigQuery. Also, an example of the created_at style is: 2021-02-09 17:05:44.583 UTC
This is what I have so far:
SELECT FIRST(created_at) FROM 'large_table' ORDER BY created_at
**I know FIRST is probably not the best clause for this case, and it’s currently acting to grab the very first data point in created_at, but just as a jumping-off point.
Advertisement
Answer
You can use aggregation:
select any_value(lt).* from large_table lt group by created_at order by min(created_at);
Note: This assumes that created_at
is a date
— or at least only has one value per date. You might need to convert it to a date:
select any_value(lt).* from large_table lt group by date(created_at) order by min(created_at);