Skip to content
Advertisement

Use SQL to ensure I have data for each day of a certain time period

I’m looking to only select one data point from each date in my report. I want to ensure each day is accounted for and has at least one row of information, as we had to do a few different things to move a large data file into our data warehouse (import one large Google Sheet for some data, use Python for daily pulls of some of the other data – want to make sure no date was left out), and this data goes from now through last summer. I could do a COUNT DISTINCT clause to just make sure the number of days between the first data point and yesterday (the latest data point), but I want to verify each day is accounted for. Should mention I am in BigQuery. Also, an example of the created_at style is: 2021-02-09 17:05:44.583 UTC

This is what I have so far:

SELECT FIRST(created_at)
FROM 'large_table'
ORDER BY created_at

**I know FIRST is probably not the best clause for this case, and it’s currently acting to grab the very first data point in created_at, but just as a jumping-off point.

Advertisement

Answer

You can use aggregation:

select any_value(lt).*
from large_table lt
group by created_at
order by min(created_at);

Note: This assumes that created_at is a date — or at least only has one value per date. You might need to convert it to a date:

select any_value(lt).*
from large_table lt
group by date(created_at)
order by min(created_at);
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement