Use SQL to ensure I have data for each day of a ce…

I’m looking to only select one data point from each date in my report. I want to ensure each day is accounted for and has at least one row of information, as we had to do a few different things to move a large data file into our data warehouse (import one large Google Sheet for some data, use Python for daily pulls of some of the other data – want to make sure no date was left out), and this data goes from now through last summer. I could do a COUNT DISTINCT clause to just make sure the number of days between the first data point and yesterday (the latest data point), but I want to verify each day is accounted for. Should mention I am in BigQuery. Also, an example of the created_at style is: 2021-02-09 17:05:44.583 UTC

This is what I have so far:

SELECT FIRST(created_at)
FROM 'large_table'
ORDER BY created_at

​x
 
SELECT FIRST(created_at)FROM 'large_table'ORDER BY created_at​

**I know FIRST is probably not the best clause for this case, and it’s currently acting to grab the very first data point in created_at, but just as a jumping-off point.

Answer

You can use aggregation:

select any_value(lt).*
from large_table lt
group by created_at
order by min(created_at);

 
select any_value(lt).*from large_table ltgroup by created_atorder by min(created_at);​

Note: This assumes that created_at is a date — or at least only has one value per date. You might need to convert it to a date:

select any_value(lt).*
from large_table lt
group by date(created_at)
order by min(created_at);

 
select any_value(lt).*from large_table ltgroup by date(created_at)order by min(created_at);​

Use SQL to ensure I have data for each day of a certain time period

Advertisement

Answer