Skip to content
Advertisement

How to calculate weekly and monthly appearances of distinct string values in SQL Google Big Query?

I am new to SQL, and I have a dataset with date values and domain column. Domain column only contains values ‘personal’ and ‘business’. What I am trying to accomplish is to calculate weekly and monthly rolling count for each domain type.

What I thought of doing is creating 2 separate columns – is_personal and is_business – with values 1 for rows where domain_type is of appropriate value. For example, if domain_type is ‘personal’, then there would be value of 1 in is_personal column. Otherwise, 1 would be in the row of is_business. Then, I was going to calculate the rolling sum.

However, I wonder if I can avoid creating additional columns and perform weekly and monthly rolling count straight from the string column in Google Big Query.

What I tried so far is “grouping by” date column using DATE_TRUNC(CAST(created_at AS date), ISOWEEK) to ‘roll-up’ dates by week periods. When I try any rolling functions on the domain_type column, I get many errors. Some are related to trying functions not recognizable by Google Big Query, some related to the fact that I am working with the string column, and so on.

The ultimate goal I am trying to achieve is calculate weekly and monthly rolling count of ‘business’ and ‘personal’ domain types. Please let me know if I can provide additional information that will help. Thank you!

Current look of data:

Let’s say that during the week of 2017-10-02, a total of 10 users signed up with personal email address and a total of 20 signed up with business email address. During the week of 2017-10-09, a total of 25 signed up with personal email and 30 signed up with business email. Hence, for 2 weeks, rolling count for personal domain type is 35, and rolling count for business domain type is 50.

The output I am trying to achieve:

Advertisement

Answer

Below is for BigQuery Standard SQL

if to apply to sample data from your question – output is

What if, for one particular week, there is no data on dow=2 but there is data for the other days?

Good point, somehow I assumed that at least one entry per day will exist :o)

See version below that does not have this dependency

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement