I have these two columns in Big Query: budget_id and activity:
budget_id | activity | region | execution window |
---|---|---|---|
000507_Corporate | 507 | Corporate | 2022 |
000508_AMERICAS | 508 | AMERICAS | 2022Q2 |
NULL | NULL | c | b |
NULL | NULL | c | b |
The budget_id comes from a Google Sheet which is manually inputted by a stakeholder. I’m trying to change this that going forward, I can automate this myself. Everything from 000001 to 000508 mostly comes from a spreadsheet. I’m trying to automated this going forward. However, this has been trickier than I thought.
WITH blah AS ( select *, IF(activity IS NULL, last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) + 1, activity) AS new_activity from `rax-datamart-dev.marketing.auto_budget_framework`) select budget_id, activity, region, execution_window, new_activity from blah order by activity NULLS last
budget_id | activity | region | execution window | new_activity |
---|---|---|---|---|
000507_Corporate | 507 | Corporate | 2022 | 507 |
000508_AMERICAS | 508 | AMERICAS | 2022Q2 | 508 |
NULL | NULL | c | b | 509 |
NULL | NULL | c | b | 509 |
As you can see, I’m able to get 509 so that’s promising. Unfortunately, I was hoping the next NULL value row would be 510. And then if there is another NULL value, 511, 512, 513 etc etc.
Is what I’m attempting even possible? I feel like I’m missing something but if anyone could help out, it would be greatly appreciated.
Advertisement
Answer
I’m using similar data as you posted in your question.
with data as ( SELECT 'Americas_1 ' as id,1 as activity, 'America' as region union all SELECT 'Americas_2 ' as id,2 as activity, 'America' as region union all SELECT 'Americas_3 ' as id,3 as activity, 'America' as region union all SELECT 'Americas_4 ' as id,4 as activity, 'America' as region union all SELECT null as id,null as activity, 'c' as region union all SELECT null as id,null as activity, 'a' as region )
In the subquery data
, I just have the sample data. In the second subquery data2
, I added a column number
, this column adds the row_number
when the activity column
is null, if it is not null add a 0. The column new_activity
just puts the same numbers when activity is not null.
Here you can see the complete query.
with data as ( SELECT 'Americas_1 ' as id,1 as activity, 'America' as region union all SELECT 'Americas_2 ' as id,2 as activity, 'America' as region union all SELECT 'Americas_3 ' as id,3 as activity, 'America' as region union all SELECT 'Americas_4 ' as id,4 as activity, 'America' as region union all SELECT null as id,null as activity, 'c' as region union all SELECT null as id,null as activity, 'a' as region ), data2 as ( select id,activity, region, IF (activity is null,ROW_NUMBER() OVER(ORDER BY activity),0) as number, IF(activity IS NULL, last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) , activity ) as new_activity from data group by id,activity, region order by activity asc nulls last )
This query displays these columns ID, activity, region
; and in the last column new_Activity
, I sum the column number
and new_activity
from the subquery data2
select id, activity, region, (number+new_activity) as new_Activity from data2 order by activity asc nulls last
This is the output of the query.