Skip to content
Advertisement

Fill Null Values with Last Previous Value and add 1 as a continuous integer for every value going forward – Big Query

I have these two columns in Big Query: budget_id and activity:

budget_id activity region execution window
000507_Corporate 507 Corporate 2022
000508_AMERICAS 508 AMERICAS 2022Q2
NULL NULL c b
NULL NULL c b

The budget_id comes from a Google Sheet which is manually inputted by a stakeholder. I’m trying to change this that going forward, I can automate this myself. Everything from 000001 to 000508 mostly comes from a spreadsheet. I’m trying to automated this going forward. However, this has been trickier than I thought.

WITH blah AS (
select *,
IF(activity IS NULL, last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) + 1, activity) AS new_activity
from `rax-datamart-dev.marketing.auto_budget_framework`)
select budget_id,
activity, 
region,
execution_window,
new_activity
from blah
order by activity NULLS last
budget_id activity region execution window new_activity
000507_Corporate 507 Corporate 2022 507
000508_AMERICAS 508 AMERICAS 2022Q2 508
NULL NULL c b 509
NULL NULL c b 509

As you can see, I’m able to get 509 so that’s promising. Unfortunately, I was hoping the next NULL value row would be 510. And then if there is another NULL value, 511, 512, 513 etc etc.

Is what I’m attempting even possible? I feel like I’m missing something but if anyone could help out, it would be greatly appreciated.

Advertisement

Answer

I’m using similar data as you posted in your question.

with data as (
 SELECT 'Americas_1 '  as id,1 as activity, 'America' as region union all
 SELECT 'Americas_2 '  as id,2 as activity, 'America' as region union all
 SELECT 'Americas_3 '  as id,3 as activity, 'America' as region union all
 SELECT 'Americas_4 '  as id,4 as activity, 'America' as region union all
 SELECT null  as id,null as activity, 'c' as region union all
 SELECT null  as id,null as activity, 'a' as region
)

In the subquery data, I just have the sample data. In the second subquery data2, I added a column number, this column adds the row_number when the activity column is null, if it is not null add a 0. The column new_activity just puts the same numbers when activity is not null.

Here you can see the complete query.

with data as (
 SELECT 'Americas_1 '  as id,1 as activity, 'America' as region union all
 SELECT 'Americas_2 '  as id,2 as activity, 'America' as region union all
 SELECT 'Americas_3 '  as id,3 as activity, 'America' as region union all
 SELECT 'Americas_4 '  as id,4 as activity, 'America' as region union all
 SELECT null  as id,null as activity, 'c' as region union all
 SELECT null  as id,null as activity, 'a' as region
), data2 as (
 
select id,activity, region, 
IF (activity is null,ROW_NUMBER() OVER(ORDER BY activity),0) as number,
IF(activity IS NULL,
last_value(activity ignore nulls) over (order by activity RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ,
activity ) as new_activity
from data
group by id,activity, region
order by activity asc nulls last
)

This query displays these columns ID, activity, region; and in the last column new_Activity, I sum the column number and new_activity from the subquery data2

select id, activity, region, (number+new_activity) as new_Activity from data2
order by activity asc nulls last

This is the output of the query.

enter image description here

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement