Skip to content
Advertisement

How to distribute values when prior rank is zero

Not certain how to title this question. Describing is a bit of a challenge, too. I have shaped my postgres data as follows, where a location and geo is further broken out into micro geo’s based on dist_grp column. Dist_grp values range from 1 to 6 and have fixed allocation values (1=0.55, 2=0.18, 3=0.15, 4=0.09, 5=0.025, 6=0.005). So, the sum of the allocation for each geo and location combination is 1. I don’t have any problems as long as there are no gaps in data.

I have found that not all location/geo combinations have all 6 dist_grp’s filled. That is, not all have a micro-geo. I created a cross join table so that I could at least tag those without micro-geos.

So, Location A in Geo 609 does not have any micro-geo’s in dist_grps 1 and 2. Therefore, I’m trying to determine how to increase the allocation in dist_grp 3. So, dist_grp 3 in Location A in Geo 609 should have a desired_allocation of 0.88 (0.55 + 0.18 + 0.15).

    location    geo dist_grp    allocation  has_micro_geo   allocation_desired
    A   604 1   0.55    1   0.55
    A   604 2   0.18    1   0.18
    A   604 3   0.15    1   0.15
    A   604 4   0.09    1   0.09
    A   604 5   0.025   1   0.025
    A   604 6   0.005   1   0.005
    A   609 1   0.55    0   0
    A   609 2   0.18    0   0
    A   609 3   0.15    1   0.88
    A   609 4   0.09    1   0.09
    A   609 5   0.025   1   0.025
    A   609 6   0.005   1   0.005
    B   604 1   0.55    1   0.55
    B   604 2   0.18    1   0.18
    B   604 3   0.15    1   0.15
    B   604 4   0.09    1   0.09
    B   604 5   0.025   0   0
    B   604 6   0.005   1   0.03

Or, maybe easier to see as an image:

enter image description here

I have create a fiddle here: https://www.db-fiddle.com/f/i1vX8ZgBTeEQYApud1wjSA/1

I have stared at this for several hours and don’t have a reasonable bit of code to show. I’m trying to determine how to create that “allocation_desired” column. Any hints are welcome.

Advertisement

Answer

This is a harder problem to explain than to solve. Basically, it is a type of gaps-and-islands problem, when you want to sum within islands. You can define the islands based on counts of non-zero values.

Specifically, you can assign groups based on the count of non-zero has_micro_geo values on or after each row (within each location). Then, you can sum within each group.

The query is actually simpler than the explanation sounds:

select la.*,
       (case when has_micro_geo > 0
             then sum(allocation) over (partition by location_code, geo, grp)
             else 0
        end) as imputed_allocation
from (select la.*,
             count(*) filter (where has_micro_geo <> 0) over (partition by location_code, geo order by dist_grp desc) as grp
      from location_allocation la
     ) la
order by location_code, geo;

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement