Not certain how to title this question. Describing is a bit of a challenge, too. I have shaped my postgres data as follows, where a location and geo is further broken out into micro geo’s based on dist_grp column. Dist_grp values range from 1 to 6 and have fixed allocation values (1=0.55, 2=0.18, 3=0.15, 4=0.09, 5=0.025, 6=0.005). So, the sum of the allocation for each geo and location combination is 1. I don’t have any problems as long as there are no gaps in data.
I have found that not all location/geo combinations have all 6 dist_grp’s filled. That is, not all have a micro-geo. I created a cross join table so that I could at least tag those without micro-geos.
So, Location A in Geo 609 does not have any micro-geo’s in dist_grps 1 and 2. Therefore, I’m trying to determine how to increase the allocation in dist_grp 3. So, dist_grp 3 in Location A in Geo 609 should have a desired_allocation of 0.88 (0.55 + 0.18 + 0.15).
location geo dist_grp allocation has_micro_geo allocation_desired A 604 1 0.55 1 0.55 A 604 2 0.18 1 0.18 A 604 3 0.15 1 0.15 A 604 4 0.09 1 0.09 A 604 5 0.025 1 0.025 A 604 6 0.005 1 0.005 A 609 1 0.55 0 0 A 609 2 0.18 0 0 A 609 3 0.15 1 0.88 A 609 4 0.09 1 0.09 A 609 5 0.025 1 0.025 A 609 6 0.005 1 0.005 B 604 1 0.55 1 0.55 B 604 2 0.18 1 0.18 B 604 3 0.15 1 0.15 B 604 4 0.09 1 0.09 B 604 5 0.025 0 0 B 604 6 0.005 1 0.03
Or, maybe easier to see as an image:
I have create a fiddle here: https://www.db-fiddle.com/f/i1vX8ZgBTeEQYApud1wjSA/1
I have stared at this for several hours and don’t have a reasonable bit of code to show. I’m trying to determine how to create that “allocation_desired” column. Any hints are welcome.
Advertisement
Answer
This is a harder problem to explain than to solve. Basically, it is a type of gaps-and-islands problem, when you want to sum within islands. You can define the islands based on counts of non-zero values.
Specifically, you can assign groups based on the count of non-zero has_micro_geo
values on or after each row (within each location). Then, you can sum within each group.
The query is actually simpler than the explanation sounds:
select la.*, (case when has_micro_geo > 0 then sum(allocation) over (partition by location_code, geo, grp) else 0 end) as imputed_allocation from (select la.*, count(*) filter (where has_micro_geo <> 0) over (partition by location_code, geo order by dist_grp desc) as grp from location_allocation la ) la order by location_code, geo;
Here is a db<>fiddle.