I want to do a special query to evaluate team members. Each member will have a score, and those with score that is greater than 80th percentile of that team’s score will get a bonus. However, I want to add a condition so that the 80th percentile score is calculated based only on those scores > 0.
For example, if team A has
[0, 0, 0.6, 0.6, 0.8, 0.8]
then the percentile will be calculated using
[0.6, 0.6, 0.8, 0.8]
which the result will be 0.8. Else, if the team only has scores = 0, then team_80th_score will be 0.
The table looks like this, with the team_80th_score* is the desired result.
team| member | score | team_80th_score* ----+-----------+------+-------- A | Alex | 0 | 0.8 A | Abby | 0 | 0.8 A | Hunt | 0.6 | 0.8 A | Tyler | 0.6 | 0.8 A | Jack | 0.8 | 0.8 A | Mile | 0.8 | 0.8 B | John | 0 | 0 B | Amy | 0 | 0 B | Alice | 0 | 0
I use Hive SQL, and understand that building on this fundamental window function will be the way to go
select team, member, score, percentile_approx(score, 0.8) over (partition by team) as team_80th_score from table;
But I don’t know how to include the conditions to just consider score > 0 (for a case like team A), and if sum(score) group by team is 0 then 0 end as team_80th_score (for a case like team B).
What would you recommend me to do in this case?
Advertisement
Answer
Hmmmm . . . One method is to include logic in the partition by
. Note that this returns meaningless values when score = 0
:
select team, member, score, percentile_approx(score, 0.8) over (partition by team, (case when score > 0 then 1 else 0 end) as team_80th_score from table;
To fix that, use an outer case
expression:
select team, member, score, (case when score > 0 then percentile_approx(score, 0.8) over (partition by team, (case when score > 0 then 1 else 0 end)) end) as team_80th_score from table;