Skip to content
Advertisement

calculating percentiles in aws athena

result of my query is being used in aws quicksight. even though quicksight offers percentileCont() which does the job for us I want to use it in the query instead of using calculated field.

eventually what I want to do is create a point column where

depending on a column that ranges from [a, b].

Right now I find out value at each percentile and manually create a range

however I want to make it dynamic so instead of [0,25] would be something like [min(col), 25percentile(col)].

above query outputs

With NTILE() added Thanks to @Gordon Linoff

outputs

seems to mess up col calculation

Advertisement

Answer

You are pretty much describing the ntile() function:

Two caveats:

  • NTILE(<n>) returns values between 1 and n
  • NTILE() makes sure the resulting tiles are equal. That means that values on the boundaries could end up in different bins.

An alternative that puts values in separate bins (but the bins might have different sizes) is percent_rank(). In your case:

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement