result of my query is being used in aws quicksight. even though quicksight offers percentileCont() which does the job for us I want to use it in the query instead of using calculated field.
eventually what I want to do is create a point column where
under 25percentile -> 0 under 50 percentile -> 1 under 75 percentile -> 2 rest -> 3
depending on a column that ranges from [a, b].
Right now I find out value at each percentile and manually create a range
With table as ( SELECT * , cast(date_diff('day', last_transaction, current_date) as double) as col ) SELECT * , case when col between 0 and 25 then 0 when col between 26 and 66 then 1 when col between 67 and 193 then 2 when col >= 194 then 3 end as point FROM table;
however I want to make it dynamic so instead of [0,25] would be something like [min(col), 25percentile(col)].
above query outputs
col point 333 3 166 2 96 1 .
With NTILE() added Thanks to @Gordon Linoff
With table as ( SELECT * , cast(date_diff('day', last_transaction, current_date) as double) as col ) SELECT * , case when col between 0 and 25 then 0 when col between 26 and 66 then 1 when col between 67 and 193 then 2 when col >= 194 then 3 end as point , NTILE(4) over(order by col) as pt FROM table;
outputs
col point 0 1 0 1 0 1 .
seems to mess up col calculation
Advertisement
Answer
You are pretty much describing the ntile()
function:
SELECT t.*,, NTILE(4) OVER (ORDER BY col) - 1 as point FROM table;
Two caveats:
NTILE(<n>)
returns values between 1 and nNTILE()
makes sure the resulting tiles are equal. That means that values on the boundaries could end up in different bins.
An alternative that puts values in separate bins (but the bins might have different sizes) is percent_rank()
. In your case:
SELECT t.*,, CEILING(PRECENT_RANK() OVER (ORDER BY col) * 4) - 1 as point FROM table;