Skip to content
Advertisement

RANK() function with over is creating ranks dynamically for every run

I am creating ranks for partitions of my table. Partitions are performed by name column with ordered by its transaction value. While I am generating these partitions and checking count for each of the ranks, I get different number in each rank for every query run I do.

select count(*)  FROM (
                        --
                        -- Sort and ranks the element of RFM
                        --
                        SELECT
                            *,
                            RANK() OVER (PARTITION BY name ORDER BY date_since_last_trans desc) AS rfmrank_r,
                        FROM (
                            SELECT
                                name,
                                id_customer,
                                cust_age,
                                gender,
                                DATE_DIFF(entity_max_date, customer_max_date, DAY ) AS date_since_last_trans,
                                txncnt,
                                txnval,
                                txnval / txncnt AS avg_txnval
                            FROM 
                            (
                                SELECT
                                    name,
                                    id_customer,
                                    MAX(cust_age) AS cust_age,
                                 COALESCE(APPROX_TOP_COUNT(cust_gender,1)[OFFSET(0)].VALUE, MAX(cust_gender)) AS gender,
                                    MAX(date_date) AS customer_max_date,
                                    (SELECT MAX(date_date) FROM xxxxx) AS entity_max_date,
                                    COUNT(purchase_amount) AS txncnt,
                                    SUM(purchase_amount) AS txnval
                                FROM
                                    xxxxx
                                WHERE
                                date_date > (
                                    SELECT
                                        DATE_SUB(MAX(date_date), INTERVAL 24 MONTH) AS max_date
                                    FROM
                                        xxxxx)
                                AND cust_age >= 15
                                AND cust_gender IN ('M','F')
                                GROUP BY
                                    name,
                                    id_customer
                            )
                        )
                    )
          group by rfmrank_r

For 1st run I am getting

Row f0
1 3970
2 3017
3 2116
4 2118

For 2nd run I am getting

Row f0
1 4060
2 3233
3 2260
4 2145

What can be done, If I need to get same number of partitions getting ranked same for each run Edit: Sorry for the blurring of fields This is the output of field “`query to get this column““

Advertisement

Answer

The RANK window function determines the rank of a value in a group of values. Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers.

For example, if two rows are ranked 1, the next rank is 3.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement