Skip to content
Advertisement

SQL query for top N people as per spending in each state for top M states

I have a table with 4 columns (about 10M rows)

          time          | person_id | amount_spent | state
------------------------+-----------+--------------+-------
 2010-01-02 23:58:08.71 |     1     |      10      |   AZ
 2010-01-02 23:58:08.71 |     2     |      20      |   AR
 2010-01-02 23:58:08.72 |     1     |      10      |   AZ
 2010-01-02 23:58:08.72 |     3     |      15      |   CA
 2010-01-02 23:58:08.74 |     4     |      30      |   AZ
 2010-01-02 23:58:08.75 |     5     |      30      |   AR
 ...                    

I want to get the “Top N people as per spending in each state for top M states” using an SQL query on the table. I am using PostgreSQL.

Can someone help me out?

Note: There are multiple entries per person and per state. However, 1 person can live only in 1 state. People are represented by their person_id.

Edit:

I believe a JOIN will be required with 2 tables: A and B.

Table A: Top M states as per spending

SELECT SUM(amount_spent) AS total_amt_spent, state
FROM <table_name>
GROUP BY state ORDER BY total_amt_spent DESC LIMIT M

Table B: TOP N people as per spending per country using the solution here

Then, possibly one could join the two tables (A and B) based on states in Table A.

However there seems to be a flaw in this solution OR a better solution could exist. Even if this solution is good enough, I am not able to write the Query esp for the part where one has to JOIN the tables based on states in Table A.

Advertisement

Answer

You have two things that you need to accomplish. One is aggregating by state and getting the “top m” of those. The second is aggregating by person and getting the “top n” of those per state.

This suggests two aggregations:

select *
from (select state, sum(amount_spent) as state_spent,
             row_number() over (order by sum(amount_spent) desc) as seqnum
      from t
      group by state
     ) s join
     (select state, person_id, sum(amount_spent) as person_spent,
             row_number() over (partition by state order by sum(amount_spent) desc) as seqnum
      from t
      group by state, person_id
     ) sp
     using (state)
where sp.seqnum <= <n> and
      s.seqnum <= <m>
order by s.seqnum, sp.seqnum
     on s.state = sp.state;
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement