I have a table with 4 columns (about 10M rows)
time | person_id | amount_spent | state ------------------------+-----------+--------------+------- 2010-01-02 23:58:08.71 | 1 | 10 | AZ 2010-01-02 23:58:08.71 | 2 | 20 | AR 2010-01-02 23:58:08.72 | 1 | 10 | AZ 2010-01-02 23:58:08.72 | 3 | 15 | CA 2010-01-02 23:58:08.74 | 4 | 30 | AZ 2010-01-02 23:58:08.75 | 5 | 30 | AR ...
I want to get the “Top N people as per spending in each state for top M states” using an SQL query on the table. I am using PostgreSQL.
Can someone help me out?
Note: There are multiple entries per person and per state. However, 1 person can live only in 1 state. People are represented by their person_id.
Edit:
I believe a JOIN will be required with 2 tables: A and B.
Table A: Top M states as per spending
SELECT SUM(amount_spent) AS total_amt_spent, state FROM <table_name> GROUP BY state ORDER BY total_amt_spent DESC LIMIT M
Table B: TOP N people as per spending per country using the solution here
Then, possibly one could join the two tables (A and B) based on states in Table A.
However there seems to be a flaw in this solution OR a better solution could exist. Even if this solution is good enough, I am not able to write the Query esp for the part where one has to JOIN the tables based on states in Table A.
Advertisement
Answer
You have two things that you need to accomplish. One is aggregating by state and getting the “top m” of those. The second is aggregating by person and getting the “top n” of those per state.
This suggests two aggregations:
select * from (select state, sum(amount_spent) as state_spent, row_number() over (order by sum(amount_spent) desc) as seqnum from t group by state ) s join (select state, person_id, sum(amount_spent) as person_spent, row_number() over (partition by state order by sum(amount_spent) desc) as seqnum from t group by state, person_id ) sp using (state) where sp.seqnum <= <n> and s.seqnum <= <m> order by s.seqnum, sp.seqnum on s.state = sp.state;