I have these tables below along with the definition. I want to find top 10 products sold in a year after finding counts and without using aggregation and in an optimized way. I want to know if aggregation is still needed or I can accomplish it without using aggregation. Below is the query. Can anyone suggest a better approach.
CREATE TABLE Customer ( id int not null, first_name VARCHAR(30), last_name VARCHAR(30), Address VARCHAR(60), State VARCHAR(30), Phone text, PRIMARY KEY(id) ); CREATE TABLE Product ( ProductId int not null, name VARCHAR(30), unitprice int, BrandID int, Brandname varchar(30), color VARCHAR(30), PRIMARY KEY(ProductId) ); Create Table Sales ( SalesId int not null, Date date, Customerid int, Productid int, Purchaseamount int, PRIMARY KEY(SalesId), FOREIGN KEY (Productid) REFERENCES Product(ProductId), FOREIGN KEY (Customerid) REFERENCES Customer(id) )
Sample Data:
insert into Customer(id, first_name, last_name, address, state, phone) values (1111, 'andy', 'johnson', '123 Maryland Heights', 'MO', 3211451234), (1112, 'john', 'smith', '237 Jackson Heights', 'TX', 3671456534), (1113, 'sandy', 'fleming', '878 Jersey Heights', 'NJ', 2121456534), (1114, 'tony', 'anderson', '789 Harrison Heights', 'CA', 6101456534) insert into Product(ProductId, name, unitprice, BrandId, Brandname) values (1, 'watch',200, 100, 'apple'), (2, 'ipad', 429, 100, 'apple'), (3, 'iphone', 799, 100, 'apple'), (4, 'gear', 300, 110, 'samsung'), (5, 'phone',1000, 110, 'samsung'), (6, 'tab', 250, 110, 'samsung'), (7, 'laptop', 1300, 120, 'hp'), (8, 'mouse', 10, 120, 'hp'), (9, 'monitor', 400, 130, 'dell'), (10, 'keyboard', 40, 130, 'dell'), (11, 'dvddrive', 100, 130, 'dell'), (12, 'dvddrive', 90, 150, 'lg') insert into Sales(SalesId, Date, CustomerID, ProductID, Purchaseamount) values (30, '01-10-2019', 1111, 1, 200), (31, '02-10-2019', 1111, 3, 799), (32, '03-10-2019', 1111, 2, 429), (33, '04-10-2019', 1111, 4, 300), (34, '05-10-2019', 1111, 5, 1000), (35, '06-10-2019', 1112, 7, 1300), (36, '07-10-2019', 1112, 9, 400), (37, '08-10-2019', 1113, 5, 2000), (38, '09-10-2019', 1113, 4, 300), (39, '10-10-2019', 1113, 3, 799), (40, '11-10-2019', 1113, 2, 858), (41, '01-10-2020', 1111, 1, 400), (42, '02-10-2020', 1111, 2, 429), (43, '03-10-2020', 1112, 7, 1300), (44, '04-10-2020', 1113, 7, 2600), (45, '05-10-2020', 1114, 7, 1300), (46, '06-10-2020', 1114, 7, 1300), (47, '07-10-2020', 1114, 9, 800)
Tried this:
SELECT PCY.Name, PCY.Year, PCY.SEQNUM FROM (SELECT P.Name AS Name, Extract('Year' from S.Date) AS YEAR, COUNT(P.Productid) AS CNT, RANK() OVER (PARTITION BY Extract('Year' from S.Date) ORDER BY COUNT(P.Productid) DESC) AS RANK FROM Sales S inner JOIN Product P ON S.Productid = P.Productid ) PCY WHERE PCY.RANK <= 10;
I am seeing this error:
ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function LINE 2: FROM (SELECT P.Name AS Name, Extract('Year' from S.Date) AS ... ^ SQL state: 42803 Character: 52
Advertisement
Answer
I don’t understand why you don’t want to use an aggregate function when you have to aggregate over your data. This query works fine, without any issues on the GROUP BY:
WITH stats AS ( SELECT EXTRACT ( YEAR FROM DATE ) AS y, P.productid, P.NAME, COUNT ( * ) numbers_sold, RANK ( ) OVER ( PARTITION BY EXTRACT ( YEAR FROM DATE ) ORDER BY COUNT ( * ) DESC ) r FROM product P JOIN sales S ON S.Productid = P.Productid GROUP BY 1,2 ) SELECT y , name , numbers_sold FROM stats WHERE r <= 10;
This works because the productid is the primary key that has a functional dependency to the product name.
By the way, tested on version 12, but it should work on older and newer versions as well.