SQL QUERIES
customers = pd.DataFrame({'customer_id': {0: 5386596, 1: 32676876}, 'created_at': {0: Timestamp('2017-01-27 00:00:00'), 1: Timestamp('2018-06-07 00:00:00')}, 'venture_code': {0: 'MY', 1: 'ID'}}) visits = Pd.DataFrame({'customer_id': {0: 3434886, 1: 10053}, 'date': {0: Timestamp('2016-10-02 00:00:00'), 1: Timestamp('2017-12-14 00:00:00')}}) orders = Pd.DataFrame({'order_id': {0: 112525, 1: 112525}, 'date': {0: Timestamp('2019-02-01 00:00:00'), 1: Timestamp('2019-02-01 00:00:00')}, 'sku': {0: 'SA108SH89OLAHK', 1: 'RO151AA60REHHK'}, 'customer_id': {0: 46160566, 1: 46160566}}) products = Pd.DataFrame({'sku': {0: 'SA108SH89OLAHK', 1: 'RO151AA60REHHK'}, 'brand': {0: 1, 1: 1}, 'supplier': {0: 'A', 1: 'B'}, 'category': {0: 'Mapp', 1: 'Macc'}, 'price': {0: 15, 1: 45}}) segment = Pd.DataFrame({'Age Range': {0: '<20', 1: '<20'}, 'Gender': {0: 'female', 1: 'female'}, 'Category': {0: 'Wsho', 1: 'Wapp'}, 'Discount %': {0: 0.246607432, 1: 0.174166503}, 'NMV': {0: 2509.580375, 1: 8910.447587}, '# Items': {0: 169, 1: 778}, '# Orders': {0: 15, 1: 135}}) buying = Pd.DataFrame({'Supplier Name': {0: 'A', 1: 'A'}, 'Brand Name': {0: 1, 1: 2}, '# SKU': {0: 506, 1: 267}, '# Item Before Return': {0: 5663, 1: 3256}, '# Item Returned': {0: 2776, 1: 1395}, 'Margin %': {0: 0.266922793, 1: 0.282847894}, 'GMV': {0: 191686.749171408, 1: 115560.037075292}})
Using SQL or Pandas, please tell me how to
1. Compare the monthly sales (GMV) trend in Q4 2019, across all countries (venture_code)
2. Show the top 10 brands for each product category, based on total sales (GMV)
I wrote but got the query wrong!
SELECT category, SUM(GMV) as Total_Sales FROM products INNER JOIN buying ON products.brand = buying.[Brand Name]
Advertisement
Answer
Concerning the error, you have a space in the column name.
In SQL, if the column has a space, use brackets to wrap the column name:
MyTable.[My Column]
In your code, use this SQL:
SELECT category, SUM(GMV) as Total_Sales FROM products INNER JOIN buying ON products.brand = buying.[Brand Name]
I don’t have access to your data, so I can’t test, but I think these queries are correct. You may need to tweak them some.
Part 1:
select c.venture_code, sum(b.GMV) GMVSum from customers c join orders o on c.customer_id = o.customer_id join products p on o.skuv=p.sku join buying b on p.brand = b.[Brand Name] and p.supplier = b.[Supplier Name] where o.date >= '2019-10-01' and o.date <= '2019-12-31' -- 2019 4th qtr group by c.venture_code
Part 2:
select * from (select *, RANK() over (PARTITION BY category,brand order by GMV) rk from (select p.brand, p.category, b.GMV from products p join buying b on p.brand = b.[Brand Name] and p.supplier = b.[Supplier Name]) x) xx where rk <= 10