Skip to content
Advertisement

How do I extract these SQL queries from these pandas dataframes?

SQL QUERIES

customers = pd.DataFrame({'customer_id': {0: 5386596, 1: 32676876}, 'created_at': {0: Timestamp('2017-01-27 00:00:00'), 1: Timestamp('2018-06-07 00:00:00')}, 'venture_code': {0: 'MY', 1: 'ID'}})

visits = Pd.DataFrame({'customer_id': {0: 3434886, 1: 10053}, 'date': {0: Timestamp('2016-10-02 00:00:00'), 1: Timestamp('2017-12-14 00:00:00')}})

orders = Pd.DataFrame({'order_id': {0: 112525, 1: 112525}, 'date': {0: Timestamp('2019-02-01 00:00:00'), 1: Timestamp('2019-02-01 00:00:00')}, 'sku': {0: 'SA108SH89OLAHK', 1: 'RO151AA60REHHK'}, 'customer_id': {0: 46160566, 1: 46160566}})

products = Pd.DataFrame({'sku': {0: 'SA108SH89OLAHK', 1: 'RO151AA60REHHK'}, 'brand': {0: 1, 1: 1}, 'supplier': {0: 'A', 1: 'B'}, 'category': {0: 'Mapp', 1: 'Macc'}, 'price': {0: 15, 1: 45}})

segment = Pd.DataFrame({'Age Range': {0: '<20', 1: '<20'},
 'Gender': {0: 'female', 1: 'female'},
 'Category': {0: 'Wsho', 1: 'Wapp'},
 'Discount %': {0: 0.246607432, 1: 0.174166503},
 'NMV': {0: 2509.580375, 1: 8910.447587},
 '# Items': {0: 169, 1: 778},
 '# Orders': {0: 15, 1: 135}})

buying = Pd.DataFrame({'Supplier Name': {0: 'A', 1: 'A'},
 'Brand Name': {0: 1, 1: 2},
 '# SKU': {0: 506, 1: 267},
 '# Item Before Return': {0: 5663, 1: 3256},
 '# Item Returned': {0: 2776, 1: 1395},
 'Margin %': {0: 0.266922793, 1: 0.282847894},
 'GMV': {0: 191686.749171408, 1: 115560.037075292}})

Using SQL or Pandas, please tell me how to

1. Compare the monthly sales (GMV) trend in Q4 2019, across all countries (venture_code)

2. Show the top 10 brands for each product category, based on total sales (GMV)

I wrote but got the query wrong!

 SELECT category, SUM(GMV) as Total_Sales FROM products INNER JOIN buying ON products.brand = buying.[Brand Name]

Advertisement

Answer

Concerning the error, you have a space in the column name.

In SQL, if the column has a space, use brackets to wrap the column name:

MyTable.[My Column]

In your code, use this SQL:

SELECT category, SUM(GMV) as Total_Sales FROM products INNER JOIN buying ON products.brand = buying.[Brand Name]

I don’t have access to your data, so I can’t test, but I think these queries are correct. You may need to tweak them some.

Part 1:

select c.venture_code, sum(b.GMV) GMVSum from customers c join orders o on c.customer_id = o.customer_id 
   join products p on o.skuv=p.sku
   join buying b on p.brand = b.[Brand Name] and p.supplier = b.[Supplier Name]
   where o.date >= '2019-10-01' and o.date <= '2019-12-31'  -- 2019 4th qtr
   group by c.venture_code

Part 2:

select * from 
(select *, RANK() over (PARTITION BY category,brand order by GMV) rk from
(select p.brand, p.category, b.GMV from products p join buying b on p.brand = b.[Brand Name] and p.supplier = b.[Supplier Name]) x) xx
where rk <= 10
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement