I have 4 tables that I want to link together based on the information in the sales_details table which will serve as the base table (limiting results based on the sales_details table)
1)I want to pull all columns from the sales_detail table that are mastercard orders but since the sales_detail table doesn’t have a column to identity the type of transaction it is, I have to:
- link the sales_detail table to the sales_total table by transaction_ID to get the pay_id
- use the pay_id from the sales_total table and link to payment type table in order to filter for payment type
- link product_id from sales_detail table to product_detail table to get product name
BASE TABLE : sales_detail +-------+-----------+-----------+-----------------------------------------+ | order_date | transaction_id| product_cost | product_id | country +-------+-----------+-----------+------------------------------------------+ | 10/1 | 12345 | 20 | 87956666 | usa | 10/1 | 12345 | 50 | 63333333 | usa | 10/5 | 82456 | 50 | 63333333 | usa | 10/9 | 64789 | 30 | 45665466 | canada | 10/12 | 08546 | 19 | 78979879 | usa | 10/15 | 87988 | 19 | 78979879 | usa | 10/17 | 65898 | 50 | 63333333 | canada +-------+-----------+-----------+-------------------------------------+
table : payment_type +-------+-----------+-----------+-------------------+ | pay_id | pay_type| payment | phone_number +-------+-----------+-----------+-------------------+ | 08585 | 24 | mastercard |214-444-1234 | | 07895 | 35 | visa |555-111-1234 | | 08585 | 24 | mastercard |214-444-1234 | | 08657 | 35 | visa |817-333-1234 | | 02345 | 24 | mastercard |214-555-1234 | | 02245 | 35 | visa |888-555-1234 | | 08785 | 24 | mastercard |240-555-1234 | | 06587 | 24 | mastercard |240-555-1234 | +-------+-----------+-----------+-------------------+
table : sales_total +-------+-----------+-----------+----------+ | pay_id | transaction_id| unit sold | +-------+-----------+-----------+----------+ | 08585 | 12345 | 2 | | 07895 | 82456 | 1 | | 08657 | 64789 | 1 | | 04568 | 32145 | 3 | | 02345 | 08546 | 1 | | 08785 | 87988 | 1 | | 06587 | 65898 | 1 | +-------+-----------+-----------+-----------+
table : product_detail +-------+-----------+-----------+--+ | product_name | product id +-------+-----------+-----------+--+ popcorn | 87956666 cheetos | 63333333 soda | 93333333 milk | 45665466 | water | 78979879 +-------+-----------+-----------+--+
**I want the output to look something like this: **
IDEAL OUTPUT +-------+-----------+-----------+--------------------------------------------+ | order_date | transaction_id| product_cost | product_id | product | +-------+-----------+-----------+--------------------------------------------+ | 10/1 | 12345 | 20 | 87956666 | popcorn | | 10/1 | 12345 | 50 | 63333333 | cheetos | | 10/12 | 08546 | 19 | 78979879 | water | | 10/15 | 87988 | 19 | 93333333 | soda | +-------+-----------+-----------+--------------------------------------------+
im trying to get all orders from the sales_detail table that have paid by mastercard and from the usa. I tried using left joins and not only it takes forever because of the large tables (doesnt even load) and it also gives me duplicates. What am i doing wrong?
This is the code i used but failed as it took over 20+hrs and timed out:
select t1.order_date, t1.transaction_id, t1.product_cost, t1.product_id t4.product_name from sales_detail t1 left join sales_total t2 on t1.transaction_id=t2.transaction_id left join payment_type t3 on t2.pay_id=t3.pay_id left join product_detail t4 on t1.product_id=t4.product_id where t1.order_date between '2020-10-1' and'2020-12-30' and t1.country not in ('canada') and t3.pay_type= 24;
there is no output for this code as after 20hrs it could never finish 🙁
Thanks in advance! I am a beginner so still learning the ins and outs of sql! (am using hive)
Advertisement
Answer
There is nothing seems to be wrong with your query and the reason you failed could be due to huge data present in sales table I would presume.
How ever lets give it a try by considering few points, as You only need columns from t1 and t4 , I would go for exists rather joining the tables without taking any columns to the select clause and second point , we can do a inner join between t2 and t3 considering t2 will never have a record with value null in pay_id column
select t1.order_date,t1.transaction_id,t1.product_cost,t1.product_id,t4.product_name from sales_detail t1 left join product_detail t4 on t1.product_id = t4.product_id where t1.order_date between '2020-10-1' and'2020-12-30' and t1.country not in ('canada') and exists (select 1 from sales_total t2 join payment_type t3 on t2.pay_id = t3.pay_id where t1.transaction_id = t2.transaction_id and t3.pay_type= 24);
If you still face performance issue try creating index on “transaction_id” and “product_id “.
I hope this help you taking a step forward. Thanks.