Skip to content
Advertisement

SQL Join – Limit to base table

I have 4 tables that I want to link together based on the information in the sales_details table which will serve as the base table (limiting results based on the sales_details table)

1)I want to pull all columns from the sales_detail table that are mastercard orders but since the sales_detail table doesn’t have a column to identity the type of transaction it is, I have to:

  • link the sales_detail table to the sales_total table by transaction_ID to get the pay_id
  • use the pay_id from the sales_total table and link to payment type table in order to filter for payment type
  • link product_id from sales_detail table to product_detail table to get product name
BASE TABLE : sales_detail 
+-------+-----------+-----------+-----------------------------------------+
|   order_date | transaction_id|   product_cost |  product_id  |    country
+-------+-----------+-----------+------------------------------------------+
|   10/1     |   12345         |      20       |      87956666 |    usa
|   10/1     |   12345         |      50       |      63333333 |    usa
|   10/5     |  82456          |      50       |      63333333 |    usa
|   10/9     |  64789          |      30       |      45665466 |    canada
|   10/12    |  08546          |      19       |      78979879 |    usa
|   10/15    |  87988          |      19       |      78979879 |    usa
|   10/17    |  65898         |       50       |      63333333 |    canada
+-------+-----------+-----------+-------------------------------------+
table : payment_type
+-------+-----------+-----------+-------------------+
|   pay_id | pay_type|   payment  |  phone_number
+-------+-----------+-----------+-------------------+
|   08585  |   24    | mastercard   |214-444-1234    |
|   07895  |   35    | visa         |555-111-1234    |
|   08585  |   24    | mastercard   |214-444-1234    |
|   08657  |   35    | visa         |817-333-1234    |
|   02345  |   24    |  mastercard  |214-555-1234    |
|   02245  |   35   |  visa         |888-555-1234    |
|   08785  |   24   |  mastercard   |240-555-1234    |
|   06587  |   24   |  mastercard   |240-555-1234    |
+-------+-----------+-----------+-------------------+
table : sales_total
+-------+-----------+-----------+----------+
|   pay_id | transaction_id|   unit sold |  
+-------+-----------+-----------+----------+
|   08585  |   12345      |      2       |
|   07895  |   82456      |      1       |
|   08657  |   64789      |      1       |
|   04568  |   32145      |      3       |   
|   02345  |   08546      |      1       |
|   08785  |   87988      |      1       |
|   06587  |   65898      |      1       | 
+-------+-----------+-----------+-----------+
table : product_detail
+-------+-----------+-----------+--+
|   product_name    |  product id
+-------+-----------+-----------+--+
      popcorn      |      87956666
      cheetos      |      63333333
      soda         |      93333333
      milk         |      45665466
|     water        |      78979879
+-------+-----------+-----------+--+

**I want the output to look something like this: **

IDEAL OUTPUT
+-------+-----------+-----------+--------------------------------------------+
|   order_date | transaction_id|   product_cost |  product_id    |   product |
+-------+-----------+-----------+--------------------------------------------+
|   10/1     |   12345         |      20       |      87956666   |   popcorn |
|   10/1     |   12345         |      50       |      63333333   |   cheetos |
|   10/12    |   08546         |      19       |      78979879   |   water   |
|   10/15    |  87988          |      19       |      93333333   |   soda    |
+-------+-----------+-----------+--------------------------------------------+

im trying to get all orders from the sales_detail table that have paid by mastercard and from the usa. I tried using left joins and not only it takes forever because of the large tables (doesnt even load) and it also gives me duplicates. What am i doing wrong?

This is the code i used but failed as it took over 20+hrs and timed out:

select t1.order_date, 
t1.transaction_id,
t1.product_cost,
t1.product_id
t4.product_name
from sales_detail t1
left join sales_total t2 on t1.transaction_id=t2.transaction_id
left join payment_type t3 on t2.pay_id=t3.pay_id
left join product_detail t4 on t1.product_id=t4.product_id
where t1.order_date between '2020-10-1' and'2020-12-30'
and t1.country not in ('canada')
and t3.pay_type= 24;

there is no output for this code as after 20hrs it could never finish 🙁

Thanks in advance! I am a beginner so still learning the ins and outs of sql! (am using hive)

Advertisement

Answer

There is nothing seems to be wrong with your query and the reason you failed could be due to huge data present in sales table I would presume.

How ever lets give it a try by considering few points, as You only need columns from t1 and t4 , I would go for exists rather joining the tables without taking any columns to the select clause and second point , we can do a inner join between t2 and t3 considering t2 will never have a record with value null in pay_id column

select t1.order_date,t1.transaction_id,t1.product_cost,t1.product_id,t4.product_name 
from sales_detail t1  
left join product_detail t4 on t1.product_id = t4.product_id 
where t1.order_date between '2020-10-1' and'2020-12-30'
  and t1.country not in ('canada')
  and exists (select 1 
                from sales_total t2 
               join payment_type t3 on t2.pay_id = t3.pay_id
              where t1.transaction_id = t2.transaction_id
                and t3.pay_type= 24);

If you still face performance issue try creating index on “transaction_id” and “product_id “.

I hope this help you taking a step forward. Thanks.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement