Skip to content
Advertisement

Is it possible to remove duplicates from the result for the data set?

I have two following tables, dim_customers and fact_daily_customer_shipments:

Join them to get a table of the following schema:

Example results:

SQL I came up with,

This SQL doesn’t make sense because I see duplicates in both the tables. Joining the table on key attributes customer_id is yielding duplicates.

Any thoughts what would the correct SQL approach would be?

Advertisement

Answer

The reason you are having issues with duplication is that you have two entries in the dim_customers table with the same customer_id value (but different membership dates). What this means is that you need to change the JOIN condition to include the membership_dates. By then changing to a LEFT JOIN, we can determine whether a customer was a member at the time by whether the customer_id value from the JOIN is NULL. So the query you should use is:

Output:

SQLFiddle Demo

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement