I’m sure this is very basic, but I’ve hit a brick wall. First post here, long time reader.
I’ve got two tables, one is a list of customers, one is a list of their activity by month. For some unknown reason, some of the data in the activity column is duplicated and it’s causing me errors when calculating totals. Is there a way for me to only join the data in the activity table once?
E.g the customer table:
CustID | AccountNo |
---|---|
331 | AccountNo1 |
332 | AccountNo2 |
333 | AccountNo2 |
334 | AccountNo2 |
The activity table looks like this:
CustID | ActivityID | Month | Sales |
---|---|---|---|
331 | 331133 | Jan-21 | £30 |
332 | 331101 | Jan-21 | £10 |
332 | 331121 | Feb-21 | £40 |
332 | 331196 | Feb-21 | £40 |
332 | 331141 | Mar-21 | £40 |
333 | 331120 | Feb-21 | £20 |
334 | 331119 | Mar-21 | £20 |
You will see there are two rows for customer 332 for Feb-21 with the same sales amount. Total sales for customer 332 are £90 not £130. How do I tell the JOIN to ignore multiple rows for a particular month if duplicated?
At the moment I’m doing a basic:
JOIN Activity AC (NOLOCK) ON C.CUSTID = AC.CUSTID
Obviously moving forward the activity table needs fixing but I’m hoping for a quick win!
Sorry if this is basic! Mark
Advertisement
Answer
One method is:
select c.*, a.* from customer c left join (select a.*, row_number() over (partition by custid, month order by amount) as seqnum from activity a ) a on c.custid = a.custid and c.seqnum = 1;
It is unclear if the amounts need to be the same for the duplication. If so, include amount
in the partition by
.