Skip to content
Advertisement

Duplicates in pre-aggregated sub-query sql

I have two tables with many-to-many relationship. I am trying to get values from both of the table where UserId is unique (I’m joining these table on this value)

I am rying to use pre aggregated query, but I get error

Column 'clv.ProbabilityAlive' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I understand that I should add these all values to group by clause, but then I am getting duplicates because peakClv values repeat.

If i am using simple join then it takes forever because of many to many relationship.

this is my query:

SELECT 
distinct(s.userid) as userId,
s.ProbabilityAlive AS ProbabilityAlive,
a.PeakClv as PeakClv
FROM (
SELECT [UserId], ([sb].[ProbabilityAlive]) AS ProbabilityAlive
FROM clv as  sb  
WHERE sb.[CalculationDate] = '20200311'
GROUP BY [UserId]
) s
LEFT JOIN (
SELECT [UserId], PeakClv
FROM [dbo].[AdditionalClvData] where peakClv > 1
GROUP BY [UserId]
) a ON a.[UserId] = s.[UserId]

I am a bit out of ideas could someone lend a hand?

I also tried using distinct like one answer suggested:

SELECT 
distinct (s.userid) as userId,
s.ProbabilityAlive AS ProbabilityAlive,
a.PeakClv as PeakClv
FROM (
SELECT DISTINCT ([UserId]), ([sb].[ProbabilityAlive]) AS 
ProbabilityAlive
FROM clv as  sb  
    WHERE  sb.[CalculationDate] = '10/09/2020 00:00:00' AND  sb. 
 [EstimatedNumberOfTransactionsLong] >= 0 AND  sb. 
 [EstimatedNumberOfTransactionsLong] <= 5680 AND  sb.[ClientId] = '16' 
 AND  sb.[Product] = 'Total'
 ORDER BY sb.[userId] asc OFFSET (1 - 1) * 10 ROWS FETCH NEXT 10 ROWS 
 ONLY  
 ) s
 LEFT JOIN (
 SELECT DISTINCT [UserId], PeakClv
 FROM [dbo].[AdditionalClvData]
 ) a ON a.[UserId] = s.[UserId]

but I still get duplicates: enter image description here

Advertisement

Answer

If you have not aggregation function like SUM(), MAX() .. you can’t use GROUP BY

SELECT 
distinct s.userid as userId,
s.ProbabilityAlive AS ProbabilityAlive,
a.PeakClv as PeakClv
FROM (
    SELECT DISTINCT [UserId], ([sb].[ProbabilityAlive]) AS ProbabilityAlive
    FROM clv as  sb  
    WHERE sb.[CalculationDate] = '20200311'

) s
LEFT JOIN (
SELECT DISTINCT  [UserId], PeakClv
FROM [dbo].[AdditionalClvData] where peakClv > 1

) a ON a.[UserId] = s.[UserId]

if you need distinct (not repeated rows) use distinct

but looking to you img seems you need an aggregation function on PeakClv eg max() and group by

SELECT 
 s.userid as userId,
s.ProbabilityAlive AS ProbabilityAlive,
max(a.PeakClv) as PeakClv
FROM (
    SELECT DISTINCT [UserId], ([sb].[ProbabilityAlive]) AS ProbabilityAlive
    FROM clv as  sb  
    WHERE sb.[CalculationDate] = '20200311'

) s
LEFT JOIN (
SELECT DISTINCT  [UserId], PeakClv
FROM [dbo].[AdditionalClvData] where peakClv > 1

) a ON a.[UserId] = s.[UserId]
GROUP BY s.userid, 
s.ProbabilityAlive
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement