Estimated number of rows is way off in execution plan

Question

I have a situation where the estimated number of rows in the execution plan is way off My columns in the join are varchar(50). I have tried different indexes but it does not reduce this problem. I ...

Accepted Answer

Regarding the code &#8211; it appears that the difference between the larger (original) version and your simpler GROUP BY version is that the original finds the minimum profilecreateddate for anyone in that household, whereas your simpler version finds the profilecreateddate for the specific primarycustomerid.For example (using simpler data)CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES(1, 1, 1, '20201001'),(1, 1, 1, '20201002'),(1, 1, 2, '20201003');SELECT DISTINCT householdnumber, householdid, primaryCustomerIDINTO #HouseholdsFROM #TableA;SELECT     A.*,    MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE INTO #Profilefrom #Households AS a     LEFT JOIN #TableA AS B        ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;SELECT * FROM #Profile;/* -- Resultshouseholdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE1                1            1                  2020-10-01 00:00:00.0001                1            2                  2020-10-01 00:00:00.000*/SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE INTO #Profile2from #TableAGROUP BY householdnumber, householdid, primaryCustomerID;SELECT * FROM #Profile2;/* -- Resultshouseholdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE1                1            1                  2020-10-01 00:00:00.0001                1            2                  2020-10-03 00:00:00.000*/If you notice in the above, the PROFILECREATEDATE for row 2 is different.You could therefore try the following code that should give the same results as the original set &#8211; see how that goes for time (and confirm it matches the original results).SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,         MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE INTO #Profile3FROM #TableA t1;SELECT * FROM #Profile3;/* -- Resultshouseholdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE1                1            1                  2020-10-01 00:00:00.0001                1            2                  2020-10-01 00:00:00.000*/

Advertisement

Answer