Skip to content
Advertisement

Estimated number of rows is way off in execution plan

I have a situation where the estimated number of rows in the execution plan is way off

enter image description here

My columns in the join are varchar(50). I have tried different indexes but it does not reduce this problem. I have even tried with an index on the temp table. What else can I do?

PS this is the first place where the estimated number starts to drift… Also the tables are not big (48000 rows).

The code is:

SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA

    
SELECT 
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE 
INTO #Profile
from #Households AS a 
LEFT JOIN TableA AS B
    
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;

I know it seems that this can be rewritten as:

SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE 
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;

But the results are not identical and I don’t want to change the results since I am not sure if the creator of this code knew what they were doing.

Some statistics on the columns: householdnumber is always equal to householdid. householdid is nvarchar(50) but householdnumber is varchar(40). The table has 48877 rows. Distinct combination of householdnumber, householdid, primaryCustomerID has 48029 rows. And distinct number of primaryCustomerID is 47152.

Advertisement

Answer

Regarding the code – it appears that the difference between the larger (original) version and your simpler GROUP BY version is that the original finds the minimum profilecreateddate for anyone in that household, whereas your simpler version finds the profilecreateddate for the specific primarycustomerid.

For example (using simpler data)

CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');

SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;

SELECT 
    A.*,
    MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE 
INTO #Profile
from #Households AS a 
    LEFT JOIN #TableA AS B  
      ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;

SELECT * FROM #Profile;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-01 00:00:00.000
*/

SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE 
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;

SELECT * FROM #Profile2;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-03 00:00:00.000
*/

If you notice in the above, the PROFILECREATEDATE for row 2 is different.

You could therefore try the following code that should give the same results as the original set – see how that goes for time (and confirm it matches the original results).

SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID, 
        MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE 
INTO #Profile3
FROM #TableA t1;

SELECT * FROM #Profile3;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-01 00:00:00.000
*/
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement