Skip to content
Advertisement

How to remove duplicates in CTE from VARIABLES and TEMPORARY TABLE

I was looking for this solution in internet and none of them worked properly. I decided to create variables, then accommodates it in the user defined function:

CREATE FUNCTION Top10CustomerByCategoryInYear
(
    @Category varchar(MAX),
    @StartYear int,
    @EndYear int
)
RETURNS @CustomerTop10TableByCategoryInYears TABLE
(
    [Customer Name]  VARCHAR(MAX),
    [Category] VARCHAR(MAX),
    Sales int,
    Year int
)
AS
BEGIN 
    INSERT INTO @CustomerTop10TableByCategoryInYears
    SELECT 
        DISTINCT TOP 10 WITH TIES
        [Customer Name], 
        [Category], 
        SUM(Sales) OVER (PARTITION BY [Customer Name]) as Sales,
        YearOfOrderDate as [Year]
    FROM [Project4].[dbo].[SuperStore]
    WHERE [Category] = @Category AND YearOfOrderDate BETWEEN @StartYear AND @EndYear
    ORDER BY Sales DESC 
    RETURN;
END
GO

It works properly, because for example I am able to return the top 10 Customer in the Furniture Category in 2011 and 2012

SELECT *
FROM Top10CustomerByCategoryInYear('Furniture',2011,2011)
SELECT * 
FROM Top10CustomerByCategoryInYear('Furniture',2012,2012)

The result The result

Then, I would like to have all records (2011-2014) in one Table. I decided to use CTE, then JOIN these Tables by Category:

WITH Furniture2011 AS (
    SELECT DISTINCT *
    FROM Top10CustomerByCategoryInYear('Furniture',2011,2011)
), Furniture2012 AS (
SELECT DISTINCT *
FROM Top10CustomerByCategoryInYear('Furniture',2012,2012)
), Furniture2013 AS (
SELECT DISTINCT *
FROM Top10CustomerByCategoryInYear('Furniture',2013,2013)
),  Furniture2014 AS  (
SELECT DISTINCT *
FROM Top10CustomerByCategoryInYear('Furniture',2014,2014)
)
SELECT  DISTINCT *
FROM Furniture2011 F1
        JOIN Furniture2012 F2
        ON F1.Category = F2.Category
        JOIN Furniture2013 F3
        ON F1.Category = F3.Category
        JOIN Furniture2014 F4
        ON F1.Category = F4.Category

Unfortunately there are a lot of duplicates:

[Duplicates3

I tried a lot of solutions from my mind and internet and unfortunately these did not work. I was wondering if it is possible to combine these Tables using UDF/Variables, TEMP TABLE and finally CTE. Could you please tell me if it is possible? If so, would you be so kind as to support me in modifying my codes in SQL to have the desired result without any duplicates?

I would appreciate it if you could advise me on it.

Advertisement

Answer

You can have an inline Table function taking a start and end year. Inline is faster than multi-statement.

Start with a virtual table of years, then CROSS APPLY the top 10 results for each year.

CREATE FUNCTION Top10CustomerByCategoryInYear
(
    @Category varchar(MAX),
    @StartYear int,
    @EndYear int
)
RETURNS TABLE
AS RETURN

    WITH L0 AS (
        SELECT @StartYear - 1 + ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS Year
            FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1),
                        (1),(1),(1),(1),(1),(1),(1),(1)) AS D(c)
    ),
    Years AS (
        SELECT * FROM L0 WHERE Year <= @EndYear
    )
    SELECT
        s.[Customer Name], 
        @Category AS [Category], 
        s.Sales,
        y.Year
    FROM Years y
    CROSS APPLY (
        SELECT TOP 10 WITH TIES
            [Customer Name],
            SUM(Sales) AS Sales
        FROM [dbo].[SuperStore] s
        WHERE [Category] = @Category
          AND y.Year = s.YearOfOrderDate
        GROUP BY [Customer Name]
        ORDER BY Sales DESC
    ) s;

GO
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement