How can I UPDATE a table based on another table, using values from groups of rows?

Question

I have two tables: I would like to update values in PTO&#8217;s WorkTypeId column: EMP NO in Employee (the lookup table) and PTO should match. A single WorkTypeId value should be picked from only the first occurrence of the month. For example, given this sample input data: TABLE Employee: Site WorkTypeId Emp_…

Accepted Answer

Getting a value from a column different to the column used in a MIN/MAX expression in a GROUP BY query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they&#8217;re completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.Anyway, as-ever in modern SQL, there&#8217;s usually 3 or 4 different ways to accomplish the same task, with a few gotchas.Preface:As Site, Date, Year, and Month are all keywords in T-SQL, I&#8217;ve escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.SQL Server supports this by default. If (for some ungodly reason) you have SET QUOTED IDENTIFIER OFF then change the double-quotes to square-brackets: []I assume that the Site column in both tables is just a plain&#8217; ol&#8217; data column, as such:It is not a PRIMARY KEY member column.It should not be used as a GROUP BY.It should not be used in a JOIN predicate.All of the approaches below assume this database state:CREATE TABLE "Employee" (    "Site"     int      NOT NULL,    WorkTypeId char(2)  NOT NULL,    Emp_NO     int      NOT NULL,    "Date"     date     NOT NULL  );CREATE TABLE "PTO" (    "Site"     int      NOT NULL,    WorkTypeId char(2)      NULL,    Emp_NO     int      NOT NULL,    "Date"     date     NOT NULL  );GOINSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )VALUES( 5015, 'MB', 1005, '2022-02-01' ),( 5015, 'MI', 1005, '2022-02-04' ),( 5015, 'PO', 1005, '2022-02-04' ),( 5015, 'ME', 2003, '2022-01-01' ),( 5015, 'TT', 2003, '2022-01-10' );INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )VALUES( 5015, NULL, 1005, '2022-02-03' ),( 5015, NULL, 1005, '2022-02-14' ),( 5014, NULL, 2003, '2022-01-09' );Both approaches define CTEs e and p that extend Employee and PTO respectively to add computed "Year" and "Month" columns, which avoids having to repeatedly use YEAR( "Date" ) AS "Year" in GROUP BY and JOIN expressions.I suggest you add those as computed-columns in your base tables, if you&#8217;re able, as they&#8217;ll be useful generally anyway. Don&#8217;t forget to index them appropriately too.Approach 1: Composed CTEs with elementary aggregates, then UPDATE:WITH-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):e AS (    SELECT        Emp_No,        "Site",        WorkTypeId,        "Date",        YEAR( "Date" ) AS "Year",        MONTH( "Date" ) AS "Month"    FROM        Employee),p AS (    SELECT        Emp_No,        "Site",        WorkTypeId,        "Date",        YEAR( "Date" ) AS "Year",        MONTH( "Date" ) AS "Month"    FROM        PTO),-- Step 2: Get the MIN( "Date" ) value for each group:minDatesForEachEmployeeMonthYearGroup AS (    SELECT        e.Emp_No,        e."Year",        e."Month",        MIN( "Date" ) AS "FirstDate"    FROM        e    GROUP BY        e.Emp_No,        e."Year",        e."Month"),-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:firstWorkTypeIdForEachEmployeeMonthYearGroup AS (    /* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */    SELECT        e.Emp_No,        e."Year",        e."Month",        e.WorkTypeId AS FirstWorkTypeId    FROM        e        INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON            e.Emp_NO = q.Emp_NO            AND            e."Date" = q.FirstDate)-- Step 4: Do the UPDATE.-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).UPDATE    pSET    p.WorkTypeId = f.FirstWorkTypeIdFROM    p    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON        p.Emp_No = f.Emp_No        AND        p."Year" = f."Year"        AND        p."Month" = f."Month"WHERE    p.WorkTypeId IS NULL;Here&#8217;s a screenshot of SSMS showing the contents of the PTO table from before, and after, the above query runs:Approach 2: Skip the self-JOIN with FIRST_VALUE:This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).Surprisingly, you cannot use FIRST_VALUE in a GROUP BY query, despite its obvious similarities with MIN, but an equivalent query can be built with SELECT DISTINCT:WITH-- Step 1: Extend the `Employee` table with YEAR and MONTH columns:e AS (    SELECT        Emp_No,        "Site",        WorkTypeId,        "Date",        YEAR( "Date" ) AS "Year",        MONTH( "Date" ) AS "Month"    FROM        Employee),firstWorkTypeIdForEachEmployeeMonthYearGroup AS (    SELECT        DISTINCT        e.Emp_No,        e."Year",        e."Month",        FIRST_VALUE( WorkTypeId ) OVER (            PARTITION BY                Emp_No,                e."Year",                e."Month"            ORDER BY                "Date" ASC        ) AS FirstWorkTypeId    FROM        e)-- Step 3: UPDATE PTO:UPDATE    pSET    p.WorkTypeId = f.FirstWorkTypeIdFROM    PTO AS p    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON        p.Emp_No = f.Emp_No        AND        YEAR( p."Date" ) = f."Year"        AND        MONTH( p."Date" ) = f."Month"WHERE    p.WorkTypeId IS NULL;Doing a SELECT * FROM PTO after this runs gives me the exact same output as Approach 2.Approach 2b, but made shorter:Just so @SOS doesn&#8217;t feel too smug about their SQL being considerably more shorter than mine ð, the Approach 2 SQL above can be compacted down to this:WITH empYrMoGroups AS (    SELECT        DISTINCT        e.Emp_No,        YEAR( e."Date" ) AS "Year",        MONTH( e."Date" ) AS "Month",        FIRST_VALUE( e.WorkTypeId ) OVER (            PARTITION BY                e.Emp_No,                YEAR( e."Date" ),                MONTH( e."Date" )            ORDER BY                e."Date" ASC        ) AS FirstWorkTypeId    FROM        Employee AS e)UPDATE    pSET    p.WorkTypeId = f.FirstWorkTypeIdFROM    PTO AS p    INNER JOIN empYrMoGroups AS f ON        p.Emp_No = f.Emp_No        AND        YEAR( p."Date" ) = f."Year"        AND        MONTH( p."Date" ) = f."Month"WHERE    p.WorkTypeId IS NULL;The execution-plans for both Approach 2 and Approach 2b are almost identical, excepting that Approach 2b has an extra Computed Scalar step for some reason.The execution plans for Approach 1 and Approach 2 are very different, however, with Approach 1 having more branches than Approach 2 despite their similar semantics.But my execution-plans won&#8217;t match yours because it&#8217;s very context-dependent, especially w.r.t. what indexes and PKs you have, and if there&#8217;s any other columns involved, etc.Approach 1&#8216;s plan looks like this:Approach 2b&#8216;s plan looks like this:@SOS&#8217;s plan, for comparison, is a lot simpler&#8230; and I honestly don&#8217;t know why, but it does show how good SQL Server&#8217;s query optimizer is thesedays:

How can I UPDATE a table based on another table, using values from groups of rows?

Advertisement

Answer

Preface:

Approach 1: Composed CTEs with elementary aggregates, then `UPDATE`:

Approach 2: Skip the self-`JOIN` with `FIRST_VALUE`:

Approach 2b, but made shorter:

Site	WorkTypeId	Emp_NO	Date
5015	MB	1005	2022-02-01
5015	MI	1005	2022-02-04
5015	PO	1005	2022-02-04
5015	ME	2003	2022-01-01
5015	TT	2003	2022-01-10

Site	WorkTypeId	Emp_NO	Date
5015	MB	1005	2022-02-03
5015	MB	1005	2022-02-14
5014	ME	2003	2022-01-09