Skip to content
Advertisement

How can I UPDATE a table based on another table, using values from groups of rows?

I have two tables:

I would like to update values in PTO‘s WorkTypeId column:

  1. EMP NO in Employee (the lookup table) and PTO should match.
  2. A single WorkTypeId value should be picked from only the first occurrence of the month.

For example, given this sample input data:

TABLE Employee:

Site WorkTypeId Emp_NO Date
5015 MB 1005 2022-02-01
5015 MI 1005 2022-02-04
5015 PO 1005 2022-02-04
5015 ME 2003 2022-01-01
5015 TT 2003 2022-01-10

TABLE PTO:

Site WorkTypeId Emp_NO Date
5015 1005 2022-02-03
5015 1005 2022-02-14
5014 2003 2022-01-09

For example:

  • Given Employee with Emp_NO = 1005
    • …there are 3 rows for that Emp_NO in the Employee table, with 3 distinct WorkTypeId values, but differing Date values.
    • So pick the WorkTypeId value for the earliest Date (2022-02-01), which is 'MB'
    • So Emp_NO gets WorkTypeId = 'MB'.
    • And use that single value to fill 1005‘s WorkTypeId cells in the PTO table.
    • But also match by month.

So the expected output in the PTO table is

Site WorkTypeId Emp_NO Date
5015 MB 1005 2022-02-03
5015 MB 1005 2022-02-14
5014 ME 2003 2022-01-09

Advertisement

Answer

Getting a value from a column different to the column used in a MIN/MAX expression in a GROUP BY query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they’re completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.

Anyway, as-ever in modern SQL, there’s usually 3 or 4 different ways to accomplish the same task, with a few gotchas.

Preface:

  • As Site, Date, Year, and Month are all keywords in T-SQL, I’ve escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.

    • SQL Server supports this by default. If (for some ungodly reason) you have SET QUOTED IDENTIFIER OFF then change the double-quotes to square-brackets: []
  • I assume that the Site column in both tables is just a plain’ ol’ data column, as such:

    • It is not a PRIMARY KEY member column.
    • It should not be used as a GROUP BY.
    • It should not be used in a JOIN predicate.
  • All of the approaches below assume this database state:

  • Both approaches define CTEs e and p that extend Employee and PTO respectively to add computed "Year" and "Month" columns, which avoids having to repeatedly use YEAR( "Date" ) AS "Year" in GROUP BY and JOIN expressions.
    • I suggest you add those as computed-columns in your base tables, if you’re able, as they’ll be useful generally anyway. Don’t forget to index them appropriately too.

Approach 1: Composed CTEs with elementary aggregates, then UPDATE:

Here’s a screenshot of SSMS showing the contents of the PTO table from before, and after, the above query runs:

enter image description here

Approach 2: Skip the self-JOIN with FIRST_VALUE:

This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).

Surprisingly, you cannot use FIRST_VALUE in a GROUP BY query, despite its obvious similarities with MIN, but an equivalent query can be built with SELECT DISTINCT:

Doing a SELECT * FROM PTO after this runs gives me the exact same output as Approach 2.

Approach 2b, but made shorter:

Just so @SOS doesn’t feel too smug about their SQL being considerably more shorter than mine 😉, the Approach 2 SQL above can be compacted down to this:

  • The execution-plans for both Approach 2 and Approach 2b are almost identical, excepting that Approach 2b has an extra Computed Scalar step for some reason.
  • The execution plans for Approach 1 and Approach 2 are very different, however, with Approach 1 having more branches than Approach 2 despite their similar semantics.
  • But my execution-plans won’t match yours because it’s very context-dependent, especially w.r.t. what indexes and PKs you have, and if there’s any other columns involved, etc.

Approach 1‘s plan looks like this:

enter image description here

Approach 2b‘s plan looks like this:

enter image description here

@SOS’s plan, for comparison, is a lot simpler… and I honestly don’t know why, but it does show how good SQL Server’s query optimizer is thesedays:

enter image description here

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement