How to write a complex SQL Query?
Below information is the old data of users working on different services with service status and last change date on particular Service_Area and Service_Sector. All users are working on multiple services where some of the services are still active for a particular Service_Area and Service_Sector.
Requirements in words:
I require to select all columns when the Service of the user is inactive and only record on that particular Service_Area and Service_Sector of that user.
As earlier mentioned that a user can have multiple services on a Service_Area and Service_Sector combined. In case if there are more than one services on Service_Area and Service_Sector for the user, then we need to select the record that was last changed if all the services are inactive (I mean, when there is a record in between whose service status is active, then we cant select any user data from that Service_Area and Service_Sector of that user).
Scenarios:
Case 1: When there are 5-services, all of them are inactive then, we need to select the last changed record.
Case 2: When there is at least one service with active status among 5-services then, we cant select that user data from that particular Service_Area and Service_Sector of that user.
Case 3 When there is only one service on that Service_Area and Service_Sector of that user, it’s inactive then, the record will also get selected.
About Table:
Location_Code, User_Id, Service_Area, Service_Sector, Service will always make a unique combination. And there may be many services on this combination Location_Code, User_Id, Service_Area, Service_Sector.
Please find the sample table below with the result table (display expected output).
Sample Table:
Loc_Code | User_Id | Service_Area | Service_Sector | Service | Status | Last_Changed | |
---|---|---|---|---|---|---|---|
101 | 1001 | C | 1 | PAINT | INACTIVE | 11/28/2020 | |
101 | 1002 | A | 1 | WOOD | INACTIVE | 12/7/2020 | |
101 | 1002 | A | 1 | CLEANING | ACTIVE | 11/23/2020 | |
101 | 1002 | A | 1 | PEST | INACTIVE | 12/7/2020 | |
101 | 1002 | A | 2 | LIGHT | INACTIVE | 12/7/2020 | ` |
101 | 1002 | B | 2 | AC | INACTIVE | 11/28/2020 | |
101 | 1002 | B | 2 | HEATER | INACTIVE | 11/30/2020 |
Result Table:
Total 3-records are selected.
- The first record is selected as the user was working on a single inactive service on a particular Service_Area and Service_Sector.
- The second record is selected because Location_Code, User_Id, Service_Area, Service_Sector combination has only one service on it, which is inactive.
- The third record selected because Location_Code, User_Id, Service_Area, Service_Sector combination list 2-inactive services, selected the service which was last changed.
Loc_Code | User_Id | Service_Area | Service_Sector | Service | Status | Last_Changed |
---|---|---|---|---|---|---|
101 | 1001 | c | 1 | PAINT | INACTIVE | 11/28/2020 |
101 | 1002 | A | 2 | LIGHT | INACTIVE | 12/7/2020 |
101 | 1002 | B | 2 | HEATER | INACTIVE | 11/30/2020 |
Advertisement
Answer
Test
WITH cte AS ( SELECT *, !SUM(status != 'INACTIVE') OVER (PARTITION BY loc_code, user_id, service_area, service_sector) only_inactive, ROW_NUMBER() OVER (PARTITION BY loc_code, user_id, service_area, service_sector ORDER BY last_changed DESC) rn FROM test ) SELECT * FROM cte WHERE only_inactive AND rn = 1
how this query is working?
WITH is a clause which creates… think that this is a subquery.
What I do in it? I calculate 2 additional columns. Both uses window functions – it is similar to aggregate function, but with its own grouping and ordering.
First summarises the result of checking does a column not contain specified value. For 'INACTIVE'
the checking result is FALSE which is treated as zero, for all another it is TRUE and 1 respectively. So if none 'INACTIVE'
in this group the sum is zero, and above zero if at least one another value meet. NOT inverts this result.
Second enumerates rows in a group by specified ordering.
In outer query I simply select rows which have non-zero checking result (i.e. the rows from groups which have no another statuses than ‘INACTIVE’), and from these rows only the row which number is 1 (taking into account the ordering – the most recent row).
That’s all.