Skip to content
Advertisement

Issues with GROUP BY and NULL (SQL / BigQuery)

I have some bad data that I need to transform a bit and am struggling to do.

Below is example data and desired output.

What I’m hoping for is if PUNCHIN is NULL, but there is another value of PUNCHIN for that EmployeeNum and APPLY_DATE, then use the other value. Otherwise, use SHIFTSTARTTIME. And the same goes for PUNCHOUT and SHIFTENDTIME.

Originally I thought to use MIN(PUNCHIN) and GROUP BY but there can be multiple entries for EmployeeNum and APPLY_DATE, as seen for EmployeeNum 456

| EmployeeNum |     APPLY_DATE      |       PUNCHIN       |      PUNCHOUT       |   SHIFTSTARTTIME    |    SHIFTENDTIME     |
+-------------+---------------------+---------------------+---------------------+---------------------+---------------------+
|         123 | 2019-06-29T00:00:00 | null                | 2019-06-30T00:55:00 | 2019-06-29T17:00:00 | 2019-06-30T01:00:00 |
|         123 | 2019-06-29T00:00:00 | 2019-06-29T16:54:00 | null                | 2019-06-29T17:00:00 | 2019-06-30T01:00:00 |
|         456 | 2019-12-09T00:00:00 | 2019-12-09T16:15:00 | 2019-12-09T21:15:00 | 2019-12-09T16:15:00 | 2019-12-09T21:15:00 |
|         456 | 2019-12-09T00:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 |
|         789 | 2019-12-22T00:00:00 | null                | null                | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 |
+-------------+---------------------+---------------------+---------------------+---------------------+---------------------+ ```

And the output I'm looking for is:

```+-------------+---------------------+---------------------+---------------------+---------------------+---------------------+
| EmployeeNum |     APPLY_DATE      |       PUNCHIN       |      PUNCHOUT       |     SHIFTSTART      |      SHIFTEND       |
+-------------+---------------------+---------------------+---------------------+---------------------+---------------------+
|         123 | 2019-06-29T00:00:00 | 2019-06-29T16:54:00 | 2019-06-30T00:55:00 | 2019-06-29T17:00:00 | 2019-06-30T01:00:00 |
|         456 | 2019-12-09T00:00:00 | 2019-12-09T16:15:00 | 2019-12-09T21:15:00 | 2019-12-09T16:15:00 | 2019-12-09T21:15:00 |
|         456 | 2019-12-09T00:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 |
|         789 | 2019-12-22T00:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 | 2019-12-09T22:00:00 | 2019-12-10T02:00:00 |
+-------------+---------------------+---------------------+---------------------+---------------------+---------------------+```

Advertisement

Answer

Hmmm . . . I think this does what you want:

select distinct employeenum, apply_date,
       coalesce(punchin, max(punchin) over (partition by employeenum, apply_date), apply_date),
       coalesce(punchout, max(punchout) over (partition by employeenum, apply_date), apply_date)
from t;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement