I have the below tables:
1) My Company table
id | c_name | c_code | status
----+------------+----------+--------
1 | AAAAAAAAAA | AA1234 | Active
2) My User table
id | c_id | u_name | status | emp_id
----+------------+----------+--------+--------
1 | 1 | XXXXXXXX | Active | 1
2 | 1 | YYYYYYYY | Active | 2
3) My Attendance table
id | u_id | swipe_time | status
----+--------+------------------------+--------
1 | 1 | 2020-08-20 16:00:00 | IN
2 | 1 | 2020-08-20 20:00:00 | OUT
3 | 1 | 2020-08-20 21:00:00 | IN
4 | 1 | 2020-08-21 01:00:00 | OUT
5 | 1 | 2020-08-21 16:00:00 | IN
6 | 1 | 2020-08-21 19:00:00 | OUT
I need to calculate the attendance grouped by date, u_id like below:
Note: The query parameters would be the “From Date”, “To Date” and “Company ID”
u_id | u_name | date | in_time | out_time | hrs
-----+-----------+-------------+----------------------+----------------------+-----
1 | XXXXXXXX | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-21 01:00:00 | 7
1 | XXXXXXXX | 2020-08-21 | 2020-08-21 16:00:00 | 2020-08-21 19:00:00 | 4
2 | YYYYYYYY | null | null | null | 0
Is this possible in PostgreSQL?
Advertisement
Answer
The tricky part is to expand one row that covers two (calendar) days to two rows and allocating the hours of the “next” day correctly.
The first part is to get a pivot table that combines IN/OUT pairs into a single row.
A simple (yet not very efficient) approach is:
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
The next step is to break up the rows with more than one day into two rows.
This assumes that you never have an IN/OUT pair that covers more than two days!
with inout as (
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
), expanded as (
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
union all
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
)
select *
from expanded;
The above returns the following for your sample data:
u_id | date | time_in | time_out
-----+------------+---------------------+--------------------
1 | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-20 20:00:00
1 | 2020-08-20 | 2020-08-20 21:00:00 | 2020-08-21 00:00:00
1 | 2020-08-21 | 2020-08-21 00:00:00 | 2020-08-21 01:00:00
1 | 2020-08-21 | 2020-08-21 16:00:00 | 2020-08-21 19:00:00
How does this work?
So we first select all those rows that start and end on the same day with this part:
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
The second part of the union splits up the rows that span two days by using a cross join that generates one row with the original start time and midnight, and another from midnight to the original end time:
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
-- this generates a row for the first of the two days
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
-- this generates the row for the next day
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
At the end the new “expanded” rows are then aggregated by grouping them by user and date and left joined to the users
table to get the username as well.
with inout as (
select ain.u_id,
ain.swipe_time as time_in,
(select min(aout.swipe_time)
from attendance aout
where aout.u_id = ain.u_id
and aout.status = 'OUT'
and aout.swipe_time > ain.swipe_time) as time_out
from attendance ain
where ain.status = 'IN'
), expanded as (
select u_id,
time_in::date as "date",
time_in,
time_out
from inout
where time_in::date = time_out::date
union all
select i.u_id,
x.time_in::date as date,
x.time_in,
x.time_out
from inout i
cross join lateral (
select i.u_id,
i.time_in,
i.time_in::date + 1 as time_out
union all
select i.u_id,
i.time_out::date,
i.time_out
) x
where i.time_out::date > i.time_in::date
)
select u.id,
u.u_name,
e."date",
min(e.time_in) as time_in,
max(e.time_out) as time_out,
sum(e.time_out - e.time_in) as duration
from users u
left join expanded e on u.id = e.u_id
group by u.id, u.u_name, e."date"
order by u.id, e."date";
Which then results in:
u_id | date | time_in | time_out | duration
-----+------------+---------------------+---------------------+----------------------------------------------
1 | 2020-08-20 | 2020-08-20 16:00:00 | 2020-08-21 00:00:00 | 0 years 0 mons 0 days 7 hours 0 mins 0.0 secs
1 | 2020-08-21 | 2020-08-21 00:00:00 | 2020-08-21 19:00:00 | 0 years 0 mons 0 days 4 hours 0 mins 0.0 secs
The “duration” column is an interval
which you can format to your liking.