Skip to content
Advertisement

How to merge rows based on date/time

Microsoft SQL Server Management Studio 14.0.17224.0

I have a dataset where users perform tasks for clients. Sometimes tasks can be done at the same time and sometimes not. I need to merge records that have the same time or the time difference between tasks is not more than 2 minutes, showing only the earliest time when the first task was performed and concatenate names of all tasks that are within 2 minutes time criteria.

enter image description here

I’ve created a code that merges records but I don’t know how to integrate “2 minutes criteria”, so currently all tasks performed by a user for the same customer are merged together and also cannot sort tasks in alphabetical order.

Here is the code:

WITH final AS
( 
    SELECT
      CustomerID
      , Date
      , UserID
      , STUFF((SELECT '; ' + data2.[Task]
          FROM 
          #data data2
          WHERE data2.CustomerID  = data1.CustomerID AND data2.UserID = data1.UserID
          ORDER BY data2.[Task] -- sorting doesn't work!
          FOR XML PATH('')), 1, 1, '') [All_Tasks]
    FROM #data data1
)
SELECT
  CustomerID
  , Date = MIN(Date)
  , UserID
  , [All Tasks]
FROM final
--WHERE -- I assume "2 minutes" criteria should be somewhere here or I might need to use UNION ALL somewhere above...
GROUP BY 
  CustomerID
  , UserID
  , [All_Tasks]
ORDER BY  
  CustomerID
  , Date
;

The final result should look like this:

enter image description here

Greatly appreciate any help!

Advertisement

Answer

This will help,

select CustomerID,UserID,min([Date]) [Date],STRING_AGG( TaskDSC,',') TaskDSC
from #data c
group by CustomerID,UserID,
            DATEPART(YEAR, [Date]),
            DATEPART(MONTH, [Date]),
            DATEPART(DAY, [Date]),
            DATEPART(HOUR,[Date]),
            (DATEPART(MINUTE, [Date]) / 3)
            order by  UserID

And if you dont have STRING_AGG, check the below:-

select CustomerID,UserID,min([Date]) [Date], COUNT(*) [TaskCount]
--,STRING_AGG( TaskDSC,',') TaskDSC,
, stuff((select ',' + d.TaskDSC from #data d 
    where d.CustomerID=c.CustomerID and d.UserID=c.UserID and d.Date between MIN(c.date) and max(c.date)
      for xml path('')), 1, 1, '') [TaskDSC]
from #data c
group by CustomerID,UserID,
            DATEPART(YEAR, [Date]),
            DATEPART(MONTH, [Date]),
            DATEPART(DAY, [Date]),
            DATEPART(HOUR,[Date]),
            (DATEPART(MINUTE, [Date]) / 3)
            order by  UserID

My results is as below:-

CustomerID  UserID  Date                    TaskCount   TaskDSC
91237531    ID1     2019-01-02 08:38:00.000 2           T001,T002
91256319    ID2     2019-01-02 14:15:00.000 3           T0001,T0002,T0003
91089280    ID3     2019-01-02 06:01:00.000 1           T1
91268324    ID4     2019-01-02 00:29:00.000 2           T101,T102
91268324    ID4     2019-01-02 00:35:00.000 1           T103
91268351    ID5     2019-01-02 16:56:00.000 1           T11111
91268351    ID5     2019-01-02 17:03:00.000 1           T22212

To group by data with a specific time periods we can use a group by with several Dateparts, for example, if we needed to group our data for each 5 years , we will use:

Group By DATEPART(YEAR, [Date])/5

As this will group by each 5 years, simply sql is deviding the result of the date to first get the year part, and then devide it by 5, resulting in the same number for each 5 years, so the data will be grouped for each five years. to see the result of this you can select the value to understand what sql server is seeing.

No if the period is in months lets say each two months, I think we can use the below:-

Group By DATEPART(YEAR, [Date]),DATEPART(MONTH, [Date])/2

In our requirement we needed each 2 minutes, so we need to add parts till the minuses, (there are other ways too, but this is simple)

DATEPART(YEAR, [Date]),
                DATEPART(MONTH, [Date]),
                DATEPART(DAY, [Date]),
                DATEPART(HOUR,[Date]),
                (DATEPART(MINUTE, [Date]) / 2)

but for the result provided , we used each 3, so the data will match the expected result, so really the solution is grouping each 3 minutes not two.

if we need each 15 sec, we simply add the (DATEPART(Second, [Date]) / 15) and so on.

If the time slots dont serve your solution, we can use a complex recursive function that will look at the first date a task accured and check exactly all the tasks within 2 min, then start with the one after it and go from there , please check the below for a more accurate solution for your question.

; with prev as (
        select c.CustomerID,c.UserID,c.[Date],1 [Depth] from #data c
        union all 
        select c.CustomerID,c.UserID,c.[Date],prev.depth+1 [Depth] from #data c
        inner join prev on prev.CustomerID=c.CustomerID and prev.UserID=c.UserID and c.[Date]>DATEADD(MINUTE,2,prev.[Date])
    )

    select prev.CustomerID,prev.UserID,MIN(prev.Date) [date]
         , stuff((select ',' + d.TaskDSC from #data d 
             where d.CustomerID=prev.CustomerID and d.UserID=prev.UserID and d.[Date] between MIN(prev.[Date]) and DATEADD(MINUTE,2,MIN(prev.[Date]))
            for xml path('')), 1, 1, '') [TaskDSC],
         , (select count(*) from #data d 
               where d.CustomerID=prev.CustomerID 
                  and d.UserID=prev.UserID 
                  and d.[Date] between MIN(prev.[Date]) and DATEADD(MINUTE,2,MIN(prev.[Date]))) [TaskCount]
     from prev 
     group by prev.CustomerID,prev.UserID,depth
     order by prev.UserID,prev.CustomerID

I think this will solve all your concerns.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement