Microsoft SQL Server Management Studio 14.0.17224.0
I have a dataset where users perform tasks for clients. Sometimes tasks can be done at the same time and sometimes not. I need to merge records that have the same time or the time difference between tasks is not more than 2 minutes, showing only the earliest time when the first task was performed and concatenate names of all tasks that are within 2 minutes time criteria.
I’ve created a code that merges records but I don’t know how to integrate “2 minutes criteria”, so currently all tasks performed by a user for the same customer are merged together and also cannot sort tasks in alphabetical order.
Here is the code:
WITH final AS ( SELECT CustomerID , Date , UserID , STUFF((SELECT '; ' + data2.[Task] FROM #data data2 WHERE data2.CustomerID = data1.CustomerID AND data2.UserID = data1.UserID ORDER BY data2.[Task] -- sorting doesn't work! FOR XML PATH('')), 1, 1, '') [All_Tasks] FROM #data data1 ) SELECT CustomerID , Date = MIN(Date) , UserID , [All Tasks] FROM final --WHERE -- I assume "2 minutes" criteria should be somewhere here or I might need to use UNION ALL somewhere above... GROUP BY CustomerID , UserID , [All_Tasks] ORDER BY CustomerID , Date ;
The final result should look like this:
Greatly appreciate any help!
Advertisement
Answer
This will help,
select CustomerID,UserID,min([Date]) [Date],STRING_AGG( TaskDSC,',') TaskDSC from #data c group by CustomerID,UserID, DATEPART(YEAR, [Date]), DATEPART(MONTH, [Date]), DATEPART(DAY, [Date]), DATEPART(HOUR,[Date]), (DATEPART(MINUTE, [Date]) / 3) order by UserID
And if you dont have STRING_AGG, check the below:-
select CustomerID,UserID,min([Date]) [Date], COUNT(*) [TaskCount] --,STRING_AGG( TaskDSC,',') TaskDSC, , stuff((select ',' + d.TaskDSC from #data d where d.CustomerID=c.CustomerID and d.UserID=c.UserID and d.Date between MIN(c.date) and max(c.date) for xml path('')), 1, 1, '') [TaskDSC] from #data c group by CustomerID,UserID, DATEPART(YEAR, [Date]), DATEPART(MONTH, [Date]), DATEPART(DAY, [Date]), DATEPART(HOUR,[Date]), (DATEPART(MINUTE, [Date]) / 3) order by UserID
My results is as below:-
CustomerID UserID Date TaskCount TaskDSC 91237531 ID1 2019-01-02 08:38:00.000 2 T001,T002 91256319 ID2 2019-01-02 14:15:00.000 3 T0001,T0002,T0003 91089280 ID3 2019-01-02 06:01:00.000 1 T1 91268324 ID4 2019-01-02 00:29:00.000 2 T101,T102 91268324 ID4 2019-01-02 00:35:00.000 1 T103 91268351 ID5 2019-01-02 16:56:00.000 1 T11111 91268351 ID5 2019-01-02 17:03:00.000 1 T22212
To group by data with a specific time periods we can use a group by with several Dateparts, for example, if we needed to group our data for each 5 years , we will use:
Group By DATEPART(YEAR, [Date])/5
As this will group by each 5 years, simply sql is deviding the result of the date to first get the year part, and then devide it by 5, resulting in the same number for each 5 years, so the data will be grouped for each five years. to see the result of this you can select the value to understand what sql server is seeing.
No if the period is in months lets say each two months, I think we can use the below:-
Group By DATEPART(YEAR, [Date]),DATEPART(MONTH, [Date])/2
In our requirement we needed each 2 minutes, so we need to add parts till the minuses, (there are other ways too, but this is simple)
DATEPART(YEAR, [Date]), DATEPART(MONTH, [Date]), DATEPART(DAY, [Date]), DATEPART(HOUR,[Date]), (DATEPART(MINUTE, [Date]) / 2)
but for the result provided , we used each 3, so the data will match the expected result, so really the solution is grouping each 3 minutes not two.
if we need each 15 sec, we simply add the (DATEPART(Second, [Date]) / 15) and so on.
If the time slots dont serve your solution, we can use a complex recursive function that will look at the first date a task accured and check exactly all the tasks within 2 min, then start with the one after it and go from there , please check the below for a more accurate solution for your question.
; with prev as ( select c.CustomerID,c.UserID,c.[Date],1 [Depth] from #data c union all select c.CustomerID,c.UserID,c.[Date],prev.depth+1 [Depth] from #data c inner join prev on prev.CustomerID=c.CustomerID and prev.UserID=c.UserID and c.[Date]>DATEADD(MINUTE,2,prev.[Date]) ) select prev.CustomerID,prev.UserID,MIN(prev.Date) [date] , stuff((select ',' + d.TaskDSC from #data d where d.CustomerID=prev.CustomerID and d.UserID=prev.UserID and d.[Date] between MIN(prev.[Date]) and DATEADD(MINUTE,2,MIN(prev.[Date])) for xml path('')), 1, 1, '') [TaskDSC], , (select count(*) from #data d where d.CustomerID=prev.CustomerID and d.UserID=prev.UserID and d.[Date] between MIN(prev.[Date]) and DATEADD(MINUTE,2,MIN(prev.[Date]))) [TaskCount] from prev group by prev.CustomerID,prev.UserID,depth order by prev.UserID,prev.CustomerID
I think this will solve all your concerns.