I’m having a problem with duplicates using mysql server with stuff function. The database contains few million records and using distinct is out of question (it seems that the query does not even process).
This is my database structure:
PersonID Freetext Importance PersonID Freetext Importance PersonID Freetext Importance PersonID Freetext Importance PersonID Freetext Importance
I have been using following structure for this query. The query works fine (regarding stuff function), but it returns duplicates. For example, if there are five ID’s which are the same, the query returns five rows:
SELECT PersonID, Importance, Freetext = STUFF( (SELECT '~' + Freetext FROM TABLE t1 WHERE t1.PersonID = t2.PersonID FOR XML PATH ('')) , 1, 1, '') from TABLE t2 group By PersonID, Importance order by Importance
How to avoid this?
I would like to have all the freetexts that belong to the same PersonID stuffed together to one single row, ordering the stuffing so that the row that has smallest importance would have it’s freetext stuffed first.
Advertisement
Answer
Just guessing (can’t be sure this gets the desired result without knowing that or the source data), but if you want FOR XML PATH
to list the values in order of importance (smallest to largest), you need to move the ORDER BY
:
SELECT PersonID, [Freetext] = STUFF( (SELECT '~' + Freetext FROM dbo.[TABLE] AS t1 WHERE t1.PersonID = t2.PersonID ORDER BY Importance FOR XML PATH(''), TYPE ).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, '') FROM dbo.[Table] AS t2 GROUP BY PersonID;
If you want to include importance somehow, you’ll need to concatenate it with the freetext; it didn’t make sense in the outer query or the group by (the group by importance was actually what was causing your duplicates).
If you need to prevent duplicate values for freetext, you’ll need to add grouping within the subquery. But that gets complicated too; what if you have two identical values for freetext but they were entered with different importance? Do you discard the one with higher or lower importance?
Going back to the original problem, on SQL Server 2017 and above, you can use a much simpler (and more efficient) struct:
SELECT PersonID, [Freetext] = STRING_AGG([Freetext],'~') WITHIN GROUP (ORDER BY Importance) FROM dbo.[Table] GROUP BY PersonID;