Let’s say my sql dataset looks like this and it has one column:
John John,2345 Joe,1234 Joe Joe,1234 John John,2345 John,2345 Smith Smith
I am going to count the number of times Joe, John, and Smith appear and group them by only their names’ substring. I know I can select it as a sub-string, but that does not work for my case as it shows the sub-string version, not the full string version. The query result that I like to have as an output would be:
("John,2345",5) ("Joe,1234",3) ("Smith",2)
So because John and Joe have the comma, we need to select the whole string, if they don’t have the comma like Smith, we just show the substring which is just name. Note that the number after the comma is the person’s ID. So it will be unique to each name if it exists.
Advertisement
Answer
One concise approach, using SUBSTRING_INDEX
to isolate the base name in each record:
SELECT CONCAT('("', MAX(name), '",', COUNT(*), ')') AS output FROM ( SELECT name, SUBSTRING_INDEX(name, ',', 1) AS name_only FROM yourTable ) t GROUP BY name_only;
The subquery first generates a name only for each input name, which is the portion of the name preceding the comma (should a comma exist). Then, we aggregate by this name only, but select the max of the name, for each name group, to report the full name which you want to see.
Note that as mentioned in the above comments, your table design is not optimal, because it is storing CSV data. Needing a query with the above complexity just to get basic counts might be indication that it is time for a design change.