Skip to content
Advertisement

Group by sub-string while selecting the full string

Let’s say my sql dataset looks like this and it has one column:

John          
John,2345    
Joe,1234      
Joe
Joe,1234
John
John,2345
John,2345
Smith
Smith

I am going to count the number of times Joe, John, and Smith appear and group them by only their names’ substring. I know I can select it as a sub-string, but that does not work for my case as it shows the sub-string version, not the full string version. The query result that I like to have as an output would be:

("John,2345",5)
("Joe,1234",3)
("Smith",2)

So because John and Joe have the comma, we need to select the whole string, if they don’t have the comma like Smith, we just show the substring which is just name. Note that the number after the comma is the person’s ID. So it will be unique to each name if it exists.

Advertisement

Answer

One concise approach, using SUBSTRING_INDEX to isolate the base name in each record:

SELECT
    CONCAT('("', MAX(name), '",', COUNT(*), ')') AS output
FROM
(
    SELECT name, SUBSTRING_INDEX(name, ',', 1) AS name_only
    FROM yourTable
) t
GROUP BY name_only;

screen capture from demo link below

Demo

The subquery first generates a name only for each input name, which is the portion of the name preceding the comma (should a comma exist). Then, we aggregate by this name only, but select the max of the name, for each name group, to report the full name which you want to see.

Note that as mentioned in the above comments, your table design is not optimal, because it is storing CSV data. Needing a query with the above complexity just to get basic counts might be indication that it is time for a design change.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement