I have a table that has many rows (between a few 1000s to a few million). I need my query to do the following:
- group results by the same part of the value in the field;
- order by the biggest group first.
The table has mostly values that have only some part are similar (and i.e. suffix would be different). Since the number of similar values is huge – I cannot predict all of them.
Here is i.e. my table:
+--------+-----------+------+ | Id | Uri | Run | +--------+-----------+------+ | 15145 | select_123| Y | | 15146 | select_345| Y | | 15148 | delete_123| N | | 15150 | select_234| Y | | 15314 | delete_334| N | | 15315 | copy_all | N | | 15316 | merge_all | Y | | 15317 | select_565| Y | | 15318 | copy_all | Y | | 15319 | delete_345| Y | +--------+-----------+------+
What I would like to see, something like this (the Count part is desirable but not required):
+-----------+------+ | Uri | Count| +-----------+------+ | select | 4 | | delete | 3 | | copy_all | 2 | | merge_all| 1 | +-----------+------+
Advertisement
Answer
If you’re using MySQL 5.x, you can strip the trailing _
and digits from the Uri
value using this expression:
LEFT(Uri, LENGTH(Uri) - LOCATE('_', REVERSE(Uri)))
Using a REGEXP
test to see if the Uri
ends in _
and some digits, we can then process the Uri
according to that and then GROUP BY
that value to get the counts:
SELECT CASE WHEN Uri REGEXP '_[0-9]+$' THEN LEFT(Uri, LENGTH(Uri) - LOCATE('_', REVERSE(Uri))) ELSE Uri END AS Uri2, COUNT(*) AS Count FROM data GROUP BY Uri2
Output:
Uri2 Count copy_all 2 delete 3 merge_all 1 select 4