Skip to content
Advertisement

Group rows by the same value in the field, while matching on partial value only

I have a table that has many rows (between a few 1000s to a few million). I need my query to do the following:

  • group results by the same part of the value in the field;
  • order by the biggest group first.

The table has mostly values that have only some part are similar (and i.e. suffix would be different). Since the number of similar values is huge – I cannot predict all of them.

Here is i.e. my table:

+--------+-----------+------+
|   Id   | Uri       | Run  |
+--------+-----------+------+
|  15145 | select_123|    Y |
|  15146 | select_345|    Y |
|  15148 | delete_123|    N |
|  15150 | select_234|    Y |
|  15314 | delete_334|    N |
|  15315 | copy_all  |    N |
|  15316 | merge_all |    Y |
|  15317 | select_565|    Y |
|  15318 | copy_all  |    Y |
|  15319 | delete_345|    Y |
+--------+-----------+------+

What I would like to see, something like this (the Count part is desirable but not required):

+-----------+------+
| Uri       | Count|
+-----------+------+
|  select   |  4   |
|  delete   |  3   |
|  copy_all |  2   |
|  merge_all|  1   |
+-----------+------+

Advertisement

Answer

If you’re using MySQL 5.x, you can strip the trailing _ and digits from the Uri value using this expression:

LEFT(Uri, LENGTH(Uri) - LOCATE('_', REVERSE(Uri)))

Using a REGEXP test to see if the Uri ends in _ and some digits, we can then process the Uri according to that and then GROUP BY that value to get the counts:

SELECT CASE WHEN Uri REGEXP '_[0-9]+$' THEN LEFT(Uri, LENGTH(Uri) - LOCATE('_', REVERSE(Uri)))
       ELSE Uri
       END AS Uri2,
       COUNT(*) AS Count
FROM data
GROUP BY Uri2

Output:

Uri2        Count
copy_all    2
delete      3
merge_all   1
select      4

Demo on SQLFiddle

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement