Skip to content
Advertisement

Using GROUP BY and JOIN-ing non-aggregated columns to query

I know there are many similar questions, but I haven’t managed to solve my problem by reading them. I’d appreciate some pointers.

Here’s some example data from my dummy table below:

id foo bar baz moo ins_date percentage yes no maybe
38 foothing bar_one pizazz amoosing 2018-05-26 06:59:00 81 25 529 196
41 foothing bar_one pizazz amoosing 2018-05-29 06:43:00 83 441 144 49
23 foothing bar_one pizazz amoosing 2018-06-24 08:48:00 62 9 1 16
20 foothing bar_one pizazz amoosing 2018-06-27 10:37:00 94 676 16 400
65 foothing bar_one pizazz amoosing 2018-07-01 08:34:00 92 121 64 225
68 foothing bar_one pizazz amoosing 2018-07-04 01:46:00 91 324 25 289
71 foothing bar_one pizazz amoosing 2018-07-06 23:44:00 65 196 676 100
74 foothing bar_one pizazz amoosing 2018-07-10 09:41:00 92 1024 121 81
77 foothing bar_one pizazz amoosing 2018-07-13 06:47:00 64 576 169 1
96 foothing bar_one pizazz amoosing 2018-08-02 10:34:00 78 1369 256 81
99 foothing bar_one pizazz amoosing 2018-08-04 08:25:00 82 2809 9 256
102 foothing bar_one pizazz amoosing 2018-08-07 06:49:00 87 576 9 676
105 foothing bar_one pizazz amoosing 2018-08-10 03:29:00 68 4225 1089 196
108 foothing bar_one pizazz amoosing 2018-08-13 03:59:00 92 1156 169 484
111 foothing bar_one pizazz amoosing 2018-08-16 05:34:00 63 1764 100 108

I want to achieve the following through a single query:

  1. Filter all rows to get an ins_date between a date range that I choose
  2. Get the maximum ins_date per group of foo, bar, baz and moo
  3. Be able to filter rows by foo, bar, baz and moo
  4. Additionally show non-grouped values in the query, such as percentage, yes, no and maybe.

Overall this is proving to be complex. So far I’ve managed to achieve the first 3 points in the query below, and hopefully this explains what I’m looking for:

SELECT
  s.foo,
  s.bar, 
  s.baz,
  s.moo,
  MAX(s.ins_date) mdate
FROM 
(
  SELECT *
  FROM dummy
  WHERE ins_date
  -- My arbitrary date range goes here
  BETWEEN '2018-07-01 00:00:00'
  AND '2019-11-01 23:59:59'
) s

GROUP BY foo, bar, baz, moo
-- I could add other filters into the 'HAVING' clause
HAVING moo LIKE "%moo%"
AND baz = "baz"

This gives the output:

foo bar baz moo mdate
foothing bar_one baz amoosing 2018-11-29 05:31:00
foothing bar_one baz mooman_being 2019-04-21 10:31:00
foothing bar_one baz strawberry_moosse 2019-03-17 06:37:00

In this example, if I were to change the date constraint to show only dates between 2018-05-01 and 2018-05-29, then for the first row mdate would show 2018-05-29 06:43:00, because that’s the latest (most recent) date within that date range, for that particular grouping of foo/bar/baz/moo.

But I haven’t been able to append the other columns which aren’t related to this grouping. I’ve tried by using a JOIN …

SELECT
  s1.foo,
  s1.bar, 
  s1.baz,
  s1.moo,
  MAX(s1.ins_date) mdate,
  s2.percentage,
  s2.yes,
  s2.maybe,
  s2.no
FROM 
(
  SELECT *
  FROM dummy
  WHERE ins_date
  -- My arbitrary date range goes here
  BETWEEN '2018-07-01 00:00:00'
  AND '2019-11-01 23:59:59'
) s1
-- Attempting to a do a self-join to get the non-aggregated columns
INNER JOIN
(
  SELECT id, percentage, yes, maybe, no
  FROM dummy
) s2
ON s2.id = s1.id

GROUP BY foo, bar, baz, moo
-- I could add other filters into the 'HAVING' clause
HAVING moo LIKE "%moo%"
AND baz = "baz"

But this returns the error:

Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘s2.percentage’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

How do I add the non-aggregated columns to my query without breaking it?

I’m using mysql 5.7, so fancy mysql 8 options aren’t available.

Fiddle below:

https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=1980dd582c2235dc0938cb14c781e3c6

Advertisement

Answer

Maybe by aggregating the extra columns?

SELECT foo, bar, baz, moo
, MAX(ins_date) AS mdate
, AVG(percentage) AS avg_perc
, MAX(yes) AS YEAHBABY
, MAX(maybe) AS MAYBEBABY
, MAX(no) AS NONONONONOOO
FROM dummy dum
WHERE ins_date BETWEEN '2018-07-01 00:00:00'
                   AND '2019-11-01 23:59:59'
  AND moo LIKE '%moo%'
  AND baz = 'baz' 
GROUP BY foo, bar, baz, moo
foo      | bar     | baz | moo               | mdate               | avg_perc | YEAHBABY | MAYBEBABY | NONONONONOOO
:------- | :------ | :-- | :---------------- | :------------------ | -------: | -------: | --------: | -----------:
foothing | bar_one | baz | amoosing          | 2018-11-29 05:31:00 |  82.8000 |    11236 |       625 |          841
foothing | bar_one | baz | mooman_being      | 2019-04-21 10:31:00 |  70.0000 |     3969 |        16 |          121
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 |  80.0000 |    23716 |       529 |           49

db<>fiddle here

Or join to the grouped fields and MAX date.

Or use an emulated row_number.

Or use an EXISTS.

SELECT foo, bar, baz, moo
, ins_date
, percentage
, yes, maybe, no
FROM dummy dum
WHERE EXISTS (
  SELECT 1
  FROM dummy dum2
  WHERE dum2.ins_date BETWEEN '2018-07-01 00:00:00'
                          AND '2019-11-01 23:59:59'
    AND dum2.moo LIKE '%moo%'
    AND dum2.baz = 'baz'
    AND dum2.foo = dum.foo
    AND dum2.bar = dum.bar
    AND dum2.baz = dum.baz
    AND dum2.moo = dum.moo
  GROUP BY foo, bar, baz, moo
  HAVING MAX(dum2.ins_date) = dum.ins_date
);
foo      | bar     | baz | moo               | ins_date            | percentage |   yes | maybe |  no
:------- | :------ | :-- | :---------------- | :------------------ | ---------: | ----: | ----: | --:
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 |         80 | 23716 |   529 |  49
foothing | bar_one | baz | mooman_being      | 2019-04-21 10:31:00 |         70 |  3969 |    16 | 121
foothing | bar_one | baz | amoosing          | 2018-11-29 05:31:00 |         97 |  9025 |   361 |   1

db<>fiddle here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement