Skip to content
Advertisement

Keep variables in the SELECT list which are not used in the GROUP BY nor aggregated

I have 2 tables: table1 (including id, title, and abstract) and table2 (including id, pubdate, and family_id).

Table 1 Table 2

Each id is unique but multiple rows can have the same family_id.

My goal is to display the id, title and pubdate only for the MIN(pubdate) of each family_id so that I should get:

Result

I tried this:

SELECT
  t1.id, t1.title, nt2.pubdate
FROM
  (SELECT
     id, family_id, MIN(pubdate) AS pubdate
   FROM
     table2
   GROUP BY
     family_id) AS nt2
INNER JOIN
  table1 t1
ON
  t1.id = nt2.id

I get the following error message in BigQuery: “SELECT list expression references column id which is neither grouped nor aggregated at [position]“.

I read here that this is due to the fact that all expressions in the respective SELECT list must be either those which are in the GROUP BY or should be with AGGREGATION function.

But, I do need id for merging with table1.

How can I do this?

Advertisement

Answer

In the “Group by” statement, you need to write all the fields you use in the “SELECT” statement. You can see this documentation.

Like this line of code:

GROUP BY Family_id,id

In this case, if the family and id are duplicated in another field, they will group.

Here is an example code:

SELECT
  t1.id, t1.title, nt2.pubdate
FROM
  (
  Select B.* 
  from (
        SELECT
        family_id, MIN(pubdate) AS pubdate
        FROM
            table2
        GROUP BY
            family_id
      ) as A 
        inner join table2  B on A.pubdate = B.pubdate and A.family_id=B.family_id
    ) AS nt2
INNER JOIN
  table1  t1
ON
  t1.id = nt2.id
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement