I want to group the database by gauge_id
and then get the row having maximum time_col
. This is my current code:
rain_gauge_datas = db.session.query(DataEntry.gauge_id, func.max(DataEntry.time_col)).group_by(DataEntry.gauge_id).all()
But I am only able to get both gauge_id
and time_col
. When I add another column (reading), like this:
rain_gauge_datas = db.session.query(DataEntry.gauge_id, DataEntry.reading, func.max(DataEntry.time_col)).group_by(DataEntry.gauge_id).all()
it gives this error.
column "data_entry.reading" must appear in the GROUP BY clause or be used in an aggregate function
How do I return the top row of each groups? Thanks in advance.
Advertisement
Answer
One way to do this would be with a common table expression (CTE). The CTE creates a virtual table of the group by resultset which we can then join against.
The SQL would be
with cte as ( select gauge_id, max(time_col) as max_time from data_entries group by gauge_id ) select d.* from data_entries d join cte on d.gauge_id = cte.gauge_id and d.time_col = cte.max_time;
The SQLAlchemy equivalent would be
cte = (session.query(DataEntry.gauge_id, sa.func.max(DataEntry.time_col).label('max_time')) .group_by(DataEntry.gauge_id) .cte(name='cte')) query = session.query(DataEntry).join(cte, sa.and_( DataEntry.gauge_id == cte.c.gauge_id, DataEntry.time_col == cte.c.max_time ))
(the above example uses “pure” SQLAlchemy rather than Flask-SQLAlchemy – it should be enough to replace sa.
with db.
and session
with db.session
to get it working in Flask_SQLAlchemy)
It’s worth noting that CTEs were not handled efficiently in Postgresql until v12, so if you are on an earlier version it may be better to inline the CTE as a subquery.