Skip to content
Advertisement

First row of all groups in Flask SQLAlchemy

I want to group the database by gauge_id and then get the row having maximum time_col. This is my current code:

rain_gauge_datas = db.session.query(DataEntry.gauge_id, func.max(DataEntry.time_col)).group_by(DataEntry.gauge_id).all()

But I am only able to get both gauge_id and time_col. When I add another column (reading), like this:

rain_gauge_datas = db.session.query(DataEntry.gauge_id, DataEntry.reading, func.max(DataEntry.time_col)).group_by(DataEntry.gauge_id).all()

it gives this error.

column "data_entry.reading" must appear in the GROUP BY clause or be used in an aggregate function

How do I return the top row of each groups? Thanks in advance.

Advertisement

Answer

One way to do this would be with a common table expression (CTE). The CTE creates a virtual table of the group by resultset which we can then join against.

The SQL would be

with cte as (
    select gauge_id, max(time_col) as max_time
    from data_entries
    group by gauge_id
)

select d.* from data_entries d
join cte 
on d.gauge_id = cte.gauge_id 
and d.time_col = cte.max_time;

The SQLAlchemy equivalent would be

cte = (session.query(DataEntry.gauge_id, sa.func.max(DataEntry.time_col).label('max_time'))
              .group_by(DataEntry.gauge_id)
              .cte(name='cte'))
query = session.query(DataEntry).join(cte, sa.and_(
    DataEntry.gauge_id == cte.c.gauge_id,
    DataEntry.time_col == cte.c.max_time
))

(the above example uses “pure” SQLAlchemy rather than Flask-SQLAlchemy – it should be enough to replace sa. with db. and session with db.session to get it working in Flask_SQLAlchemy)

It’s worth noting that CTEs were not handled efficiently in Postgresql until v12, so if you are on an earlier version it may be better to inline the CTE as a subquery.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement