Skip to content
Advertisement

How to edit my postgreSQL query to select the most recent row by date for several columns

I have a query where I’m trying to select the most recent row based on date. So what I would like my query to do is to select the most recent datapoints for a couple columns in my database.

sql = f"""SELECT DISTINCT ON (id), cast(marketcap as money), cast(week52high as money), cast(week52low as money)
                 , to_char(dividend_yield * 100, '99D99%%')
                 , pe_ratio, ROUND(beta,2) 
          FROM security_stats 
          WHERE security_stats.id IN ({placeholders});
       """

For anyone wondering, placeholders is just a list of stocks given by the user.

The goal of this application is to retrieve the most recent observations for financial data points. My query works fine but instead it selects all of the rows per stock, when I just want the the most recent row, per stock.

How can I edit my query to fix the above? I have tried some methods using MAX and LIMIT but couldn’t get it to work!.

EDIT: Note that the columns working with are id and date. These should be distinct, and the date should be the most recent observation.

Got it to work thanks to @GME, using this method below. Special thanks too @ Somy for showing me another method as well!

sql = f"""SELECT DISTINCT ON (id) id, cast(marketcap as money), cast(week52high as money), cast(week52low as money)
                 , to_char(dividend_yield * 100, '99D99%%')
                 , pe_ratio, ROUND(beta,2) 
          FROM security_stats 
          WHERE security_stats.id IN ({placeholders}) ORDER BY id, date desc ;
       """

Advertisement

Answer

The column (or set of columns) in the distinct on clause defines the groups: if you want one row per “stock”, then column stock should belong to that clause.

Then, you need an order by: it starts with the distinct on column(s), followed by one or more columns that define which row should be kept in each group: this is where the “date” column goes.

Your question does not explain what are the names of these to crucial columns in the table. Assuming that we have id and date, the logic would be:

SELECT DISTINCT ON (id) ...
FROM security_stats 
WHERE ...
ORDER BY id, date desc 
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement