Skip to content
Advertisement

For rows with the same A, B, C values, how to select only “earliest” row (specified by column D) within a given time range?

I currently have a query (simplified below):

Basically, the data source we’ve been getting data from writes their timestamps as when their job finished running rather than when the data actually came in (we can’t do anything about this on our end). Their jobs regularly end at 5-6 minutes after the data came in so we infer that the actual minutes value for the data by setting the snap_mins ourselves. (For reasons I won’t go into, we can only use the values on the half hour and need to have a row for the given name/id/node each half hour.)

However, I know this is really really flaky and I want to try to, instead of looking for specific minute values, just grab the first row whose snapshot_date’s minute value is somewhere between 4-10 minutes after each half hour (XX:00 or XX:30). For example, if the table looks like this:

So the query results I want to get would be:

However, I’ve been struggling with determining if it’s even possible to do this. Any help would be greatly appreciated!

Advertisement

Answer

One option

  1. Add the columns defining the associated 30 minute interval
  2. Sequence number the rows within a 30 minute group
  3. Take only sequence #1 from each group

If you really need to omit rows from MINUTE 0-3 and 30-33 from consideration, then add a WHERE condition to the first CTE.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement