Skip to content
Advertisement

Impala Last_Value() Not giving result as expected

I have a Table in Impala in which I have time information as Unix-Time (with a frequency of 1 mSec) and information about three variables, like given below:

I want to resample the data and to get the last value of the new time window. For example, if I want to resample as 10Sec frequency the output should be the last value of 10Sec window, like given below:

To have this result, I am running the following query:

I have read on some forums that LAST_VALUE() sometimes cause problem, so I tried to achieve the same thing using FIRST_VALUE with ORDER BY DESC. The query is given below:

But in both cases, I am not getting the result as expected. The resampled time ts appeared as expected (with a window of 10Sec) but I am getting random values for Val1, Val2 and Val3 between 0-9sec, 10-19Sec, … windows.

Logic wise this query looks fine and I didnÄt find any problem. Could anybody explain that why I am not getting the right answer using this query.

Thanks !!!

Advertisement

Answer

The problem is this line:

You are partitioning and ordering by the same column, ts — so there is no ordering (or more specifically ordering by a value that is constant throughout the partition results in an arbitrary ordering). You need to preserve the original ts to make this work, using that for ordering:

Incidentally, the issue with last_value() is that it has unexpected behavior when you leave out the window frame (the rows or range part of the window function specification).

The issue is that the default specification is range between unbounded preceding and current row, meaning that last_value() just picks up the value in the current row.

On the other hand, first_value() works fine with the default frame. However, both are equivalent if you include an explicit frame.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement