Skip to content
Advertisement

Getting values from the first row, last row, and an aggregation in MySQL Window function

For a marketing related analysis I need to provide data on the first- and last-touchpoint and of the number of total interactions with our website.

A simplified version of our interaction table looks like this:

Our current approach looks like this:

Currently, we observe that the runtime scales approximately linear with our data, which will become infeasible to compute soon.

An alternative idea is

but in our experiments we never saw the second query complete. Hence, we are not a 100% sure that it yields the same results.

We tried indices on timestamp and (session_id, timestamp), but according to EXPLAIN this didn’t change the query plan.

Is there any fast way to retrieve individual properties from the first and last entry per session_id plus the count per session_id? Note that in our real example there are more parameter like utm_source and utm_medium that we are interested in.

EDIT

Sample data:

Expected result:

I noticed that my second query doesn’t yield the expected result. The last_touchpoint and last_medium are filled with the first value instead. I tried

  • first_value(utm_source) over (partition by session_id order by timestamp desc) as last_touchpoint, and
  • last_value(utm_source) over (partition by session_id order by timestamp range between unbounded preceding and unbounded following) as last_touchpoint,

Advertisement

Answer

The only way you are going to make the query scalable is by reducing the amount of data being processed using a where clause. If I assume that sessions never last more than a day, then I can expand the timeframe for the calculation by a day and use window functions. That results in something like this:

Your use of first_value() should be returning an error — it violates the rules of “full group by” which MySQL 8+ has set by default. No surprise that syntactically incorrect code is not working.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement