Skip to content
Advertisement

How to make a query that filters out only back to back values that do not have any other values in-between?

Write a query that returns all urls involved in a “bounce.” A url X is involved in a bounce if a single user navigates from a page Y to page X, then immediately returns to page Y while visiting no other pages in between. That is, you’ll want to find two visits X->Y (on date d1) and Y->X (on date d2) for the same user u, for which there are no other visits by u between d1 and d2.

You can compare dates just like numbers: d1 < d2 is true if date d1 is prior to date d2.

I self joined this table:

url dt uid src rev
A04 1890-05-14 A A01 10
A02 1890-05-15 A A04 15
A01 1890-05-16 A A04 20

This is my query:

Select *
FROM Visit V1, Visit V2
WHERE V1.url = V2.src 
AND V1.src = V2.url
AND V1.dt < V2.dt
AND V1.uid = V2.uid

I understand that the single user and navigation part, and comparing the dates part.

A01 would not work here because the single user A nagivated from A01 -> A04 -> A01, however, the user visited A02 in between so it would return an empty set. How would I make it so that it return the urls only when the single user visits no other pages while navigating back and forth?

Advertisement

Answer

You can add another condition in your where clause as below:

AND NOT EXISTS (SELECT * FROM Visit visit WHERE visit.uid=V1.uid and visit.dt>V1.dt AND visit.dt<V2.dt)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement