Skip to content
Advertisement

Missing rows in full outer join

I am trying to count how many users are observed on each of the 3 consecutive days. Each of the 3 intermediate tables (t0, t1, t2) has 2 columns: uid (unique ID) and d0 (or d1 or d2, which is 1 and indicates that the user is observed on that day).

The following query:

produces this output from spark.sql(q).toPandas().set_index(["d0","d1","d2"]):

Two rows are obviously missing: 0 1 1 and 1 0 1. Why?!

PS1. I understand why 0 0 0 is missing.

PS2. my_table looks approximately like this:

For this table I expect the query to return

Advertisement

Answer

Regarding the original query the last FULL JOIN should take into account t0.uid could be null due to the first FULL JOIN so it must be OR not AND.

SQL Server db<>fiddle

Personally I would stick with the Gordon Linoff’s solution.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement