Skip to content
Advertisement

How to structure DBT tables with cyclical dependencies

I have one table containing my members.

customer_id name age
1 John 74
2 Sarah 87

Everyday, I get a new table containing the current members.

  • If a new member has joined, I want to add them.
  • If a member has left, I want to nullify their name/id
  • If a current member is still a member then I want to keep them as is.

Imagine that I get a new upload with the following rows

customer_id name age
2 Sarah 87
3 Melvin 23

I then want to generate the table

customer_id name age
Null Null 74
2 Sarah 87
3 Melvin 23

I don’t want to nullify anything by mistake and therefore I want to run a few tests on this table before I replace my old one. The way I’ve done this is by creating a temporary table (let’s call it customer_temp). However, I’ve now created a cyclical dependency since I:

  • Need to read the live table customer in order to create the customer_temp
  • Need to replace the live table customer with customer_temp after I’ve run my tests

Is there anyway I can do this using dbt?

Advertisement

Answer

Destroying data is tricky. I would avoid that unless it’s necessary (e.g., DSAR compliance).

Assuming the new data is loaded into the same table in your database each day, I think this is a perfect candidate for snapshots, with the option for invalidating hard-deleted records. See the docs. This allows you to capture the change history of a table without any data loss.

If you turned on snapshots in the initial state, your snapshot table would look like (assuming the existing records had a timestamp of 1/1):

customer_id name age valid_from valid_to
1 John 74 1/1/2022
2 Sarah 87 1/1/2022

Then, after the source table was updated, re-running dbt snapshot (today) would create this table:

customer_id name age valid_from valid_to
1 John 74 1/1/2022 5/12/2022
2 Sarah 87 1/1/2022
3 Melvin 23 5/12/2022

You can create the format you’d like with a simple query:

select
    case when valid_to is null then customer_id else null end as customer_id,
    case when valid_to is null then name else null end as name,
    age
from {{ ref('my_snapshot') }}
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement