Skip to content
Advertisement

Insert only specific columns with DBT

As of now, I have an INSERT INTO clause, that works like this:

The final_table has 4 columns: ID (incremental/serial), name, age and created_at. The person_table has only name, age, and created_at.

When I translate it to the dbt logic, it looks like this:

However, dbt keeps compiling as an INSERT CLAUSE containing the ID (that it is auto-generated). Does anyone know how to insert only the specific columns that I want?

Advertisement

Answer

You can use row_number to fake an auto-incrementing key in Snowflake.

In a model materialized as a table, that would be as simple as:

But since this is an incremental model, your inner query will be filtered before row_number() is computed, so we need to add on the maximum value of id in the table if the model is being run in incremental mode:

Note that, if you --full-refresh this model, and records were deleted from person_table, then any records created after the deleted records will have their IDs changed. No real way around that, unfortunately! If you need an immutable ID, I would suggest hashing some combination of name, age, and created_at using md5 or dbt_utils.surrogate_key. Of course, those hashed ID’s would not be monotonically increasing.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement