dbt Incremental part 2: Implementing & Testing

Incremental Strategies & How to Use Them

In Part 1, I covered the topic of choosing the appropriate incremental strategy, which significantly impacts the cost and time of each table.

In this blog, I’ll discuss how I implement and test incremental models in the J4J project.

1. Implementing

I will take the fct__jira__issue model as an example.

1.1 Config Incremental Strategy

The _fivetran_synced timestamp was skewed because, at that time, we had just switched from Stitch to Fivetran to ingest Jira data. So, in this case, we chose the merge with clustered strategy.

But if the insert_overwrite strategy was possible, the config block could be as below:

1.2 Choose Timestamp Column and Creat Time Range Filter

  • It is recommended to choose the synced timestamp of the ingestion tool (e.g., _fivetran_synced) for all incremental models.

  • Find out the last modified timestamp using GREATEST() function:

The final fact model is built from several staging models. If we simply choose the timestamp from one staging model, the fields from other staging models might not be updated. Therefore, we must find the most updated timestamp for each row. In this case, I used GREATEST() to pick the last modified _fivetran_synced timestamp.

Read more about GREATEST() function HERE.

Set up a time range filter to sort out data that need to update:

1.3 (optional) Add build_source Column

Later on, the build_source field enables us to assess whether the incremental models have been configured properly. However, it is not a compulsory step.

1.4 Putting It All Together!

2. Testing

In order to debug faster, it is recommended to audit tables from downstream to upstream models (base -> stg -> int -> fct)

To check whether there is a discrepancy between the full-refresh model and incremental model, we should take the following steps:

2.1 Create Full-refresh and Incremental Model

Since the incremental model only cares about recent data, so in J4J, we only compare data in the last 3 days. Therefore, we need to 2 types of model:

(1) completely full-refresh model

(2) model that is built full refresh before N-3 (N = current date) and incremental in the last 3 days and store in custom schema, e.g. dbt_na_incremental_test_mart

To create the latter type:

  • Create a custom filter date range macro:

 

 

  • Add custom filter in base models, e.g. base__jira__worklog model

 

 

2.2 Create Audit Tests

Using audit_helper package to create tests that compare between the full-refresh model and incremental model.

Start with base model, then continue with upstream models

 

References

Alice Bui
Like what you read? Share with a friend.

Contact Us!

Do you have a question or need more info? Please enter your information and describe your inquiry, and we’ll get back to you as soon as possible. Thanks!

Check out our Case Study!

We’d love to hear more from you. Tell us more about yourself and we’ll message you with our case studies as soon as we receive your message!