This post was co-written by Sam Hall, Dakota Kelley & Sunny Yan.
In our previous blog, we discussed how Fivetran and dbt scale for any data volume and workload, both small and large. Now, you might be wondering what these tools can do for your data team and the efficiency of your organization as a whole. Can these tools help reduce the time our data engineers spend fixing things? Can they help speed up the time it takes to make data available to our business users?
In this blog, we will explore how Fivetran and dbt can enable teams of all sizes to drive better business value and do it faster and more efficiently.
Why Does My Data Organization Move so Slowly?
This is the question that many business teams ask themselves (or the data team) at some point in most organizations’ data-driven journey.
The story is all too common – a business user requests some data, the data team creates/prioritizes a ticket, and said ticket is completed after some number of months (or weeks if you’re lucky) – just to have the data be wrong, and the whole process starts again.
What is going wrong here? There are a couple of major problems:
The Complexity of Traditional/Custom Tooling and ETL
The traditional method of building ETL pipelines involves building efficient, custom code using architectural patterns that fit the current use case. Naturally, over time, though, as requirements grow and change, this method frequently duplicates logic and brittle, complex pipelines that require a lot of maintenance and upkeep.
This results in increased technical debt, and your team ends up spending their time fixing large repositories of codes and debugging why certain chunks of code are not working – instead of innovating and building pipelines to enhance your business operations and analytics.
Additionally, these customizations often slow down the ability of your analytics teams to react to changes – data changes, technology changes, and business strategy changes.
This is why it is a cause of significant lead times between a request for the data and its availability to end users. Your developers are bogged down updating and maintaining some portion of the following:
Custom connection logic for data sources/data targets
Transforming data (matching data from source to target)
Scalable, security, privacy
Understanding batching logic from source to target
Retry/error handling to ensure the connection works
All of these changes eat up precious FTE hours – and are often viewed as low-value activities. This cuts into time that can be spent delivering new data/features – and often results in leadership wondering why it is taking so long for new products to arrive (which leads to projects being cut).
Additionally, frequent trust issues arise as these pipelines break or data quality suffers. These things all contribute to burnout for your engineers, reduced organizational trust, cut projects, and lack of effectiveness of your data platform.
Change Management Bottlenecks
Remember all those large code repositories created by the above custom ETL pipelines over time? Those are scary for data teams to change. A simple change becomes a massive deal as they try to find what other elements in the web of dependencies that change might affect.
Additionally, especially in large enterprise engineering teams, meeting(s) must be held across departments for each change – getting everyone in a room together to align on things like architectural best practices, security, back-out planning, and change schedules.
The dream of DevOps was that engineering teams could own the development and deployment of code without all these bottlenecks and still have confidence in the Quality and robustness of their code via automation and best practices built into their CICD process. This would enable them to deliver frequently and with quality. Your data operations should be the same way.
How do you know your pipelines/data is good unless you put it in front of your users (answer: you don’t)? Get used to delivering quickly and iteratively, and trust that quality will follow as your engineering and business teams learn from each other.
Can your data team achieve this? A key way to unlock your team from these challenges is to simplify and automate your architecture where possible – which is exactly what Fivetran and dbt aim to do.
How Fivetran and dbt Streamline the ELT Process
Fivetran has abstracted away the work needed to integrate with various sources, simplifying the development effort needed and automating the entire ingestion effort, including things like reloads and incremental logic. Fivetran also takes care of all the manual elements of building and maintaining a data pipeline that is not business-related so that data teams don’t have to.
For example, when new columns get added to your data source, Fivetran simplifies the handling of schema drift for users and ensures that your data pipeline continues to work.
It also encourages the ELT pattern of moving the filter and transformation logic from the beginning/middle of the pipeline (where it can be difficult to debug or re-execute when changes are made) to the end inside the cloud data warehouse/lake.
These transformations can be democratized among technical and less-technical users to create workable datasets that data analysts/scientists can use to build reports and/or AI/ML models.
This is where dbt comes in – powering the transformations. With dbt, transforming the data according to business logic becomes easy. dbt allows you to write templated SQL using Jinja to create macros. These allow you to stick to DRY principles by making it easy to update logic in one place.
Additionally, with the new Data Mesh constructs that have arrived in dbt, it is now easier to provide contracts on top of your models. It focuses on them being reusable, allowing data teams to unify their metrics instead of having business units fighting about whose version of a metric is correct.
Business Impact of Faster Time to Value
Using the Modern Data Stack method of ELT and the combination of Fivetran and dbt significantly decreases the cost of producing new data products. Each new data product is now less resource-intensive because costs are amortized across all use cases, not just for a specific subset of data.
Using Fivetran minimizes the duplication of data movement, and using dbt simplifies the process of transforming data. With these resources now freed up, data organizations can focus on higher-priority activities such as AI/LLM models.
How Fivetran and dbt Enable AI/LLM Development
Fivetran and dbt will not be the tools that are directly used to build out complex machine learning and large language models. However, they will be the cornerstone tools for moving and curating AI/LLM consumption data.
In order to create and train these models, organizations need access to their data from their proprietary domain data systems. Simplifying and accelerating the path to moving and modeling this data via toolings like Fivetran and dbt means exponentially more progress can be made on AI/LLM projects instead of that time potentially being used for debugging and building custom systems.
Fivetran and dbt will obviously only solve some problems. Still, they are built to speed up the process of getting data from source to target and transforming it into a usable format, regardless of whether that is for analytics or model inference.
If your priorities are creating an environment where data can be quickly and readily accessible for value creation – these tools may be worth a look!
A Word on Costs
As a business, time to leverage your people’s creativity is one of your most important resources for creating exponential value. It is hard to quantify on a spreadsheet, but understanding the ramifications of your build vs. buy decision on your people’s time could be worth a few extra dollars in operational cost.
Don’t get caught up in the trap of only comparing infrastructural costs between a certain tool vs. a new or existing custom solution – make sure to consider the opportunity cost of the decision as well.
As you can see, both Fivetran and dbt provide everything needed to help data teams drive business value efficiently. We’ve covered why the old way of performing ETL can often take longer while showing how embracing modern ELT pipelines helps data teams reduce their time to value.
Now, you might wonder how these tools stay on the cutting edge of the data engineering world. In our next blog, we will dive into what Data Mesh is and how these tools are embracing Data Mesh.