This post was co-written by Sam Hall and Dakota Kelley
As the Modern Data Stack grows and matures, a large variety of tools begin to pop up, solving various problems throughout the ecosystem. If you’ve been watching how Snowflake Data Cloud has been growing and changing over the years, you’ll see that two tools have made very large impacts on the Modern Data Stack: Fivetran and dbt.
In this blog, we will cover what Fivetran and dbt are, but first, to understand why tools like Fivetran and dbt have brought such value to the data ecosystem, we need to go back to the reason for their existence – the emergence of the ELT pattern.
Understanding ELT and why Fivetran + dbt
What is ELT?
ELT enables access to raw data in the warehouse, powers a DevOps-based style of data integration, and taps into the parallel processing power of modern cloud-based data platforms. In short, ELT exemplifies the data strategy required in the era of big data, cloud, and agile analytics.
ELT is short for Extract, Load, and Transform. With ELT, we first extract data from source systems, then load the raw data directly into the data warehouse before finally applying transformations natively within the data warehouse. This is unlike the more traditional ETL method, where data is transformed before loading into the data warehouse.
By bringing raw data into the data warehouse and then transforming it there, ELT provides more flexibility compared to ETL’s fixed pipelines. This makes ELT aligned with modern data practices and helps explain why it has become the dominant pattern, replacing the once-standard ETL approach.
The Story of ELT
In the early days of data warehousing, ETL was the standard for data processing. Data was extracted from mainframes and legacy systems into warehouse databases like Oracle and Teradata using custom-built ETL tools.
The ETL approach made sense – it was required to clean up and transform the data in order for it to successfully load into the warehouse. Data volumes were also generally smaller relative to some of today’s datasets, so ETL made sense.
But then the 2000s brought the rise of the internet and big data, and everything changed. Data volumes exploded as web, mobile, and IoT took off. ETL systems just couldn’t handle the massive flows of raw data. Open source big data tools like Hadoop were experimented with – these could land data into a repository first before transformation. Thus, the early data lakes began following more of the EL-style flow.
But then, in the 2010s, cloud data warehouses, particularly ones like Snowflake, came along and really changed the game. Snowflake was optimized for the cloud, separating storage and computing. It could handle billions of rows of data using columnar data storage and parallel processing.
Because it handled huge amounts of data so easily, ELT was gradually viewed as a viable approach for data processing. As Snowflake and other cloud data warehouses like AWS Redshift and Google BigQuery grew in popularity, it pushed the whole industry towards adopting the ELT pattern.
Why ELT Has Succeeded
Today, ELT has become the norm. Modern data teams need the flexibility to access and query raw data. They need the ability to quickly and efficiently iterate on a variety of curated data assets on top of that raw data for current and future use cases. Our modern data warehouses have no problem running these transformations and more (such as serving LLM models on top of this data), provided the raw data is available. This also works well with modern DevOps practices – we want simple reusable tooling, not custom ETL scripts. ETL just can’t keep up anymore.
Challenges with Building an ELT Solution
Moving to ELT is not without its challenges, though. Adjusting to new patterns can be difficult, and without a proper strategy and value proposition can be a drain not just on company resources but on the morale of your people.
It is important to understand the unique challenges that your organization will face when moving to ELT so as to navigate the technical and social aspects of them. In our experience, a few of the top challenges are:
Limited engineering resources. Whether this be limited talent, reduced budgets, or both, data teams today are being forced to do more with less and provide proof of value for the initiatives that they undertake.
Integration with legacy systems and overcoming technical debt. Many larger organizations will have one or more data warehousing technologies, and most likely, those will be implementing an ETL strategy vs an ELT strategy. Being able to easily integrate these data sources (alongside new ones) is critical to successfully providing value using ELT and a modern data platform.
Change management and cultural shifts. The ability to iterate quickly and provide value to the business is not a technology problem but a people and process problem. Providing a path to improving and modernizing change management for your business and helping teams transition to a more value-driven and iterative mindset is one of the most critical things you can do as an organization to drive success with ELT – or any initiative.
Why We Build ELT Solutions with dbt and Fivetran
At phData we love the combination of Fivetran for extraction and loading (EL) and dbt for transformation (T). These are tried and tested tools that work well for small businesses as well as large enterprises.
Fivetran solves the very difficult integration problem via their 200+ pre-built connectors that connect to everything from web applications and APIs to on-premise application databases and ERP systems. If you find that your particular connector doesn’t exist, it is extremely easy to stand up a custom connector to ingest that data source.
Meanwhile, dbt is the standard for data transformation management on top of modern cloud data warehouses. It brings software development best practices like CI/CD, Testing, Automation, Idempotency, and many others.
All while having an extremely low barrier to entry, only requiring SQL to begin to get comfortable, and a Pythonic templating engine known as Jinja allows you to template and automate various models.
How Fivetran and dbt Work Together
So how do these two pieces of tech work together? They both provide different pieces of the ELT process. Fivetran will do all of the heavy lifting to get data of various volumes and velocities into a cloud data warehouse without a large engineering effort.
While dbt provides the mechanism that allows you to transform that data through various stages and standards. With each tool playing an important role within the modern data stack, they not only excel within their niches but they allow organizations of all sizes, from small start-ups all the way to large enterprises, to reduce their time to value.
As you can see, using these tools comes with a new paradigm shift we must embrace. However, in the process of embracing ELT, we gain many efficiencies and best practices that allow our organization to do more with less and reduce the fatigue that our engineers develop with their current pipelines.
If you find yourself intrigued but want more of a deep dive into why these tools work and less of a philosophical one. Keep your eyes peeled for part 2 of our blog, where we’ll dive into all of the features, including the things that make both Fivetran and dbt enterprise-grade!