October 12, 2023

How Fivetran + dbt provides Enterprise Scale to ELT Pipelines

By Sam Hall

This blog was co-written by Sam Hall and Dakota Kelley

In our previous blog, we discussed some ways Fivetran and dbt solve ELT for enterprise data consumption and analytics. In this post, we want to build on that by diving deeper into one of the most impactful ways that these platforms positively impact a business – scale. 

As your data organization grows, the scalability of your data platform matters. Data consumers within your organization will demand more data and want it delivered in more places, more frequently. The ability to expand and grow your data mobility and transformation effortlessly is key to maintaining an edge in today’s competitive marketplace. 

We believe choosing foundational data platforms like Fivetran and dbt ensures that both startups and large enterprises can continue to accelerate their data usage without having to worry about when scalability limits will be hit. 

Throughout this blog, we’ll explore four compelling instances where Fivetran and dbt really shine at helping organizations scale.

Operational Simplicity at Scale

One of the most powerful aspects of solutions such as Fivetran and dbt Cloud lies in their role as managed services, abstracting a substantial portion of the infrastructure creation and maintenance that may have been previously required of your team. 

This enables your team’s capacity to be channeled into leveraging your data instead of using precious resources on building and sustaining that infrastructure alone.

Nothing is completely hands off, though. 

When the data or pipeline configuration needs to be changed, tools like Fivetran and dbt reduce the time required to make the change, and increase the confidence your team can have around the change. This is because they use a declarative methodology for building your data pipelines — you provide the configuration for your desired end state, and Fivetran/dbt’s automation handles the logistics of the actual creation/updating of the infrastructure and data. 

Fivetran is engineered to gracefully accommodate these changes alongside your organization’s evolving requirements and use cases. This means being able to start by replicating and centralizing data from your legacy technologies like DB2 and SQLServer, and then as your data practice evolves, being able to add in additional sources like Salesforce or ServiceNow data easily. 

Even farther along the maturative curve, having access to enable features like data encryption/decryption will allow your data organization to easily take advantage of leading-edge capabilities.

dbt on the other hand, leverages the elasticity of the database engine on which it is executed and is built to manage the sprawl of your organization’s business logic and lineage for its various data assets – allowing the scale of your data assets to only be confined by your team’s creativity.

Cost Efficiency at Scale

As your business grows and data becomes more critical and impactful for the decisions that it supports, it becomes more important to free up your teams to continue to build and innovate on top of what has already been created. 

What keeps them from doing this? Often, it is having to go back and address technical debt and bugs in previously implemented software.  

Fivetran and dbt alleviate this by abstracting away known challenges related to data movement and transformation, especially for large enterprises where edge cases abound. This strategic abstraction empowers your team to accelerate value creation while continuing to ensure reliability and trust. 

Additionally, these platforms automate much of the complexity involved in building new pipelines from the ground up, getting data to your team faster, and facilitating smoother change management — ultimately saving your organization significant time and effort.

Because of this, the dollars you spend on the automation and simplicity of Fivetran and dbt open up opportunities for additional value that would not be there if your teams were busy building the pipelines and maintaining infrastructure themselves.

In other words, by paying to automate the tasks that companies like Fivetran and dbt have already solved, the chances of greater ROI from dollars spent on your data teams improves significantly.

But Wait, Building Pipelines Myself is Still Cheaper, Right?

I’m going to answer that question with a classic…it depends. To elaborate, it depends on the goals of your team and broader organization. If you have a business that is not expecting to grow and change the way they use data significantly, it might be cheaper to build all of the infrastructure yourself in the long run. 

However, if your organization is growing and/or your team is trying to move to build value quickly, a SaaS solution could make a lot of sense, even if it means investing a little more money initially. 

In the long run, using your team’s time to innovate instead of constantly updating/migrating your homegrown solutions will almost certainly outweigh the cost incurred by the SaaS product.

Scaling with the Evolving Data Landscape

Within your enterprise, you will inevitably have data scattered throughout your applications and infrastructure, and centralizing all that data alongside new sources can also become increasingly complex as time goes on and requirements grow. 

Fivetran is here to simplify that, providing a single platform that can centralize your data in a performant and optimized manner at scale. 

One particular way Fivetran scales its replication so easily is the recent options for database technologies like DB2 or SQL Server using high volume database agents (or HVA). These minimize the resources needed to replicate large amounts of data from these technologies by reading the database logs instead of querying the database directly.  

A diagram that illustrates how Fivetran works.
Figure 1: Fivetran Integration Map

dbt similarly has an ever-growing library of packages that integrate with functionality from other platforms like Great Expectations, while also continuing to add new functionality that leans into the latest trends – regardless if that is data vault, data mesh, or something else. 

Additionally, dbt can expand upon the scalability of Fivetran. Incremental processing and data freshness scans become trivial and easy thanks to the metadata Fivetran brings into your cloud data warehouse. These allow you to scale your pipelines quickly.

Optimizing for Scale

So what does it look like to actually optimize your pipelines to scale your data pipelines? 

The Fivetran SaaS managed platform is built to automatically scale with your data volume and workload variety. Their 99.9% uptime guarantee and cloud-based architecture ensures they can handle data at cloud scale. 

The managed platform has both a simple and intuitive UI for getting started and a mature API for enterprise-scale pipeline management using software development best practices such as version control and CICD.  

One challenge with using a managed web-based platform with enterprise architecture is that many enterprise-level organizations may not want to send their data outside of their secure network boundary but still take advantage of the scalability and best practices that Fivetran provides. 

This is where Fivetran’s Local Data Processing (LDP) product comes into play. 

LDP is a software product you can install inside your data center that will act as a hub for all your data movement – replicating data to internal and/or external destinations to your network. Although this is a product that you install and manage yourself, it is built to scale on your infrastructure:

Figure 2: Hub Disk Sizing Guide

This software also includes many ways to optimize your replication. For instance, slicing, which is similar to indexing, allows you to define partitions in your source data that can be processed in parallel when your data replication volumes are especially large. 

Additionally, when sending data to a destination like the Snowflake Data Cloud, LDP can batch up the sent data to optimize Snowflake cost and performance. Finally, the compare feature will allow you to do periodic table (or even record) level checks to ensure the data in the source and target locations match.  

Meanwhile, within dbt, we can focus on taking advantage of the builtin materializations. With Fivetran, we get synced dates that we can make use of to help optimize and incrementally load your data. The community also creates many packages to help further optimize, including an option to incrementally load based on streams within Snowflake. 

Adding in the built-in SlimCi allows you to ensure that you not only transform your data efficiently but that you aren’t wasting compute by recalculating transformations you’ve already run. 

By leveraging these tools, you can optimize your transformation pipelines by anywhere from 5x to 10x faster — saving on cloud compute costs.

Governance

When talking about scaling, governance doesn’t often come up. But governance is an extremely important part of protecting your data system as it grows in complexity. 

How do you scale and continue to ensure your consumers trust the data being provided? How do you give consumers proper access as your data assets grow? And how do you provide transparency to things that might be going wrong in your pipelines or data?

Whether you are starting from scratch or managing a large data organization, here are some advantages of Fivetran and dbt for data governance: 

  • Fivetran and dbt can be integrated with your enterprise authentication solution via SSO, as well as implement access control via RBAC within the product. 

  • The combination of Fivetran and dbt allows you to provide transparency into the lineage of your data automatically.

  • Both Fivetran and dbt (Cloud) have implemented a platform for configuring alerts based on ingestion or transformation failures, integrating with common tools like Slack, email, and webhooks.

  • Policies become an integral part of your ingestion and transformation pipeline, making enforcing proper data privacy policies extremely easy.

  • dbt has created model governance constructs that make it easy to contract out data and its structure in a way that is both easy to control and easy to share, embracing data mesh principles.

As you can see, the combination of these platforms allow you to not only scale how much data you process, but the access, security, and quality of the data. Governance doesn’t have to be scary or preventative to your cloud data warehouse. Instead, they should be embraced in a scalable manner like both Fivetran and dbt provide.

Closing

Fivetran and dbt provides everything needed to scale an organization of any size. We’ve covered many of the features that these platforms bring to help the cloud data warehouse scale while maintaining proper governance and controls. 

However, we imagine you might be wondering if Fivetran and dbt can also provide a faster time to value. Watch out for Part 4 of this series, where we will cover how these platforms not only scale well, but provide teams of all sizes the ability to drive faster business value.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit