This post was co-written by Sam Hall and Dakota Kelley
In our previous blog, we talked about some of the challenges that larger enterprise data teams may have around modernizing their data architecture to implement best practices and free up time and resources to build more value using their data.
In this blog, we will talk about some of the ways that a data stack, including Fivetran and dbt, will solve those kinds of challenges, as well as the opportunities that these tools bring with them. No tool(s) is perfect, but as these products modernize, they solve many of the challenges that their predecessors dealt with, as well as opening up new ways to provide value.
Fivetran + dbt: Solving Enterprise ELT
Simplified Source Integration
One of the challenges for an enterprise is the sheer volume and spread of data technologies that may be available as the business has shifted and evolved over the years. There could be multiple ways data for the same use case could be getting created, and the interfaces by which to consume that data will be diverse and many.
Fivetran is built to natively move data from your enterprise technologies into a more modernized, efficient analytics platform. Fivetran’s acquisition of the company HVR in 2022 established it as a leader in database integration that now included agent-based connections to popular legacy systems such as IBM’s DB2, SAP, and Oracle.
This acquisition (which has now been branded as Local Data Processing and HVA), combined with Fivetran’s 200+ other pre-built connectors to on-prem, cloud, and application data sources, ensures that your organization will have one tool to ingest the data it needs wherever it lives.
Additionally, Fivetran focuses on simplicity and automation when it comes to pipeline creation and maintenance, no matter what the source is – and so you can be sure that it will be a reliable and long-term solution for your team. Their 99.9% uptime means that Fivetran can grow alongside your platform and data with minimal effort on your part.
Today’s data landscape is changing and evolving quickly and has been for the last 5-10 years. As enterprises look for ways to adopt a more modern data stack, that will generally require adopting a flexible data stack – one of the hallmarks of the so-called “modern data stack”.
Being able to integrate with both existing and future data tooling, as well as adhering to open-source standards, should be a key consideration when choosing your data stack.
Fivetran and dbt both provide large enterprise organizations the capability to avoid lock-in to any particular data destination or cloud. They provide loose coupling between the business logic that processes your data and the platform and data that it is executed upon.
With Fivetran, you can quickly and easily switch between different data warehouse technologies in which to land your data, as well as popular open-source lake formats such as Apache Iceberg.
dbt is the same way, where your business models are defined via SQL within the dbt system (which is open source itself) and are executed on whichever execution engine to which you choose to connect it. This means your business can use the best tool for your workload, which will evolve with time – and not worry about deep lock-in to a particular technology or platform.
The number one thing likely on the mind of every large enterprise is how they can save money and maximize their investments. As organizational consumption of data grows, it becomes increasingly more expensive to maintain legacy tools and processes – which also takes away from the resources your team has to investigate new methods to bring value to your business.
Both Fivetran and dbt have solutions that can help you optimize the engineering costs of the modern data stack, requiring less ongoing maintenance as your data consumption grows.
First and foremost, both dbt and Fivetran provide robust methods for incrementally loading and processing your most sensitive and critical data workloads. This means only processing the data that changed since the last load, preventing any unnecessary reprocessing and wasted computing. This often allows you to use smaller compute engines for less time during each execution.
Fivetran also has the concept of pricing per MAR (monthly active row). This means that you will only be charged for the number of unique primary keys that are changed per month – no matter how many times those records are changed. Also, as your consumption increases, your cost per row decreases. This is a game changer when it comes to cost, especially for transactional data, where the same records change often.
On top of this, dbt provides the ability to monitor the freshness of your source data, which Fivetran conveniently informs you of the load/processing time of. This allows you to monitor for new data by looking for the _fivetran_synced column, then using a simple selector, you can run only models that have fresh data. Again, reducing what models are run and when.
Finally, dbt provides state selection criteria that allow you to optimize your CI/CD process by only running models/code that have changed and their downstream dependencies. Allowing you to avoid recomputing data that hasn’t had a change. All of these features are built into dbt and are extremely easy to implement and make use of within dbt Cloud.
Security and Compliance
The security and compliance of your data should always be first and foremost. The last thing you want to do is find a tool that checks a bunch of boxes but then doesn’t meet the bare minimum of your security requirements. Thankfully, Fivetran and dbt have built their tools with a security-centric approach.
You can rely on both of these tools – which have made it a priority to obtain key security and compliance certifications like CCPA, GDPR, HIPAA, ISO, PCI, and SOC2. Each cloud-based product also contains enterprise-ready features such as SSO (Single-Sign-On), RBAC (Role Based Access Control), API keys and rotation policies, private cloud connectivity, and audit logging.
dbt also does this while embracing OWASP standards around its development process. It also has the option of having a single-tenant environment setup that is completely independent of any other customer environment. All of this is to help protect your data development life cycle since dbt handles all code execution while pushing down execution onto the cloud data warehouse.
Fivetran additionally has recently rolled out Local Data Processing, which expands the deployment options for enterprises looking for a secure platform for replicating all their data. Local Data Processing enables self-hosted deployments of Fivetran in an organization’s on-premise network or VPC, keeping your most sensitive data within the walls of your organization’s network.
Streamlined and Simplified Transformation Process
With dbt, you can streamline and simplify the transformation portion of your pipeline. Thanks to SQL-centric transformations, you no longer need to have deep experience in a particular tool or programming language.
However, you still maintain software development best practices like version control, CI/CD, automation, testing, DRY (don’t repeat yourself) principles, linting/parsing, and idempotency. Within the transformation tool, it provides a self-documenting structure that makes it extremely easy to provide transparency and data democratization thanks to the data lineage. Additionally, Fivetran provides many different dbt Packages to help support modeling for common data sources.
Thanks to the reference function, you make it extremely easy for dbt to also know how to orchestrate your transformations. This provides not only the lineage and the directed acyclic graph but also makes the orchestration of your transformations extremely simple.
As you can see, both Fivetran and dbt bring all of the features necessary to create an enterprise solution using the ELT paradigm. We’ve covered many of the features that these tools bring to the table, as well as the ways they can help you save money and maintain good security practices around your data platform.
However, I imagine you might be wondering how this all scales and if it can really help you reduce your time to value. Watch out for Part 3 of this series, where we will cover how these tools scale and operate at large-scale enterprise levels.