Archway: Self-Service Data Engineering on Cloudera CDH

Data engineering in a production environment is complex. Engineers and data scientists need to be onboarded onto a platform where they can share data and resources; and the process is often longer and more difficult than many people initially realize.

It can be an adventure just getting the right approvals: Is the data allowed to live on this platform? Are there additional security considerations? Are there enough resources available on the platform for the application? And that’s only the beginning. The necessary resources need to be provisioned. The operations team needs to ensure that quotas and resource queues are configured correctly; that databases are created; that role-based access control is set up properly — and on and on.

Not only is this time-consuming; it’s risky. With such a complex, manual process, it’s easy to make a mistake along the way. Mistakes mean project slowdowns, misconfigurations, and extra costs, which can lead in turn to late starts, security vulnerabilities, or failure to get applications off the ground at all.

That’s where Archway comes in: it was built to automate these processes — with ease-of-use, accuracy, flexibility, and security in mind.

An easier way to get value from your data

Archway is an open source project, under the Apache License Version 2, that automates the creation, approval, and governance of user workspaces in a Cloudera CDH environment. It makes self-service data engineering a reality, with an intuitive user interface (UI) that empowers users to create workspaces and set up role-based access control in a matter of minutes — saving you time, reducing onboarding risks, and granting faster access to your data.

With Archway, users simply choose from a list of workspace types that fit their needs, entering a name, description, summary, and data compliance information. After reviewing the workspace request data, compliance and operations teams can approve the workspace with the click of a button. The workspace resources are then automatically created in the background, and access control is configured immediately.

Archway will automatically provision:

Hive databases
HDFS directories, including user home directories
Kafka topics
Yarn resource queues
Sentry roles with their associated Active Directory groups for each the above resources

Archway User Interface

Because every analytics or machine learning application is different, Archway provides templates for workspaces to fit almost any need, including:

User workspaces — Workspaces for single users, consisting of a database and an optional Kafka topic for their own exploration and play.
Simple workspaces — Workspaces intended for small applications or team collaboration.
Structured workspaces — Workspaces consisting of multiple databases for different stages of the application lifecycle. For example, different databases can be used for landing data, transforming it, and exposing data to end users and analysts.

It’s easy to grant additional users access to the workspace, by doing a quick user ID search and setting the desired access permissions. Removing user access is just as simple.

Likewise, members of data compliance or operations teams can easily view and manage workspaces in the cluster.

Governance, including important information about data security, is handled as part of the Archway workspace approval process. When creating a workspace, users indicate the type of data that will exist in that workspace. Important information, including whether a workspace will include PII, PHI, or PCI, is collected and stored alongside the workspace to be used in future decision making and auditing.

An Archway to self-service data engineering

Archway enables self-service data analysis and engineering by automating the creation and governance of workspaces in Cloudera CDH environments. It’s flexible and secure. It’s open source. And it helps ensure that processes like resource creation and access control never get in the way of a successful data project.

For more details, watch the demo, or check out the Github repo.

This blog post was written by Tony Foerster and Brock Noland.

Archway: Self-Service Data Engineering on Cloudera CDH

An easier way to get value from your data

An Archway to self-service data engineering

More to explore

From Spec to Pipeline: Inside phData Toolkit’s Agentic Automation

Ship Snowflake Cortex Agents Faster: A Skills‑First Workflow with Cortex Code + TruLens

phData Wins Snowflake’s 2026 AI Partner of the Year and Implementation Partner of the Year for the Seventh Consecutive Year

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Check out our latest insights

From Spec to Pipeline: Inside phData Toolkit’s Agentic Automation

Ship Snowflake Cortex Agents Faster: A Skills‑First Workflow with Cortex Code + TruLens

Other Technology Partners

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning