October 10, 2023

dbt Cloud API 101 – Trigger and Polling Jobs

By Arnab Mondal

dbt Cloud is the Cloud Solution to deploy dbt with a web-based UI. Anything that you can do with dbt Core, you will be able to do here, along with other extra functionalities like creating and scheduling jobs, managing environments, and more. 

A major part of dbt Cloud is its API, through which you can manage and orchestrate the dbt Cloud jobs. You could also git version control dbt Cloud jobs via the API.

In this blog, we will look into how we can utilize the API efficiently to trigger dbt Cloud jobs.

Why Should you use the API?

The dbt Cloud API can not only create, run, and retrieve artifacts from a job, but you can perform CRUD operations on multiple endpoints for Connections, Environments, Jobs, Licenses, Users, Notifications, Permissions, Projects, Repositories, and more. 

In short, anything that can be done from the UI, you are also able to do the same with the API. You can find the full and comprehensive list of all endpoints and what you can do on their official documentation page.

Notes: To implement the following endpoints, you can use any language of your choice. For example, we would like to use Python to access the REST API.

				
					import requests
import json

#You need to fetch your account API key from your profile page
headersAuth = {
    'Authorization': 'Bearer '+ str(‘<your_api_key>’),
}

response = requests.get( <ENTER YOUR URL>, headers=headersAuth)
# Use requests.post() function to interact with POST Endpoints
res=response.json()
print(json.dumps(res,indent=4))
				
			

How to Trigger a Job

You can use the following endpoint to trigger a job and kick off a run. When the endpoint gives a 200 status or would be created successfully, a run has been triggered for this job. If the job is already running, then this run will be enqueued and will execute after that job has finished. The URL to hit to trigger the job is:

Endpoint:
				
					POST request at
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/jobs/{jobId}/run/'
				
			
The required parameters of the URL are accountId and jobId which you want to trigger. The body of the request needs one required parameter, which is cause and the value of it will be a string that would provide a text description of why the job was triggered. The other parameters are not required but optional, which are as follows:
  • git_sha: The git sha that we need to check out before running this job.

  • git_branch: The git branch to check out before running the job.

  • schema_override: You can override the destination schema in the configured target for this specific job run

  • dbt_version_override: You can override the version of dbt which is to be used for this particular run

  • threads_ovverride: You can override the number of threads that are required to run the job

  • target_name_override: You can override the target.name context variable when the job is executed.

  • generate_docs_override: You can override whether or not this job will generate documentation after execution.

  • timeout_seconds_override: You can override the timeout for a specific run. The value will be a number that will represent the number of seconds to set the new timeout.

  • steps_override: You can override the list of steps for the job run.

How to List All Runs

This endpoint is used to get a list of all the jobs running for a particular account. The request looks as follows:

Endpoint:
				
					GET request at
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/'
				
			
The required parameter of this URL is accountId which is a number belonging to the account. The other optional parameters are as follows:
  • Include_related: You can use this value to specify the related fields to pull with the run. The possible valid values are job, trigger, and debug_logs. if you do not explicitly define the debug logs, they will be truncated to the last  1000 lines of the debug log output file

  • job_definition_id: You can apply a filter so that the returned results will be from the job ID given.

  • project_id: You can apply this filter only to get results from particular projects.

  • status: You can apply the status filter only to get the list of jobs that match a particular job status code. The list of acceptable values is as follows:

    • 1: The runs that are in the queue but haven’t been scheduled yet

    • 2: The runs that are being removed from the queue and are being actively scheduled

    • 3: The runs that are currently executing at the moment

    • 10: The runs which have been completed successfully

    • 20: The runs which failed to complete successfully

    • 30: The runs were canceled by the API or by a user.

  • order_by: You can use this field to order the results by a particular field. The supported values are:

    • id

    • created_at

    • finished_at

You can put a - (negative sign) at the beginning of the parameter to reverse the order of results returned. E.g., order_by=-finished_at
  • offset: You can specify the offset the API needs to apply while listing the various runs. You should use this with limit so that the results can be paginated.

  • limit: You can specify the limit to apply while listing the runs. This is often used with offset to paginate results.

How to Get a Specific Run Detail

This endpoint is used to get the details of a specific run of a job. The requested endpoint looks as follows:

Endpoint:
				
					GET request at 
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/{runId}/'
				
			

The required parameter of this URL is accountId which is a number belonging to the account. The other required parameter is runId.

  • include_related: You can use this value to specify the related fields to pull with the run. The possible valid values are job, trigger, and debug_logs. if you do not explicitly define the debug logs, they will be truncated to the last  1000 lines of the debug log output file.

Example Screenshot:

You can fetch all run details via this API. It might be successful or not, but you will be able to fetch details from this endpoint necessary for debugging.

How to List All Run Artifacts

This endpoint is used to fetch the full list of artifact files that are generated after the completion of a run.

Endpoint:
				
					GET request at 
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/{runId}/artifacts/'
				
			

The required parameter of this URL is accountId which is a number belonging to the account. The other required parameter is runId.

Example Screenshot:

You can use this endpoint to fetch all the run artifacts to understand the code executed and then analyze for bugs and errors as needed.

How to Get a Specific Run Artifact

This endpoint is used to fetch a specific Artifact from a completed run. You can retrieve data like timing information, information about dbt models, and status messages from the dbt model build. The endpoint looks like this, as shown below:

Endpoint:
				
					GET request at
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/{runId}/artifacts/{path}'
				
			

The required parameter of this URL is accountId which is a number belonging to the account. The other required parameter is runId. The final required parameter is path.The possible paths are run_results.json, catalog.json, manifest.json files. There is one optional parameter, which is:

  • step: You can use the index of the step in the run to query for the artifacts of a specific job run. The first step in a run has an index of 1, and its default value is the artifacts from the latest step in the run.

How to Get Information About a Specific Step

This endpoint allows you to fetch information about a given step. The endpoint is used in the following manner:

Endpoint:
				
					GET request at
'https://cloud.getdbt.com/api/v2/acounts/{accountId}/steps/{stepId}/'

				
			

The required parameter of this URL is accountId which is a number belonging to the account. The other required parameter is stepId, which is the ID of the step we want data about. There is also an optional parameter as the following one: 

  • include_related: You can use this value to specify the related fields to pull with the run. The possible valid values are job, trigger, and debug_logs. if you do not explicitly define the debug logs, they will be truncated to the last  1000 lines of the debug log output file.

How to Cancel a Run

This endpoint is used to cancel a job run that is in progress. The endpoint is used in the following manner:

Endpoint:
				
					POST request at 
'https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/{runId}/cancel/'

				
			

The required parameter of this URL is accountId which is a number belonging to the account. The other required parameter is runId.

Conclusion

Now you know everything about triggering a dbt Cloud job via the API and all the different ways to poll job details. 

Check out phData’s other blogs and contact our team of dbt experts to resolve your dbt issues or help with your dbt Core to dbt Cloud migration.

FAQs

You can pass the schedule in the body of the JSON. You can also pass a cron schedule for your job.

A dbt job is the execution of one or multiple dbt commands from the dbt cloud. You can also choose the environment variables, target schema, schedule, and number of threads.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit