How to Setup PagerDuty Alerts From Snowflake Using External Functions (Part 2)

In our previous blog, we looked at how to use Snowflake external functions to send email notifications. In this blog, let’s look at how to raise an incident ticket in PagerDuty when a job fails in the Snowflake Data Cloud

In this article, we’ll use the Lambda setup, API configuration, and API integration object that was covered in the previous blog. Before going any further, you’ll want to make sure that they are all set up before continuing to the following steps. 

What is PagerDuty?

PagerDuty is a platform for incident management that serves as the central point for all time-sensitive and business-critical work across a given organization. It integrates with various tools like Jira, Service Desk, ServiceNow, AWS Cloud Watch, and many others. PagerDuty also has an exposed API that’s great for integrating with custom applications to manage incidents, account settings, etc. 

In this blog, we will use the PagerDuty API to raise an incident alert when there is a failure in Snowflake.

Step 1: Create PagerDuty Account

If your organization already uses PagerDuty, then continue to use the same account and proceed to step 2. If not, create an account on the PagerDuty website. Pagerduty provides a free developer account with limited support but still offers 650+ out-of-box integrations.

To start, set up your account using your work email. 

As part of this configuration, it will set up a default escalation policy, which we will use for this blog.

Step 2: Configure API Access Key

You will need an access key to access the services in PagerDuty through API. For setting up an access key, go to the Integrations menu and click on API Access Keys under Developer Tools. Click on Create New API Key button. Enter a name for the key and click on Create Key.

Write down the API key created as this will be the only time the key will be visible.

api-key

Step 3: Setup Service in PagerDuty

The next step will be to set up a service in PagerDuty. A service is required to create incidents/alerts. Click on the Services menu and then click on the Service Directory menu option. Click on the New Service button. This will open up a page as shown below.

Give a name for the service (SnowflakeService) and click on the Next button.

name

In the next page, assign the default escalation policy (which was created when the account was set up).

assign

In the third step ( Reduce Noise), leave the options to default and go to step 4 (Integrations). 

As you can see below, PagerDuty offers different integrations. For this example, click on the option create service without an integration. 

integrations

This creates a basic service in PagerDuty with no integrations. You will be presented with a page as shown below.

activity

Step 4: Prerequisites to Invoke API

PagerDuty offers different APIs to create incidents, alerts, escalation policies, etc. We will be using the CreateIncident API in our example. The sample payload is given below.

				
					{
   "incident": {
     "type": "incident",
     "title": "The server is on fire.",
     "service": {
       "id": "PWIXJZS",
       "type": "service_reference"
     },
     "urgency": "high",
     "incident_key": "baf7cf21b1da41b4b0221008339ff357",
     "body": {
       "type": "incident_body",
       "details": "A disk is getting full on this machine. Please look into the logs"
     },
     "escalation_policy": {
       "id": "PT20YPA",
       "type": "escalation_policy_reference"
     }
   }
}

				
			

In this payload, we need two critical pieces of information in relation to the PagerDuty account:

  1. Service ID: Each service created in PagerDuty will have an unique ID
  2. Escalation Policy ID: Each escalation policy will have an unique account
 

We will use the following curl commands to get the IDs.

Service ID:

				
					curl --request GET \
 --url https://api.pagerduty.com/services \
 --header 'Accept: application/vnd.pagerduty+json;version=2' \
 --header 'Authorization: Token token=<enter token from step 2>' \
 --header 'Content-Type: application/json'

				
			

This will give a response in the form of JSON. If your PagerDuty account has multiple services, then the response may list all services as a JSON array.

Write down the id ( PWWECJW in this case). We will need this in step 5.

				
					 {
           "id": "PWWECJW",
           "name": "SnowflakeService",
           "description": null,
           "created_at": "2022-10-11T20:08:23-04:00",
           "updated_at": "2022-10-11T20:08:23-04:00",
           "status": "active",
           "teams": [],
           "alert_creation": "create_alerts_and_incidents",

				
			

Escalation Policy ID:

				
					curl --request GET \
--url https://api.pagerduty.com/escalation_policies \
--header 'Accept: application/vnd.pagerduty+json;version=2' \
--header 'Authorization: Token token=<enter token from step 2>' \
--header 'Content-Type: application/json'

				
			

Similar to service id, this will give a JSON response. Take note of the id for the default escalation policy.

Step 5: Update AWS Lambda to Create Incident in PagerDuty

We will use the same Lambda code from the previous blog as a base and enhance it on top of it. We will create a template of the request body and use a simple Python code to replace the command to update incident details. We will need an unique id for each incident we create. For this purpose, we will generate an unique UUID and attach it in the incident body.

				
					{
   "incident": {
     "type": "incident",
     "title": "incident_title",
     "service": {
       "id": "PBQKQ1F",
       "type": "service_reference"
     },
     "urgency": "high",
     "incident_key": "uuid_value",
     "body": {
       "type": "incident_body",
       "details": "incident_details"
     },
     "escalation_policy": {
       "id": "P6P2F89",
       "type": "escalation_policy_reference"
     }
   }
 }

				
			

Listed below is the updated Lambda code. As you can see, we are using the template from above and replacing the content with the message received from the Snowflake external function. 

Be sure to update the token that’s highlighted in red in the code snippet below with a note – <enter token from step 2>. In this example, we have hardcoded the API access key, but in a production scenario, this can be retrieved from AWS Secret Manager or any other secrets management service.

				
					import json
import boto3
import uuid
import requests
 
def lambda_handler(event, context):
 
   # 200 is the HTTP status code for "ok".
   status_code = 200
 
   # The return value will contain an array of arrays (one inner array per input row).
   array_of_rows_to_return = [ ]
  
   url = "https://api.pagerduty.com/incidents"
 
   headers = {"Content-Type": "application/json; charset=utf-8" , "Authorization": "Token token=<enter token from step 2>", "Accept": "application/vnd.pagerduty+json;version=2", "From": "dganiger@phdata.io"}
  
   uuid_value=uuid.uuid4()
 
   data='''{
     "incident": {
       "type": "incident",
       "title": "incident_title",
       "service": {
         "id": "PBQKQ1F",
         "type": "service_reference"
       },
       "urgency": "high",
       "incident_key": "uuid_value",
       "body": {
         "type": "incident_body",
         "details": "incident_details"
       },
       "escalation_policy": {
         "id": "P6P2F89",
         "type": "escalation_policy_reference"
       }
     }
   }'''
  
  
 
   try:
       # From the input parameter named "event", get the body, which contains
       # the input rows.
       event_body = event["body"]
       print(event_body)
       #print(json.dumps(event_body))
      
       # Convert the input from a JSON string into a JSON object.
       payload = json.loads(event_body)
       # This is basically an array of arrays. The inner array contains the
       # row number, and a value for each parameter passed to the function.
  
       rows = payload["data"]
       print(rows)
      
       # For each input row in the JSON object...
       for row in rows:
           # Read the input row number (the output row number will be the same).
           row_number = row[0]
          
           #read the notification message subject
           status = row[1]
           subject = row[2]
           message = row[3]
           print(message)
          
           #Create SNS client and send notification
           client = boto3.client('sns')
           response = client.publish (
               #update this with your newly created SNS topic ARN
               TargetArn= "arn:aws:sns:us-east-1:858366249668:snowflake-alert-sns" ,
               Subject = status + " : " + subject ,
               Message = json.dumps({'default':message}),
               MessageStructure = 'json'
               )
          
           if ( status == 'FAILURE' ):
             data = data.replace("uuid_value", str(uuid_value)).replace("incident_title", subject).replace("incident_details", message)
             print(data)
             try:
               response = requests.post(url, headers=headers, json=json.loads(data))
             except Exception as err:
               print(err)
             print(response)
  
           #Update output value with message as below. This will serve as an acknowledgement that email has been sent.
           output_value = ["Email Notification sent"]
 
           # Put the returned row number and the returned value into an array.
           row_to_return = [row_number, output_value]
 
           # ... and add that array to the main array.
           array_of_rows_to_return.append(row_to_return)
          
       json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})
 
   except Exception as err:
       # 400 implies some type of error.
       status_code = 400
       # Tell caller what this function could not handle.
       json_compatible_string_to_return = event_body
       print(err)
 
   # Return the return value and HTTP status code.
   return {
       'statusCode': status_code,
       'body': json_compatible_string_to_return
   }

				
			

Step 6: Execute Snowflake External Function and Verify Incident in PagerDuty

Execute Snowflake external function using the command as shown below.

				
					select snowflake_api_function('FAILURE','Job Failed', 'Stored Proc LoadData failed. Check logs');
				
			

Once the query completes successfully, verify if the incident is created in PagerDuty. This will also send notification to the id configured in service and escalation policy.

The PagerDuty dashboard will show that the incident was created successfully and assigned to specific person.

incidents

Closing

This blog is a simple example of how to create an incident ticket using Lambda for a job failure in Snowflake. You could also integrate PagerDuty with various services like ServiceNow, AWS CloudWatch, and more for additional capabilities. 

Looking to extract more value from your Snowflake investment? Learn how the experts at phData can help by exploring our Snowflake consulting services.

Accelerate and automate your data projects with the phData Toolkit

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.