In our ever-growing remote society, Slack has become a centerpiece of conversation and interaction with each other, our organizations, and even our applications. Thus, Slack becomes, by default, a collector of data about people and their various interactions.
That data may include conversations, user profiles (including anything from a user’s online status to their email/phone number), and even more subtly, attributes such as a user’s reaction to various messages.
Need to understand more about people? Some of those answers might lie in Slack.
It may not come as a surprise that creating a native Fivetran connector for Slack is the subject of a popular request among the Fivetran community. During the time of this writing, Fivetran did release a Slack connector in private preview, but an alternate way of ingesting Slack data via Fivetran could still be applicable for ingesting data that the connector does not support.
In this blog, we’ll cover step-by-step how to leverage Fivetran to integrate Slack data into the Snowflake Data Cloud.
Why Use Fivetran and Snowflake To Transfer and Store Data?
Fivetran is a world-class enterprise application that makes it extremely simple for teams to move data quickly and securely from one location to another. When paired with Snowflake (the premier enterprise data cloud), it creates a low-maintenance, high-performing data pipeline, and storage solution.
In Snowflake, the data can be organized, enriched, and securely accessed by the various consumers that need it. Moving Slack (and other) data using this modern data stack ensures that your organization will be enabled to take full advantage of its data without the maintenance and management required of many other systems.
Process Overview
This blog post builds on the capabilities introduced in my previous post for ingesting custom data sources. It would be beneficial to go back and read that post before attempting to work through this process, but the basic idea of what will be done is demonstrated in the following diagram:
As you can see above, Fivetran will be used to orchestrate an AWS Lambda function call which will request and return data from Slack. Fivetran will then send that returned data to Snowflake.
In this post, we will focus on the steps necessary to create a script that can be run in an AWS Lambda function, which will be able to access the Slack API, as well as return data that Fivetran can use to send off to Snowflake.
You can then follow the steps in my previous post to set up the entire pipeline and plug the script into the Lambda component.
In the end, we will need to:
- Create a slack bot token.
- Create a script that will use the above token to retrieve Slack data and return it in a format that Fivetran expects.
- Put it all together – complete the steps from my previous post, integrating the above script into the Lambda function (step 5), and the token into the Lambda connector setup (step 6).
Step 1: Create a Slack Bot Token
- Navigate to https://api.slack.com/apps while logged into your Slack workspace.
- Create a new app.
- Chose to create an app “From scratch”.
- Name the app, select a workspace that the app should be associated with, and select the “Create App” button.
- The app will be created and you will be redirected to the app configuration page. Select “OAuth and Permissions” in the left side menu.
- Add the appropriate bot scopes to your application. It is recommended to use the principle of least privilege here (only grant the scopes that you actually plan to use). In our example, we will just be querying information about Slack users, so all we should need is the users:read scope. More info on all the slack API scopes here.
- Scroll to the top of the OAuth and Permissions page and select the “Install to Workspace” button.
- Follow the prompts to allow the application access to your Slack workspace, and then you will be redirected back to the OAuth and Permissions page, where there will now be a bot token to copy. Save this token for later.
Step 2: Create a Script For Querying The Slack Web API
We will then need to create the script that will be plugged into a Lambda function in order for Fivetran to call it and retrieve Slack data.
You can use any HTTP client to call the various web APIs that slack provides, but for this guide, we will be using the official Slack Python SDK (There is also a supported Node and Java SDK). This SDK makes constructing the Slack API calls a little easier.
The python SDK is easy to install (`pip install slack_sdk`) and has comprehensive documentation on how to use it. Below I will be writing a script to collect a few attributes of a Slack user to send back to Fivetran and store in Snowflake. I’lll be using the `users_list` SDK method to pull back info on users from my Slack workspace.
This script uses an external pypi library (slack_sdk), which requires bundling the library’s binaries in the package that is deployed to the Lambda function. AWS has good documentation on how to do this here.
from slack_sdk import WebClient
from slack_sdk.http_retry import RetryHandler
def get_client(token):
retry_handler = RetryHandler(max_retry_count=5)
return WebClient(token=token, retry_handlers=[retry_handler])
def lambda_handler(request, context):
# Extract token from the Fivetran request object
token = request['secrets']['apiKey']
# Get user objects from Slack
users = get_users(token=token)
# Construct and send appropriate JSON response back to Fivetran
return {
"state": {},
"schema" : {
"users" : {
"primary_key" : ["id"]
}
},
"insert": {
"users": users
},
"hasMore" : False
}
def get_users(token):
client = get_client(token=token)
response = client.users_list()
members = response.get('members')
users = []
# Extracting only certain user attributes
for member in members:
users.append({
"id": member["id"],
"team_id": member.get("team_id", ""),
"name": member.get("name", ""),
"real_name": member.get("real_name", ""),
"is_bot": member.get("is_bot", "")
})
return users
Below are a few important things to call out about this script.
lambda_handler method
This is meant to be the entry point method of the Lambda function. This is similar to the sample Python function that Fivetran provides. The attributes of the `request` parameter are documented more here.
How is the Slack Token Passed?
One of those `request` parameter attributes is a key named `secrets`. This is populated with a JSON object configured when setting up the Fivetran connector. For this example, I added the JSON expression `{“apiKey”: “<slack_token>”}`, replacing <slack_token> with the Slack token generated earlier.
This token can then be retrieved from the `request` parameter using the expression `request[“secrets”][“apiKey”]` as shown in the script.
Response JSON Structure
The response schema that Fivetran expects from a Lambda function is described in more detail here. The response structure that was created in the script above implements that schema, essentially communicating to Fivetran to insert any user records included in the response to Snowflake and to deduplicate user records using the “id” attribute.
Step 3: Putting It All Together
Now you have the Slack token (embedded in the JSON expression as described above), which will be used in the connector setup (secrets field), and the script necessary to return Slack data to Fivetran.
The final step is to follow the steps in my previous post, substituting the script into the Lambda function (step 5) and the token JSON expression into the secrets field of the Lambda connector setup (step 6).
*Sensitive Data Disclaimer
As with any other data that could contain personally identifiable and private information, your organization should be careful and purposeful when querying, moving, and storing data from Slack.
Data containing personally identifiable information (PII) could easily fall under regulations such as CCPA and GDPR, which can expose an organization to a certain level of risk. phData has extensive experience in handling sensitive data transit and storage and would be happy to assist your organization with your own design and implementation.
Wrap Up
With the above setup, Slack data can be easily ingested into Snowflake using Fivetran. The data ingested will depend on the script that is written for the Lambda function, but the flexibility exists there to query and manipulate any data from the Slack API.
The goal here is to ultimately be maintaining useful Slack data in Snowflake, where it can be leveraged to enrich other business applications or datasets. Exploring those business use cases will be the job of another post for another time.
If this sounds interesting, and you would like to learn more, please reach out to our team at phData! We would be happy to provide expertise on setting up ingestion pipelines like this, as well as exploring how it can add value to your business.