Terraforming Snowflake

Snowflake + Terraform

What is Snowflake?

Snowflake is a managed cloud data warehouse solution. It is similar to BigQuery or Redshift, but has some unique features like separation of compute and storage and strong support for semi-structured data (JSON, Parquet, Avro) that differentiate it.

What is Terraform?

Terraform is a tool from Hashicorp for managing infrastructure via code. You can use it to provision, update, or delete a wide range of resources like EC2 instances, Datadog monitors, OpsGenie schedules and more. This allows you to know the current state of your infrastructure at any given time and to control how it is updated. Terraform ships with many providers. Unfortunately, Snowflake is not one of them! However, it also supports third-party providers, which means users can write their own providers for almost any service. As an example (and shameless plug), check out this one I wrote for Segment’s API.

Why use Terraform with Snowflake?

As luck would have it, the Chan Zuckerberg Initiative has done the hard work for us and open-sourced a Snowflake Terraform Provider that we can use to manage many Snowflake objects like roles, grants, and users. Therefore, by using Terraform with Snowflake, we can always know the current state of our Snowflake environments. Like which schemas exists, the size of a warehouse, and maybe most importantly, which users have access to which Snowflake objects. With this setup we can avoid situations where someone accidentally manually grants a user the wrong role with too much access or spins up a 4X-Large Warehouse that never auto-suspends.

If this all sounds great, then let’s walk through an example using Terraform to setup roles, users, schemas, and grant access in Snowflake.

Setting up Terraform

Before we get started, you’ll need to install Terraform. I’d recommend following the official documentation for this. I’ll be using Terraform 0.12 in the examples below. I’d also suggest setting up a Remote State to make it easier for a team to manage your resources, but that’s not needed to get started. A Remote State writes your Terraform state to a remote data store like AWS S3 that all team members can access versus using local storage for the state.

After you have Terraform setup, you’ll need to download the Snowflake Terraform Provider from the latest releases and move it to the appropriate directory (usually ~/.terraform.d/plugins). Next follow the steps in the provider README for the authentication method of your choosing and create a file called main.tf that looks something like this:

provider "snowflake" {
  account = "your-snowflake-account"
  region  = "your-snowflake-region"
}

After that run terraform init and you should see the following meaning we’re ready to go:

Initializing the backend...

Initializing provider plugins...

Terraform has been successfully initialized! 

Terraforming Schemas

First, we’ll create two databases in Snowflake. Create a file called schemas.tf that looks like this:

locals {
  schemas = {
    "RAW" = {
      database = "DEV"
      comment = "contains raw data from our source systems"
    }
    "ANALYTICS" = {
      database = "DEV"
      comment = "contains tables and views accessible to analysts and reporting"
    }
  }
}

resource "snowflake_schema" "schema" {
  for_each = local.schemas
  name     = each.key
  database = each.value.database
  comment  = each.value.comment
}

Note, we’re taking advantage of the new loops in Terraform 0.12 to avoid copying and pasting the snowflake_role resource every time we add a new role. Then run terraform plan to see what resources will be created. This command should return saying 2 new schemas would be created in Snowflake. If things look good, run terraform apply to create them.

Terraforming Roles

Next, let’s create two Snowflake roles with Terraform. As an aside, dbt has a great blog post on how they structure roles for their clients that I recommend reading. The example here sort of copies this structure.

Create a file roles.tf that looks like this:

locals {
  roles = {
    "LOADER" = {
      comment = "Owns the tables in raw schema"
    }
    "TRANSFORMER" = {
      comment = "Has query permissions on tables in raw schema and owns tables in the analytics schema."
    }
  }
}

resource "snowflake_role" "role" {
  for_each = local.roles
  name     = each.key
  comment  = each.value.comment
}

As before, run terraform plan to see what resources will be created. This command should return saying 2 new roles would be created in Snowflake. If things look good, run terraform apply to create them.

Roles are great, but they’re even better if you can do something with them, so let’s change our schemas.tf file to look like this:

locals {
  schemas = {
    "RAW" = {
      database = "DEV"
      comment = "contains raw data from our source systems"
      usage_roles = ["TRANSFORMER"]
      all_roles = ["LOADER"]
    }
    "ANALYTICS" = {
      database = "DEV"
      comment = "contains tables and views accessible to analysts and reporting"
      usage_roles = []
      all_roles = ["TRANSFORMER"]
    }
  }
}

resource "snowflake_schema" "schema" {
  for_each = local.schemas
  name     = each.key
  database = each.value.database
  comment  = each.value.comment
}

resource "snowflake_schema_grant" "schema_grant_usage" {
  for_each      = local.schemas
  schema_name   = each.key
  database_name = each.value.database
  privilege     = "USAGE"
  roles         = each.value.usage_roles
  shares        = []
}

resource "snowflake_schema_grant" "schema_grant_all" {
  for_each      = local.schemas
  schema_name   = each.key
  database_name = each.value.database
  privilege     = "ALL"
  roles         = each.value.all_roles
  shares        = []
}

A quick terraform plan should show that we are ready to grant the appropriate access on each of our schemas to each of our roles. If things look good, run terraform apply to create the grants in Snowflake.

Terraforming Users

Now that we have roles, let’s make some users for those roles. Create a file called users.tf like this:

locals {
  users = {
    "MAC" = {
      login_name = "MAC_DATAENGINEER@MACANDCHEESE.COM"
      role       = "TRANSFORMER"
      namespace  = "DEV.PUBLIC"
      warehouse  = "TRANSFORMER_WH"
    }
    "CHEESE" = {
      login_name = "CHEESE_DATAENGINEER@MACANDCHEESE.COM"
      role       = "TRANSFORMER"
      namespace  = "DEV.PUBLIC"
      warehouse  = "TRANSFORMER_WH"
    }
    "STITCH" = {
      login_name = "STITCH@MACANDCHEESE.COM"
      role       = "LOADER"
      namespace  = "DEV.PUBLIC"
      warehouse  = "LOADER_WH"
    }
  }
}

resource "snowflake_user" "user" {
  for_each             = local.users
  name                 = each.key
  login_name           = each.value.login_name
  default_role         = each.value.role
  default_namespace    = each.value.namespace
  default_warehouse    = each.value.warehouse
  must_change_password = false
}

Again, run terraform plan to see what resources will be created. This command should return saying 3 new users would be created in Snowflake (note, we also are assuming that passwords do not need to be changed because we’ve setup SSO before we started this). If things look good, run terraform apply to create them.

But wait, we’ve given these users a default_role, but that role hasn’t been granted to them yet! A user that has not been granted their default role won’t be able to do anything in Snowflake. Let’s fix that by modifying the roles.tf file so that it looks like this:

locals {
  roles = {
    "LOADER" = {
      comment = "Owns the tables in raw schema"
      users = ["STITCH"]
    }
    "TRANSFORMER" = {
      comment = "Has query permissions on tables in raw schema and owns tables in the analytics schema."
      users = ["MAC", "CHEESE"]
    }
  }
}

resource "snowflake_role" "role" {
  for_each = local.roles
  name     = each.key
  comment  = each.value.comment
}

resource "snowflake_role_grants" "role_grant" {
  for_each  = local.roles
  role_name = each.key
  users     = each.value.users
  roles     = []
}

Yet again, we’ll run terraform plan to see the changes, which should be that we will now be granting the appropriate roles to our users. If that’s what the output says, run terraform apply to make it so in Snowflake.

And that’s it! We now have roles, schemas, and users created in Snowflake and managed by Terraform. Time to commit these files and put a PR into GitHub! As a next step, check out Atlantis to manage Terraform through Pull Requests or Terraform Cloud to make it easier for teams work with Terraform.