
Most data platform infrastructure is managed by clicking through web consoles. A warehouse gets created in Snowflake's UI. An S3 bucket gets configured in the AWS console. IAM roles get hand-crafted by whoever has admin access that day. This works until it does not — someone deletes a role by accident, a staging environment drifts from production, or an auditor asks who changed the warehouse size last Tuesday and nobody can answer.
Infrastructure as Code solves all three problems. Every resource is defined in version-controlled configuration files. Changes go through pull requests with peer review. The full history of every modification is in git. Disaster recovery becomes terraform apply instead of a two-day scramble. For data platforms specifically, Terraform is the right tool because it has mature providers for both cloud infrastructure (AWS, GCP, Azure) and data platform services (Snowflake, Databricks, Confluent).
This guide covers Snowflake provider setup with key-pair authentication, managing warehouses, databases, schemas, and roles, S3 bucket configuration for data lake storage, remote state management, CI/CD with GitHub Actions, and environment separation.
Snowflake Provider Setup
The Snowflake Terraform provider authenticates using key-pair authentication rather than passwords. This is more secure and avoids embedding credentials in your Terraform configuration. Start by creating a service user in Snowflake and assigning it an RSA key pair.
1-- Run this in Snowflake to create the Terraform service user2USE ROLE ACCOUNTADMIN;3 4CREATE USER IF NOT EXISTS TERRAFORM_SVC5 DEFAULT_ROLE = SYSADMIN6 DEFAULT_WAREHOUSE = WH_TERRAFORM7 MUST_CHANGE_PASSWORD = FALSE8 TYPE = SERVICE;9 10CREATE ROLE IF NOT EXISTS TERRAFORM_ROLE;11GRANT ROLE SYSADMIN TO ROLE TERRAFORM_ROLE;12GRANT ROLE SECURITYADMIN TO ROLE TERRAFORM_ROLE;13GRANT ROLE TERRAFORM_ROLE TO USER TERRAFORM_SVC;14 15ALTER USER TERRAFORM_SVC SET RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...';With the service user created, configure the Terraform provider to authenticate with the private key.
1# providers.tf2terraform {3 required_version = ">= 1.5"4 5 required_providers {6 snowflake = {7 source = "Snowflake-Labs/snowflake"8 version = "~> 1.0"9 }10 aws = {11 source = "hashicorp/aws"12 version = "~> 5.0"13 }14 }15}16 17provider "snowflake" {18 organization_name = var.snowflake_org19 account_name = var.snowflake_account20 user = "TERRAFORM_SVC"21 authenticator = "JWT"22 private_key = var.snowflake_private_key23 role = "TERRAFORM_ROLE"24}25 26provider "aws" {27 region = var.aws_region28 29 default_tags {30 tags = {31 ManagedBy = "terraform"32 Environment = var.environment33 Team = "data-platform"34 }35 }36}The private key is passed as a variable, never hardcoded. In CI/CD, it comes from a GitHub secret. Locally, it comes from an environment variable or a .tfvars file that is gitignored.
Managing Snowflake Resources
With the provider configured, define your Snowflake warehouses, databases, schemas, and roles. Start with warehouses. Each workload class gets its own warehouse with appropriate sizing and auto-suspend settings.
1# snowflake_warehouses.tf2resource "snowflake_warehouse" "etl" {3 name = "WH_ETL_${upper(var.environment)}"4 warehouse_size = var.environment == "prod" ? "MEDIUM" : "XSMALL"5 auto_suspend = 1206 auto_resume = true7 min_cluster_count = 18 max_cluster_count = var.environment == "prod" ? 3 : 19 scaling_policy = "ECONOMY"10 comment = "ETL batch processing — managed by Terraform"11}12 13resource "snowflake_warehouse" "bi" {14 name = "WH_BI_${upper(var.environment)}"15 warehouse_size = var.environment == "prod" ? "SMALL" : "XSMALL"16 auto_suspend = 6017 auto_resume = true18 min_cluster_count = 119 max_cluster_count = var.environment == "prod" ? 4 : 120 scaling_policy = "STANDARD"21 comment = "BI dashboard queries — managed by Terraform"22}23 24resource "snowflake_warehouse" "dev" {25 count = var.environment == "prod" ? 0 : 126 name = "WH_DEV_${upper(var.environment)}"27 warehouse_size = "XSMALL"28 auto_suspend = 6029 auto_resume = true30 max_cluster_count = 131 comment = "Developer sandbox — managed by Terraform"32}The environment variable controls sizing. Production gets larger warehouses with multi-cluster scaling. Staging and dev get the minimum. The dev warehouse is only created in non-production environments, saving costs.
Next, define databases, schemas, and the role hierarchy. Snowflake's role-based access control is powerful but becomes impossible to manage without code. Terraform makes the permission graph explicit and auditable.
1# snowflake_databases.tf2resource "snowflake_database" "analytics" {3 name = "ANALYTICS_${upper(var.environment)}"4 comment = "Primary analytics database — managed by Terraform"5}6 7resource "snowflake_schema" "bronze" {8 database = snowflake_database.analytics.name9 name = "BRONZE"10 comment = "Raw ingested data — immutable audit trail"11}12 13resource "snowflake_schema" "silver" {14 database = snowflake_database.analytics.name15 name = "SILVER"16 comment = "Cleaned and deduplicated data"17}18 19resource "snowflake_schema" "gold" {20 database = snowflake_database.analytics.name21 name = "GOLD"22 comment = "Business-ready metrics and dimensions"23}24 25# Role hierarchy26resource "snowflake_account_role" "analyst" {27 name = "ANALYST_${upper(var.environment)}"28 comment = "Read access to gold schema"29}30 31resource "snowflake_account_role" "engineer" {32 name = "ENGINEER_${upper(var.environment)}"33 comment = "Read/write access to silver and gold schemas"34}35 36resource "snowflake_grant_privileges_to_account_role" "analyst_gold_read" {37 account_role_name = snowflake_account_role.analyst.name38 privileges = ["SELECT"]39 40 on_schema_object {41 future {42 object_type_plural = "TABLES"43 in_schema = "${snowflake_database.analytics.name}.${snowflake_schema.gold.name}"44 }45 }46}47 48resource "snowflake_grant_privileges_to_account_role" "engineer_silver_write" {49 account_role_name = snowflake_account_role.engineer.name50 privileges = ["SELECT", "INSERT", "UPDATE", "DELETE"]51 52 on_schema_object {53 future {54 object_type_plural = "TABLES"55 in_schema = "${snowflake_database.analytics.name}.${snowflake_schema.silver.name}"56 }57 }58}Every role, grant, and schema is defined in code. When a new team member needs access, an engineer adds them to the appropriate role in Terraform and opens a PR. The change is reviewed, approved, and applied through CI. No ad-hoc GRANT statements in a Snowflake worksheet that nobody can trace.
S3 Bucket for Data Lake Storage
The data lake needs an S3 bucket with appropriate lifecycle policies, encryption, and access controls. Raw data lands here before being ingested into Snowflake's bronze layer. Iceberg tables may also store their data and metadata files here.
1# s3.tf2resource "aws_s3_bucket" "data_lake" {3 bucket = "datalake-${var.environment}-${var.aws_account_id}"4 5 tags = {6 Purpose = "Data lake storage for bronze/raw data"7 }8}9 10resource "aws_s3_bucket_versioning" "data_lake" {11 bucket = aws_s3_bucket.data_lake.id12 versioning_configuration {13 status = "Enabled"14 }15}16 17resource "aws_s3_bucket_server_side_encryption_configuration" "data_lake" {18 bucket = aws_s3_bucket.data_lake.id19 rule {20 apply_server_side_encryption_by_default {21 sse_algorithm = "aws:kms"22 }23 bucket_key_enabled = true24 }25}26 27resource "aws_s3_bucket_lifecycle_configuration" "data_lake" {28 bucket = aws_s3_bucket.data_lake.id29 30 rule {31 id = "archive-old-raw-data"32 status = "Enabled"33 34 filter {35 prefix = "raw/"36 }37 38 transition {39 days = 9040 storage_class = "STANDARD_IA"41 }42 43 transition {44 days = 36545 storage_class = "GLACIER"46 }47 }48 49 rule {50 id = "expire-tmp-files"51 status = "Enabled"52 53 filter {54 prefix = "tmp/"55 }56 57 expiration {58 days = 759 }60 }61}62 63resource "aws_s3_bucket_public_access_block" "data_lake" {64 bucket = aws_s3_bucket.data_lake.id65 66 block_public_acls = true67 block_public_policy = true68 ignore_public_acls = true69 restrict_public_buckets = true70}The lifecycle policy moves raw data to cheaper storage tiers after 90 days and archives to Glacier after a year. Temporary files are automatically cleaned up after seven days. Versioning is enabled so accidental deletions can be recovered. Public access is blocked entirely.
IAM Role for Cross-Account Access
Snowflake accesses your S3 bucket through a storage integration that assumes an IAM role. This role needs read/write access to the bucket and a trust policy that allows Snowflake's AWS account to assume it.
1# iam.tf2data "aws_iam_policy_document" "snowflake_assume_role" {3 statement {4 actions = ["sts:AssumeRole"]5 6 principals {7 type = "AWS"8 identifiers = [var.snowflake_aws_iam_user_arn]9 }10 11 condition {12 test = "StringEquals"13 variable = "sts:ExternalId"14 values = [var.snowflake_storage_integration_external_id]15 }16 }17}18 19resource "aws_iam_role" "snowflake_access" {20 name = "snowflake-data-lake-${var.environment}"21 assume_role_policy = data.aws_iam_policy_document.snowflake_assume_role.json22 23 tags = {24 Purpose = "Snowflake storage integration access"25 }26}27 28data "aws_iam_policy_document" "data_lake_access" {29 statement {30 sid = "AllowListBucket"31 effect = "Allow"32 actions = [33 "s3:ListBucket",34 "s3:GetBucketLocation",35 ]36 resources = [aws_s3_bucket.data_lake.arn]37 }38 39 statement {40 sid = "AllowObjectAccess"41 effect = "Allow"42 actions = [43 "s3:GetObject",44 "s3:GetObjectVersion",45 "s3:PutObject",46 "s3:DeleteObject",47 ]48 resources = ["${aws_s3_bucket.data_lake.arn}/*"]49 }50}51 52resource "aws_iam_role_policy" "snowflake_data_lake" {53 name = "snowflake-data-lake-access"54 role = aws_iam_role.snowflake_access.id55 policy = data.aws_iam_policy_document.data_lake_access.json56}The trust policy uses an external ID condition, which prevents the confused deputy problem. Snowflake provides both the IAM user ARN and the external ID when you create a storage integration. These values are passed as Terraform variables.
Remote State with S3 Backend
Terraform state must be stored remotely so your team can collaborate and your CI/CD pipeline can access it. S3 with DynamoDB locking is the standard setup for AWS-based platforms.
1# backend.tf2terraform {3 backend "s3" {4 bucket = "terraform-state-data-platform"5 key = "data-platform/terraform.tfstate"6 region = "us-east-1"7 encrypt = true8 dynamodb_table = "terraform-state-lock"9 }10}The DynamoDB table provides locking so two engineers cannot run terraform apply simultaneously and corrupt the state. The state file is encrypted at rest in S3. Create the state bucket and DynamoDB table manually before running terraform init — this is the one piece of infrastructure you bootstrap outside of Terraform.
For environment separation, use either Terraform workspaces or a directory structure. Workspaces are simpler: terraform workspace select prod switches context. A directory structure (environments/prod/, environments/staging/) gives you more isolation but requires duplicating the backend configuration. For most data platform teams, workspaces are sufficient.
CI/CD with GitHub Actions
The final piece is a CI/CD pipeline that runs terraform plan on pull requests and terraform apply on merge to main. This ensures every infrastructure change is reviewed before it is applied and creates an audit trail in your git history.
1# .github/workflows/terraform.yml2name: Terraform3on:4 pull_request:5 paths:6 - 'terraform/**'7 push:8 branches: [main]9 paths:10 - 'terraform/**'11 12env:13 TF_VAR_snowflake_private_key: ${{ secrets.SNOWFLAKE_PRIVATE_KEY }}14 TF_VAR_snowflake_org: ${{ secrets.SNOWFLAKE_ORG }}15 TF_VAR_snowflake_account: ${{ secrets.SNOWFLAKE_ACCOUNT }}16 AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}17 AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}18 19jobs:20 plan:21 if: github.event_name == 'pull_request'22 runs-on: ubuntu-latest23 steps:24 - uses: actions/checkout@v425 26 - uses: hashicorp/setup-terraform@v327 with:28 terraform_version: "1.8"29 30 - name: Terraform Init31 run: terraform init32 working-directory: terraform33 34 - name: Terraform Plan35 id: plan36 run: terraform plan -no-color -out=tfplan37 working-directory: terraform38 39 - name: Post plan to PR40 uses: actions/github-script@v741 with:42 script: |43 const output = `${{ steps.plan.outputs.stdout }}`;44 const truncated = output.length > 6000045 ? output.substring(0, 60000) + '\n... truncated'46 : output;47 github.rest.issues.createComment({48 owner: context.repo.owner,49 repo: context.repo.repo,50 issue_number: context.issue.number,51 body: `## Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``52 });53 54 apply:55 if: github.event_name == 'push' && github.ref == 'refs/heads/main'56 runs-on: ubuntu-latest57 environment: production58 steps:59 - uses: actions/checkout@v460 61 - uses: hashicorp/setup-terraform@v362 with:63 terraform_version: "1.8"64 65 - name: Terraform Init66 run: terraform init67 working-directory: terraform68 69 - name: Terraform Apply70 run: terraform apply -auto-approve71 working-directory: terraformThe plan job runs on every PR and posts the plan output as a comment. Reviewers can see exactly what will change: which warehouses will be resized, which roles will be modified, which buckets will be created. The apply job runs only on pushes to main, which means it runs only after a PR is approved and merged. The production environment protection rule adds an additional approval gate if you configure it in GitHub.
Environment Separation
For data platforms, the cleanest environment separation uses Terraform workspaces combined with environment-specific variable files. Each environment gets its own .tfvars file with appropriate sizing, and the workspace determines which state file Terraform uses.
1# variables.tf2variable "environment" {3 type = string4 description = "Deployment environment (dev, staging, prod)"5 validation {6 condition = contains(["dev", "staging", "prod"], var.environment)7 error_message = "Environment must be dev, staging, or prod."8 }9}10 11variable "snowflake_org" {12 type = string13 sensitive = true14}15 16variable "snowflake_account" {17 type = string18 sensitive = true19}20 21variable "snowflake_private_key" {22 type = string23 sensitive = true24}25 26variable "aws_region" {27 type = string28 default = "us-east-1"29}30 31variable "aws_account_id" {32 type = string33}34 35variable "snowflake_aws_iam_user_arn" {36 type = string37 description = "ARN provided by Snowflake storage integration"38}39 40variable "snowflake_storage_integration_external_id" {41 type = string42 description = "External ID from Snowflake storage integration"43}With this setup, deploying to staging is terraform workspace select staging followed by terraform apply -var-file=environments/staging.tfvars. Every resource name includes the environment variable, so there is no collision between environments. The state files are isolated by workspace. And the CI/CD pipeline can target any environment by selecting the appropriate workspace.
Importing Existing Resources
Most teams are not starting from scratch. You already have Snowflake warehouses, S3 buckets, and IAM roles that were created manually. Terraform can adopt these existing resources into its state using the import command. This is how you move from console-managed infrastructure to code-managed infrastructure without recreating anything.
1# imports.tf — one-time import block2import {3 to = snowflake_warehouse.etl4 id = "WH_ETL_PROD"5}6 7import {8 to = snowflake_database.analytics9 id = "ANALYTICS_PROD"10}11 12import {13 to = aws_s3_bucket.data_lake14 id = "datalake-prod-123456789"15}16 17import {18 to = aws_iam_role.snowflake_access19 id = "snowflake-data-lake-prod"20}Run terraform plan after adding imports to see whether your Terraform configuration matches the actual state of the resources. If there are differences — for example, the existing warehouse has a different auto_suspend value than your Terraform code — terraform plan will show the drift. Fix the Terraform code to match reality first, then start making intentional changes through PRs.
Common Pitfalls
Three mistakes trip up most teams when terraforming their data platform. First, managing too many resources at once. Start with infrastructure resources (warehouses, buckets, roles) and leave object-level resources (tables, views, pipes) to dbt or other tools. Terraform is the wrong tool for managing individual Snowflake tables — that is what your transformation layer is for.
Second, not using modules for repeated patterns. If you have five warehouses that differ only in size and name, extract a warehouse module and call it five times with different parameters. This keeps your code DRY and makes it easy to apply consistent policies like auto-suspend settings across all warehouses.
Third, forgetting to protect the state file. Your Terraform state contains sensitive information: resource IDs, configuration values, and sometimes secrets. Enable encryption on the S3 state bucket, restrict access to the state file with IAM policies, and never commit the state file to git. The remote backend handles this correctly by default, but verify it during initial setup.
Start with your most critical resources: the production Snowflake warehouses and the S3 bucket. Import them into Terraform state with terraform import. Then expand to roles, schemas, and lower environments. Within two weeks, your entire data platform infrastructure will be version-controlled, auditable, and recoverable from a single terraform apply command.




