Testing Helm Charts: Catch Kubernetes Configuration Bugs Before They Reach Production

A Helm chart is code. A Terraform module is code. A Kubernetes manifest is code. Yet most engineering teams who would never ship application code without tests deploy infrastructure changes with nothing more than a manual helm upgrade and a visual inspection of kubectl get pods.

The consequences are predictable: a values.yaml typo disables autoscaling in production, a resource limit misconfiguration causes OOM kills under load, a missing readinessProbe causes traffic to hit unready pods during rolling deployments. All of these are testable and preventable.

The IaC Testing Pyramid

flowchart TD
    A[Unit: Static Analysis\nhelm lint, terraform validate, conftest] --> B
    B[Integration: Cluster Test\nhelm unittest, kind/k3s] --> C
    C[End-to-End: Deploy + Verify\nActual deploy + smoke tests]

    style A fill:#22c55e,color:#fff
    style B fill:#3b82f6,color:#fff
    style C fill:#f59e0b,color:#fff

Layer	Tools	What It Tests	Speed
Static analysis	`helm lint`, `kubeval`, Conftest	Syntax, schema, policies	Seconds
Unit tests	`helm unittest`	Template rendering, values	Seconds
Cluster integration	kind + helm install	Real K8s behavior	Minutes
E2E deploy test	Staging deploy + test suite	Full system behavior	Minutes

Layer 1: Helm Lint and Schema Validation

Start with the free, fast checks:

# Lint the chart for syntax errors and best practices
helm lint ./helm/scanly/

# Validate rendered templates against Kubernetes API schema
helm template ./helm/scanly/ | kubeval --strict

# Validate with multiple K8s versions
helm template ./helm/scanly/ | kubeval --kubernetes-version 1.28.0
helm template ./helm/scanly/ | kubeval --kubernetes-version 1.29.0

# In CI:
helm template ./helm/scanly/ \
  --set image.tag=test \
  --set environment=staging \
  | kubeval \
  --strict \
  --ignore-missing-schemas

Layer 2: Helm Unit Tests

helm unittest lets you assert on rendered template output without deploying:

# Install the plugin
helm plugin install https://github.com/helm-unittest/helm-unittest

# Run tests
helm unittest ./helm/scanly/

# helm/scanly/tests/deployment_test.yaml
suite: Deployment tests
tests:
  - it: should use the specified image tag
    values:
      - values-test.yaml
    set:
      image.tag: 'v1.2.3'
    asserts:
      - equal:
          path: spec.template.spec.containers[0].image
          value: registry.example.com/scanly:v1.2.3
        template: templates/deployment.yaml

  - it: should have readinessProbe configured
    asserts:
      - isNotNull:
          path: spec.template.spec.containers[0].readinessProbe
        template: templates/deployment.yaml
      - equal:
          path: spec.template.spec.containers[0].readinessProbe.httpGet.path
          value: /api/health
        template: templates/deployment.yaml

  - it: should have resource limits set
    asserts:
      - isNotNull:
          path: spec.template.spec.containers[0].resources.limits
        template: templates/deployment.yaml
      - isNotNull:
          path: spec.template.spec.containers[0].resources.requests
        template: templates/deployment.yaml

  - it: should not expose secrets as environment variables directly
    asserts:
      - notContainsDocument:
          path: spec.template.spec.containers[0].env
          content:
            name: DATABASE_PASSWORD
            value: # Should use secretKeyRef, not literal value
        template: templates/deployment.yaml

  - it: number of replicas matches minReplicas HPA value
    set:
      autoscaling.enabled: true
      autoscaling.minReplicas: 3
    asserts:
      - equal:
          path: spec.minReplicas
          value: 3
        template: templates/hpa.yaml

Layer 3: Policy Testing with Conftest

Conftest uses OPA (Open Policy Agent) Rego policies to enforce organizational standards across all Kubernetes manifests:

# policies/kubernetes/deny_latest_tag.rego
package kubernetes.deployment

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  contains(container.image, ":latest")
  msg := sprintf("Container '%v' uses ':latest' tag", [container.name])
}

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  not contains(container.image, ":")
  msg := sprintf("Container '%v' has no image tag", [container.name])
}

# policies/kubernetes/require_resource_limits.rego
package kubernetes.resources

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  not container.resources.limits
  msg := sprintf("Container '%v' is missing resource limits", [container.name])
}

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  not container.resources.requests
  msg := sprintf("Container '%v' is missing resource requests", [container.name])
}

warn[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  to_number(container.resources.limits.memory) > 4294967296  # 4Gi
  msg := sprintf("Container '%v' has memory limit exceeding 4Gi", [container.name])
}

# Run Conftest against rendered helm templates
helm template ./helm/scanly/ --set image.tag=v1.0.0 \
  | conftest test --policy policies/kubernetes/ -

# Expected output:
# PASS - Container 'frontend' has resource limits
# PASS - Container 'frontend' has resource requests
# FAIL - Container 'frontend' uses ':latest' tag

Layer 4: Terraform Testing

# modules/scanly-vpc/tests/vpc_test.tftest.hcl
variables {
  environment = "test"
  region      = "us-east-1"
  cidr_block  = "10.0.0.0/16"
}

run "verify_vpc_created" {
  command = plan

  assert {
    condition     = aws_vpc.main.cidr_block == var.cidr_block
    error_message = "VPC CIDR block does not match input variable"
  }

  assert {
    condition     = aws_vpc.main.enable_dns_hostnames == true
    error_message = "DNS hostnames must be enabled"
  }
}

run "verify_subnets_in_multiple_azs" {
  command = plan

  assert {
    condition     = length(distinct(aws_subnet.private[*].availability_zone)) >= 2
    error_message = "Private subnets must span at least 2 availability zones"
  }
}

# Run Terraform tests
terraform test

# With specific test directory
terraform test -test-directory=./tests

CI/CD Pipeline Integration

# .github/workflows/iac-tests.yml
name: Infrastructure Tests
on:
  pull_request:
    paths:
      - 'helm/**'
      - 'terraform/**'
      - 'deploy/**'

jobs:
  helm-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v3
        with:
          version: v3.13.0

      - name: Helm lint
        run: helm lint ./helm/scanly/

      - name: Install helm-unittest
        run: helm plugin install https://github.com/helm-unittest/helm-unittest

      - name: Run helm unit tests
        run: helm unittest ./helm/scanly/

      - name: Install conftest
        run: |
          curl -L https://github.com/open-policy-agent/conftest/releases/latest/download/conftest_Linux_x86_64.tar.gz \
            | tar xz conftest
          mv conftest /usr/local/bin/

      - name: Run policy tests
        run: |
          helm template ./helm/scanly/ --set image.tag=ci \
            | conftest test --policy policies/kubernetes/ -

  terraform-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Terraform format check
        run: terraform fmt -check -recursive

      - name: Terraform validate
        run: |
          terraform init -backend=false
          terraform validate

      - name: Terraform test
        run: terraform test
        env:
          AWS_DEFAULT_REGION: us-east-1

Helm Testing Coverage Goals

Test	Tool	Pass Criterion
All templates render without error	`helm lint`	Zero errors
Image tags are never ":latest"	Conftest	Zero violations
Resource limits set on all containers	Conftest	Zero violations
readinessProbe on all deployments	`helm unittest`	Test passes
Secrets use secretKeyRef	Conftest	Zero violations
HPA min replicas >= 2 in prod	`helm unittest`	Test passes
ConfigMap keys match application expectations	`helm unittest`	Test passes

Testing infrastructure code is one of the highest-leverage activities in a platform engineering practice. The cost of an untested Helm chart bug in production is orders of magnitude higher than the cost of five minutes of unit tests in CI.

Verify your application layer is healthy after every infrastructure change: Try ScanlyApp free and run automated functional checks after each infrastructure deployment.

Testing Helm Charts: Catch Kubernetes Configuration Bugs Before They Reach Production

Testing Helm Charts: Catch Kubernetes Configuration Bugs Before They Reach Production

The IaC Testing Pyramid

Layer 1: Helm Lint and Schema Validation

Layer 2: Helm Unit Tests

Layer 3: Policy Testing with Conftest

Layer 4: Terraform Testing

CI/CD Pipeline Integration

Helm Testing Coverage Goals

Related Posts

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You

Webhook Testing: How to Guarantee Delivery, Retry Logic, and Correct Event Ordering