Testing Helm Charts: Catch Kubernetes Configuration Bugs Before They Reach Production
A Helm chart is code. A Terraform module is code. A Kubernetes manifest is code. Yet most engineering teams who would never ship application code without tests deploy infrastructure changes with nothing more than a manual helm upgrade and a visual inspection of kubectl get pods.
The consequences are predictable: a values.yaml typo disables autoscaling in production, a resource limit misconfiguration causes OOM kills under load, a missing readinessProbe causes traffic to hit unready pods during rolling deployments. All of these are testable and preventable.
The IaC Testing Pyramid
flowchart TD
A[Unit: Static Analysis\nhelm lint, terraform validate, conftest] --> B
B[Integration: Cluster Test\nhelm unittest, kind/k3s] --> C
C[End-to-End: Deploy + Verify\nActual deploy + smoke tests]
style A fill:#22c55e,color:#fff
style B fill:#3b82f6,color:#fff
style C fill:#f59e0b,color:#fff
| Layer | Tools | What It Tests | Speed |
|---|---|---|---|
| Static analysis | helm lint, kubeval, Conftest |
Syntax, schema, policies | Seconds |
| Unit tests | helm unittest |
Template rendering, values | Seconds |
| Cluster integration | kind + helm install | Real K8s behavior | Minutes |
| E2E deploy test | Staging deploy + test suite | Full system behavior | Minutes |
Layer 1: Helm Lint and Schema Validation
Start with the free, fast checks:
# Lint the chart for syntax errors and best practices
helm lint ./helm/scanly/
# Validate rendered templates against Kubernetes API schema
helm template ./helm/scanly/ | kubeval --strict
# Validate with multiple K8s versions
helm template ./helm/scanly/ | kubeval --kubernetes-version 1.28.0
helm template ./helm/scanly/ | kubeval --kubernetes-version 1.29.0
# In CI:
helm template ./helm/scanly/ \
--set image.tag=test \
--set environment=staging \
| kubeval \
--strict \
--ignore-missing-schemas
Layer 2: Helm Unit Tests
helm unittest lets you assert on rendered template output without deploying:
# Install the plugin
helm plugin install https://github.com/helm-unittest/helm-unittest
# Run tests
helm unittest ./helm/scanly/
# helm/scanly/tests/deployment_test.yaml
suite: Deployment tests
tests:
- it: should use the specified image tag
values:
- values-test.yaml
set:
image.tag: 'v1.2.3'
asserts:
- equal:
path: spec.template.spec.containers[0].image
value: registry.example.com/scanly:v1.2.3
template: templates/deployment.yaml
- it: should have readinessProbe configured
asserts:
- isNotNull:
path: spec.template.spec.containers[0].readinessProbe
template: templates/deployment.yaml
- equal:
path: spec.template.spec.containers[0].readinessProbe.httpGet.path
value: /api/health
template: templates/deployment.yaml
- it: should have resource limits set
asserts:
- isNotNull:
path: spec.template.spec.containers[0].resources.limits
template: templates/deployment.yaml
- isNotNull:
path: spec.template.spec.containers[0].resources.requests
template: templates/deployment.yaml
- it: should not expose secrets as environment variables directly
asserts:
- notContainsDocument:
path: spec.template.spec.containers[0].env
content:
name: DATABASE_PASSWORD
value: # Should use secretKeyRef, not literal value
template: templates/deployment.yaml
- it: number of replicas matches minReplicas HPA value
set:
autoscaling.enabled: true
autoscaling.minReplicas: 3
asserts:
- equal:
path: spec.minReplicas
value: 3
template: templates/hpa.yaml
Layer 3: Policy Testing with Conftest
Conftest uses OPA (Open Policy Agent) Rego policies to enforce organizational standards across all Kubernetes manifests:
# policies/kubernetes/deny_latest_tag.rego
package kubernetes.deployment
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
contains(container.image, ":latest")
msg := sprintf("Container '%v' uses ':latest' tag", [container.name])
}
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not contains(container.image, ":")
msg := sprintf("Container '%v' has no image tag", [container.name])
}
# policies/kubernetes/require_resource_limits.rego
package kubernetes.resources
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not container.resources.limits
msg := sprintf("Container '%v' is missing resource limits", [container.name])
}
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not container.resources.requests
msg := sprintf("Container '%v' is missing resource requests", [container.name])
}
warn[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
to_number(container.resources.limits.memory) > 4294967296 # 4Gi
msg := sprintf("Container '%v' has memory limit exceeding 4Gi", [container.name])
}
# Run Conftest against rendered helm templates
helm template ./helm/scanly/ --set image.tag=v1.0.0 \
| conftest test --policy policies/kubernetes/ -
# Expected output:
# PASS - Container 'frontend' has resource limits
# PASS - Container 'frontend' has resource requests
# FAIL - Container 'frontend' uses ':latest' tag
Layer 4: Terraform Testing
# modules/scanly-vpc/tests/vpc_test.tftest.hcl
variables {
environment = "test"
region = "us-east-1"
cidr_block = "10.0.0.0/16"
}
run "verify_vpc_created" {
command = plan
assert {
condition = aws_vpc.main.cidr_block == var.cidr_block
error_message = "VPC CIDR block does not match input variable"
}
assert {
condition = aws_vpc.main.enable_dns_hostnames == true
error_message = "DNS hostnames must be enabled"
}
}
run "verify_subnets_in_multiple_azs" {
command = plan
assert {
condition = length(distinct(aws_subnet.private[*].availability_zone)) >= 2
error_message = "Private subnets must span at least 2 availability zones"
}
}
# Run Terraform tests
terraform test
# With specific test directory
terraform test -test-directory=./tests
CI/CD Pipeline Integration
# .github/workflows/iac-tests.yml
name: Infrastructure Tests
on:
pull_request:
paths:
- 'helm/**'
- 'terraform/**'
- 'deploy/**'
jobs:
helm-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Helm
uses: azure/setup-helm@v3
with:
version: v3.13.0
- name: Helm lint
run: helm lint ./helm/scanly/
- name: Install helm-unittest
run: helm plugin install https://github.com/helm-unittest/helm-unittest
- name: Run helm unit tests
run: helm unittest ./helm/scanly/
- name: Install conftest
run: |
curl -L https://github.com/open-policy-agent/conftest/releases/latest/download/conftest_Linux_x86_64.tar.gz \
| tar xz conftest
mv conftest /usr/local/bin/
- name: Run policy tests
run: |
helm template ./helm/scanly/ --set image.tag=ci \
| conftest test --policy policies/kubernetes/ -
terraform-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Terraform format check
run: terraform fmt -check -recursive
- name: Terraform validate
run: |
terraform init -backend=false
terraform validate
- name: Terraform test
run: terraform test
env:
AWS_DEFAULT_REGION: us-east-1
Related articles: Also see ephemeral Kubernetes environments as the target for your Helm deployments, testing Terraform and Pulumi alongside Helm for full IaC coverage, and Docker as the runtime underpinning both Helm charts and test containers.
Helm Testing Coverage Goals
| Test | Tool | Pass Criterion |
|---|---|---|
| All templates render without error | helm lint |
Zero errors |
| Image tags are never ":latest" | Conftest | Zero violations |
| Resource limits set on all containers | Conftest | Zero violations |
| readinessProbe on all deployments | helm unittest |
Test passes |
| Secrets use secretKeyRef | Conftest | Zero violations |
| HPA min replicas >= 2 in prod | helm unittest |
Test passes |
| ConfigMap keys match application expectations | helm unittest |
Test passes |
Testing infrastructure code is one of the highest-leverage activities in a platform engineering practice. The cost of an untested Helm chart bug in production is orders of magnitude higher than the cost of five minutes of unit tests in CI.
Verify your application layer is healthy after every infrastructure change: Try ScanlyApp free and run automated functional checks after each infrastructure deployment.
