Terraform: count vs for_each
Both meta-arguments produce N copies of a resource. They look interchangeable in the config. They are not. The difference shows up in the state file — and that's where the production failures live.
What the terraform state representation looks like
With count, Terraform indexes resources positionally:
resource "aws_iam_user" "team" {
count = length(var.usernames)
name = var.usernames[count.index]
}
State addresses:
aws_iam_user.team[0]
aws_iam_user.team[1]
aws_iam_user.team[2]
With for_each, Terraform keys resources by string:
resource "aws_iam_user" "team" {
for_each = toset(var.usernames)
name = each.value
}
State addresses:
aws_iam_user.team["alice"]
aws_iam_user.team["bob"]
aws_iam_user.team["carol"]
Same three users, two very different state shapes. The first is positional; the second is named. Everything below follows from that.
The re-indexing trap
Remove bob from the middle of var.usernames and re-plan.
With count, indices shift under your feet:
aws_iam_user.team[0] alice -> alice (no change)
aws_iam_user.team[1] bob -> carol (replace)
aws_iam_user.team[2] carol -> (gone) (destroy)
Terraform sees team[1] as “the user formerly known as bob, now
carol” and plans to destroy and recreate it. team[2] is gone. You
wanted to delete one user; Terraform plans to destroy two and recreate one.
With for_each, the keyspace is stable:
aws_iam_user.team["alice"] no change
aws_iam_user.team["bob"] destroy
aws_iam_user.team["carol"] no change
Bob goes. Everyone else stays put. The plan does what your brain expected.
Why this matters
The IAM-user example is harmless — users get recreated and life goes on. The shape of the bug is the same when the resources have real side effects:
-
A list of
aws_db_instanceindexed bycount. Remove one from the middle and Terraform plans to destroy and recreate every RDS that came after it. Each replacement is a multi-hour, multi-snapshot affair. - Route-table entries or VPC peering links keyed positionally. Re-indexing breaks connectivity for every entry past the one you removed, often silently until traffic hits it.
The common thread: anything with a meaningful identity, stored at a positional address. The plan output reads “destroy 12, create 12” and the apply does what it says.
How do you workaround this? Enter moved
You can't just rewrite the resource. State still has [0],
[1], [2]; the config now expects ["alice"],
["bob"], ["carol"]. Terraform's plan:
Terraform will perform the following actions:
# aws_iam_user.team[0] will be destroyed
# aws_iam_user.team[1] will be destroyed
# aws_iam_user.team[2] will be destroyed
# aws_iam_user.team["alice"] will be created
# aws_iam_user.team["bob"] will be created
# aws_iam_user.team["carol"] will be created
Plan: 3 to add, 0 to change, 3 to destroy.
Exactly the failure mode this post is about.
moved blocks tell Terraform to rename addresses in state instead of
destroying and recreating:
moved {
from = aws_iam_user.team[0]
to = aws_iam_user.team["alice"]
}
moved {
from = aws_iam_user.team[1]
to = aws_iam_user.team["bob"]
}
moved {
from = aws_iam_user.team[2]
to = aws_iam_user.team["carol"]
}
Plan output drops to 0 to add, 0 to change, 0 to destroy — pure
address rewrites, applied as part of the next normal run. No destroy, no recreate,
no downtime.
A few practical notes from doing this on real fleets:
-
The
fromaddresses must match state exactly. Runterraform state listagainst the workspace before writing the blocks. -
For lists of more than ~10 items, generate the
movedblocks with a quick script overterraform state listoutput. Hand-writing fifty of them is how typos sneak in and resources end up orphaned. -
Once the migration has been applied, the
movedblocks can be removed from the code. They have no effect on subsequent runs — the addresses already exist at their new keys in state. Removing them costs nothing: terraform plans a no-op and the codebase stays clean. Some teams keep them around as an inline changelog of past refactors, which is fine too. Style call, not a safety call. -
The migration only covers items that already exist at the old address. New items
go in through
for_eachas normal. -
For modules being refactored,
movedworks across module boundaries too — same syntax, fully-qualified addresses on both sides.
Real cases from production
The same shape of bug shows up across totally different parts of a terraform codebase — AWS networking primitives one day, Kubernetes manifests rendered by helm the next. Two cases from a multi-region production fleet.
VPC peering routes
Earlier I said route-table entries keyed positionally are a classic version of this bug. Here is how it actually played out.
Cross-region VPC peerings install routes into every route table on each side. The
original module used count:
resource "aws_route" "region_a_to_region_b_peering" {
provider = aws.region_a
count = length(data.aws_route_tables.region_a_vpc.ids)
route_table_id = data.aws_route_tables.region_a_vpc.ids[count.index]
destination_cidr_block = data.aws_vpc.region_b_vpc.cidr_block
vpc_peering_connection_id = aws_vpc_peering_connection.region_a_to_region_b_peering.id
}
The trigger was unrelated work elsewhere in the network stack: a project added new
private subnets to all production clusters. New subnets meant new route tables, and
data.aws_route_tables.region_a_vpc.ids started returning the IDs in a
different order. Every aws_route keyed by count
re-indexed silently.
The plan came back proposing to destroy and recreate every cross-region peering
route across every production cluster. Hundreds of lines of
aws_route.<name>[N] will be destroyed and replaced.
Your backend applications reach RDS, ElastiCache, OpenSearch, and other regional endpoints over those peering routes — including cluster-to-cluster communication. A momentary route deletion is not free; even a short drop knocks out cache traffic for everything in flight, across regions. Not a plan you apply.
The fix was to key by the route table ID itself:
- count = length(data.aws_route_tables.region_a_vpc.ids)
- route_table_id = data.aws_route_tables.region_a_vpc.ids[count.index]
+ for_each = toset(data.aws_route_tables.region_a_vpc.ids)
+ route_table_id = each.value
The whole resource, after the change:
resource "aws_route" "region_a_to_region_b_peering" {
provider = aws.region_a
for_each = toset(data.aws_route_tables.region_a_vpc.ids)
route_table_id = each.value
destination_cidr_block = data.aws_vpc.region_b_vpc.cidr_block
vpc_peering_connection_id = aws_vpc_peering_connection.region_a_to_region_b_peering.id
}
Route table IDs do not move. Adding new subnets — and therefore new route tables — only adds new entries to the map. Nothing existing changes address.
Paired with one moved block per existing route, per peering, per
region:
moved {
from = aws_route.region_a_to_region_b_peering[0]
to = aws_route.region_a_to_region_b_peering["rtb-aaaaaaaaaaaaaaaaa"]
}
moved {
from = aws_route.region_a_to_region_b_peering[1]
to = aws_route.region_a_to_region_b_peering["rtb-bbbbbbbbbbbbbbbbb"]
}
# ...one per route table, across every peering and every region.
The plan dropped from “destroy and recreate everything” to
0 to add, 0 to change, 0 to destroy. Pure state rewrites. Inter-region
traffic kept flowing.
ingress-nginx upgrades
The task: cut a major version upgrade of ingress-nginx. The new chart adds a handful of new Kubernetes resources alongside the breaking changes called out in the release notes. Routine upgrade work — proactive, of course. Cough. An ingress-nginx CVE knocked on the door.
Side note: ingress-nginx is being retired upstream — in case you missed it. The pattern in this section still applies to whatever you migrate to.
The ingress-nginx manifests are rendered with helm template and
applied via terraform through kubectl_manifest:
resource "kubectl_manifest" "ingress_nginx" {
count = length(data.kubectl_path_documents.ingress_nginx.documents)
yaml_body = element(data.kubectl_path_documents.ingress_nginx.documents, count.index)
}
Plan time: the new chart emits more documents in a different order, and every
kubectl_manifest.ingress_nginx[N] past the inserted resource
re-indexes silently.
The plan output includes lines like:
kubectl_manifest.ingress_nginx[12] will be destroyed
kubectl_manifest.ingress_nginx[13] will be destroyed
# ...and so on, for every resource past the insertion point.
Among those destroys: the ingress-nginx Deployment and its
Service of type LoadBalancer backed by an NLB.
Destroying and recreating that Service tears down the NLB and
provisions a new one — new ARN, new DNS name. Traffic blackholes until DNS
records and any upstream load balancers catch up. For minutes, possibly longer.
Can you afford that in prod? No. Ingress is the chokepoint every external request flows through. A new NLB on every helm upgrade is not a deploy strategy.
The migration is identical in shape to the route-table case:
resource "kubectl_manifest" "ingress_nginx" {
for_each = data.kubectl_path_documents.ingress_nginx.manifests
yaml_body = each.value
}
The manifests attribute is a map keyed by each rendered manifest's
API path — the resource's identity in the cluster, not its position in the
file. Add or remove resources between upgrades and the keys for everything else
don't change.
For kubectl_manifest backed by helm output, for_each is
much more flexible to chart refactors than count: keys survive
upgrades, positional indices don't.
The moved blocks key into the new addresses by API path:
moved {
from = kubectl_manifest.ingress_nginx[0]
to = kubectl_manifest.ingress_nginx["/api/v1/namespaces/ingress-nginx"]
}
moved {
from = kubectl_manifest.ingress_nginx[1]
to = kubectl_manifest.ingress_nginx["/api/v1/namespaces/ingress-nginx/serviceaccounts/ingress-nginx"]
}
# ...
moved {
from = kubectl_manifest.ingress_nginx[39]
to = kubectl_manifest.ingress_nginx["/api/v1/namespaces/ingress-nginx/services/ingress-nginx-controller"]
}
# ...
moved {
from = kubectl_manifest.ingress_nginx[42]
to = kubectl_manifest.ingress_nginx["/apis/apps/v1/namespaces/ingress-nginx/deployments/ingress-nginx-controller"]
}
# ...one per existing manifest, generated from `terraform state list`.
Same outcome: the destroy plan disappears. With the moved blocks in
place, the migration itself is a pure state rewrite
(0 to add, 0 to change, 0 to destroy). The next plan, with the
upgraded chart applied, shows only in-place updates for resources whose YAML
changed and creates for the new ones the chart added — no deletions at all.
The NLB stays where it is.
The for_each tradeoff
for_each keys must be known at plan time. Many resource attributes
— an EC2 instance's private IP, an RDS endpoint, a randomly generated
password, a resource's ARN — are only determined by the cloud provider at
creation time. Use any of those as a for_each key on a resource
being created in the same plan and Terraform errors out:
The for_each map includes keys derived from resource attributes
that cannot be determined until apply.
count works regardless — it just takes a number and does not
care where it came from.
This has not bitten me in every for_each case, only some —
and even then, there are usually workarounds that keep for_each the
right call. A topic for another post.
Rule of thumb
Default to for_each when items have identity. Default to
count when they do not — fungible replicas, on/off toggles,
conditional creation:
# fungible workers, no identity
resource "aws_instance" "worker" {
count = var.worker_count
# ...
}
# conditional creation
resource "aws_cloudwatch_metric_alarm" "high_cost" {
count = var.enable_cost_alarms ? 1 : 0
# ...
}
for_each = var.enabled ? toset(["this"]) : toset([]) works for a
toggle but reads worse than count = var.enabled ? 1 : 0. Pick the
one that says what you mean.
When the wrong choice is already in production, moved blocks turn a
destructive plan into a state-only change — the difference between a
Tuesday afternoon refactor and a Tuesday afternoon incident.
Yes, it is grunt work — lots of terraform state resource inspection and
dozens of moved blocks for resources nobody is looking at. But if you lean on terraform heavily and have not hit the
count-shifting trap yet, you will, and it tends to land on the day you can least
afford it: mid-incident, mid-upgrade, mid-emergency-refactor. The proactive
Tuesday version costs a lot less than the reactive 3am one.