⚙ Deep Dive · Kubernetes Internals

Kubernetes etcd:
The Brain Behind Your Cluster

A complete guide to etcd — the distributed key-value store that holds every secret, config, and object in your cluster. Learn architecture, backup, encryption, tuning, and debugging.

📅 Feb 14, 2026⏱ 12 min read🏷 etcd · Raft · Control Plane

What is etcd?

etcd is the single source of truth for your entire Kubernetes cluster. Every object you create — Pods, Services, ConfigMaps, Secrets, RBAC roles — is serialized and stored in etcd. If you lose etcd without a backup, you lose your cluster. Period.

It is a distributed, strongly consistent key-value store built on the Raft consensus algorithm. The Kubernetes API server is the only component that talks to etcd directly; everything else reads and writes through the API server.

"etcd is to Kubernetes what a database is to a web application — except losing the database means losing the entire system state."

CONTROL PLANEAPI Server:6443controllermanagerkubeschedulerETCD CLUSTERetcdLEADER:2379etcdfollower:2379etcdfollower:2379:2380 peerread/writeWorker nodes communicate only via API server — never directly with etcdRaft quorum: 2/3 for commit

Kubernetes control plane architecture — etcd cluster (right) is accessed exclusively by the API server.


How Kubernetes Uses etcd

All cluster state lives under the /registry key prefix. Every resource type gets its own hierarchical path. Here's a snapshot of what lives in etcd:

etcd KeyWhat It Stores
/registry/pods/{namespace}/{name}Pod spec and status
/registry/services/specs/{ns}/{name}Service definitions
/registry/secrets/{ns}/{name}Secrets (base64, optionally encrypted)
/registry/configmaps/{ns}/{name}ConfigMaps
/registry/deployments/apps/{ns}/{name}Deployment specs
/registry/leases/kube-system/{component}Leader election leases

You can inspect raw etcd data directly using etcdctl:

bash
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  get /registry/pods --prefix --keys-only

The Raft Consensus Algorithm

etcd uses Raft to guarantee linearizable consistency across all members. Every write goes through the elected leader; followers only replicate and vote.

ClientLeaderFollow 1Follow 2WRITEAPPENDREPLICATEACKCOMMIT ✓200 OK① write② replicate③ quorum④ commit

Raft write path: the leader replicates to followers, waits for a quorum of ACKs, then commits and replies to the client.

Why Odd Node Counts?

3
nodes
tolerates 1 failure
5
nodes
tolerates 2 failures
7
nodes
tolerates 3 failures

A 4-node cluster also requires 3 votes for quorum — the same as 3 nodes — but with added cost and complexity. Always use 3 or 5 in production.


Backup and Restore

⚠️

Critical: Always backup etcd before cluster upgrades, node additions, or any destructive operation. Automate this with a CronJob — test restores regularly.

Taking a Snapshot

bash
ETCDCTL_API=3 etcdctl snapshot save \
  /backup/etcd-snapshot-$(date +%Y%m%d).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Verifying the Snapshot

bash
etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table

Restoring from Snapshot

bash
# 1. Stop API server
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/

# 2. Restore to a new data dir
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restored \
  --name=master \
  --initial-cluster=master=https://127.0.0.1:2380 \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://127.0.0.1:2380

# 3. Update etcd manifest --data-dir, then restore API server
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/

Encryption at Rest

By default, Secrets stored in etcd are only base64-encoded — not encrypted. Anyone with etcd access can read them. Enable encryption at rest via an EncryptionConfiguration resource:

yaml
# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
      - configmaps
    providers:
    - aescbc:
        keys:
        - name: key1
          secret: <base64-encoded-32-byte-key>
    - identity: {}

Pass this config to the API server and re-encrypt existing secrets:

bash
# API server flag
--encryption-provider-config=/etc/kubernetes/encryption-config.yaml

# Re-encrypt all existing secrets
kubectl get secrets --all-namespaces -o json | kubectl replace -f -

Performance Tuning

etcd is highly sensitive to disk I/O latency — it writes a WAL entry to disk on every committed write. On a slow disk, the Raft heartbeat can be missed, causing leader re-elections and degraded performance.

ParameterDefaultRecommended
--heartbeat-interval100ms250ms
--election-timeout1000ms1250ms
--quota-backend-bytes2GB8GB
--auto-compaction-retention0 (off)1h
bash
# Compact old revisions
REV=$(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')
etcdctl compact $REV

# Defragment (during low-traffic windows)
etcdctl defrag --endpoints=https://127.0.0.1:2379 [tls-flags]

Monitoring etcd

etcd exposes Prometheus metrics on port 2381. These are the key metrics to alert on:

MetricWhat to Watch For
etcd_server_leader_changes_seen_totalShould be near 0 — frequent re-elections = disk/network issue
etcd_disk_wal_fsync_duration_secondsp99 < 10ms — high latency = slow disk
etcd_disk_backend_commit_duration_secondsp99 < 25ms
etcd_network_peer_round_trip_time_seconds< 10ms within same datacenter
etcd_mvcc_db_total_size_in_bytesAlert at 80% of quota

Common Failure Scenarios

Cluster Loses Quorum

If more than half your etcd nodes are down, the cluster becomes read-only. Kubernetes cannot schedule Pods or update any objects. Restore from a snapshot or repair the failed members.

NOSPACE Alarm

💡

Symptom: etcdserver: mvcc: database space exceeded. etcd rejected all writes because the keyspace exceeded the storage quota.

bash
# 1. Compact old revisions
etcdctl compact $(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')

# 2. Defrag all members
etcdctl defrag --endpoints=...

# 3. Disarm the alarm
etcdctl alarm disarm --endpoints=...

Security Checklist

  • TLS certificates for client-to-server and peer-to-peer communication
  • Separate PKI for etcd — don't reuse the Kubernetes CA
  • Encryption at rest enabled for Secrets (and ConfigMaps)
  • etcd port 2379 not reachable outside the control plane network
  • Automated daily backups with tested restore runbooks
  • Monitoring and alerting on leader elections and disk latency
  • Storage quota configured with auto-compaction enabled

Further Reading