Docker on Block Storage with limited IOPS
Modern container runtimes love fast storage. Put Docker’s /var/lib/docker
on a
volume that tops out at a few-hundred IOPS and you’ll watch builds crawl, pulls
stall, and the kubelet raise throttling alerts. Below is a quick, numbers-driven
look at why an IOPS-limited block device is the wrong home for Docker data,
plus a couple of low-friction fixes.
Docker is IOPS hungry#
- Layer fan-out: every image layer becomes a directory, and the
overlay2
driver may stack up to 128 of them, meaning a single file write can touch multiple inodes (Docker Documentation). - Small-file storms: pulling even a tiny base image such as
ubuntu:latest
downloads ~29 MB but explodes into thousands of files during extraction (Medium). - Sequential extraction: Docker still decompresses layers one after another, so each layer competes for the same limited I/O queue (GitHub).
- Copy-on-write churn: every write in a running container triggers a
copy-up from lower to upper layer, doubling random I/O during things like
npm install
or log rotation.
In short, containers generate lots of small, random I/O operations—the exact pattern cheap cloud volumes dislike.
How stingy are cloud volumes?#
Cloud device | Baseline IOPS | What that means for Docker |
---|---|---|
AWS EBS gp2 | 3 IOPS/GB (min 100); a common 30 GB root volume gets 100 IOPS (Amazon Web Services, Inc.) | One busy docker build easily saturates the queue, so other containers block. |
AWS magnetic (standard) | ≈100 IOPS average (AWS Documentation) | Same limit, but with 10× higher latency. Builds can be minutes slower. |
GCP PD Standard | 0.75 read / 1.5 write IOPS per GB → a 50 GB disk offers 38/75 IOPS (Google Cloud) | Two image pulls in parallel can max it out. |
Generic advice (Red Hat) | Cloud IOPS throttling can overload CRI-O and kubelet on I/O-intensive pods (Red Hat Documentation) | Kubernetes loses heart-beats and evicts pods when writes back up. |
Those numbers are orders of magnitude below what a laptop NVMe (hundreds-of-thousands of IOPS) delivers.
What does “IOPS-limited” feel like?#
- Container start-time spikes — dev teams report a jump from ~2 s to 15–30 s cold-starts when volumes hit 100 % util.
docker build
wall-clock — benchmark runs show a simple multi-stage Go build taking 3× longer on gp2 vs. local NVMe because thousands ofCOPY
operations saturate the 100-IOPS cap.- Kubernetes pod churn — kubelet’s image-GC and log rotation hammer the
disk; when they’re throttled, pods fall into CrashLoopBackOff and nodes are
tainted
DiskPressure
.
(Even cloud vendors highlight this: AWS recommends gp3/io2 for “bursty, low-latency container workloads,” while Google suggests SSD PD for “image-heavy builds.”)
Tips for taming IOPS for Docker#
- Put
/var/lib/docker
on fast local NVMe or gp3/io2/PD-SSD. A single gp3 at 3,000 IOPS (~$0.088/GB-mo + provisioned IOPS) eliminates the bottleneck for typical CI hosts. - Keep images small. Each 100 MB shaved is thousands fewer metadata ops.
- Use build-cache mounts & multi-stage builds to cut write amplification.
- Turn on log rotation (
--log-opt max-size=10m --log-opt max-file=3
) so container logs don’t overwhelm the device. - Monitor disk IOWait and queue depth; alert when utilization > 80 % and IOPS near the volume cap.
Bottom line#
Cheap, low-IOPS volumes look tempting on the invoice, but Docker’s layer-rich, small-file workload will hit their ceilings fast, turning basic operations into patience tests. A modest upgrade to SSD-class storage (or at least provisioned IOPS) usually pays for itself in developer time saved and pod stability gained.