Building Minimal Docker Images with Alpine and GDAL
Building minimal Docker images with Alpine and GDAL requires a multi-stage build strategy that isolates heavy compilation steps, strips unnecessary binaries, and explicitly manages musl libc compatibility. For serverless geospatial workloads on AWS Lambda, Google Cloud Run, or Azure Container Apps, the target is typically under 250MB uncompressed, with GDAL, PROJ, and Python bindings pre-compiled against Alpine’s apk ecosystem. The core approach uses alpine:3.19 (or newer), installs build dependencies, pulls GDAL via Alpine’s package manager, then copies only the runtime artifacts into a clean stage. This eliminates build tools, static libraries, and debug symbols that bloat container payloads and degrade cold-start performance.
Multi-Stage Architecture
The pattern relies on Docker’s multi-stage build capability to separate the heavy dependency resolution phase from the lean execution environment. In the first stage, you install build-base, development headers, and the gdal/py3-gdal packages. Alpine’s package manager resolves complex C/C++ dependencies (PROJ, GEOS, SQLite, TIFF, JPEG) automatically. The second stage pulls only the runtime libraries, Python site-packages, and GDAL data directories. This workflow is a foundational component of Docker Container Optimization for GIS, where image size directly impacts deployment limits, registry pull latency, and execution costs across cloud providers.
Production Dockerfile
The following Dockerfile demonstrates a hardened, production-ready pattern. It avoids manual ./configure && make compilation by leveraging Alpine’s pre-built binaries, then aggressively strips debug symbols and caches.
# Stage 1: Build environment
FROM alpine:3.19 AS builder
# Install build tools and GDAL + Python bindings
RUN apk add --no-cache \
build-base cmake python3-dev py3-pip \
proj-dev sqlite-dev libtiff-dev libjpeg-turbo-dev \
libpng-dev libwebp-dev curl-dev zlib-dev \
geos-dev expat-dev \
gdal py3-gdal
# Strip debug symbols from shared libraries to reduce size
RUN strip --strip-unneeded /usr/lib/libgdal.so.*
# Stage 2: Minimal runtime
FROM alpine:3.19
# Install only runtime dependencies (no compilers, no headers)
RUN apk add --no-cache python3 py3-pip \
libtiff libjpeg-turbo libpng libwebp curl sqlite proj geos expat
# Copy runtime libraries and Python bindings from builder
COPY --from=builder /usr/lib/libgdal.so.* /usr/lib/
COPY --from=builder /usr/lib/python3.11/site-packages/osgeo /usr/lib/python3.11/site-packages/osgeo
COPY --from=builder /usr/lib/python3.11/site-packages/gdal.py /usr/lib/python3.11/site-packages/
COPY --from=builder /usr/share/gdal /usr/share/gdal
COPY --from=builder /usr/share/proj /usr/share/proj
# Configure GDAL/PROJ data paths and Python environment
ENV GDAL_DATA=/usr/share/gdal
ENV PROJ_LIB=/usr/share/proj
ENV PYTHONPATH=/usr/lib/python3.11/site-packages
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "handler.py"]
Critical Compatibility & Optimization Rules
musl vs glibc
Alpine uses musl instead of the GNU C Library (glibc). Pre-built manylinux wheels (common for rasterio, fiona, shapely, or pyproj) will fail with Error loading shared library: libm.so.6 or missing musl symbols. Always compile C extensions from source within the Alpine stage, or use Alpine-native packages (apk add py3-rasterio py3-fiona). When handling Python dependencies, refer to Packaging & Dependency Management for Serverless GIS for wheel compatibility patterns and fallback strategies.
PROJ Data & Coordinate Systems
GDAL 3+ requires PROJ 6+. The COPY --from=builder /usr/share/proj directive ensures coordinate reference system (CRS) definitions are available at runtime. If you encounter PROJ: proj_create_from_database: Cannot find proj.db, verify that PROJ_LIB points to /usr/share/proj and that the proj package is installed in the runtime stage.
Environment Variables & Configuration
GDAL relies heavily on runtime configuration. You can override defaults using GDAL configuration options via ENV directives. Common serverless optimizations include:
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR(prevents unnecessary directory scans on object storage)CPL_VSIL_CURL_CACHE_SIZE=0(disables HTTP caching in stateless environments)VSI_CACHE=FALSE(reduces memory footprint for large raster operations)
Size Reduction Tactics
- Strip binaries:
strip --strip-unneededremoves debug symbols and reduceslibgdal.soby ~30–40%. - Avoid
apkcache: Always use--no-cacheor runrm -rf /var/cache/apk/*after installations. - Pin versions: Lock
alpine:3.19and specificgdalversions to prevent unexpected dependency bloat during rebuilds. - Verify layers: Run
docker history <image>anddive <image>to identify hidden bloat in intermediate layers.
Validation & Deployment Checks
Before pushing to a registry, validate the image against your serverless platform’s constraints:
- Size Check:
docker images --format "{{.Repository}}:{{.Tag}} {{.Size}}"should return< 250MBuncompressed. - Import Test: Run
docker run --rm alpine:3.19 python -c "from osgeo import gdal; print(gdal.__version__)"to confirm bindings load withoutmuslerrors. - CRS Validation: Execute
docker run --rm <image> python -c "from osgeo import osr; s = osr.SpatialReference(); s.ImportFromEPSG(4326); print(s.ExportToWkt())"to verify PROJ data paths. - Cold-Start Benchmark: Measure initialization time with
time docker run --rm <image> python handler.py. Target< 1.5sfor serverless functions.
By isolating build artifacts, enforcing musl compatibility, and explicitly managing GDAL/PROJ data paths, you achieve a lean, predictable container that scales efficiently across cloud-native geospatial pipelines.