Reducing Python GDAL Cold Starts with Provisioned Concurrency

Q: How many provisioned instances do I need for a Python GDAL function?

Start with 5–10 instances for moderate workloads. Monitor ProvisionedConcurrencySpilloverInvocations on AWS or instance_count on GCP. If spillover consistently exceeds 5% of total requests, increase provisioned capacity by 20% increments until the rate drops below 2%.

Q: Will GDAL_DISABLE_READDIR_ON_OPEN break S3 sidecar file detection?

It disables automatic scanning for auxiliary files (.ovr, .aux.xml, .wld) at open time. If your pipeline depends on external overviews or world files, either pass explicit filenames to GDAL or set GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR rather than YES — this still blocks deep recursive scans while allowing single-directory lookups.

Configure ProvisionedConcurrencyConfig (AWS), --min-instances (GCP), or preWarmedInstances (Azure) to keep pre-warmed containers permanently available, then call gdal.AllRegister() at module scope so driver registration runs exactly once per container lifecycle — not once per request. This eliminates the 10–30 second startup penalty that the Cold Start Mapping for Python GDAL sequence documents as the primary latency source for serverless raster pipelines. When combined with GDAL_DISABLE_READDIR_ON_OPEN=YES and lru_cache on driver lookups, provisioned containers deliver consistent sub-200 ms response times regardless of traffic gaps.

Why This Technique Is Needed

The Cold Start Mapping for Python GDAL sequence identifies four sequential bottlenecks: dynamic library resolution (libgdal.so, libproj.so, libgeos_c.so), driver registration across all compiled format plugins, PROJ coordinate system database loading, and package unpacking from zip layers or container image layers. In unoptimized deployments these phases compound to 4–12 seconds. Serverless platforms tear down execution environments after periods of inactivity, forcing the entire chain to repeat on the next invocation.

Provisioned concurrency interrupts this cycle in two ways. First, the platform keeps a fixed number of containers alive and fully initialized between requests — loaded shared libraries stay resident in process memory, and GDAL’s internal driver registry is never torn down. Second, pairing provisioned instances with module-scope initialization (rather than per-handler initialization) guarantees that when the platform calls your initialization code during container provisioning, GDAL is ready before the first request arrives.

Without this approach, any geospatial API that scales from zero — including tile servers, on-demand COG subsetting, and spatial query endpoints — will spike to unacceptable latency during traffic ramp-ups or after overnight inactivity windows. Ephemeral Storage Limits in AWS Lambda can add further pressure: if GDAL data files are unpacked to /tmp on every cold start, the I/O cost compounds the startup penalty even further.

Prerequisites

Before implementing, confirm all of the following:

Python runtime: 3.11 or 3.12. GDAL wheels built against older runtimes may silently drop format drivers when linked against the provider’s bundled libpython.
GDAL version: 3.6 or later. Earlier versions lack gdal.GetDriverCount() reliability fixes that affect AllRegister() idempotency.
Layer or container image: GDAL must be present as a Lambda Layer (e.g., arn:aws:lambda:…:gdal36-python312) or baked into a container image based on public.ecr.aws/lambda/python:3.12. Mixed zip + layer deployments require the layer to be listed before the function code.
Environment variables set in function configuration (not in code):
- GDAL_DATA=/opt/share/gdal — path to the GDAL data directory inside the layer or container
- PROJ_LIB=/opt/share/proj — path to the PROJ datum grid directory
- LD_LIBRARY_PATH=/opt/lib — path containing libgdal.so, libproj.so, and libgeos_c.so
- GDAL_DISABLE_READDIR_ON_OPEN=YES — prevents recursive directory scans on every Open() call
IAM permissions: lambda:PutProvisionedConcurrencyConfig and lambda:GetProvisionedConcurrencyConfig for the deploying principal; function execution role needs no additional permissions for this optimization.
Memory allocation: 1024 MB minimum. GDAL’s driver registration allocates internal C++ caches; allocations below 512 MB trigger swap-backed I/O during initialization, negating the warm-instance benefit.

Provisioned Concurrency Configuration

Each platform exposes a different control surface. The table below shows exact configuration for all three major providers:

Platform	Configuration	Minimum Warm Instances	Billing Model
AWS Lambda	`ProvisionedConcurrencyConfig` via SAM, CDK, or Console; supports auto-scaling via Application Auto Scaling	1 per function version/alias	Per GB-second provisioned + per-request charge while idle
GCP Cloud Run	`--min-instances N` flag or `minInstances` in `service.yaml`; applies per revision	1 per revision	Per vCPU-second + per-GB-memory-second while idle at minimum
Azure Functions Premium	`preWarmedInstances` in `host.json`; only available on Elastic Premium (EP1/EP2/EP3) plans	1 per plan	Per instance-hour billed continuously

For AWS Lambda, also alias your provisioned version so traffic shifts happen atomically:

python

# deploy_provisioned.py — run as part of your CI/CD pipeline
import boto3

lambda_client = boto3.client("lambda", region_name="us-east-1")

FUNCTION_NAME = "gdal-tile-server"
ALIAS_NAME    = "live"
PROVISIONED   = 10   # start conservative; tune from spillover metrics

# 1. Publish a new immutable version from $LATEST
version_resp = lambda_client.publish_version(FunctionName=FUNCTION_NAME)
version_arn  = version_resp["FunctionArn"]
version_num  = version_resp["Version"]

# 2. Point the alias at the new version
lambda_client.update_alias(
    FunctionName=FUNCTION_NAME,
    Name=ALIAS_NAME,
    FunctionVersion=version_num,
)

# 3. Attach provisioned concurrency to the alias (not $LATEST)
lambda_client.put_provisioned_concurrency_config(
    FunctionName=FUNCTION_NAME,
    Qualifier=ALIAS_NAME,
    ProvisionedConcurrentExecutions=PROVISIONED,
)

print(f"Provisioned {PROVISIONED} warm instances on {FUNCTION_NAME}:{ALIAS_NAME} (v{version_num})")

Always attach provisioned concurrency to a published version or alias, never to $LATEST. AWS does not allow provisioned concurrency on $LATEST, and GCP ignores min-instances on revisions that receive no traffic share.

Implementation

The following handler pattern ensures GDAL initializes exactly once per container lifecycle. The SVG below illustrates the request lifecycle difference between cold and warm paths.

python

import os
import json
import logging
from functools import lru_cache
from osgeo import gdal

logger = logging.getLogger(__name__)

# ---------------------------------------------------------------------------
# Module-level initialization — executes once per container lifecycle.
# On a provisioned-concurrency instance this runs at provisioning time,
# not at request time, so the first handler call hits an already-warm GDAL.
# ---------------------------------------------------------------------------

def _init_gdal() -> None:
    """Pre-initialize GDAL with explicit environment validation."""
    gdal.UseExceptions()

    # GDAL_DISABLE_READDIR_ON_OPEN must be set before any Open() call.
    # It prevents GDAL from scanning sibling files on S3 or /tmp on every open —
    # a common source of 5–10 s latency in Layer-based Lambda deployments.
    gdal.SetConfigOption("GDAL_DISABLE_READDIR_ON_OPEN", "YES")

    # GDAL_DATA and PROJ_LIB are injected as function environment variables
    # (e.g., /opt/share/gdal and /opt/share/proj from the Lambda Layer).
    # Validate here so misconfigured deployments fail loudly at init time.
    gdal_data = os.environ.get("GDAL_DATA")
    proj_lib  = os.environ.get("PROJ_LIB")

    if not gdal_data or not os.path.isdir(gdal_data):
        raise RuntimeError(
            f"GDAL_DATA is unset or missing: {gdal_data!r}. "
            "Set GDAL_DATA in your function environment configuration."
        )
    if not proj_lib or not os.path.isdir(proj_lib):
        raise RuntimeError(
            f"PROJ_LIB is unset or missing: {proj_lib!r}. "
            "Set PROJ_LIB in your function environment configuration."
        )

    # Force driver registration upfront. On a warm instance this is a no-op
    # because AllRegister() is idempotent — it checks an internal flag.
    gdal.AllRegister()

    driver_count = gdal.GetDriverCount()
    logger.info(
        "GDAL initialized",
        extra={"gdal_drivers": driver_count, "gdal_data": gdal_data, "proj_lib": proj_lib}
    )
    if driver_count < 10:
        raise RuntimeError(
            f"GDAL registered only {driver_count} drivers — Layer installation is incomplete."
        )


# Execute at import time so provisioned-concurrency containers are ready
# before the first request. Failures here abort container initialization
# and generate a clear error in CloudWatch rather than a silent handler failure.
_init_gdal()


@lru_cache(maxsize=32)
def _get_driver(driver_name: str) -> gdal.Driver:
    """Return a cached GDAL driver handle. Avoids repeated C-extension string lookups."""
    driver = gdal.GetDriverByName(driver_name)
    if driver is None:
        raise RuntimeError(
            f"GDAL driver '{driver_name}' is not registered. "
            "Verify the format is compiled into your GDAL build."
        )
    return driver


def handler(event: dict, context: object) -> dict:
    """
    Serverless entry point. GDAL is already initialized on both cold and warm paths.
    On provisioned instances the handler receives requests immediately,
    with no initialization cost.
    """
    input_path: str | None = event.get("input_path")
    if not input_path:
        return {"statusCode": 400, "body": json.dumps({"error": "Missing input_path"})}

    try:
        # Validate the requested driver is available (result is cached after first call)
        _get_driver("GTiff")

        ds = gdal.Open(input_path, gdal.GA_ReadOnly)
        if ds is None:
            return {
                "statusCode": 422,
                "body": json.dumps({"error": f"Could not open raster: {input_path}"})
            }

        meta = {
            "bands":        ds.RasterCount,
            "width":        ds.RasterXSize,
            "height":       ds.RasterYSize,
            "projection":   ds.GetProjection(),
            "geotransform": list(ds.GetGeoTransform()),
        }
        ds = None  # Explicit dereference — GDAL does not use Python GC for file handles

        return {"statusCode": 200, "body": json.dumps(meta)}

    except RuntimeError as exc:
        logger.error("GDAL processing error", extra={"error": str(exc), "input": input_path})
        return {"statusCode": 500, "body": json.dumps({"error": "Geospatial processing failed"})}

Key points in this implementation:

_init_gdal() is called at module scope, not inside the handler. On a provisioned-concurrency container the platform calls this during the provisioning phase, before any request arrives.
gdal.UseExceptions() is called before AllRegister(). Without it, GDAL returns None on failures instead of raising, causing silent data loss in production.
Environment variable validation at init time surfaces misconfigured deployments as noisy container failures, visible in CloudWatch Logs, rather than as silent None driver returns mid-request.
ds = None explicitly releases the GDAL file handle. GDAL’s C++ layer does not participate in Python’s garbage collector; relying on del ds or scope exit for file closure causes handle leaks under concurrent load.

Verification

After deploying with provisioned concurrency enabled, use the following probe to confirm warm-path execution:

python

import boto3
import json
import time

client = boto3.client("lambda", region_name="us-east-1")
FUNCTION = "gdal-tile-server:live"   # invoke the alias, not $LATEST

results = []
for i in range(3):
    t0 = time.perf_counter()
    resp = client.invoke(
        FunctionName=FUNCTION,
        Payload=json.dumps({"input_path": "/vsis3/my-bucket/sample.tif"}),
    )
    elapsed_ms = (time.perf_counter() - t0) * 1000
    body = json.loads(resp["Payload"].read())
    results.append({
        "invocation":   i + 1,
        "duration_ms":  round(elapsed_ms),
        "status":       body.get("statusCode"),
        "concurrency":  resp.get("ResponseMetadata", {}).get("HTTPHeaders", {}).get(
                            "x-amzn-requestid", "unknown"
                        ),
    })
    time.sleep(0.1)

for r in results:
    print(r)

Expected output on a correctly provisioned function (all three invocations should complete in under 300 ms, with no invocation showing the 10–30 s penalty of a cold start):

code

{'invocation': 1, 'duration_ms': 187, 'status': 200, 'concurrency': '…'}
{'invocation': 2, 'duration_ms': 143, 'status': 200, 'concurrency': '…'}
{'invocation': 3, 'duration_ms': 156, 'status': 200, 'concurrency': '…'}

To confirm provisioned concurrency is actually being used (not just on-demand warm instances), check the AWS CloudWatch metric ProvisionedConcurrencyUtilization for the alias. It should show a value between 0 and 100%. If ProvisionedConcurrencySpilloverInvocations is non-zero, increase your provisioned count.

On GCP Cloud Run, query container/instance_count filtered by state=active to verify minimum instances are staying alive between requests.

Gotchas and Edge Cases

Attaching provisioned concurrency to $LATEST is silently rejected on AWS. The API call succeeds but no instances are provisioned. Always publish a version or attach to an alias that points to a published version.
GDAL_DISABLE_READDIR_ON_OPEN=YES disables .ovr sidecar detection. If your pipeline relies on external overview files or .aux.xml auxiliary metadata files, use GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR instead. This blocks recursive scans while still allowing single-directory sidecar discovery. See the full option list in GDAL Configuration Options.
Azure Functions Consumption Plan does not support preWarmedInstances. The preWarmedInstances setting in host.json is exclusive to the Elastic Premium (EP1/EP2/EP3) and App Service plans. Attempting to set it on a Consumption plan has no effect — the configuration key is ignored without an error. Review Memory and CPU Allocation for Raster Workloads before choosing Azure as a platform for latency-sensitive GDAL workloads.
GCP Cloud Run min-instances keeps containers alive but does not guarantee pre-warmed state after a revision update. After a new revision is deployed, even minimum instances may experience one cold start cycle before reaching steady state. Issue a synthetic warm-up request as part of your deployment pipeline to force initialization before shifting traffic.
Provisioned concurrency and Python Layer Management and Size Reduction interact on initialization cost. Oversized layers with unnecessary format drivers increase AllRegister() time during the provisioning phase. Reducing layer size with --no-binary builds or stripped wheels cuts the provisioning window — which reduces the time between a deployment and the first warm request being served.

Frequently Asked Questions

How many provisioned instances do I need for a Python GDAL function?

Start with 5–10 instances for moderate workloads. Monitor ProvisionedConcurrencySpilloverInvocations on AWS or instance_count on GCP. If spillover consistently exceeds 5% of total requests, increase provisioned capacity by 20% increments until the rate drops below 2%.

Does provisioned concurrency work with container image deployments?

Yes. AWS Lambda supports provisioned concurrency on both zip-packaged and container image functions. Container images often see greater benefit because the larger image layers are already extracted into the container filesystem, removing unpacking overhead from the initialization path.

Will GDAL_DISABLE_READDIR_ON_OPEN break S3 sidecar file detection?

It disables automatic scanning for auxiliary files (.ovr, .aux.xml, .wld) at open time. If your pipeline depends on external overviews or world files, either pass explicit filenames to GDAL or set GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR rather than YES — this still blocks deep recursive scans while allowing single-directory lookups.

Cold Start Mapping for Python GDAL — full initialization sequence breakdown and measurement techniques
Ephemeral Storage Limits in AWS Lambda — /tmp pressure that compounds cold start cost when GDAL data files are extracted on every invocation
Managing /tmp Storage Limits for GeoTIFF Extraction — streaming patterns that avoid disk materialization during warm-path raster reads
Memory and CPU Allocation for Raster Workloads — choosing the right memory tier so GDAL’s C++ allocations do not force swap during provisioning
Python Layer Management and Size Reduction — reducing GDAL layer size to lower provisioning initialization time

Back to Serverless Geospatial Architecture & Platform Limits