Memory and CPU Allocation for Raster Workloads

Q: How much memory does a 10,000 × 10,000 pixel GeoTIFF need in AWS Lambda?

A 4-band float32 raster at that size requires ~1.6 GB uncompressed. With GDAL's internal cache, numpy intermediate arrays, and Python object overhead, allocate at least 3.2–4 GB (double the raw footprint) in your Lambda configuration.

Q: Does raising Lambda memory also increase vCPU?

Yes. AWS Lambda allocates vCPU proportionally: 1,769 MB grants one full vCPU; 10,240 MB grants approximately 6 vCPUs. Every raster workload that runs parallel GDAL warps or numpy band math benefits from this scaling.

Q: Should I use threading or multiprocessing for raster band math in serverless functions?

Use threading for I/O-bound operations (rasterio window reads/writes). rasterio releases the GIL during GDAL I/O, so ThreadPoolExecutor scales effectively. Avoid multiprocessing in serverless contexts — serialization overhead and cold-start memory inflation typically negate any GIL bypass benefit.

Q: What GDAL_CACHEMAX value is safe for a 3 GB Lambda function?

Set GDAL_CACHEMAX to 25–30% of allocated memory, so roughly 750–900 MB for a 3 GB function. Higher values starve numpy and rasterio of the heap they need for windowed tile arrays.

On AWS Lambda, allocating 3,538 MB (2 full vCPUs) is the minimum viable tier for a single windowed 10,000 × 10,000 pixel GeoTIFF; anything below that risks OOM kills before GDAL finishes its first overview pass. On GCP Cloud Run, CPU is configured independently of memory, so you must explicitly request at least 2 vCPU alongside a 4 GB memory allocation to prevent GDAL resampling operations from stalling during request processing. Correct memory and CPU sizing is the single biggest lever for pipeline reliability, execution cost, and throughput in serverless raster architectures — misconfiguration leads to silent OOM kills, CPU throttling, and cascading timeout failures that are difficult to attribute without deliberate instrumentation.

This page covers the platform-by-platform resource model, a step-by-step sizing and configuration workflow, verification methods, and the failure signatures that indicate under-allocation.

Why This Constraint Matters for Geospatial Workloads

Raster data is memory-intensive by nature. A modest Sentinel-2 scene at 10 m resolution covers roughly 100 × 100 km, producing ~10,000 × 10,000 pixels per band. Load 12 bands as float32 and you have touched ~4.8 GB of pixel data before any reprojection, resampling, or band math occurs. Serverless platforms enforce hard memory ceilings — 10 GB on AWS Lambda, 32 GB on GCP Cloud Functions 2nd gen — and those ceilings interact with library initialization overhead in ways that do not exist in traditional VM deployments.

The interaction chain that produces OOM failures in serverless raster pipelines:

Library initialization: rasterio imports GDAL shared libraries and registers format drivers, consuming 40–120 MB before any data is loaded. The Cold Start Mapping for Python GDAL sequence confirms driver registration is the dominant cold-start cost, and it competes for the same memory budget as your pixel arrays.
Compressed-to-uncompressed expansion: A 200 MB Cloud Optimized GeoTIFF with LZW or Deflate compression can expand to 1.6 GB in memory. Allocating based on file size, not uncompressed footprint, is the most common sizing error.
GDAL internal cache: GDAL_CACHEMAX defaults to 5% of available RAM. On a 1 GB Lambda function that is only 51 MB — far below what GDAL needs for multi-overview reads. The cache fills, GDAL falls back to repeated disk reads from /tmp, and execution time balloons.
Intermediate array accumulation: Operations like rio_cogeo.cog_translate(), pyproj reprojection via rasterio.warp.reproject(), or numpy histogram equalisation all create temporary arrays alongside the source data. Peak RSS is typically 2–2.5× the raw pixel footprint at the worst moment in the pipeline.

Ephemeral Storage Limits in AWS Lambda compounds this: when in-memory processing overflows into /tmp-backed GDAL virtual datasets, you can exhaust the 512 MB default /tmp ceiling (expandable to 10 GB) before GDAL registers the write failure, producing a silent corrupt-output rather than a thrown exception.

Platform-by-Platform Limits

The three major serverless platforms differ in how memory and CPU are related, billed, and configured. Every table cell below reflects current hard limits.

	AWS Lambda	GCP Cloud Run	Azure Functions (Consumption)
Max memory	10,240 MB	32 GB (per instance)	1.5 GB
Max vCPU	~6 vCPU at 10,240 MB	8 vCPU (configurable)	~1 vCPU
CPU-memory coupling	Proportional (1 vCPU per 1,769 MB)	Independent	Fixed per plan tier
Billing unit	GB-seconds (memory × duration)	vCPU-seconds + GB-seconds separately	Execution count + GB-seconds
Min viable raster tier	3,008 MB (≥1.7 vCPU)	2 GB + 1 vCPU (explicit)	Premium EP2 (3.5 GB, 2 vCPU)
Max execution time	15 minutes	60 minutes (HTTP), unlimited (jobs)	10 minutes (Consumption)
Config mechanism	`MemorySize` in function config	`--memory`, `--cpu` flags or YAML	`functionTimeout` + plan selection
CPU during idle	N/A (invocation-scoped)	Throttled to 0.08 vCPU unless `--cpu-always-on`	N/A

Azure Functions Consumption plan is not suitable for heavy raster processing. The 1.5 GB memory ceiling and 1 vCPU allocation mean even a 5,000 × 5,000 pixel float32 mosaic will OOM. Use the Premium EP2 or EP3 plan (3.5 GB / 7 GB, 2–4 vCPU) for any GDAL-backed transformation workload.

GCP Cloud Run’s idle CPU throttling is a hidden failure mode: if you allocate 2 vCPU but do not pass --no-cpu-throttling, long-running GDAL warps that pause between tile writes may stall as the container drops to 0.08 vCPU between request handler yields.

Memory–CPU Scaling Diagram

The following diagram shows how AWS Lambda’s proportional vCPU scaling maps across the memory range relevant to raster workloads, with annotation of the recommended minimum and maximum tiers.

Step-by-Step Implementation

Step 1 — Calculate the Raw Pixel Footprint

Before touching any configuration, establish the mathematical memory floor for your typical input:

python

def raster_memory_floor_bytes(width: int, height: int, band_count: int, dtype_bytes: int = 4) -> int:
    """
    Estimate uncompressed in-memory footprint for a raster.
    dtype_bytes: 1=uint8, 2=uint16/int16, 4=float32/int32, 8=float64
    Multiply result by 2.5 to get safe Lambda allocation floor.
    """
    raw = width * height * band_count * dtype_bytes
    # 2.5x accounts for GDAL cache, numpy intermediates, Python object overhead
    return int(raw * 2.5)

# Example: Sentinel-2 L2A, 4 bands, float32
floor = raster_memory_floor_bytes(10980, 10980, 4, 4)
print(f"Recommended minimum: {floor / 1e9:.2f} GB")
# → Recommended minimum: 4.82 GB

For Lambda, round up to the next supported tier above your calculated floor. Supported tiers increment by 64 MB between 128 MB and 10,240 MB, so a 4.82 GB floor maps to a 4,928 MB configuration.

Step 2 — Profile Peak RSS with a Representative Dataset

Use memory_profiler against a real tile before committing to a tier:

python

# profile_raster_handler.py
import os
import rasterio
from rasterio.windows import Window
from memory_profiler import profile

# Always set GDAL environment variables explicitly — never leave these as defaults
os.environ["GDAL_DATA"] = "/opt/gdal/share/gdal"
os.environ["PROJ_LIB"] = "/opt/share/proj"
os.environ["GDAL_CACHEMAX"] = "512"        # MB — tune in step 3
os.environ["GDAL_NUM_THREADS"] = "2"       # match target vCPU
os.environ["LD_LIBRARY_PATH"] = "/opt/lib"
os.environ["CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE"] = "YES"

TILE_SIZE = 1024  # pixels per window edge

@profile
def process_raster(src_path: str, dst_path: str) -> None:
    with rasterio.open(src_path) as src:
        profile = src.profile.copy()
        profile.update(driver="GTiff", compress="deflate", tiled=True,
                       blockxsize=TILE_SIZE, blockysize=TILE_SIZE)

        with rasterio.open(dst_path, "w", **profile) as dst:
            for ji, window in src.block_windows(1):
                data = src.read(window=window)
                # Band math: NDVI = (NIR - Red) / (NIR + Red)
                nir = data[3].astype("float32")
                red = data[2].astype("float32")
                ndvi = (nir - red) / (nir + red + 1e-6)
                dst.write(ndvi, 1, window=window)

if __name__ == "__main__":
    process_raster("s3://bucket/sentinel2_tile.tif", "/tmp/ndvi_output.tif")

Run locally against a tile matching your production input dimensions:

bash

python -m memory_profiler profile_raster_handler.py 2>&1 | grep "MiB"

The “Increment” column in the output shows per-line memory growth. The peak “Mem usage” line is your observed RSS; multiply by 1.2 as a safety buffer when selecting a memory tier.

Step 3 — Configure GDAL Environment Variables

Set these explicitly in every deployment. Never rely on platform defaults — they are tuned for general workloads, not GDAL.

python

# gdal_config.py — import this at the top of your Lambda/Cloud Function handler
import os

def configure_gdal(allocated_memory_mb: int, vcpu_count: float) -> None:
    """
    Apply GDAL tuning for the given serverless allocation.
    Call once at module load, not inside the handler, to avoid per-invocation overhead.
    """
    cache_mb = int(allocated_memory_mb * 0.25)  # 25% of allocation
    threads = max(1, int(vcpu_count))

    os.environ.setdefault("GDAL_DATA", "/opt/gdal/share/gdal")
    os.environ.setdefault("PROJ_LIB", "/opt/share/proj")
    os.environ.setdefault("LD_LIBRARY_PATH", "/opt/lib")
    os.environ["GDAL_CACHEMAX"] = str(cache_mb)
    os.environ["GDAL_NUM_THREADS"] = str(threads)
    # Prevent GDAL from buffering large compressed writes in RAM
    os.environ["CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE"] = "YES"
    # Avoid GDAL opening auxiliary .aux.xml files — unnecessary I/O in serverless
    os.environ["GDAL_PAM_ENABLED"] = "NO"
    # Silence GDAL warnings that inflate CloudWatch log costs
    os.environ["CPL_LOG_ERRORS"] = "ON"

# For a 4096 MB Lambda (≈2.3 vCPU):
configure_gdal(allocated_memory_mb=4096, vcpu_count=2.3)

Step 4 — Configure the Platform Allocation

AWS Lambda (boto3/CLI):

bash

aws lambda update-function-configuration \
  --function-name raster-ndvi-processor \
  --memory-size 4096 \
  --ephemeral-storage '{"Size": 4096}' \
  --timeout 300

See the step-by-step guide for configuring 10 GB memory on AWS Lambda for the maximum-tier configuration including CDK and Terraform examples.

GCP Cloud Run (YAML manifest):

yaml

# cloudrun-raster-service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: raster-processor
spec:
  template:
    metadata:
      annotations:
        # Allocate CPU continuously — prevents stall during long GDAL warps
        run.googleapis.com/cpu-throttling: "false"
    spec:
      containerConcurrency: 1   # one raster job per instance to avoid memory contention
      containers:
        - image: gcr.io/PROJECT/raster-processor:latest
          resources:
            limits:
              memory: "8Gi"
              cpu: "4"
          env:
            - name: GDAL_DATA
              value: "/usr/share/gdal"
            - name: PROJ_LIB
              value: "/usr/share/proj"
            - name: GDAL_CACHEMAX
              value: "2048"
            - name: GDAL_NUM_THREADS
              value: "4"

Azure Functions Premium (host.json + Bicep):

json

// host.json — Azure Functions Premium EP2 (3.5 GB, 2 vCPU)
{
  "version": "2.0",
  "functionTimeout": "00:10:00",
  "extensions": {
    "blobs": { "maxDegreeOfParallelism": 2 }
  }
}

bicep

// azure-function-app.bicep
resource functionApp 'Microsoft.Web/sites@2023-01-01' = {
  name: 'raster-processor'
  properties: {
    siteConfig: {
      appSettings: [
        { name: 'GDAL_DATA', value: '/home/site/wwwroot/.python_packages/lib/python3.11/site-packages/osgeo/gdal-data' }
        { name: 'PROJ_LIB', value: '/home/site/wwwroot/.python_packages/lib/python3.11/site-packages/pyproj/proj_dir/share/proj' }
        { name: 'GDAL_CACHEMAX', value: '768' }
        { name: 'GDAL_NUM_THREADS', value: '2' }
      ]
    }
  }
}

Step 5 — Implement Windowed Reads Aligned to Block Boundaries

Never load entire rasters into memory. Align window sizes to the blockxsize/blockysize in the GeoTIFF header to avoid I/O amplification:

python

import rasterio
from rasterio.windows import Window
from concurrent.futures import ThreadPoolExecutor
from typing import Generator, Tuple

def aligned_windows(src: rasterio.DatasetReader, tile_size: int = 1024) -> Generator[Window, None, None]:
    """
    Yield windows aligned to the source block structure.
    Misaligned windows cause GDAL to re-read partial blocks, multiplying I/O.
    """
    block_shapes = src.block_shapes
    if block_shapes:
        bh, bw = block_shapes[0]
    else:
        bh, bw = tile_size, tile_size

    for y in range(0, src.height, bh):
        for x in range(0, src.width, bw):
            win = Window(x, y,
                         min(bw, src.width - x),
                         min(bh, src.height - y))
            yield win

def process_raster_windowed(src_path: str, dst_path: str, max_workers: int = 2) -> None:
    """
    Process a raster in aligned windows using ThreadPoolExecutor.
    rasterio releases the GIL during I/O, so threading scales for read/write ops.
    """
    with rasterio.open(src_path) as src:
        profile = src.profile.copy()
        profile.update(compress="deflate", tiled=True, blockxsize=512, blockysize=512)
        windows = list(aligned_windows(src))

        def read_window(win: Window) -> Tuple[Window, object]:
            return win, src.read(window=win)

        with rasterio.open(dst_path, "w", **profile) as dst:
            with ThreadPoolExecutor(max_workers=max_workers) as executor:
                for win, data in executor.map(read_window, windows):
                    # Apply transform (example: scale to uint16)
                    scaled = (data.astype("float32") / data.max() * 65535).astype("uint16")
                    dst.write(scaled, window=win)

Measurement and Verification

CloudWatch Metrics (AWS Lambda)

After deploying with the new configuration, run 50+ invocations against representative inputs and query:

python

import boto3
from datetime import datetime, timedelta, timezone

cw = boto3.client("cloudwatch", region_name="eu-west-1")

def get_memory_utilization(function_name: str, hours: int = 1) -> dict:
    """
    Retrieve MaxMemoryUsed and Duration for the last N hours.
    A healthy function should peak at <80% of MemorySize.
    """
    end = datetime.now(timezone.utc)
    start = end - timedelta(hours=hours)

    for metric in ("MaxMemoryUsed", "Duration"):
        response = cw.get_metric_statistics(
            Namespace="AWS/Lambda",
            MetricName=metric,
            Dimensions=[{"Name": "FunctionName", "Value": function_name}],
            StartTime=start,
            EndTime=end,
            Period=3600,
            Statistics=["Maximum", "Average", "p99"],
        )
        print(f"{metric}: {response['Datapoints']}")

get_memory_utilization("raster-ndvi-processor")
# Expected: MaxMemoryUsed p99 < 3276 MB for a 4096 MB function (80% ceiling)
# Expected: Duration p99 < 120000 ms for 10,000×10,000 input

Set a CloudWatch alarm at 80% of allocated memory to catch drift before OOM kills occur:

bash

aws cloudwatch put-metric-alarm \
  --alarm-name "raster-processor-memory-high" \
  --metric-name MaxMemoryUsed \
  --namespace AWS/Lambda \
  --statistic Maximum \
  --period 60 \
  --threshold 3276 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=FunctionName,Value=raster-ndvi-processor \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:eu-west-1:123456789:alerts

GCP Cloud Run Metrics

bash

# Query peak memory usage over the last hour via gcloud
gcloud monitoring metrics list \
  --filter="metric.type=run.googleapis.com/container/memory/utilizations" \
  --format="table(metric.labels.service_name, points[0].value.distributionValue.mean)"

Expected output: mean utilization < 0.8 (80%) for a healthy allocation.

Failure Modes and Debugging

1. Silent OOM Kill (no exception in logs)

Signature: Lambda invocation ends with REPORT Duration: Xms but no END RequestId: line; CloudWatch shows MaxMemoryUsed equal to MemorySize; downstream consumers see no output.

Root cause: The Linux OOM killer terminates the Python process before it can write a log entry. Lambda wraps this as a status 137 (SIGKILL).

Fix: Increase MemorySize by at least 512 MB increments until MaxMemoryUsed p99 falls below 80% of allocation. For a 1,769 MB function processing a 4-band float32 tile, you likely need 3,538 MB minimum.

2. GDAL Warp Timeout Without Progress

Signature: Task timed out after 300.00 seconds; CloudWatch Duration metric consistently near the configured timeout; no obvious error in logs; output GeoTIFF exists but is truncated.

Root cause: Insufficient vCPU causes GDAL’s internal thread pool to queue work faster than it executes. On GCP Cloud Run without --no-cpu-throttling, the container drops to 0.08 vCPU between write operations, stalling the warp.

Fix: On Lambda, move to a higher memory tier (vCPU scales proportionally). On Cloud Run, add the run.googleapis.com/cpu-throttling: "false" annotation. Confirm: execution time should drop by 40–70% at 4× vCPU versus 1 vCPU for a CPU-bound warp.

3. `GDAL_CACHEMAX` Starvation

Signature: GDAL logs GDAL: GDALDefaultOverviews::BuildOverviews(): building overview, taking long time; repeated disk seeks visible in /proc/self/io reads; execution time 3–5× expected.

Root cause: Default GDAL_CACHEMAX (5% of RAM) is too small to hold frequently accessed overview tiles in memory. GDAL repeatedly re-reads blocks from /tmp or object storage.

Fix: Set GDAL_CACHEMAX to 25% of allocated memory. For a 4,096 MB function: GDAL_CACHEMAX=1024. Never set it above 35% — GDAL does not release this cache to numpy, and array operations will page.

4. Thread Contention from `GDAL_NUM_THREADS=ALL_CPUS`

Signature: Intermittent OSError: [Errno 11] Resource temporarily unavailable during concurrent window reads; CPU utilization in CloudWatch spikes then drops to zero in an alternating pattern.

Root cause: ALL_CPUS creates a thread per logical CPU. On Lambda at 10,240 MB (6 vCPU with hyperthreading visible as 12 logical CPUs), GDAL spawns 12 threads competing for 6 real cores and the GIL.

Fix: Set GDAL_NUM_THREADS to the integer vCPU count, not ALL_CPUS. For 4,096 MB Lambda: GDAL_NUM_THREADS=2. For 10,240 MB: GDAL_NUM_THREADS=6.

5. Azure Consumption Plan OOM (1.5 GB Ceiling)

Signature: Azure Functions host returns HTTP 500; Application Insights shows OutOfMemoryException in the .NET host wrapper (not in your Python code); function runtime restarts.

Root cause: Azure Consumption plan’s 1.5 GB ceiling is below the minimum viable raster footprint. The OOM originates in the Functions host, not your code, so Python-level tracemalloc does not capture it.

Fix: Migrate to a Premium EP2 plan (3.5 GB, 2 vCPU) or EP3 (14 GB, 4 vCPU). The Consumption plan cannot process any multi-band raster larger than approximately 3,000 × 3,000 pixels at float32.

Cost and Scaling Considerations

AWS Lambda: The Memory-Duration Tradeoff

Lambda billing is GB-seconds = (MemorySize_GB × Duration_ms / 1000). Because vCPU scales with memory, a higher allocation typically reduces duration enough to produce a net cost reduction up to a crossover point.

python

def lambda_cost_gb_seconds(memory_mb: int, duration_ms: float, price_per_gbs: float = 0.0000166667) -> float:
    """
    Estimate Lambda cost for one invocation.
    price_per_gbs: current AWS Lambda price in USD (us-east-1, arm64 is ~20% cheaper)
    """
    gb = memory_mb / 1024
    seconds = duration_ms / 1000
    return gb * seconds * price_per_gbs

# Hypothetical 10,000×10,000 reprojection:
# At 1,769 MB: takes 180,000 ms (GIL-constrained, 1 vCPU)
cost_low = lambda_cost_gb_seconds(1769, 180_000)   # → $0.0053

# At 3,538 MB: takes 70,000 ms (2 vCPU, parallel I/O)
cost_mid = lambda_cost_gb_seconds(3538, 70_000)    # → $0.0041

# At 7,076 MB: takes 32,000 ms (4 vCPU)
cost_high = lambda_cost_gb_seconds(7076, 32_000)   # → $0.0038

print(f"1769 MB: ${cost_low:.4f} | 3538 MB: ${cost_mid:.4f} | 7076 MB: ${cost_high:.4f}")

For most raster reprojection and band-math workloads, the cost minimum is in the 3,538–7,076 MB range. Beyond ~7 GB, the parallelism gains diminish for pure Python workloads constrained by the GIL.

GCP Cloud Run: Independent Billing

Cloud Run bills vCPU-seconds and GB-seconds separately. For raster jobs where CPU is the bottleneck, increasing vCPU without increasing memory reduces total cost. Profile with --cpu 1 versus --cpu 4 and compare vCPU-seconds × $0.00002400 against the execution time reduction.

Scale-Out vs Scale-Up

For datasets that can be partitioned by tile or temporal slice, scale-out (many smaller functions processing tiles in parallel) is almost always cheaper than scale-up (one large function). Chunked I/O for Large Satellite Imagery covers tile partitioning strategies that pair directly with the memory sizing approach here. SQS and Pub/Sub Queue Routing Strategies describes how to distribute those tile jobs across a worker pool without manual orchestration.

Frequently Asked Questions

How much memory does a 10,000 × 10,000 pixel GeoTIFF need in AWS Lambda?

A 4-band float32 raster at that size requires ~1.6 GB uncompressed. With GDAL’s internal cache, numpy intermediate arrays, and Python object overhead, allocate at least 3.2–4 GB (double the raw footprint) in your Lambda configuration.

Does raising Lambda memory also increase vCPU?

Yes. AWS Lambda allocates vCPU proportionally: 1,769 MB grants one full vCPU; 10,240 MB grants approximately 6 vCPUs. Every raster workload that runs parallel GDAL warps or numpy band math benefits from this scaling.

Should I use threading or multiprocessing for raster band math in serverless functions?

Use threading for I/O-bound operations such as rasterio window reads and writes. rasterio releases the GIL during GDAL I/O, so ThreadPoolExecutor scales effectively. Avoid multiprocessing in serverless contexts — serialization overhead and cold-start memory inflation typically negate any GIL bypass benefit for the tile sizes common in these pipelines.

What GDAL_CACHEMAX value is safe for a 3 GB Lambda function?

Set GDAL_CACHEMAX to 25–30% of allocated memory, so roughly 750–900 MB for a 3 GB function. Higher values starve numpy and rasterio of the heap they need for windowed tile arrays.

How to Configure 10 GB Memory for AWS Lambda Raster Processing — maximum-tier configuration with CDK, Terraform, and CLI examples
Ephemeral Storage Limits in AWS Lambda — /tmp ceiling interaction with large raster intermediates
Cold Start Mapping for Python GDAL — how GDAL driver registration and memory tier affect initialization latency
Chunked I/O for Large Satellite Imagery — tile partitioning to keep per-function memory requirements manageable
IAM Security Boundaries for Cloud GIS — least-privilege role scoping for raster processing pipelines

Back to Serverless Geospatial Architecture & Platform Limits