HTTP Caching
How TimeTiles implements HTTP caching for scheduled URL imports using RFC 7234 standards.
Purpose
The URL fetch cache reduces redundant network requests when importing data from external APIs:
- Avoids re-downloading unchanged data
- Respects API rate limits
- Saves bandwidth and improves performance
- Reduces quota consumption
HTTP Caching Standards (RFC 7234)
ETag (Entity Tag)
Server provides unique identifier for response version:
Response: ETag: "abc123xyz"
Next Request: If-None-Match: "abc123xyz"
Server Response: 304 Not Modified (if unchanged)Last-Modified
Server indicates when resource was last updated:
Response: Last-Modified: Wed, 21 Oct 2025 07:28:00 GMT
Next Request: If-Modified-Since: Wed, 21 Oct 2025 07:28:00 GMT
Server Response: 304 Not Modified (if unchanged)Cache-Control Directives
Server specifies caching behavior:
max-age=N- Cache for N secondsno-cache- Revalidate before using cached responseno-store- Never cacheprivate- Do not cache (browser-only directive)
Cache Flow
Cache Hit
Request → Check cache → Found & valid → Return cached response ✓Cache Miss
Request → Check cache → Not found → Fetch from URL → Store → Return responseRevalidation
Request → Check cache → Found but stale → Send conditional request with ETag
→ 304 Not Modified → Update cache metadata → Return cached response ✓
→ 200 OK with new data → Update cache → Return new responseTTL Calculation
The cache determines Time-To-Live using this priority order:
Cache-Control: no-store→ Don’t cache (TTL = 0)Cache-Control: no-cache→ Don’t cache (TTL = 0)Cache-Control: max-age=N→ Use N secondsExpiresheader → Calculate from expiration date- Default TTL → Use
URL_FETCH_CACHE_TTLenvironment variable
Cache Key Generation
Cache keys are generated from normalized URLs to maximize hit rate:
Normalization steps:
- Hostname lowercased (
API.Example.com→api.example.com) - Default ports removed (
:80and:443stripped) - Trailing slashes removed
- Query parameters sorted alphabetically
- Fragments removed (
#section)
Example:
Original: https://API.Example.com:443/events/?limit=100&format=json#top
Normalized: https://api.example.com/events?format=json&limit=100These URLs cache as the same entry:
https://api.example.com/datahttps://API.Example.com/data/https://api.example.com:443/datahttps://api.example.com/data?b=2&a=1
Storage Architecture
File System Backend
The cache uses persistent disk storage:
Structure:
/var/cache/timetiles/http/
├── index.json # Cache metadata and key index
└── entries/
├── abc123.cache # Individual cache entries
└── def456.cacheCharacteristics:
- Survives server restarts and deployments
- Configurable size limits (
URL_FETCH_CACHE_MAX_SIZE) - Automatic cleanup every 6 hours via background job
- Lazy eviction (expired entries removed on next access)
Why Not Generic Cache?
TimeTiles has a separate generic cache system (CacheManager) used by other components, but URL fetch cache is standalone because:
- HTTP-specific features (ETags, 304 responses, Cache-Control)
- Requires persistent storage (filesystem only)
- Different configuration needs
- Dedicated cleanup schedule
Cache Headers
The cache adds X-Cache header for debugging:
HIT- Served from cacheMISS- Fetched from origin serverSTALE- Cached but expired, fallback usedREVALIDATED- 304 response, cache metadata updated
Integration Points
Used by:
url-fetch-job- Scheduled import URL fetching- Scheduled imports with
advancedOptions.useHttpCache: true
Not used by:
- Manual file uploads (no URL involved)
- Geocoding API calls (uses separate cache)
- API endpoint responses (no caching layer)
Design Decisions
Why Filesystem Only?
- Persistence required across deployments
- Large response bodies don’t fit well in memory
- Disk space more scalable than RAM for caching
Why Separate from Generic Cache?
- HTTP-specific features (conditional requests, 304 responses)
- Different lifecycle management (cleanup schedule)
- Configuration independence from generic caching
Why Not Use CDN?
- Scheduled imports run server-side, not from client
- Need control over revalidation logic
- Privacy: API tokens shouldn’t go through CDN
Related Documentation
- Usage Limits Configuration - HTTP cache environment variables
- Resource Protection - How caching affects quotas
- API Reference - Cache service implementation