Data Flow

How data enters, is processed, stored, queried, and displayed across TimeTiles.

System Overview

End-to-end view from data entry through processing, storage, and display.

Entry points: Files (CSV, Excel, ODS) are uploaded directly. URLs and scrapers go through scheduled or manual triggers. Webhooks can trigger both scheduled imports and scraper runs.

Workflows: Four Payload Workflows on the ingest queue handle the different entry paths. All converge into the same 6-task per-sheet pipeline.

Storage: Events are stored with PostGIS Point geometry, JSONB original data, and denormalized access fields for zero-query permission checks.

Import Pipeline Detail

The 6-task sheet processing pipeline. For stage-by-stage documentation, see Processing Stages.

File Reading

Processing Pipeline

Deduplication strategies: "external-id" (field from source data), "computed-hash" (hash of title + date + location), "content-hash" (hash of entire row), "hybrid" (try external ID, fall back to computed hash).

Review gate: The pipeline pauses per-sheet (not per-file) when schema changes are breaking or when it’s the first import. The ingest-process workflow resumes after user approval.

Query & Display Flow

How data flows from PostgreSQL through the filter system to the frontend.

Access control: resolveEventQueryContext determines which catalogs the user can see (public + own for authenticated users). buildCanonicalFilters normalizes URL parameters into validated SQL conditions with field whitelisting against SQL injection.

PostGIS functions: cluster_events() does server-side spatial clustering by zoom level and bounds. calculate_event_histogram() computes time-bucketed event counts with automatic bucket sizing.

Background Systems

Job queues, scheduling, scraper execution, and maintenance.

Three queues: ingest for import workflows (dedicated Docker worker in production), default for trigger jobs like schedule-manager, maintenance for periodic system tasks.

Scraper isolation: Scrapers run in rootless Podman containers with read-only filesystem, no network access, CPU/memory limits, and user namespace isolation. The TimeScrape runner (Hono API server) orchestrates execution.

Data Organization & Access Control

Multi-tenant data model with denormalized access fields.

Denormalized fields: catalogOwnerId and datasetIsPublic are copied onto every event row. This enables zero-join access control — SQL queries filter directly on these fields instead of joining through catalog/dataset tables.

Site & View resolution: Cached with 5-minute TTL. Sites map domains to configurations. Views define which catalogs/datasets are visible (scopes: all, catalogs, or datasets).