Skip to Content
⚠️Active Development Notice: TimeTiles is under active development. Information may be placeholder content or not up-to-date.

Configuration

The TimeTiles data ingestion pipeline provides extensive configuration options at both the dataset and system levels. This document explains all available configuration options and their effects on pipeline behavior.

Dataset Configuration

Each dataset can configure its ingestion behavior independently. Configuration is stored in the datasets collection and affects all ingestion runs for that dataset.

ID Strategy Configuration

Controls how unique identifiers are generated and how duplicates are detected.

Strategy Types:

External ID:

  • Uses a specific field from the data as the unique identifier
  • Best for data with explicit, reliable IDs (UUID, database ID, etc.)
  • Requires specifying the field path (e.g., “event_id”, “data.uuid”)
  • Fastest strategy for duplicate detection

Computed Hash:

  • Generates ID by hashing a combination of specified fields
  • Best for data without explicit IDs but with identifying field combinations
  • Requires specifying which fields to include in hash
  • More flexible than external ID, but slightly slower

Auto (Auto-detect Duplicates by Content):

  • Automatically detects duplicates by comparing event content
  • Best for datasets without explicit IDs where content uniqueness matters
  • No configuration needed beyond enabling
  • Default strategy for new datasets

Hybrid:

  • Tries external ID first, falls back to computed hash if external ID is missing
  • Best for datasets with partial ID coverage
  • Combines reliability of external ID with flexibility of computed hash
  • Requires configuring both strategies

Duplicate Strategy:

Controls what happens when duplicates are detected:

  • Skip: Ignore duplicate rows (most common)
  • Update: Update existing event with new data
  • Version: Create new version of existing event

Deduplication Configuration

Controls early duplicate detection in Stage 2 (Analyze Duplicates).

Enabled/Disabled:

  • Enable to reduce processing volume and prevent duplicate events
  • Disable for datasets where every row should create an event

Strategy:

  • Same options as duplicate strategy: skip, update, or version
  • Controls what happens when a duplicate is found during early detection
  • Should typically match the ID strategy’s duplicate strategy for consistency

Field Specification:

  • For computed-hash: specify which fields constitute a duplicate
  • Can include nested field paths (e.g., “data.event.id”)
  • Order doesn’t matter (hashing is deterministic)

Schema Configuration

Controls schema behavior, validation, and evolution.

Locked:

  • When true: Require approval for ALL schema changes (even non-breaking)
  • When false: Allow auto-approval based on change classification
  • Use for production datasets with strict governance requirements

Auto-Grow:

  • When true: Allow schema to grow with new optional fields automatically
  • When false: Require approval for any schema changes
  • Prerequisite for auto-approval of non-breaking changes

Auto-Approve Non-Breaking:

  • When true: Non-breaking changes skip manual approval
  • When false: All changes require manual approval
  • Only effective when auto-grow is also enabled
  • Requires locked to be false

Strict Validation:

  • When true: Block imports that don’t match schema exactly
  • When false: Allow type transformations and best-effort parsing
  • Use strict mode for high-quality, well-structured data

Allow Transformations:

  • When true: Apply configured type transformations automatically
  • When false: Reject type mismatches without attempting transformation
  • Enables flexible handling of common type variations

Max Schema Depth:

  • Maximum nesting depth for nested objects in data
  • Prevents excessively deep schemas that impact performance
  • Typical values: 3-5 levels
  • Higher values increase schema complexity and query costs

Ingestion Transforms

Transform rules applied to incoming data before validation. Each transform has a type, relevant source/target fields, and an active toggle.

Transform Types:

Rename (rename):

  • Moves a field from one path to another
  • Specify from (source field path) and to (target field path)
  • Use for standardizing field names across different data sources (e.g., “EventName” to “name”)

Date Parse (date-parse):

  • Parses date strings from one format to another
  • Specify from (source field), inputFormat, and outputFormat
  • Supported input formats: DD/MM/YYYY, MM/DD/YYYY, YYYY-MM-DD, DD-MM-YYYY, MM-DD-YYYY, DD.MM.YYYY
  • Supported output formats: YYYY-MM-DD (ISO, default), DD/MM/YYYY, MM/DD/YYYY
  • Optional timezone field (e.g., “America/New_York”)

String Operation (string-op):

  • Applies a string operation to a source field
  • Specify from (source field) and operation
  • Operations: uppercase, lowercase, trim, replace (with pattern and replacement fields), expression (custom safe expression using built-in functions)
  • Expression functions include: upper, lower, trim, concat, replace, substring, toNumber, parseDate, parseBool, round, floor, ceil, abs, len, ifEmpty

Concatenate (concatenate):

  • Joins multiple source fields into a single target field
  • Specify fromFields (JSON array of source field paths), to (target field), and separator (default: space)
  • Example: combine “firstName” + “lastName” into “fullName”

Split (split):

  • Splits a single field by a delimiter into multiple target fields
  • Specify from (source field), delimiter (default: comma), and toFields (JSON array of target field names)
  • Example: split “full_name” into “first_name” and “last_name”

Common Fields:

Each transform rule also includes:

  • active: Checkbox to disable a transform without deleting it (default: true)
  • autoDetected: Whether the transform was suggested by auto-detection
  • confidence: Confidence score (0-100) for auto-detected transforms

Field Mapping Overrides

Manual specification of field mappings when auto-detection isn’t sufficient.

Override Types:

Geocoding Field Overrides:

  • Manually specify which field contains addresses
  • Manually specify latitude and longitude field pairs
  • Override auto-detection when field names are non-standard

Timestamp Field Overrides:

  • Manually specify which field contains the event timestamp
  • Override default priority (timestamp, date, datetime, etc.)
  • Handle custom timestamp field names

Required Field Overrides:

  • Force specific fields to be required
  • Mark optional fields as required for validation
  • Enforce data quality standards

Enum Detection Configuration

Controls how the system identifies enumerated (categorical) fields.

Detection Mode:

Count:

  • Field is enum if unique values ≤ threshold
  • Example: threshold=50 means fields with ≤50 unique values become enums
  • Best for small datasets or when you know the expected enum size

Percentage:

  • Field is enum if (unique values / total values) ≤ threshold
  • Example: threshold=0.05 means fields with ≤5% unique values become enums
  • Best for large datasets where absolute counts are misleading

Threshold Values:

  • Count mode: Typical values 20-100 unique values
  • Percentage mode: Typical values 0.01-0.10 (1%-10%)
  • Higher values create more enums (more permissive)
  • Lower values create fewer enums (more strict)

Geographic Field Detection

Controls automatic detection of location fields for geocoding.

Auto-Detect:

  • When true: Automatically identify address and coordinate fields
  • When false: Only use manual field mappings
  • Recommended: true for initial imports, false after field verification

Manual Overrides:

Latitude Path:

  • Specify exact field containing latitude values
  • Overrides auto-detection
  • Use when field name doesn’t match common patterns

Longitude Path:

  • Specify exact field containing longitude values
  • Must be specified if latitude path is specified
  • Use when field name doesn’t match common patterns

Note that the geoFieldDetection group does not include an address/location path field. Location field mapping is configured separately in the fieldMappingOverrides group via locationPath (see Field Mapping Overrides).


System Configuration

Global pipeline settings that affect all datasets and ingestion runs.

Batch Size Configuration

Controls how many rows are processed in each batch for various stages. Batch sizes are configured via config/timetiles.yml (under the batchSizes key) and fall back to hardcoded defaults when the YAML file is absent. There are no environment variable overrides for batch sizes.

Duplicate Analysis Batch Size (batchSizes.duplicateAnalysis):

  • Default: 5,000 rows
  • Memory-efficient for hash map operations
  • Larger values: faster but more memory
  • Smaller values: slower but less memory

Schema Detection Batch Size (batchSizes.schemaDetection):

  • Default: 10,000 rows
  • Larger batches for schema building efficiency
  • Larger values: faster schema convergence
  • Smaller values: more batches, slower convergence

Event Creation Batch Size (batchSizes.eventCreation):

  • Default: 1,000 rows
  • Balances throughput with transaction reliability
  • Larger values: faster but higher transaction timeout risk
  • Smaller values: slower but more reliable

Database Chunk Size (batchSizes.databaseChunk):

  • Default: 1,000 records
  • Batch size for bulk database operations
  • Affects memory and transaction duration

Geocoding Processing:

  • Processes ALL unique locations in one pass (not batched by rows)
  • API batch size configured separately (typically 100 requests/minute)
  • Extracts unique addresses/coordinates from entire file first
  • Results cached in lookup map for all rows

Geocoding

Geocoding providers are configured via the Payload CMS Settings global (/dashboard/globals/settings), not via config files. See the Self-Hosting Configuration docs for setup details.

Workers

Background job workers run as separate processes, one per queue. In production, each queue gets a dedicated Docker container started with pnpm payload jobs:run --cron --queue <name>. Concurrency is controlled by the number of containers deployed, not by a configuration setting. Retry behavior is handled by Payload’s built-in workflow system.


Configuration Changes

Runtime Changes

Most dataset configuration changes take effect immediately for new imports:

  • ID strategy changes
  • Schema configuration changes
  • Type transformation changes
  • Field mapping overrides

Requires Restart

  • Batch size changes in config/timetiles.yml (loaded once at startup via getAppConfig())

Migration Required

  • Changing ID strategy on datasets with existing events
  • Changing deduplication strategy significantly
Last updated on