Error Recovery

TimeTiles uses Payload CMS’s built-in workflow retry system for error recovery. Failed tasks are automatically retried, and completed tasks return cached output on re-run so work is never repeated.

Overview

Error recovery is handled at two levels:

Workflow level: Payload retries failed tasks with configurable retry counts. Completed tasks are cached and skipped on retry.
Sheet level: Multi-sheet workflows use per-sheet try/catch. A failure in one sheet does not block other sheets.

Error Model

Task handlers use three distinct patterns to communicate results:

Transient Failures (Throw)

Tasks throw errors for unexpected or transient failures (network timeouts, database connection issues, rate limits). Payload’s workflow system retries the task automatically.

Examples:

Network connection failures (ECONNREFUSED, ETIMEDOUT)
Database deadlocks or connection timeouts
API rate limiting (429 responses)
Temporary resource constraints

Human Review Required (needsReview)

Tasks return { needsReview: true } when a human decision is needed. The pipeline pauses for that sheet only. Other sheets continue processing.

Examples:

Schema drift detected in automated imports
High duplicate rate requiring confirmation
High geocoding failure rate

Success (Return Data)

Tasks return structured data on success. The pipeline continues to the next task.

Per-Sheet Error Isolation

Multi-sheet workflows (manual-ingest, scheduled-ingest, scraper-ingest) process sheets via Promise.allSettled with per-sheet try/catch. When a sheet fails:

The markSheetFailed function updates the ingestion job to FAILED with error details
The workflow continues processing remaining sheets
The parent import file status reflects the aggregate state

The ingest-process workflow (single job, no Promise.allSettled) does not use per-sheet isolation. Errors propagate normally and Payload’s onFail callback fires as expected.

Important: Payload’s onFail callback does not fire when errors are caught by Promise.allSettled. This is why multi-sheet workflows use explicit markSheetFailed instead.

Recovery Approaches

Automatic Retry (Payload Built-In)

When a task throws, Payload retries it up to the configured retry count. On retry:

Previously completed tasks return cached output immediately
The failed task re-runs from the beginning
No duplicate work for completed tasks

Manual Re-Queue

For workflows that have exhausted retries:

Fix the underlying issue (API key, permissions, network)
Re-queue the workflow via the admin interface
All previously completed tasks are skipped (cached)
Processing resumes from the failed task

NEEDS_REVIEW Resolution

For sheets paused by needsReview:

User reviews the issue in the admin interface (schema changes, duplicate rates, etc.)
User approves or adjusts configuration
The ingest-jobs afterChange hook automatically queues the ingest-process workflow
ingest-process runs Create Schema Version, Geocode Batch, and Create Events

API Endpoints

Retry Failed Ingestion Job

Endpoint: POST /api/ingest-jobs/{id}/retry

Purpose: Manually trigger retry for a failed ingestion job

When to Use:

Automatic retry hasn’t triggered yet
Need to retry immediately after fixing underlying issue
Testing recovery after configuration changes

Reset Ingestion Job

Endpoint: POST /api/ingest-jobs/{id}/reset

Purpose: Reset ingestion job to a specific stage for fresh restart

When to Use:

Need to restart from specific stage
Configuration changes require reprocessing
Data corruption requires fresh start from known good stage

Cautions:

Resetting clears progress from later stages
May re-process data (ensure idempotency)
Does not delete already created events (use with caution)

Monitoring

Logging

All error recovery operations are logged with context:


// Task failure in multi-sheet workflow
logger.error("Sheet processing failed", { ingestJobId: 123, sheetIndex: 2, error: "Rate limit exceeded (429)" });
 
// Workflow retry
logger.info("Workflow task retrying", { taskSlug: "geocode-batch", retryAttempt: 2, importFileId: 456 });

Ingestion Job State

Track processing state in the ingestion job record:

stage: Current processing stage (for UI progress display)
errorLog: Error details from the last failure
NEEDS_REVIEW: Indicates human review is required
FAILED: Terminal failure state with error context

Integration with Scheduled Imports

For scheduled imports (automated URL-based data fetching), error recovery follows the workflow model:

Workflow fails: Task throws during processing
Automatic retry: Payload retries the task up to configured limit
Retries exhausted: Workflow marked as failed
Next scheduled run: The next scheduled trigger creates a new workflow instance from scratch

Each scheduled run is an independent workflow instance. Failed workflows do not block future scheduled runs.