Error Recovery
TimeTiles uses Payload CMS’s built-in workflow retry system for error recovery. Failed tasks are automatically retried, and completed tasks return cached output on re-run so work is never repeated.
Overview
Error recovery is handled at two levels:
- Workflow level: Payload retries failed tasks with configurable retry counts. Completed tasks are cached and skipped on retry.
- Sheet level: Multi-sheet workflows use per-sheet try/catch. A failure in one sheet does not block other sheets.
Error Model
Task handlers use three distinct patterns to communicate results:
Transient Failures (Throw)
Tasks throw errors for unexpected or transient failures (network timeouts, database connection issues, rate limits). Payload’s workflow system retries the task automatically.
Examples:
- Network connection failures (
ECONNREFUSED,ETIMEDOUT) - Database deadlocks or connection timeouts
- API rate limiting (429 responses)
- Temporary resource constraints
Human Review Required (needsReview)
Tasks return { needsReview: true } when a human decision is needed. The pipeline pauses for that sheet only. Other sheets continue processing.
Examples:
- Schema drift detected in automated imports
- High duplicate rate requiring confirmation
- High geocoding failure rate
Success (Return Data)
Tasks return structured data on success. The pipeline continues to the next task.
Per-Sheet Error Isolation
Multi-sheet workflows (manual-ingest, scheduled-ingest, scraper-ingest) process sheets via Promise.allSettled with per-sheet try/catch. When a sheet fails:
- The
markSheetFailedfunction updates the ingestion job toFAILEDwith error details - The workflow continues processing remaining sheets
- The parent import file status reflects the aggregate state
The ingest-process workflow (single job, no Promise.allSettled) does not use per-sheet isolation. Errors propagate normally and Payload’s onFail callback fires as expected.
Important: Payload’s onFail callback does not fire when errors are caught by Promise.allSettled. This is why multi-sheet workflows use explicit markSheetFailed instead.
Recovery Approaches
Automatic Retry (Payload Built-In)
When a task throws, Payload retries it up to the configured retry count. On retry:
- Previously completed tasks return cached output immediately
- The failed task re-runs from the beginning
- No duplicate work for completed tasks
Manual Re-Queue
For workflows that have exhausted retries:
- Fix the underlying issue (API key, permissions, network)
- Re-queue the workflow via the admin interface
- All previously completed tasks are skipped (cached)
- Processing resumes from the failed task
NEEDS_REVIEW Resolution
For sheets paused by needsReview:
- User reviews the issue in the admin interface (schema changes, duplicate rates, etc.)
- User approves or adjusts configuration
- The
ingest-jobsafterChange hook automatically queues theingest-processworkflow ingest-processruns Create Schema Version, Geocode Batch, and Create Events
API Endpoints
Retry Failed Ingestion Job
Endpoint: POST /api/ingest-jobs/{id}/retry
Purpose: Manually trigger retry for a failed ingestion job
When to Use:
- Automatic retry hasn’t triggered yet
- Need to retry immediately after fixing underlying issue
- Testing recovery after configuration changes
Reset Ingestion Job
Endpoint: POST /api/ingest-jobs/{id}/reset
Purpose: Reset ingestion job to a specific stage for fresh restart
When to Use:
- Need to restart from specific stage
- Configuration changes require reprocessing
- Data corruption requires fresh start from known good stage
Cautions:
- Resetting clears progress from later stages
- May re-process data (ensure idempotency)
- Does not delete already created events (use with caution)
Monitoring
Logging
All error recovery operations are logged with context:
// Task failure in multi-sheet workflow
logger.error("Sheet processing failed", { ingestJobId: 123, sheetIndex: 2, error: "Rate limit exceeded (429)" });
// Workflow retry
logger.info("Workflow task retrying", { taskSlug: "geocode-batch", retryAttempt: 2, importFileId: 456 });Ingestion Job State
Track processing state in the ingestion job record:
stage: Current processing stage (for UI progress display)errorLog: Error details from the last failureNEEDS_REVIEW: Indicates human review is requiredFAILED: Terminal failure state with error context
Integration with Scheduled Imports
For scheduled imports (automated URL-based data fetching), error recovery follows the workflow model:
- Workflow fails: Task throws during processing
- Automatic retry: Payload retries the task up to configured limit
- Retries exhausted: Workflow marked as failed
- Next scheduled run: The next scheduled trigger creates a new workflow instance from scratch
Each scheduled run is an independent workflow instance. Failed workflows do not block future scheduled runs.