Skip to Content
⚠️Active Development Notice: TimeTiles is under active development. Information may be placeholder content or not up-to-date.

web


web / lib/services/error-recovery

lib/services/error-recovery

Provides error recovery mechanisms for failed import jobs.

This service handles recovery from various failure scenarios in the import pipeline. It provides retry logic, error classification, and automatic recovery strategies to improve system resilience and reduce manual intervention requirements.

Key responsibilities:

  • Retry failed jobs with exponential backoff
  • Classify errors as recoverable vs permanent
  • Reset job state for recovery attempts
  • Track retry attempts and failure patterns
  • Provide manual recovery tools for operators.

Classes

ErrorRecoveryService

Service for handling import job error recovery.

Provides automatic and manual recovery mechanisms for failed import jobs, including error classification, exponential backoff retry scheduling, quota enforcement, and operator intervention tools.

Examples

Basic usage - automatic retry:

import { ErrorRecoveryService } from "@/lib/services/error-recovery"; const result = await ErrorRecoveryService.recoverFailedJob(payload, jobId); if (result.success) { console.log(`Retry scheduled for ${result.nextRetryAt}`); }

Manual reset by administrator:

await ErrorRecoveryService.resetJobToStage( payload, jobId, PROCESSING_STAGE.GEOCODE_BATCH, true // Clear retry counter );

Get recommendations for all failed jobs:

const recommendations = await ErrorRecoveryService.getRecoveryRecommendations(payload); const autoRetryable = recommendations.filter(r => r.recommendedAction === "Automatic retry available");

Constructors

Constructor

new ErrorRecoveryService(): ErrorRecoveryService

Returns

ErrorRecoveryService

Methods

recoverFailedJob()

static recoverFailedJob(payload, jobId, retryConfig): Promise<RecoveryResult>

Attempt to recover a failed import job.

This is the primary entry point for the error recovery system. It:

  1. Validates the job exists and is in a failed state
  2. Classifies the error to determine if it’s retryable
  3. Checks retry count hasn’t exceeded the maximum
  4. Verifies user quota if applicable
  5. Calculates exponential backoff delay
  6. Updates the job to retry from the appropriate recovery stage
Parameters
payload

BasePayload

Payload CMS instance for database access

jobId

ID of the failed import job to recover

string | number

retryConfig

Partial<RetryConfig> = {}

Optional retry configuration to override defaults

Returns

Promise<RecoveryResult>

Recovery result indicating success/failure and next retry time

Example
const result = await ErrorRecoveryService.recoverFailedJob(payload, 123); if (result.success) { console.log(`Retry scheduled for ${result.nextRetryAt}`); } else { console.error(`Recovery failed: ${result.error}`); }

Notes:

  • Uses exponential backoff: 30s, 60s, 120s (base 30s, multiplier 2x, max 5min)
  • Default max retries: 3
  • Respects user quota limits to prevent abuse
  • Jobs are not automatically executed; they’re scheduled for pickup by process-pending-retries job
processPendingRetries()

static processPendingRetries(payload): Promise<void>

Process pending retries (should be called periodically).

Scans for failed jobs that are scheduled for retry (based on nextRetryAt) and automatically restarts them from the appropriate recovery stage. This method should be invoked by a scheduled background job every 5 minutes.

Parameters
payload

BasePayload

Payload CMS instance for database access

Returns

Promise<void>

Promise that resolves when processing is complete

Implementation notes:

  • Processes up to 10 retries per invocation to avoid overwhelming the system
  • Only processes jobs where nextRetryAt <= current time
  • Skips jobs with non-retryable error classifications
  • Clears nextRetryAt after queueing to prevent duplicate processing
  • Should be configured as a Payload scheduled task running every 5 minutes
Example

Configure in payload.config.ts (cron runs every 5 minutes):

jobs: { tasks: [ { slug: "process-pending-retries", handler: async ({ req }) => { await ErrorRecoveryService.processPendingRetries(req.payload); }, schedule: [{ cron: "0,5,10,15,20,25,30,35,40,45,50,55 * * * *", queue: "maintenance" }] } ] }
resetJobToStage()

static resetJobToStage(payload, jobId, targetStage, clearRetries): Promise<RecoveryResult>

Manually reset a job to a specific stage (for operator intervention).

Allows administrators to manually override the automatic recovery logic and force a job to restart from a specific stage. Useful for debugging, testing, or handling edge cases that the automatic system can’t resolve.

Parameters
payload

BasePayload

Payload CMS instance for database access

jobId

ID of the import job to reset

string | number

targetStage

ProcessingStage

Processing stage to reset the job to

clearRetries

boolean = true

Whether to reset retry counter to 0 (default: true)

Returns

Promise<RecoveryResult>

Recovery result indicating success or failure

Example
// Reset job to geocoding stage and clear retry count const result = await ErrorRecoveryService.resetJobToStage( payload, 123, PROCESSING_STAGE.GEOCODE_BATCH, true );

Important notes:

  • Records manual reset in error log with timestamp and stage information
  • Bypasses all validation checks (use with caution)
  • Does not queue jobs automatically - job will be picked up by normal processing
  • Should only be used by administrators via the reset API endpoint
  • If clearRetries is false, retry count is preserved (useful for debugging retry logic)
getRecoveryRecommendations()

static getRecoveryRecommendations(payload): Promise<object[]>

Get recovery recommendations for failed jobs.

Analyzes all failed jobs in the system and provides actionable recommendations for each. Used by the recommendations API endpoint to help administrators understand which jobs need attention.

Parameters
payload

BasePayload

Payload CMS instance for database access

Returns

Promise<object[]>

Array of job recommendations with classifications and suggested actions

Example
const recommendations = await ErrorRecoveryService.getRecoveryRecommendations(payload); recommendations.forEach(rec => { console.log(`Job ${rec.jobId}: ${rec.recommendedAction}`); });

Recommendation categories:

  • “Automatic retry available” - Job can be retried automatically
  • “Manual review required” - User action needed (from classification.suggestedAction)
  • “Manual intervention required - max retries exceeded” - Retry limit hit
  • “No action recommended” - Non-retryable permanent error

Limited to 100 failed jobs per query to prevent performance issues. Access control should be applied by the calling API endpoint.

Interfaces

RetryConfig

Configuration for retry behavior.

Properties

maxRetries

maxRetries: number

Maximum number of retry attempts before giving up

baseDelayMs

baseDelayMs: number

Initial delay in milliseconds before first retry

maxDelayMs

maxDelayMs: number

Maximum delay in milliseconds between retries

backoffMultiplier

backoffMultiplier: number

Multiplier for exponential backoff (e.g., 2 = double delay each time)


ErrorClassification

Result of error classification analysis.

Properties

type

type: "recoverable" | "permanent" | "user-action-required"

Category of error determining recovery strategy

reason

reason: string

Human-readable explanation of the error

suggestedAction?

optional suggestedAction: string

Optional suggestion for user to resolve the issue

retryable

retryable: boolean

Whether this error can be retried automatically


RecoveryResult

Result of recovery operation.

Properties

success

success: boolean

Whether the recovery operation succeeded

action

action: string

Action taken or error code (e.g., “retry_scheduled”, “job_not_found”, “quota_exceeded”)

error?

optional error: string

Error message if recovery failed

retryScheduled?

optional retryScheduled: boolean

Whether a retry was successfully scheduled

nextRetryAt?

optional nextRetryAt: Date

Timestamp when the next retry will occur

Last updated on