Background Jobs

TimeTiles uses Payload CMS’s built-in job queue and workflow system for all asynchronous processing. Jobs and workflows are defined in lib/jobs/ and registered in lib/config/payload-shared-config.ts.

Key Behavior

Auto-deletion: Completed jobs are automatically deleted (deleteJobOnComplete: true)
Workflow-based orchestration: The ingestion pipeline uses 4 Payload Workflows, queued by collection afterChange hooks
3-queue architecture: ingest (user-facing workflows), default (trigger jobs), maintenance (scheduled system jobs)
Production: 1 Docker worker container per queue via pnpm payload jobs:run --cron --queue <name>
Development: autoRun processes all queues within the Next.js process

Ingest Workflows

The ingestion pipeline is orchestrated by 4 Payload Workflows. Each workflow sequences multiple task handlers into a linear pipeline. See Data Ingestion Pipeline for detailed stage documentation.

Workflow	Trigger	Pipeline
`manual-ingest`	`ingest-files` afterChange hook	dataset-detection, then per-sheet: analyze, detect-schema, validate, create-schema-version, geocode, create-events
`scheduled-ingest`	`schedule-manager` job	url-fetch, dataset-detection, then per-sheet pipeline
`scraper-ingest`	`schedule-manager` job	scraper-execution, dataset-detection, then per-sheet pipeline
`ingest-process`	`ingest-jobs` afterChange hook (NEEDS_REVIEW approval)	create-schema-version, geocode, create-events

All ingest workflows run on the ingest queue with per-resource concurrency keys (e.g., file:{id}, sched:{id}).

Error Model

Throw for transient failures — Payload retries the task
Return { needsReview: true } for human review — pipeline pauses for that sheet
Return data for success — pipeline continues to next task
Multi-sheet files use Promise.allSettled with per-sheet try/catch; individual sheet failures do not block other sheets

Ingest Task Handlers

These task handlers are composed by the workflows above. They are not queued individually.

Task	Purpose
`dataset-detection`	Parse file, create ingestion jobs per sheet
`analyze-duplicates`	Find internal/external duplicate rows
`schema-detection`	Build progressive JSON Schema from data
`validate-schema`	Compare detected vs existing schema
`create-schema-version`	Persist approved schema version
`geocode-batch`	Geocode unique locations via providers
`create-events-batch`	Create event records in database
`url-fetch`	Download file from URL for scheduled ingest
`scraper-execution`	Run scraper in Podman container

System Jobs

System jobs use Payload’s native schedule property for cron-based scheduling.

Job	Queue	Schedule
`schedule-manager`	default	Every minute
`quota-reset`	maintenance	Daily midnight
`cache-cleanup`	maintenance	Every 6 hours
`schema-maintenance`	maintenance	Daily 3:00 AM
`audit-log-ip-cleanup`	maintenance	Daily 4:00 AM
`execute-account-deletion`	maintenance	Daily 2:00 AM
`data-export-cleanup`	maintenance	Hourly
`cleanup-stuck-scheduled-ingests`	maintenance	Hourly
`cleanup-stuck-scrapers`	maintenance	Hourly

Standalone Task Jobs

These are queued on demand (not scheduled):

Job	Purpose	Trigger
`scraper-repo-sync`	Sync scraper manifest from Git repo	Admin action
`data-export`	Generate ZIP archive of user data	User request

Adding a New Job

Create handler in lib/jobs/handlers/my-job.ts
Export job config from lib/jobs/ingest-jobs.ts
Add to ALL_JOBS array in lib/config/payload-shared-config.ts
If it is a workflow task, add it to the appropriate workflow in lib/jobs/workflows/
Create migration if the job needs new fields

Testing Jobs

See Integration Testing Patterns for job testing. Key points:

Query pending jobs before running (completedAt: { exists: false })
Verify side effects after running (not job records — they’re deleted)
Use describe.sequential() for tests that interact with the job queue
Use a drain loop with payload.jobs.run() to process chained workflow tasks in tests