Troubleshooting

This guide helps you diagnose and resolve common issues with the TimeTiles data processing pipeline.

Common Issues

Import Stuck in Processing

Symptoms:

Import status shows “processing” for extended period
Progress hasn’t updated in hours/days
No error messages visible in UI

Possible Causes:

Background job worker not running
Job queue backlog
Memory exhaustion
Disk space full
Database connection issues
File access permissions

Diagnosis Steps:

Check job queue status in admin interface
Review error logs for recent errors
Verify background workers are running
Check system resources (memory, disk, CPU)
Verify database connectivity
Check file system permissions

Resolution:

Restart background job workers if stopped
Clear job queue backlog by adding workers
Increase memory allocation if exhausted
Free up disk space if full
Restore database connection
Fix file permissions (typically need read access)

Schema Approval Needed

Symptoms:

Import pauses at “await-approval” stage
Notification about schema changes
Admin interface shows pending approvals

Possible Causes:

Breaking schema changes detected
Dataset schema is locked
Auto-approval disabled
Manual approval required by policy

Diagnosis Steps:

Check import-job for schemaValidation.breakingChanges
Review dataset schemaConfig.locked setting
Verify schemaConfig.autoApproveNonBreaking setting
Review list of breaking changes

Understanding Change Types:

Breaking Changes (Require Approval):

Field type changes (string → number)
Required fields removed
Constraint narrowing (smaller max length)
Format changes (date format modifications)
Enum value restrictions

Non-Breaking Changes (Can Auto-Approve):

New optional fields
Constraint expansion (larger max length)
Enum value additions
Type generalization (number → string)

Resolution Options:

Approve Changes: Review and approve via admin interface
Enable Auto-Grow: Set autoGrow=true for future auto-approval
Add Transformations: Configure type transformations to handle mismatches
Fix Source Data: Correct data to match existing schema
Reject and Re-Import: Reject changes and prepare corrected file

Duplicate Events Appearing

Symptoms:

Same event appears multiple times
Expected duplicates not being detected
Deduplication not working as expected

Possible Causes:

Incorrect ID strategy configuration
External ID field path wrong
Computed hash fields insufficient
Deduplication disabled
ID generation inconsistent

Diagnosis Steps:

Check dataset.idStrategy.type configuration
Verify external ID field path exists in data
Review computed hash field selection
Check dataset.deduplicationConfig.enabled
Examine import-job.duplicates summary

ID Strategy Troubleshooting:

External ID Issues:

Verify field path is correct (case-sensitive)
Check that field exists in all rows
Ensure field values are truly unique
Verify field contains stable identifiers

Computed Hash Issues:

Include enough fields to ensure uniqueness
Avoid fields that change (timestamps, counters)
Use stable, meaningful fields (name, date, location)
Test field combination creates unique hashes

Resolution:

Fix ID Strategy: Update configuration to correct strategy
Add Missing Fields: Include more fields in computed hash
Enable Deduplication: Set deduplicationConfig.enabled=true
Clean Up Duplicates: Manually delete duplicate events
Re-Import with Correct Config: Delete events and re-import

Geocoding Failures

Symptoms:

Events created without coordinates
Geocoding stage shows errors
“Geocoding failed” messages in logs

Possible Causes:

Invalid API key
Rate limit exceeded
Malformed addresses
API service down
Network connectivity issues
Incorrect field detection

Diagnosis Steps:

Check API key configuration and validity
Review rate limit status and quotas
Examine sample addresses for formatting
Test API service manually
Verify network connectivity
Check geocodingCandidates in import-job

Geocoding Field Detection Issues:

Address Not Detected:

Field name doesn’t match common patterns
Add manual field mapping override
Verify field contains actual addresses

Coordinates Not Detected:

Latitude/longitude field names non-standard
Values outside valid ranges (-90 to 90, -180 to 180)
Fields contain non-numeric data
Add manual coordinate field mappings

Resolution:

Fix API Configuration: Update/renew API key
Increase Rate Limits: Upgrade API plan or slow down processing
Clean Address Data: Standardize address formatting
Manual Field Mapping: Override auto-detection with explicit paths
Retry Failed Geocoding: Re-run geocoding stage after fixes
Switch Providers: Use different geocoding service

Schema Conflicts

Symptoms:

Schema validation errors
Type mismatch errors
“Schema conflict” messages

Possible Causes:

Type changes in source data
Strict validation enabled
Transformations not configured
Data quality issues
Schema locked

Diagnosis Steps:

Check schemaValidation details in import-job
Review dataset.schemaConfig.strictValidation
Examine dataset.typeTransformations configuration
Sample source data for type inconsistencies
Check dataset.schemaConfig.locked

Common Schema Conflicts:

Type Mismatches:

Previous imports had numeric field, new import has strings
Previous imports had required field, new import missing it
Date format changed between imports

Field Additions:

New fields in data that don’t exist in schema
Auto-grow disabled, blocking new fields

Constraint Violations:

Values exceed existing min/max constraints
Enum values outside allowed set
String lengths exceed maxLength

Resolution:

Add Transformations: Configure type transformations for known mismatches
Enable Auto-Grow: Allow schema to grow with new optional fields
Disable Strict Validation: Allow best-effort parsing
Approve Changes: Manually approve schema changes
Clean Source Data: Fix data quality issues at source
Reset Schema: Delete schema versions and start fresh (destructive)

Memory Issues

Symptoms:

Out of memory errors
Process crashes during import
Slow performance, swapping

Possible Causes:

Batch sizes too large
Too many concurrent imports
Memory leak
Insufficient system memory
Large file processing

Diagnosis Steps:

Monitor memory usage during imports
Check batch size configuration
Review concurrent import count
Check for memory growth over time (leaks)
Review file sizes being processed

Resolution:

Reduce Batch Sizes: Lower BATCHSIZE* environment variables
Limit Concurrency: Reduce max concurrent imports
Increase System Memory: Add more RAM to server
Process Files in Chunks: Split large files before import
Restart Workers Periodically: Mitigate potential memory leaks
Optimize Transformations: Simplify custom transformation functions

Performance Degradation

Symptoms:

Imports taking much longer than before
Slow batch processing
High database CPU usage
API timeouts

Possible Causes:

Database performance issues
Too many concurrent operations
Geocoding API slow/rate-limited
Large schema complexity
Network latency
Disk I/O bottleneck

Diagnosis Steps:

Monitor batch processing times per stage
Check database query performance and indexes
Review geocoding API response times
Measure network latency to external services
Check disk I/O wait times
Review schema depth and field counts

Resolution:

Optimize Database: Add indexes, optimize queries, increase connection pool
Scale Workers: Add more background job workers
Upgrade API Plan: Increase geocoding rate limits
Simplify Schema: Reduce max schema depth, limit field proliferation
Improve Network: Use CDN, closer regions for APIs
Faster Storage: Use SSD instead of HDD, increase IOPS

Row-Level Errors

Symptoms:

Import completes but with errors
Some rows missing from final events
Error details in import-job

Possible Causes:

Data validation failures
Required fields missing
Type conversion failures
Constraint violations
Malformed data

Diagnosis Steps:

Review import-job errors array
Check which rows failed
Examine error messages
Sample failed rows from source file
Review schema requirements

Common Row Errors:

Missing Required Fields:

Row missing fields marked as required in schema
Empty strings or null values in required fields

Type Conversion Failures:

Cannot parse string to expected type
Invalid date formats
Non-numeric values in number fields

Constraint Violations:

Values outside min/max ranges
String length exceeds maxLength
Values not in enum set

Resolution:

Fix Source Data: Correct problematic rows at source
Make Fields Optional: Adjust schema to allow null values
Add Transformations: Configure parsing for known patterns
Relax Constraints: Expand min/max ranges, maxLength values
Filter Invalid Rows: Pre-process file to remove invalid rows
Manual Event Creation: Create events manually for failed rows

Debugging Tools

Version History

Purpose: Review complete processing progression

How to Use:

Open import-job record in admin interface
Navigate to “Versions” tab
Review each stage transition
Check timestamps to identify bottlenecks
Examine state changes between versions

What to Look For:

Long gaps between stage transitions (bottlenecks)
Stage transitions that failed and retried
Data changes (progress, errors, validation results)

Error Logs

Purpose: Detailed error information

How to Use:

Check import-job.errors array for row-level errors
Review application logs for system-level errors
Filter by import-job ID for relevant entries
Look for stack traces and error context

Error Types:

Row-level: Individual row processing failures
Batch-level: Entire batch failed
Stage-level: Stage failed to complete
System-level: Infrastructure failures

Performance Metrics

Purpose: Identify performance bottlenecks

How to Use:

Review processing times per stage in import-job
Check batch processing durations
Monitor API response times
Track database query performance

Key Metrics:

Time per stage
Rows processed per second
API requests per minute
Database query duration

Manual Intervention

Purpose: Resume or modify processing manually

How to Use:

Update import-job stage via admin interface
Queue specific job manually via API
Modify configuration and retry
Reset to previous stage if needed

When to Use:

Automated recovery failed
Need to skip problematic stage
Testing configuration changes
Recovering from corruption

Database Queries

Purpose: Direct inspection of processing state

How to Use:

Query import-jobs collection for detailed state
Check import-files for overall status
Review dataset-schemas for schema history
Examine events for final results

Useful Queries:

Find all stuck imports
Get imports in specific stage
Check schema versions by dataset
Count events per import

Recovery Procedures

Stage-Level Recovery

When to Use: Entire stage failed, need to retry from beginning of that stage

Steps:

Identify last successful stage from import-job record
Review error logs to understand failure cause
Fix underlying issue (API key, permissions, etc.)
Reset stage to previous successful state via admin interface
Queue appropriate job to resume processing
Monitor for successful completion

Cautions:

May re-process data (ensure idempotency)
Previous stage results should be intact
Verify fix before resuming

Batch-Level Recovery

When to Use: Partial batch completed, need to resume from interruption point

Steps:

Check progress.current vs progress.total in import-job
Identify last successfully processed batch number
Verify data integrity of partial results
Queue job with correct batch number to resume
Monitor progress to ensure continuous processing
Verify final counts match expected

Cautions:

Batch boundaries must align correctly
Partial results may exist in database
Check for duplicate processing

Complete Restart

When to Use: Import is corrupted beyond repair, need fresh start

Steps:

Mark current import-job as failed
Document what went wrong for postmortem
Delete any partially created events (if needed)
Apply lessons learned (fix config, transformations, etc.)
Create new import-job from same file
Monitor new import for successful completion

Cautions:

May lose progress (starts from beginning)
Duplicate events possible if not cleaned up
Ensure underlying issue is fixed first

Data Integrity Recovery

When to Use: Corruption detected, need to validate/repair data

Steps:

Identify scope of corruption (which events affected)
Export affected events for backup
Delete corrupted events
Re-import from original file with corrected configuration
Verify event counts and data integrity
Compare before/after to ensure correctness

Cautions:

Very destructive operation
Always backup before deletion
Test on staging first

Automatic Error Recovery

TimeTiles includes an automatic error recovery system for failed imports. See the Error Recovery documentation for complete details on:

Error classification (recoverable, permanent, user-action-required)
Automatic retry with exponential backoff
Recovery API endpoints (/retry, /reset, /recommendations)
Integration with scheduled imports
Best practices for error recovery

Prevention Best Practices

Monitoring

Set up alerts for stuck imports (>1 hour in same stage)
Monitor error rates and investigate spikes
Track performance metrics over time
Regular review of pending approvals

Configuration

Start with conservative settings
Test configuration changes in staging first
Document configuration decisions
Version control configuration files

Data Quality

Validate data before import when possible
Maintain consistent data formats
Communicate schema changes in advance
Pre-process problematic data

Capacity Planning

Monitor resource usage trends
Scale before hitting limits
Plan for peak import periods
Test with production-scale data

These troubleshooting techniques and recovery procedures should help you diagnose and resolve most common pipeline issues effectively.