Troubleshooting
This guide helps you diagnose and resolve common issues with the TimeTiles data processing pipeline.
Common Issues
Import Stuck in Processing
Symptoms:
- Import status shows “processing” for extended period
- Progress hasn’t updated in hours/days
- No error messages visible in UI
Possible Causes:
- Background job worker not running
- Job queue backlog
- Memory exhaustion
- Disk space full
- Database connection issues
- File access permissions
Diagnosis Steps:
- Check job queue status in admin interface
- Review error logs for recent errors
- Verify background workers are running
- Check system resources (memory, disk, CPU)
- Verify database connectivity
- Check file system permissions
Resolution:
- Restart background job workers if stopped
- Clear job queue backlog by adding workers
- Increase memory allocation if exhausted
- Free up disk space if full
- Restore database connection
- Fix file permissions (typically need read access)
Schema Approval Needed
Symptoms:
- Import pauses at “await-approval” stage
- Notification about schema changes
- Admin interface shows pending approvals
Possible Causes:
- Breaking schema changes detected
- Dataset schema is locked
- Auto-approval disabled
- Manual approval required by policy
Diagnosis Steps:
- Check import-job for schemaValidation.breakingChanges
- Review dataset schemaConfig.locked setting
- Verify schemaConfig.autoApproveNonBreaking setting
- Review list of breaking changes
Understanding Change Types:
Breaking Changes (Require Approval):
- Field type changes (string → number)
- Required fields removed
- Constraint narrowing (smaller max length)
- Format changes (date format modifications)
- Enum value restrictions
Non-Breaking Changes (Can Auto-Approve):
- New optional fields
- Constraint expansion (larger max length)
- Enum value additions
- Type generalization (number → string)
Resolution Options:
- Approve Changes: Review and approve via admin interface
- Enable Auto-Grow: Set autoGrow=true for future auto-approval
- Add Transformations: Configure type transformations to handle mismatches
- Fix Source Data: Correct data to match existing schema
- Reject and Re-Import: Reject changes and prepare corrected file
Duplicate Events Appearing
Symptoms:
- Same event appears multiple times
- Expected duplicates not being detected
- Deduplication not working as expected
Possible Causes:
- Incorrect ID strategy configuration
- External ID field path wrong
- Computed hash fields insufficient
- Deduplication disabled
- ID generation inconsistent
Diagnosis Steps:
- Check dataset.idStrategy.type configuration
- Verify external ID field path exists in data
- Review computed hash field selection
- Check dataset.deduplicationConfig.enabled
- Examine import-job.duplicates summary
ID Strategy Troubleshooting:
External ID Issues:
- Verify field path is correct (case-sensitive)
- Check that field exists in all rows
- Ensure field values are truly unique
- Verify field contains stable identifiers
Computed Hash Issues:
- Include enough fields to ensure uniqueness
- Avoid fields that change (timestamps, counters)
- Use stable, meaningful fields (name, date, location)
- Test field combination creates unique hashes
Resolution:
- Fix ID Strategy: Update configuration to correct strategy
- Add Missing Fields: Include more fields in computed hash
- Enable Deduplication: Set deduplicationConfig.enabled=true
- Clean Up Duplicates: Manually delete duplicate events
- Re-Import with Correct Config: Delete events and re-import
Geocoding Failures
Symptoms:
- Events created without coordinates
- Geocoding stage shows errors
- “Geocoding failed” messages in logs
Possible Causes:
- Invalid API key
- Rate limit exceeded
- Malformed addresses
- API service down
- Network connectivity issues
- Incorrect field detection
Diagnosis Steps:
- Check API key configuration and validity
- Review rate limit status and quotas
- Examine sample addresses for formatting
- Test API service manually
- Verify network connectivity
- Check geocodingCandidates in import-job
Geocoding Field Detection Issues:
Address Not Detected:
- Field name doesn’t match common patterns
- Add manual field mapping override
- Verify field contains actual addresses
Coordinates Not Detected:
- Latitude/longitude field names non-standard
- Values outside valid ranges (-90 to 90, -180 to 180)
- Fields contain non-numeric data
- Add manual coordinate field mappings
Resolution:
- Fix API Configuration: Update/renew API key
- Increase Rate Limits: Upgrade API plan or slow down processing
- Clean Address Data: Standardize address formatting
- Manual Field Mapping: Override auto-detection with explicit paths
- Retry Failed Geocoding: Re-run geocoding stage after fixes
- Switch Providers: Use different geocoding service
Schema Conflicts
Symptoms:
- Schema validation errors
- Type mismatch errors
- “Schema conflict” messages
Possible Causes:
- Type changes in source data
- Strict validation enabled
- Transformations not configured
- Data quality issues
- Schema locked
Diagnosis Steps:
- Check schemaValidation details in import-job
- Review dataset.schemaConfig.strictValidation
- Examine dataset.typeTransformations configuration
- Sample source data for type inconsistencies
- Check dataset.schemaConfig.locked
Common Schema Conflicts:
Type Mismatches:
- Previous imports had numeric field, new import has strings
- Previous imports had required field, new import missing it
- Date format changed between imports
Field Additions:
- New fields in data that don’t exist in schema
- Auto-grow disabled, blocking new fields
Constraint Violations:
- Values exceed existing min/max constraints
- Enum values outside allowed set
- String lengths exceed maxLength
Resolution:
- Add Transformations: Configure type transformations for known mismatches
- Enable Auto-Grow: Allow schema to grow with new optional fields
- Disable Strict Validation: Allow best-effort parsing
- Approve Changes: Manually approve schema changes
- Clean Source Data: Fix data quality issues at source
- Reset Schema: Delete schema versions and start fresh (destructive)
Memory Issues
Symptoms:
- Out of memory errors
- Process crashes during import
- Slow performance, swapping
Possible Causes:
- Batch sizes too large
- Too many concurrent imports
- Memory leak
- Insufficient system memory
- Large file processing
Diagnosis Steps:
- Monitor memory usage during imports
- Check batch size configuration
- Review concurrent import count
- Check for memory growth over time (leaks)
- Review file sizes being processed
Resolution:
- Reduce Batch Sizes: Lower BATCHSIZE* environment variables
- Limit Concurrency: Reduce max concurrent imports
- Increase System Memory: Add more RAM to server
- Process Files in Chunks: Split large files before import
- Restart Workers Periodically: Mitigate potential memory leaks
- Optimize Transformations: Simplify custom transformation functions
Performance Degradation
Symptoms:
- Imports taking much longer than before
- Slow batch processing
- High database CPU usage
- API timeouts
Possible Causes:
- Database performance issues
- Too many concurrent operations
- Geocoding API slow/rate-limited
- Large schema complexity
- Network latency
- Disk I/O bottleneck
Diagnosis Steps:
- Monitor batch processing times per stage
- Check database query performance and indexes
- Review geocoding API response times
- Measure network latency to external services
- Check disk I/O wait times
- Review schema depth and field counts
Resolution:
- Optimize Database: Add indexes, optimize queries, increase connection pool
- Scale Workers: Add more background job workers
- Upgrade API Plan: Increase geocoding rate limits
- Simplify Schema: Reduce max schema depth, limit field proliferation
- Improve Network: Use CDN, closer regions for APIs
- Faster Storage: Use SSD instead of HDD, increase IOPS
Row-Level Errors
Symptoms:
- Import completes but with errors
- Some rows missing from final events
- Error details in import-job
Possible Causes:
- Data validation failures
- Required fields missing
- Type conversion failures
- Constraint violations
- Malformed data
Diagnosis Steps:
- Review import-job errors array
- Check which rows failed
- Examine error messages
- Sample failed rows from source file
- Review schema requirements
Common Row Errors:
Missing Required Fields:
- Row missing fields marked as required in schema
- Empty strings or null values in required fields
Type Conversion Failures:
- Cannot parse string to expected type
- Invalid date formats
- Non-numeric values in number fields
Constraint Violations:
- Values outside min/max ranges
- String length exceeds maxLength
- Values not in enum set
Resolution:
- Fix Source Data: Correct problematic rows at source
- Make Fields Optional: Adjust schema to allow null values
- Add Transformations: Configure parsing for known patterns
- Relax Constraints: Expand min/max ranges, maxLength values
- Filter Invalid Rows: Pre-process file to remove invalid rows
- Manual Event Creation: Create events manually for failed rows
Debugging Tools
Version History
Purpose: Review complete processing progression
How to Use:
- Open import-job record in admin interface
- Navigate to “Versions” tab
- Review each stage transition
- Check timestamps to identify bottlenecks
- Examine state changes between versions
What to Look For:
- Long gaps between stage transitions (bottlenecks)
- Stage transitions that failed and retried
- Data changes (progress, errors, validation results)
Error Logs
Purpose: Detailed error information
How to Use:
- Check import-job.errors array for row-level errors
- Review application logs for system-level errors
- Filter by import-job ID for relevant entries
- Look for stack traces and error context
Error Types:
- Row-level: Individual row processing failures
- Batch-level: Entire batch failed
- Stage-level: Stage failed to complete
- System-level: Infrastructure failures
Performance Metrics
Purpose: Identify performance bottlenecks
How to Use:
- Review processing times per stage in import-job
- Check batch processing durations
- Monitor API response times
- Track database query performance
Key Metrics:
- Time per stage
- Rows processed per second
- API requests per minute
- Database query duration
Manual Intervention
Purpose: Resume or modify processing manually
How to Use:
- Update import-job stage via admin interface
- Queue specific job manually via API
- Modify configuration and retry
- Reset to previous stage if needed
When to Use:
- Automated recovery failed
- Need to skip problematic stage
- Testing configuration changes
- Recovering from corruption
Database Queries
Purpose: Direct inspection of processing state
How to Use:
- Query import-jobs collection for detailed state
- Check import-files for overall status
- Review dataset-schemas for schema history
- Examine events for final results
Useful Queries:
- Find all stuck imports
- Get imports in specific stage
- Check schema versions by dataset
- Count events per import
Recovery Procedures
Stage-Level Recovery
When to Use: Entire stage failed, need to retry from beginning of that stage
Steps:
- Identify last successful stage from import-job record
- Review error logs to understand failure cause
- Fix underlying issue (API key, permissions, etc.)
- Reset stage to previous successful state via admin interface
- Queue appropriate job to resume processing
- Monitor for successful completion
Cautions:
- May re-process data (ensure idempotency)
- Previous stage results should be intact
- Verify fix before resuming
Batch-Level Recovery
When to Use: Partial batch completed, need to resume from interruption point
Steps:
- Check progress.current vs progress.total in import-job
- Identify last successfully processed batch number
- Verify data integrity of partial results
- Queue job with correct batch number to resume
- Monitor progress to ensure continuous processing
- Verify final counts match expected
Cautions:
- Batch boundaries must align correctly
- Partial results may exist in database
- Check for duplicate processing
Complete Restart
When to Use: Import is corrupted beyond repair, need fresh start
Steps:
- Mark current import-job as failed
- Document what went wrong for postmortem
- Delete any partially created events (if needed)
- Apply lessons learned (fix config, transformations, etc.)
- Create new import-job from same file
- Monitor new import for successful completion
Cautions:
- May lose progress (starts from beginning)
- Duplicate events possible if not cleaned up
- Ensure underlying issue is fixed first
Data Integrity Recovery
When to Use: Corruption detected, need to validate/repair data
Steps:
- Identify scope of corruption (which events affected)
- Export affected events for backup
- Delete corrupted events
- Re-import from original file with corrected configuration
- Verify event counts and data integrity
- Compare before/after to ensure correctness
Cautions:
- Very destructive operation
- Always backup before deletion
- Test on staging first
Automatic Error Recovery
TimeTiles includes an automatic error recovery system for failed imports. See the Error Recovery documentation for complete details on:
- Error classification (recoverable, permanent, user-action-required)
- Automatic retry with exponential backoff
- Recovery API endpoints (
/retry,/reset,/recommendations) - Integration with scheduled imports
- Best practices for error recovery
Prevention Best Practices
Monitoring
- Set up alerts for stuck imports (>1 hour in same stage)
- Monitor error rates and investigate spikes
- Track performance metrics over time
- Regular review of pending approvals
Configuration
- Start with conservative settings
- Test configuration changes in staging first
- Document configuration decisions
- Version control configuration files
Data Quality
- Validate data before import when possible
- Maintain consistent data formats
- Communicate schema changes in advance
- Pre-process problematic data
Capacity Planning
- Monitor resource usage trends
- Scale before hitting limits
- Plan for peak import periods
- Test with production-scale data
These troubleshooting techniques and recovery procedures should help you diagnose and resolve most common pipeline issues effectively.