web / lib/jobs/handlers/analyze-duplicates-job
lib/jobs/handlers/analyze-duplicates-job
Defines the job handler for analyzing duplicates in an imported file.
This job performs two types of duplicate detection:
- Internal Duplicates: Identifies rows within the same import file that are duplicates of each other based on the dataset’s unique ID strategy.
- External Duplicates: Checks for rows in the import file that are duplicates of existing events already in the database for the same dataset.
The results, including lists of duplicate rows and a summary, are stored in the corresponding import-jobs document.
If deduplication is disabled for the dataset, the job skips the analysis and proceeds to the next stage.
Upon completion, it transitions the import job to the SCHEMA_DETECTION stage.
Variables
analyzeDuplicatesJob
constanalyzeDuplicatesJob:object
Type Declaration
slug
slug:
"analyze-duplicates"=JOB_TYPES.ANALYZE_DUPLICATES
handler()
handler: (
context) =>Promise<{ output: \{ skipped: boolean; }; } |{ output: \{ totalRows: number; uniqueRows: number; internalDuplicates: number; externalDuplicates: number; }; }>
Parameters
context
Returns
Promise<{ output: \{ skipped: boolean; }; } | { output: \{ totalRows: number; uniqueRows: number; internalDuplicates: number; externalDuplicates: number; }; }>