web / lib/jobs/handlers/analyze-duplicates-job
lib/jobs/handlers/analyze-duplicates-job
Defines the job handler for analyzing duplicates in an imported file.
This job performs two types of duplicate detection:
- Internal Duplicates: Identifies rows within the same import file that are duplicates of each other based on the dataset’s unique ID strategy.
- External Duplicates: Checks for rows in the import file that are duplicates of existing events already in the database for the same dataset.
The results, including lists of duplicate rows and a summary, are stored in the corresponding import-jobs document.
If deduplication is disabled for the dataset, the job skips the analysis and proceeds to the next stage.
Upon completion, it transitions the import job to the SCHEMA_DETECTION stage.
Variables
analyzeDuplicatesJob
constanalyzeDuplicatesJob:object
Type declaration
slug
slug:
"analyze-duplicates"=JOB_TYPES.ANALYZE_DUPLICATES
handler()
handler: (
context) =>Promise<{output: {skipped:boolean;totalRows?:undefined;uniqueRows?:undefined;internalDuplicates?:undefined;externalDuplicates?:undefined; }; } | {output: {skipped?:undefined;totalRows:number;uniqueRows:number;internalDuplicates:number;externalDuplicates:number; }; }>
Parameters
context
Returns
Promise<{ output: { skipped: boolean; totalRows?: undefined; uniqueRows?: undefined; internalDuplicates?: undefined; externalDuplicates?: undefined; }; } | { output: { skipped?: undefined; totalRows: number; uniqueRows: number; internalDuplicates: number; externalDuplicates: number; }; }>