web / lib/services/schema-detection/types
lib/services/schema-detection/types
Core types for the schema detection plugin.
This module defines the interfaces for schema detectors, detection context, and detection results used throughout the plugin.
Interfaces
SchemaDetector
A schema detector is a single plugin that handles ALL detection for a file/dataset. Similar to how geocoding providers work - you select one, it does the job.
Properties
name
name:
string
Unique detector name (used for selection and DB storage)
label
label:
string
Human-readable label for admin UI
description?
optionaldescription?:string
Description for admin UI
canHandle
canHandle: (
context) =>boolean|Promise<boolean>
Check if this detector can handle the given input. Return false to fall back to default detector.
Parameters
context
Returns
boolean | Promise<boolean>
detect
detect: (
context) =>DetectionResult|Promise<DetectionResult>
Perform ALL detection in one call. Returns language, field mappings, and patterns together.
Parameters
context
Returns
DetectionResult | Promise<DetectionResult>
DetectionContext
Context passed to detectors containing all information needed for detection.
Properties
fieldStats
fieldStats:
Record<string,FieldStatistics>
Field statistics from schema builder
sampleData
sampleData:
Record<string,unknown>[]
Sample data rows
headers
headers:
string[]
Column headers
config
config:
DetectorConfig
Configuration from database (if available)
DetectionResult
Complete detection result returned by a detector.
Properties
language
language:
LanguageResult
Detected language
fieldMappings
fieldMappings:
FieldMappingsResult
Field mappings - all semantic field detection consolidated here
patterns
patterns:
PatternResult
Pattern detection - structural patterns only
LanguageResult
Language detection result.
Properties
code
code:
string
ISO 639-3 language code (e.g., ‘eng’, ‘deu’, ‘fra’)
name
name:
string
Human-readable language name
confidence
confidence:
number
Confidence score from 0-1
isReliable
isReliable:
boolean
Whether the detection is considered reliable (confidence > 0.5)
FieldMapping
A single field mapping with confidence score.
Properties
path
path:
string
Path to the field in the data
confidence
confidence:
number
Confidence score from 0-1
GeoFieldMapping
Geo field mapping - supports both separate and combined coordinate formats.
Properties
type
type:
"separate"|"combined"
Type of geo field: separate lat/lng columns or combined
confidence
confidence:
number
Overall confidence score
latitude?
optionallatitude?:FieldMapping
For separate lat/lng columns
longitude?
optionallongitude?:FieldMapping
For separate lat/lng columns
combined?
optionalcombined?:object
For combined coordinate field (e.g., “lat,lng” or GeoJSON)
path
path:
string
format
format:
string
locationField?
optionallocationField?:FieldMapping
Address/location field for geocoding (when coordinates not available)
FieldMappingsResult
All field mappings detected for a schema.
Properties
title
title:
FieldMapping|null
Title/name field
description
description:
FieldMapping|null
Description/details field
timestamp
timestamp:
FieldMapping|null
Timestamp/date field
locationName
locationName:
FieldMapping|null
Location name/venue field
geo
geo:
GeoFieldMapping|null
Geo coordinates - all coordinate info in one place
PatternResult
Structural pattern detection results.
Properties
idFields
idFields:
string[]
Fields that appear to be unique identifiers
enumFields
enumFields:
string[]
Fields that appear to be enumerations (low cardinality)
DetectorConfig
Configuration for a detector stored in database.
Properties
enabled
enabled:
boolean
Whether this detector is enabled
priority
priority:
number
Priority (lower = higher priority)
options?
optionaloptions?:Record<string,unknown>
Detector-specific options
ValidatorConfig
Configuration for overriding built-in field validators.
Properties
minStringPct?
optionalminStringPct?:number
Minimum string percentage threshold (overrides per-field-type defaults like 0.8 for title).
idealLengthRange?
optionalidealLengthRange?: [number,number]
Ideal length range [min, max] for full score.
acceptableLengthRange?
optionalacceptableLengthRange?: [number,number]
Acceptable length range [min, max] for partial score.
DetectionOptions
Options to customize schema detection behavior.
All options are optional. When omitted, detection uses built-in defaults.
Options can be passed to createDefaultDetector() or individual utility functions.
Properties
language?
optionallanguage?:string
Force a specific ISO 639-3 language code (skips language detection).
additionalLanguages?
optionaladditionalLanguages?:string[]
Additional languages to check alongside detected language.
languageConfidenceThreshold?
optionallanguageConfidenceThreshold?:number
Confidence threshold; below this the result is marked unreliable.
customLanguageDetector?
optionalcustomLanguageDetector?: (sampleData,headers) =>LanguageResult
Fully replace the built-in language detector.
Parameters
sampleData
Record<string, unknown>[]
headers
string[]
Returns
fieldPatterns?
optionalfieldPatterns?:Partial<Record<string,Partial<Record<string,RegExp[]>>>>
Extra field-name patterns keyed by field type then language code.
replacePatterns?
optionalreplacePatterns?:string[]
Field types whose default patterns should be replaced (not appended) by fieldPatterns.
scoringWeights?
optionalscoringWeights?: [number,number]
Scoring weights [patternWeight, validationWeight] (default [0.6, 0.4]).
validatorOverrides?
optionalvalidatorOverrides?:Partial<Record<string,ValidatorConfig>>
Per-field-type validator config overrides.
customValidators?
optionalcustomValidators?:Partial<Record<string, (stats) =>number>>
Per-field-type custom validator functions that fully replace the built-in validator.
latitudePatterns?
optionallatitudePatterns?:RegExp[]
Extra latitude column-name patterns.
longitudePatterns?
optionallongitudePatterns?:RegExp[]
Extra longitude column-name patterns.
combinedCoordinatePatterns?
optionalcombinedCoordinatePatterns?:RegExp[]
Extra combined-coordinate column-name patterns.
replaceCoordinatePatterns?
optionalreplaceCoordinatePatterns?:boolean
When true, custom coordinate patterns replace defaults instead of prepending.
coordinateBounds?
optionalcoordinateBounds?:object
Custom coordinate bounds for validation.
latitude?
optionallatitude?:object
latitude.min
min:
number
latitude.max
max:
number
longitude?
optionallongitude?:object
longitude.min
min:
number
longitude.max
max:
number
addressPatterns?
optionaladdressPatterns?:RegExp[]
Extra address/location column-name patterns.
replaceAddressPatterns?
optionalreplaceAddressPatterns?:boolean
When true, custom address patterns replace defaults instead of prepending.
enumThreshold?
optionalenumThreshold?:number
Enum detection threshold (absolute count or percentage depending on enumMode).
enumMode?
optionalenumMode?:"count"|"percentage"
Enum detection mode: “count” uses absolute unique-value count, “percentage” uses ratio.
idPatterns?
optionalidPatterns?:RegExp[]
Extra ID column-name patterns.
replaceIdPatterns?
optionalreplaceIdPatterns?:boolean
When true, custom ID patterns replace defaults instead of prepending.
skip?
optionalskip?:object
Skip individual pipeline stages.
language?
optionallanguage?:boolean
fieldMapping?
optionalfieldMapping?:boolean
coordinates?
optionalcoordinates?:boolean
enums?
optionalenums?:boolean
ids?
optionalids?:boolean
additionalFieldTypes?
optionaladditionalFieldTypes?:Record<string,{ patterns: Partial\<Record\<string, RegExp[]\>\>; validator: (stats) => number; }>
Register additional field types beyond the standard five.
SchemaDetectionPluginOptions
Options for the schema detection Payload plugin.
Properties
enabled?
optionalenabled?:boolean
Enable/disable the plugin entirely
detectors?
optionaldetectors?:SchemaDetector[]
Built-in detectors to register (default: [defaultDetector])
collectionSlug?
optionalcollectionSlug?:string
Collection slug for schema detectors config (default: ‘schema-detectors’)
extendDatasets?
optionalextendDatasets?:boolean
Add detector selection field to Datasets collection
datasetsCollectionSlug?
optionaldatasetsCollectionSlug?:string
Dataset collection slug to extend (default: ‘datasets’)
Type Aliases
SchemaDetectionPlugin
SchemaDetectionPlugin = (
options?) => (config) =>Config
Type for the schema detection Payload plugin function.
Parameters
options?
Returns
(config) => Config
References
FieldStatistics
Re-exports FieldStatistics