web / lib/services/schema-builder
lib/services/schema-builder
Implements a service for progressively building a JSON schema from data samples.
This class is designed to analyze records incrementally, typically in batches, to infer a schema without needing to load the entire dataset into memory. It tracks statistics for each field, such as data types, occurrence counts, and unique values.
Key features:
- Processes data in batches to build up a schema over time.
- Uses
quicktype-coreto generate a formal JSON schema from data samples. - Detects potential ID fields, geographic coordinate fields, and enumerations (enums).
- Tracks field statistics and type conflicts.
- Can compare the generated schema against a previous version to detect changes.
Classes
ProgressiveSchemaBuilder
Constructors
Constructor
new ProgressiveSchemaBuilder(
initialState?,config?):ProgressiveSchemaBuilder
Parameters
initialState?
config?
Partial<{ maxSamples: number; maxUniqueValues: number; enumThreshold: number; enumMode: "count" | "percentage"; maxDepth: number; }>
Returns
Methods
processBatch()
processBatch(
records):object
Parameters
records
DataRecord[]
Returns
object
schemaChanged
schemaChanged:
boolean
changes
changes:
SchemaChange[]
getSchema()
getSchema():
Promise<Record<string,unknown>>
Returns
Promise<Record<string, unknown>>
compareWithPrevious()
compareWithPrevious(
previousSchema):SchemaComparison
Parameters
previousSchema
Record<string, unknown>
Returns
getSchemaSync()
getSchemaSync():
Record<string,unknown>
Returns
Record<string, unknown>
getState()
getState():
SchemaBuilderState
Returns
getFieldStatistics()
getFieldStatistics():
Record<string,FieldStatistics>
Returns
Record<string, FieldStatistics>
getSummary()
getSummary():
object
Returns
object
recordCount
recordCount:
number
fieldCount
fieldCount:
number
version
version:
number
detectedPatterns
detectedPatterns:
object
detectedPatterns.idFields
idFields:
string[]
detectedPatterns.geoFields
geoFields:
object
detectedPatterns.geoFields.latitude?
optionallatitude:string
detectedPatterns.geoFields.longitude?
optionallongitude:string
detectedPatterns.geoFields.confidence
confidence:
number
detectedPatterns.enumFields
enumFields:
string[]
References
detectEnums
Re-exports detectEnums
detectGeoFields
Re-exports detectGeoFields
detectIdFields
Re-exports detectIdFields
compareSchemas
Re-exports compareSchemas
FieldStatistics
Re-exports FieldStatistics
SchemaBuilderState
Re-exports SchemaBuilderState
SchemaChange
Re-exports SchemaChange
SchemaComparison
Re-exports SchemaComparison