Validators
Data validation rules and checks
Built-in Validators
Excel-to-sql provides six built-in validators covering the most common data validation scenarios.
1. Unique Validator
Ensures that all values in a column are unique, preventing duplicate records.
Ensure column values are unique:
from excel_to_sql import ExcelToSqlite
from excel_to_sql.validators import UniqueValidator
sdk = ExcelToSqlite()
result = sdk.import_excel(
"customers.xlsx",
"customers",
validation_rules=[
ValidationRule("email", "unique")
]
)2. Range Validator
Validates that numeric values fall within a specified minimum and maximum range.
Validate numeric ranges:
from excel_to_sql.validators import ValidationRule
sdk = ExcelToSqlite()
result = sdk.import_excel(
"products.xlsx",
"products",
validation_rules=[
ValidationRule("price", "range", min_value=0, max_value=1000000),
ValidationRule("quantity", "range", min_value=0, max_value=10000),
ValidationRule("age", "range", min_value=0, max_value=120)
]
)3. Regex Validator
Uses regular expressions to validate text patterns like email addresses, phone numbers, and postal codes.
Pattern matching with regular expressions:
result = sdk.import_excel(
"users.xlsx",
"users",
validation_rules=[
ValidationRule("email", "regex", pattern=r"^[^@]+@[^@]+\.[^@]+$"),
ValidationRule("phone", "regex", pattern=r"^\+?\d{10,15}$"),
ValidationRule("postal_code", "regex", pattern=r"^\d{5}(-\d{4})?$")
]
)4. Enum Validator
Restricts values to a predefined set of allowed values, useful for status codes and categories.
Values must be in a set:
result = sdk.import_excel(
"products.xlsx",
"products",
validation_rules=[
ValidationRule("status", "enum", values=["Active", "Inactive", "Pending"]),
ValidationRule("category", "enum", values=["Electronics", "Hardware", "Software"])
]
)5. NotNull Validator
Ensures that a column contains no null values, critical for required fields like IDs and foreign keys.
Ensure no null values:
result = sdk.import_excel(
"orders.xlsx",
"orders",
validation_rules=[
ValidationRule("id", "not_null"),
ValidationRule("customer_id", "not_null"),
ValidationRule("order_date", "not_null")
]
)Custom Validators
For business-specific validation logic, create custom validator functions with full control over the validation process.
Create your own validation logic:
from excel_to_sql.validators import CustomValidator
def validate_positive(value):
return value > 0, "Value must be positive"
sdk.import_excel(
"products.xlsx",
"products",
validation_rules=[
CustomValidator("price", validate_positive)
]
)Multiple Validators
You can apply multiple validation rules to the same column, creating layered validation logic.
Apply multiple validators to one column:
validation_rules = [
ValidationRule("email", "unique"),
ValidationRule("email", "not_null"),
ValidationRule("email", "regex", pattern=r"^[^@]+@[^@]+\.[^@]+$")
]Validation Results
The import operation returns detailed validation results that you can process programmatically.
Check for Errors
Always check for validation errors after import to catch data quality issues early.
result = sdk.import_excel(
"data.xlsx",
"mytype",
validation_rules=[...]
)
if result['validation_errors']:
print(f"Validation failed with {len(result['validation_errors'])} errors")
for error in result['validation_errors']:
print(f" {error['column']}: {error['message']}")Validation Error Format
Each validation error includes row number, column, problematic value, and descriptive message.
{
'row': 42,
'column': 'email',
'value': 'invalid-email',
'message': 'Value does not match pattern',
'severity': 'ERROR'
}Configuration File
Validators can be defined in your configuration file for consistent validation across imports.
Set up validation in config/mappings.json:
{
"customers": {
"target_table": "customers",
"primary_key": ["id"],
"validation_rules": [
{
"column": "id",
"type": "unique"
},
{
"column": "email",
"type": "regex",
"pattern": "^[^@]+@[^@]+\\.[^@]+$"
},
{
"column": "age",
"type": "range",
"min_value": 0,
"max_value": 120
}
]
}
}Validation Types Reference
Quick reference for all available validator types and their typical use cases.
| Type | Description | Example |
|---|---|---|
unique | All values must be unique | Primary keys |
not_null | No null values allowed | Required fields |
range | Numeric range check | Age, price, quantity |
regex | Pattern matching | Email, phone, URL |
enum | Value must be in set | Status codes |
custom | Custom validator function | Business logic |
Best Practices
Follow these guidelines for effective data validation strategies.
1. Validate Critical Data
Always validate:
- Primary keys (unique, not_null)
- Foreign keys (reference integrity)
- Required fields (not_null)
- Format-sensitive fields (regex)
2. Order Matters
Validators run in definition order:
validation_rules = [
ValidationRule("id", "not_null"), # Check 1
ValidationRule("id", "unique") # Check 2
]3. Use Appropriate Validators
Choose the right validator for the data type:
# Good
ValidationRule("email", "regex", pattern=r"...")
# Less efficient
CustomValidator("email", lambda x: "@" in x)See Also
- Transformations - Data transformations
- Profiling - Quality analysis
- Configuration - Validation configuration