excel-to-sql

Validators

Data validation rules and checks

Built-in Validators

Excel-to-sql provides six built-in validators covering the most common data validation scenarios.

1. Unique Validator

Ensures that all values in a column are unique, preventing duplicate records.

Ensure column values are unique:

from excel_to_sql import ExcelToSqlite
from excel_to_sql.validators import UniqueValidator

sdk = ExcelToSqlite()

result = sdk.import_excel(
    "customers.xlsx",
    "customers",
    validation_rules=[
        ValidationRule("email", "unique")
    ]
)

2. Range Validator

Validates that numeric values fall within a specified minimum and maximum range.

Validate numeric ranges:

from excel_to_sql.validators import ValidationRule

sdk = ExcelToSqlite()

result = sdk.import_excel(
    "products.xlsx",
    "products",
    validation_rules=[
        ValidationRule("price", "range", min_value=0, max_value=1000000),
        ValidationRule("quantity", "range", min_value=0, max_value=10000),
        ValidationRule("age", "range", min_value=0, max_value=120)
    ]
)

3. Regex Validator

Uses regular expressions to validate text patterns like email addresses, phone numbers, and postal codes.

Pattern matching with regular expressions:

result = sdk.import_excel(
    "users.xlsx",
    "users",
    validation_rules=[
        ValidationRule("email", "regex", pattern=r"^[^@]+@[^@]+\.[^@]+$"),
        ValidationRule("phone", "regex", pattern=r"^\+?\d{10,15}$"),
        ValidationRule("postal_code", "regex", pattern=r"^\d{5}(-\d{4})?$")
    ]
)

4. Enum Validator

Restricts values to a predefined set of allowed values, useful for status codes and categories.

Values must be in a set:

result = sdk.import_excel(
    "products.xlsx",
    "products",
    validation_rules=[
        ValidationRule("status", "enum", values=["Active", "Inactive", "Pending"]),
        ValidationRule("category", "enum", values=["Electronics", "Hardware", "Software"])
    ]
)

5. NotNull Validator

Ensures that a column contains no null values, critical for required fields like IDs and foreign keys.

Ensure no null values:

result = sdk.import_excel(
    "orders.xlsx",
    "orders",
    validation_rules=[
        ValidationRule("id", "not_null"),
        ValidationRule("customer_id", "not_null"),
        ValidationRule("order_date", "not_null")
    ]
)

Custom Validators

For business-specific validation logic, create custom validator functions with full control over the validation process.

Create your own validation logic:

from excel_to_sql.validators import CustomValidator

def validate_positive(value):
    return value > 0, "Value must be positive"

sdk.import_excel(
    "products.xlsx",
    "products",
    validation_rules=[
        CustomValidator("price", validate_positive)
    ]
)

Multiple Validators

You can apply multiple validation rules to the same column, creating layered validation logic.

Apply multiple validators to one column:

validation_rules = [
    ValidationRule("email", "unique"),
    ValidationRule("email", "not_null"),
    ValidationRule("email", "regex", pattern=r"^[^@]+@[^@]+\.[^@]+$")
]

Validation Results

The import operation returns detailed validation results that you can process programmatically.

Check for Errors

Always check for validation errors after import to catch data quality issues early.

result = sdk.import_excel(
    "data.xlsx",
    "mytype",
    validation_rules=[...]
)

if result['validation_errors']:
    print(f"Validation failed with {len(result['validation_errors'])} errors")
    for error in result['validation_errors']:
        print(f"  {error['column']}: {error['message']}")

Validation Error Format

Each validation error includes row number, column, problematic value, and descriptive message.

{
    'row': 42,
    'column': 'email',
    'value': 'invalid-email',
    'message': 'Value does not match pattern',
    'severity': 'ERROR'
}

Configuration File

Validators can be defined in your configuration file for consistent validation across imports.

Set up validation in config/mappings.json:

{
  "customers": {
    "target_table": "customers",
    "primary_key": ["id"],
    "validation_rules": [
      {
        "column": "id",
        "type": "unique"
      },
      {
        "column": "email",
        "type": "regex",
        "pattern": "^[^@]+@[^@]+\\.[^@]+$"
      },
      {
        "column": "age",
        "type": "range",
        "min_value": 0,
        "max_value": 120
      }
    ]
  }
}

Validation Types Reference

Quick reference for all available validator types and their typical use cases.

TypeDescriptionExample
uniqueAll values must be uniquePrimary keys
not_nullNo null values allowedRequired fields
rangeNumeric range checkAge, price, quantity
regexPattern matchingEmail, phone, URL
enumValue must be in setStatus codes
customCustom validator functionBusiness logic

Best Practices

Follow these guidelines for effective data validation strategies.

1. Validate Critical Data

Always validate:

  • Primary keys (unique, not_null)
  • Foreign keys (reference integrity)
  • Required fields (not_null)
  • Format-sensitive fields (regex)

2. Order Matters

Validators run in definition order:

validation_rules = [
    ValidationRule("id", "not_null"),      # Check 1
    ValidationRule("id", "unique")         # Check 2
]

3. Use Appropriate Validators

Choose the right validator for the data type:

# Good
ValidationRule("email", "regex", pattern=r"...")

# Less efficient
CustomValidator("email", lambda x: "@" in x)

See Also

On this page