Pattern Detection
How Auto-Pilot detects patterns in your data
What Gets Detected
Auto-Pilot analyzes your data to identify five key patterns that help optimize your database schema and data quality.
1. Primary Keys
Finds columns that can serve as unique identifiers for each row through a combination of naming conventions and statistical analysis.
Identifies columns that uniquely identify rows:
Detection Algorithm:
- Check for columns named: id, ID, pk, PK, key, KEY
- Analyze uniqueness: 100% unique values = candidate
- Check for reasonable cardinality (not too high, not too low)
- Verify data type consistency
Example:
# Detected: ID column as primary key
"primary_key": ["id"]Confidence Score: 0.95 (High)
2. Foreign Keys
Discovers relationships between tables by analyzing column names and checking if values in one table reference primary keys in another table.
Detects relationships between tables:
Detection Algorithm:
- Look for columns ending in: id, ID, id, ID
- Check if values reference other tables
- Analyze naming patterns (customer_id → customers.id)
- Validate referential integrity
Example:
# Detected: customer_id references customers.id
"foreign_keys": [{
"column": "customer_id",
"references": "customers.id",
"confidence": 0.87
}]3. Value Mappings
Identifies columns containing coded values that should be translated to human-readable text for better data comprehension.
Finds columns with repeating code patterns:
Detection Algorithm:
- Analyze unique value count (< 20 = candidate)
- Check for numeric codes (0/1, 1/2/3)
- Detect binary patterns (T/F, Y/N, Yes/No)
- Find French codes (ENTRÉE/SORTIE, ACTIF/INACTIF)
- Calculate mapping confidence
Examples:
Binary Status:
{
"column": "status",
"mappings": {
"1": "Active",
"0": "Inactive"
},
"confidence": 0.92
}French Codes:
{
"column": "movement_type",
"mappings": {
"ENTRÉE": "inbound",
"SORTIE": "outbound",
"ACTIF": "active",
"INACTIF": "inactive"
},
"detected_language": "french",
"confidence": 0.98
}4. Split Fields
Detects redundant columns where values are split across multiple columns when they could be combined into a single column using COALESCE.
Identifies redundant status columns that should be combined:
Detection Algorithm:
- Find columns with overlapping values
- Detect mutual exclusivity (if col1 has value, col2 is null)
- Calculate COALESCE potential
- Recommend combination
Example:
# Detected: status_active and status_inactive are redundant
{
"split_fields": [{
"columns": ["status_active", "status_inactive"],
"recommendation": "COALESCE(status_active, status_inactive) as status",
"confidence": 0.89
}]
}5. Data Types
Analyzes actual data values to determine the most appropriate SQL data type for each column, optimizing storage and query performance.
Infers optimal SQL types from actual data:
Type Inference:
# Integer detection
if all_values.isdigit():
type = "integer"
# Float detection
elif contains_decimal and is_numeric():
type = "float"
# Boolean detection
elif unique_values == [0, 1] or unique_values == ["0", "1"]:
type = "boolean"
# Date detection
elif matches_iso8601_pattern():
type = "date"
# Default
else:
type = "string"Examples:
{
"id": {"type": "integer", "confidence": 0.99},
"price": {"type": "float", "confidence": 0.97},
"active": {"type": "boolean", "confidence": 0.94},
"created_at": {"type": "date", "confidence": 0.91},
"name": {"type": "string", "confidence": 0.85}
}Confidence Scores
Auto-Pilot assigns a confidence score to each detection based on how certain it is about the pattern. Higher scores indicate more reliable detections.
Each detection includes a confidence score (0-1):
| Score | Meaning | Action |
|---|---|---|
| 0.90-1.00 | Very High | Auto-accept |
| 0.70-0.89 | High | Review recommended |
| 0.50-0.69 | Medium | Manual review |
| 0.00-0.49 | Low | Manual configuration |
Interactive Review
Interactive mode presents each detection with supporting evidence so you can make informed decisions about whether to accept or override it.
In interactive mode, you'll see:
✓ Detected Primary Key
Column: id
Confidence: 0.95
Unique values: 150/150 (100%)
Accept? [Y/n]:Override Detections
You're always in control. Interactive mode lets you override any Auto-Pilot detection with your own choice.
You can override any detection:
# Auto-Pilot suggests id as PK, but you want product_id
excel-to-sql magic --data . --interactive
# When prompted, enter product_id insteadAdvanced: Python SDK
Pattern detection is also available programmatically through the Python SDK for custom workflows and automation.
Use pattern detection programmatically:
from excel_to_sql.auto_pilot import PatternDetector
detector = PatternDetector()
patterns = detector.detect_patterns(df, "products")
print(f"Primary Key: {patterns['primary_key']}")
print(f"Foreign Keys: {patterns['foreign_keys']}")
print(f"Value Mappings: {patterns['value_mappings']}")See Also
- Quality Scoring - Data quality analysis
- Recommendations - Actionable suggestions
- CLI: magic command - Auto-Pilot usage