How to Safely Mask Table and Column Names Before Sending SQL Queries to AI
Sending your database schema to ChatGPT or Claude can expose sensitive business information: customer table names, revenue column names, internal system identifiers, and PII field names. Masking schema before AI processing protects your data while still getting accurate query help. This guide covers what to mask and why, a working Python implementation, how to reverse the masking after the AI generates SQL, and compliance considerations for regulated industries.
Schema
reveals business structure, strategy, and sensitive field names
Masking
replaces real names with neutral TABLE_01 / col_01 placeholders
Reversible
keep a mapping to restore original names after AI generates SQL
GDPR/SOC2
schema masking supports data governance and compliance goals
Why Mask SQL Schema Before Sending to AI?
Your database schema is more revealing than most teams realize. Table names and column names encode your entire business model, operational strategy, and the categories of sensitive data you hold. Sending this to an external AI service creates risks that extend beyond data privacy into competitive intelligence and regulatory compliance.
What your schema reveals
Table names like "churn_risk_score", "revenue_by_account", "competitor_tracking", "acquisition_pipeline" reveal your business strategy and competitive intelligence. Column names like "ssn", "salary", "health_condition", "credit_score" signal regulated data categories. Even aggregate table names like "ml_training_features" reveal your internal AI/ML capabilities. All of this gets sent to external AI providers when you paste schema for query help — and may be used in training future models.
Before and After Masking
Masking schema before sending to AI
Exposes business strategy, PII categories, competitive intel
-- ❌ What you send WITHOUT masking:
CREATE TABLE customer_churn_risk (
customer_id BIGINT PRIMARY KEY,
email VARCHAR(255),
churn_score DECIMAL(5,2), -- reveals you track churn risk
revenue_ltv DECIMAL(10,2), -- reveals lifetime value data
health_score INT, -- may signal regulated health data
competitor_used VARCHAR(100), -- reveals competitive intelligence tracking
is_at_risk BOOLEAN,
last_login_date DATE
);
-- External AI now knows: your churn model, revenue tracking, health data, competitor analysisAI gets structure and types — sufficient for accurate SQL generation
-- ✅ What you send WITH masking:
CREATE TABLE TABLE_01 (
col_01 BIGINT PRIMARY KEY,
col_02 VARCHAR(255),
col_03 DECIMAL(5,2),
col_04 DECIMAL(10,2),
col_05 INT,
col_06 VARCHAR(100),
col_07 BOOLEAN,
col_08 DATE
);
-- AI sees: a table with 8 columns of various types — gives accurate SQL help
-- Your business logic stays privatePython Schema Masker Implementation
import re
import json
from typing import Dict, Tuple
class SqlSchemaMasker:
"""
Mask table and column names in SQL schemas before AI processing.
Maintains a reversible mapping to restore original names in AI-generated SQL.
"""
def __init__(self):
self.table_map: Dict[str, str] = {} # real_name → TABLE_XX
self.column_map: Dict[str, str] = {} # real_name → col_XX
self.reverse_map: Dict[str, str] = {} # masked → real
self._table_counter = 1
self._col_counter = 1
def _mask_table(self, name: str) -> str:
if name not in self.table_map:
masked = f"TABLE_{self._table_counter:02d}"
self.table_map[name] = masked
self.reverse_map[masked.lower()] = name
self.reverse_map[masked] = name
self._table_counter += 1
return self.table_map[name]
def _mask_column(self, name: str) -> str:
if name not in self.column_map:
masked = f"col_{self._col_counter:02d}"
self.column_map[name] = masked
self.reverse_map[masked.lower()] = name
self.reverse_map[masked] = name
self._col_counter += 1
return self.column_map[name]
def mask_schema(self, sql: str) -> str:
"""Mask a full schema (one or more CREATE TABLE statements)."""
# Mask CREATE TABLE statements
sql = re.sub(
r'CREATEs+TABLEs+(?:IFs+NOTs+EXISTSs+)?[`"]?(w+)[`"]?',
lambda m: f"CREATE TABLE {self._mask_table(m.group(1))}",
sql, flags=re.IGNORECASE
)
# Mask column names in column definitions
# Pattern: leading whitespace + column_name + data_type
sql = re.sub(
r'^(s+)[`"]?(w+)[`"]?s+(BIGINT|INT|INTEGER|SMALLINT|TINYINT'
r'|VARCHAR|CHAR|TEXT|LONGTEXT|MEDIUMTEXT'
r'|DECIMAL|NUMERIC|FLOAT|DOUBLE|REAL'
r'|BOOLEAN|BOOL|DATE|DATETIME|TIMESTAMP|TIME'
r'|JSON|BLOB|BINARY)',
lambda m: f"{m.group(1)}{self._mask_column(m.group(2))} {m.group(3)}",
sql, flags=re.IGNORECASE | re.MULTILINE
)
# Mask FOREIGN KEY references
sql = re.sub(
r'FOREIGNs+KEYs*([`"]?(w+)[`"]?)',
lambda m: f"FOREIGN KEY ({self.column_map.get(m.group(1), m.group(1))})",
sql, flags=re.IGNORECASE
)
# Mask REFERENCES table(column)
sql = re.sub(
r'REFERENCESs+[`"]?(w+)[`"]?s*([`"]?(w+)[`"]?)',
lambda m: (
f"REFERENCES {self.table_map.get(m.group(1), m.group(1))}"
f"({self.column_map.get(m.group(2), m.group(2))})"
),
sql, flags=re.IGNORECASE
)
return sql
def unmask_sql(self, sql: str) -> str:
"""Replace masked names back to originals in AI-generated SQL."""
result = sql
# Sort by length descending to avoid partial replacements (col_10 before col_1)
for masked, original in sorted(
self.reverse_map.items(), key=lambda x: -len(x[0])
):
result = re.sub(
r'' + re.escape(masked) + r'',
original,
result,
flags=re.IGNORECASE
)
return result
def export_mapping(self) -> str:
"""Export the masking map as JSON for reference or storage."""
return json.dumps({
"tables": self.table_map,
"columns": self.column_map
}, indent=2)
@classmethod
def from_mapping(cls, mapping_json: str) -> 'SqlSchemaMasker':
"""Restore a masker from a previously exported mapping."""
masker = cls()
mapping = json.loads(mapping_json)
masker.table_map = mapping["tables"]
masker.column_map = mapping["columns"]
# Build reverse map
for real, masked in mapping["tables"].items():
masker.reverse_map[masked.lower()] = real
masker.reverse_map[masked] = real
for real, masked in mapping["columns"].items():
masker.reverse_map[masked.lower()] = real
masker.reverse_map[masked] = real
return masker
# ─── Example Usage ───────────────────────────────────────────────────────────
masker = SqlSchemaMasker()
original_schema = """
CREATE TABLE customer_churn_risk (
customer_id BIGINT PRIMARY KEY,
email VARCHAR(255),
churn_score DECIMAL(5,2),
revenue_ltv DECIMAL(10,2),
health_score INT,
is_at_risk BOOLEAN,
last_login_date DATE
);
CREATE TABLE competitor_tracking (
entry_id BIGINT PRIMARY KEY,
customer_id BIGINT,
competitor_name VARCHAR(100),
switch_date DATE,
FOREIGN KEY (customer_id) REFERENCES customer_churn_risk(customer_id)
);
"""
masked = masker.mask_schema(original_schema)
print("=== SEND TO AI ===")
print(masked)
# Save mapping for later use
mapping = masker.export_mapping()
print("
=== SAVE THIS MAPPING ===")
print(mapping)
# After AI generates a query, restore original names:
ai_query = """
SELECT col_01, col_03, col_07
FROM TABLE_01
WHERE col_07 = true
AND col_04 > 1000
ORDER BY col_03 DESC;
"""
restored = masker.unmask_sql(ai_query)
print("
=== RESTORED QUERY ===")
print(restored)
# SELECT customer_id, churn_score, is_at_risk
# FROM customer_churn_risk
# WHERE is_at_risk = true
# AND revenue_ltv > 1000
# ORDER BY churn_score DESC;What to Mask vs What to Preserve
Always mask — business intelligence
Table names containing business logic (churn_risk, revenue_forecast, competitor_tracking), column names revealing metrics (ltv, arr, mrr), names indicating ML/AI features (churn_probability, propensity_score), and any name that reveals internal product or strategic information.
Always mask — PII indicators
Column names indicating regulated or sensitive data: ssn, social_security, salary, health_condition, medical_record, credit_score, ethnicity, date_of_birth. Even the existence of these column names signals regulated data categories to an external service.
Preserve — structure metadata
Data types (INT, VARCHAR, BOOLEAN), constraints (PRIMARY KEY, NOT NULL, UNIQUE), relationship structure (FOREIGN KEY), cardinality hints. The AI needs this structural information to generate syntactically correct SQL. Types and constraints are not sensitive.
Keep your mapping safe
The masking map is as sensitive as the schema itself — it's a direct lookup between real and masked names. Store it in your secrets manager or an encrypted file alongside the query. Never send the mapping to the AI service.
Compliance and Governance Considerations
GDPR and data minimization
GDPR requires data minimization — processing only the data strictly necessary for the purpose. Sending schema to an AI service for query generation is a processing activity. Masking ensures you're not exposing PII category information unnecessarily to third parties.
SOC 2 and vendor risk
SOC 2 compliance requires assessing data shared with third-party vendors (including AI providers). Masking schema reduces the scope of sensitive data shared with AI vendors. Document the masking process as a control in your vendor risk management program.
HIPAA considerations
For healthcare organizations: HIPAA Protected Health Information (PHI) column names (patient_id, diagnosis_code, medication, health_record) must not be sent to uncovered AI services. Masking is a technical safeguard. Consider using on-premise or HIPAA BAA-covered AI services for healthcare schema.
Internal policy and training
Establish a policy that all schema sent to external AI services must be masked first. Document the process. Train developers on what constitutes sensitive schema information. Consider building masking into your SQL IDE or AI assistant integration as an automatic step.
Our AI SQL Masker tool handles this automatically