How to Mask JSON Payloads Before Sending to AI Without Breaking Structure
When using AI to analyze JSON data, you often need to mask sensitive fields — PII, API keys, payment details — before the payload reaches the LLM. The challenge is masking values while preserving the JSON structure so the AI can still analyze field names, types, and relationships. This guide shows how to do it in Python and JavaScript.
PII
personally identifiable info must be masked before AI processing
Structure
preserved — AI still understands schema and field names
Reversible
token mapping lets you restore values after AI analysis
GDPR/HIPAA
masking enables regulatory compliance with AI tools
Why Mask JSON Before Sending to AI?
The compliance risk
LLMs (GPT-4, Claude, Gemini) process your data on external servers. Sending real user data (emails, phone numbers, SSNs, credit cards) may violate GDPR, HIPAA, and enterprise data policies. Masking replaces sensitive values with realistic placeholders that preserve the structure the AI needs to analyze without exposing actual PII.
| Item | Approach | Trade-offs |
|---|---|---|
| No masking | Send real data directly to AI | Easiest but violates GDPR/HIPAA — real PII leaves your system |
| Full redaction | Replace all values with [REDACTED] | Safe but AI loses context — can't analyze data patterns or types |
| Pattern masking | Mask values, preserve format hints (****@domain.com) | Balanced — AI understands field type, value is hidden |
| Token mapping (reversible) | Replace with TOKEN_A1B2 — track original in memory | Best for round-trips — AI response tokens can be restored |
| Type-only schema | Send only field names and types, no values | Maximum privacy — AI can only analyze structure, not values |
Simple JSON Masker — Python
import json
import re
from typing import Any
class JsonMasker:
"""Mask sensitive JSON fields while preserving structure for AI analysis"""
# Fields to mask (by name pattern)
SENSITIVE_PATTERNS = [
r'(email|mail)',
r'(phone|mobile|tel)',
r'(ssn|social_security)',
r'(password|passwd|pwd|secret)',
r'(credit_card|card_number|cvv)',
r'(api_key|access_token|bearer|private_key)',
r'(address|street|zip|postal)',
r'(date_of_birth|dob|birth)',
r'(ip_address|ip_addr)',
]
def __init__(self, mask_char='*', preserve_length=True):
self.mask_char = mask_char
self.preserve_length = preserve_length
def _is_sensitive(self, key: str) -> bool:
key_lower = key.lower()
return any(re.search(p, key_lower) for p in self.SENSITIVE_PATTERNS)
def _mask_value(self, value: Any, key: str) -> Any:
if not isinstance(value, str):
return value # keep non-strings (numbers, booleans) as-is
if self.preserve_length:
if '@' in value: # email: preserve domain for context
local, domain = value.split('@', 1)
return f"{'*' * len(local)}@{domain}"
return self.mask_char * len(value)
return f"[MASKED_{key.upper()}]"
def mask(self, obj: Any) -> Any:
if isinstance(obj, dict):
return {
k: self._mask_value(v, k) if self._is_sensitive(k) else self.mask(v)
for k, v in obj.items()
}
elif isinstance(obj, list):
return [self.mask(item) for item in obj]
return obj
masker = JsonMasker()
sensitive_data = {
"user": {
"name": "Alice Johnson", # not masked — 'name' not in patterns
"email": "alice@company.com", # → ****@company.com
"phone": "+1-555-123-4567", # → **************
"age": 30, # not masked — number preserved
"address": "123 Main Street", # → ****************
},
"api_key": "sk-abc123xyz", # → **********
"order_id": "ORD-2024-001", # not masked
"total": 149.99, # not masked — float
}
masked = masker.mask(sensitive_data)
print(json.dumps(masked, indent=2))
# {
# "user": {
# "name": "Alice Johnson",
# "email": "****@company.com",
# "phone": "**************",
# "age": 30,
# "address": "***************"
# },
# "api_key": "**********",
# "order_id": "ORD-2024-001",
# "total": 149.99
# }Reversible Masking with Token Mapping
import uuid
import json
class ReversibleJsonMasker:
"""Replace sensitive values with tokens, restore after AI processing"""
SENSITIVE_KEYS = {'email', 'phone', 'ssn', 'name', 'address',
'ip_address', 'date_of_birth', 'credit_card'}
def __init__(self):
self.token_map = {} # token → original value
self.reverse_map = {} # original value → token (deduplication)
def _tokenize(self, value: str) -> str:
"""Same value always gets same token (consistent across payload)"""
if value in self.reverse_map:
return self.reverse_map[value]
token = f"TOKEN_{uuid.uuid4().hex[:8].upper()}"
self.token_map[token] = value
self.reverse_map[value] = token
return token
def mask(self, obj, sensitive_keys=None):
keys = sensitive_keys or self.SENSITIVE_KEYS
if isinstance(obj, dict):
return {
k: self._tokenize(str(v)) if k.lower() in keys and isinstance(v, (str, int, float))
else self.mask(v, keys)
for k, v in obj.items()
}
elif isinstance(obj, list):
return [self.mask(item, keys) for item in obj]
return obj
def unmask(self, text: str) -> str:
"""Replace tokens back with original values in AI response text"""
for token, original in self.token_map.items():
text = text.replace(token, original)
return text
def unmask_json(self, obj) -> Any:
"""Recursively restore tokens in a JSON object"""
if isinstance(obj, str):
return self.token_map.get(obj, obj)
if isinstance(obj, dict):
return {k: self.unmask_json(v) for k, v in obj.items()}
if isinstance(obj, list):
return [self.unmask_json(item) for item in obj]
return obj
# Usage with Claude/OpenAI API
masker = ReversibleJsonMasker()
original_payload = {
"user": {"name": "Alice Johnson", "email": "alice@example.com"},
"order": {"id": "ORD-001", "total": 149.99}
}
masked_payload = masker.mask(original_payload)
# → {"user": {"name": "TOKEN_A1B2C3D4", "email": "TOKEN_E5F6G7H8"}, "order": {...}}
# Send masked_payload to AI
ai_response = "The customer TOKEN_A1B2C3D4 (TOKEN_E5F6G7H8) placed order ORD-001"
# Restore original values in the AI's response
restored = masker.unmask(ai_response)
# → "The customer Alice Johnson (alice@example.com) placed order ORD-001"JavaScript / TypeScript Implementation
type JsonValue = string | number | boolean | null | JsonObject | JsonValue[];
type JsonObject = { [key: string]: JsonValue };
const SENSITIVE_PATTERNS = [
/email|mail/i,
/phone|mobile|tel/i,
/password|passwd|pwd|secret/i,
/credit_card|card_number|cvv/i,
/api_key|access_token|bearer|private_key/i,
/address|street|zip|postal/i,
/ssn|social_security/i,
/ip_address|ip_addr/i,
];
function isSensitiveKey(key: string): boolean {
return SENSITIVE_PATTERNS.some(p => p.test(key));
}
function maskValue(value: JsonValue, key: string): JsonValue {
if (typeof value !== 'string') return value; // preserve numbers, booleans
// Email: preserve domain for context
if (value.includes('@')) {
const [local, domain] = value.split('@');
return '*'.repeat(local.length) + '@' + domain;
}
return '*'.repeat(value.length);
}
export function maskJson(obj: JsonValue): JsonValue {
if (Array.isArray(obj)) return obj.map(item => maskJson(item));
if (obj && typeof obj === 'object') {
return Object.fromEntries(
Object.entries(obj as JsonObject).map(([k, v]) => [
k,
isSensitiveKey(k) ? maskValue(v, k) : maskJson(v),
])
);
}
return obj;
}
// Reversible token masker
export class TokenMasker {
private tokenMap = new Map<string, string>(); // token → original
private reverseMap = new Map<string, string>(); // original → token
private tokenize(value: string): string {
if (this.reverseMap.has(value)) return this.reverseMap.get(value)!;
const token = 'TOKEN_' + Math.random().toString(36).slice(2, 10).toUpperCase();
this.tokenMap.set(token, value);
this.reverseMap.set(value, token);
return token;
}
mask(obj: JsonValue, sensitiveKeys = new Set(['email', 'phone', 'name', 'address'])): JsonValue {
if (typeof obj === 'string') return obj;
if (Array.isArray(obj)) return obj.map(item => this.mask(item, sensitiveKeys));
if (obj && typeof obj === 'object') {
return Object.fromEntries(
Object.entries(obj as JsonObject).map(([k, v]) => [
k,
sensitiveKeys.has(k.toLowerCase()) && typeof v === 'string'
? this.tokenize(v)
: this.mask(v, sensitiveKeys),
])
);
}
return obj;
}
unmask(text: string): string {
let result = text;
this.tokenMap.forEach((original, token) => {
result = result.replaceAll(token, original);
});
return result;
}
}
// Usage:
const masker = new TokenMasker();
const masked = masker.mask({ name: 'Alice', email: 'alice@example.com', orderId: 'ORD-001' });
// → { name: 'TOKEN_ABC123', email: 'TOKEN_DEF456', orderId: 'ORD-001' }
const aiResponse = 'Customer TOKEN_ABC123 placed order ORD-001';
console.log(masker.unmask(aiResponse));
// → 'Customer Alice placed order ORD-001'What to Mask vs What to Preserve
Always mask
Email addresses, phone numbers, SSN/national IDs, credit card numbers, CVV, passwords, API keys, access tokens, home addresses, dates of birth, IP addresses, biometric data.
Usually preserve
Field names (keys), data types, boolean values, numeric values (age: 30), non-PII identifiers (order_id, product_id), timestamps (format only), enum values (status: "active").
Context-dependent
Full names (mask if customer PII, keep if public figure references), company names (depends on sensitivity), location (city-level usually OK, street address always mask).
Preserve for AI analysis
JSON structure and nesting, field names and relationships, data types (string vs number vs boolean), approximate numeric ranges, array lengths and structure.
Masking Workflow for AI Analysis
Audit your JSON payload
Before sending anything to AI, map every field in your JSON and classify each as: PII (must mask), business-sensitive (consider masking), or safe (can send as-is). Document this classification in your codebase as a reference.
Choose masking strategy
For one-way analysis (AI explains the schema): use simple pattern masking (*** replacement). For round-trip analysis (AI rewrites data, you need original back): use reversible token mapping. Never use full redaction ([REDACTED]) — it destroys context the AI needs.
Apply masking before serialization
Mask the in-memory object before JSON.stringify or json.dumps. Never log, cache, or transmit the unmasked version outside your system boundary. The masked copy is what gets sent to the AI API.
Send to AI with context
Tell the AI what you've done: "Field values for PII fields have been masked with ***. Analyze the schema structure and field relationships." This helps the AI give better structural analysis without being confused by the masked values.
Restore tokens in AI response
If using token mapping and the AI returns tokens in its response (e.g., "TOKEN_A1B2 has an invalid email format"), run the unmask() function on the AI's response text to restore original values before displaying to users.
Audit and log the masking
Log which fields were masked and which were sent as-is, including the masking strategy used and timestamp. This audit trail is essential for GDPR compliance (demonstrating appropriate technical measures) and HIPAA breach assessment.
Use our AI Schema Masker tool