PII Janitor

The PII Janitor is Bauxite’s high-speed redaction engine. It ensures that sensitive user data never leaves your infrastructure by intercepting prompts and replacing Personally Identifiable Information (PII) with ephemeral tokens before they reach the LLM.

How It Works: The “Vault Swap”

The Janitor operates on a three-stage lifecycle: Detect, Vault, and Re-Identify. This entire process happens within the 20MB Straitjacket, utilizing zero-allocation buffers for maximum performance.

1. Detection

As the request stream enters the intercept, the Janitor scans the text using optimized Regex patterns.

  • Stream-Safe: It uses a sliding window (32KB) to scan data without loading the entire prompt into memory.

  • Low Latency: Detection adds less than 1ms of overhead to the request path.

2. Vaulting (The In-Memory Vault)

When a match (e.g., an email address) is found, it is not “deleted.” Instead, it is moved to a Request-Scoped Vault.

  • Tokenization: john.doe@example.com becomes [EMAIL_1].

  • Volatility: The vault is a map[string]string tied to the specific HTTP request context.

3. Re-Identification

When the LLM responds, it might reference the token (e.g., “I have sent an email to [EMAIL_1]”). The Janitor intercepts the response stream, looks up [EMAIL_1] in the vault, and replaces it with the original value before it reaches the user.


Supported Entities

The PII Janitor comes with pre-configured patterns for common sensitive data types:

Entity TypeExample PatternRedaction Label
Email[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}[EMAIL_X]
API Keyssk-[a-zA-Z0-9]{32,}[SECRET_X]
Credit Cards\d{4}-\d{4}-\d{4}-\d{4}[CARD_X]
Phone (US)\(\d{3}\) \d{3}-\d{4}[PHONE_X]
IP Addresses\d{1,3}(\.\d{1,3}){3}[IP_X]

Memory Safety & The “Straitjacket”

The PII Janitor is the primary reason for Bauxite’s strict memory limits. To prevent data leaks, we implement Hard Erasure:

  1. No Disk Spillage: If a vault grows too large, the request is terminated rather than swapped to disk.
  2. Explicit Wiping: When a request finishes, the Janitor does not wait for the Garbage Collector. It iterates through the vault and zeroes out the byte slices.
  3. CGO-Free: The entire logic is pure Go, avoiding the memory-safety pitfalls of C-based regex libraries.
// Example of the Janitor's "Wipe" function
func (v *Vault) Purge() {
    for key, value := range v.entries {
        // Overwrite memory with zeroes
        for i := range value {
            value[i] = 0
        }
        delete(v.entries, key)
    }
}

Configuration

You can enable or disable specific detectors in your config.yaml:

pii_janitor:
  enabled: true
  strategy: "redact" # options: redact, hash, block
  entities:
    - email
    - api_keys
    - credit_cards
  custom_patterns:
    - name: "INTERNAL_ID"
      regex: "ID-[0-9]{5}"