OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters

OpenAI just quietly dropped something worth paying close attention to. Released on Hugging Face under an Apache 2.0 license, Privacy Filter is an open, bidirectional token-classification model purpose-built for detecting and redacting personally identifiable information (PII) in text. It is small enough to run in a web browser or on a laptop and fast enough for high-throughput data sanitization pipelines.

What It Does

Privacy Filter is a Named Entity Recognition (NER) model but one tuned specifically for the privacy use case. It detects eight categories of sensitive spans: account_number, private_address, private_email, private_person, private_phone, private_url, private_date, and secret. The secret category covers credential formats, project-specific token patterns, and high-entropy strings — the model card explicitly calls out missed detection of ‘novel credential formats’ and ‘secrets split across surrounding syntax’ as known failure modes, which signals what the category is trained to target.

The intended use case is clear: dev teams that need to clean datasets, scrub logs, or pre-process user-generated content before it enters a training pipeline or gets stored in a data warehouse. Because it runs on-premises and on commodity hardware, it fits squarely into the growing set of edge-deployable AI tools that organizations can adopt without routing sensitive data to a third-party API.

The Architecture is the Real Story

Privacy Filter has 1.5 billion total parameters but only 50 million active parameters at inference time. That gap, which is roughly 30x, is explained entirely by the model’s sparse mixture-of-experts (MoE) feed-forward design.

Architecturally, the model is ‘similar to gpt-oss, albeit of a smaller size.’ It is built on 8 pre-norm transformer blocks with a residual stream width (d_model) of 640. Attention uses grouped-query attention (GQA) with rotary positional embeddings (RoPE) — 14 query heads over 2 KV heads, meaning 7 query heads share each KV head — which reduces the memory footprint of the key-value cache significantly compared to standard multi-head attention. RoPE is also what enables the model’s 128,000-token context window. The feed-forward layers use sparse MoE with 128 total experts and top-4 routing per token: for each token, 4 of the 128 experts are activated, and all other expert parameters remain dormant. This is exactly the mechanism that produces the 30x gap between total and active parameter counts.

A Three-Phase Training Pipeline

What makes this model architecturally unusual is not just its size, but how it was built. Privacy Filter was produced in three distinct phases.

First, it was pretrained autoregressively as a standard next-token prediction language model — in the tradition of GPT-style decoders. Second, that checkpoint was architecturally converted: the language-model head was replaced with a token-classification head over the privacy label taxonomy, and the attention mechanism was switched from causal (unidirectional) to bidirectional banded attention with a band size of 128, giving each token an effective context window of 257 tokens (the token itself plus 128 on each side). Third, the converted model was post-trained with a supervised classification loss — a distinct fine-tuning phase using labeled PII data, separate from the architectural conversion step.

The autoregressive pretraining gives the model rich language representations learned from far more data and compute than any task-specific budget would support. The architectural conversion enables bidirectional context, which is essential for NER — a name like ‘Alice’ in ‘Alice Smith called’ is unambiguous, but with only left context it could be missed. The supervised post-training then specializes those representations for the privacy detection task.

Compared to classical masked-language-model approaches like BERT, this is a post-training conversion of an autoregressive model rather than a native masked-LM setup — a meaningful distinction in how the base representations were formed.

Constrained Viterbi Decoding Instead of Argmax

The label scheme Privacy Filter uses is BIOES — Begin, Inside, Outside, End, Single. Each of the 8 privacy categories gets four boundary-tagged token classes (B-, I-, E-, S-) plus the background class O, yielding 33 total output classes per token. For a sequence of length T, the output logits have shape [T, 33].

Rather than taking a per-token argmax over those 33 logits, which could produce incoherent label sequences like B- followed immediately by S-, the model runs a constrained Viterbi decoder at inference time. The decoder uses linear-chain transition scoring and enforces valid BIOES boundary transitions. It scores complete label paths using start, transition, and end terms, along with six transition-bias parameters that specifically control: background persistence, span entry, span continuation, span closure, and boundary-to-boundary handoff. This global path optimization improves span coherence and boundary stability by making each token decision depend on sequence-level structure, not just local logits — particularly valuable in noisy or mixed-format text.

Those six transition-bias parameters are also user-tunable at runtime. This brings AI developers to push toward broader, more contiguous masking for improved recall, or tighten boundaries for improved precision, without retraining the model.

Key Takeaways

OpenAI released Privacy Filter, an open-source PII redaction model under Apache 2.0, capable of detecting eight sensitive span categories including account_number, private_person, secret, and more — deployable on-premises without routing data to an external API.
The model has 1.5B total parameters but only 50M active at inference, thanks to a sparse MoE feed-forward design with 128 experts and top-4 routing per token — making it lightweight enough to run in a browser or on a laptop.
The backbone is architecturally similar to gpt-oss: 8 pre-norm transformer blocks, d_model=640, grouped-query attention with RoPE, and a sparse MoE FFN — first pretrained autoregressively, then converted to a bidirectional banded attention encoder, then post-trained with a supervised classification loss.
At inference, it runs constrained Viterbi decoding over a BIOES label scheme rather than per-token argmax, producing coherent span boundaries with six tunable transition-bias parameters that let engineers adjust the precision/recall tradeoff at runtime without retraining.

Check out the Model Weights. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Source_link