Why Flagright Built Its Own Privacy-First LLM for Disposition Narratives

In financial crime compliance, disposition narratives are the written summaries of an investigation’s outcome. They capture what an analyst found (or didn’t find) in an alert or case and document the decision (e.g. whether to close the alert, file a report, escalate, etc.). These narratives are often reviewed by regulators and auditors as evidence that your compliance investigations are thorough and appropriate. A weak or generic narrative can raise red flags. For example, New York regulators penalized a bank in part because its alert disposition records were too broad and lacked detail, making it “difficult to assess the adequacy of compliance investigations”.

The high stakes mean disposition narratives must be consistent, factual, and explainable. They should stick to verifiable facts and avoid conjecture or emotional language. Industry guidance on investigation reports consistently urges objectivity, stick to factual evidence and verified information, use clear, impartial language throughout the report. In practice, this means every narrative should tell the “who, what, when, where, and why” of the case in a clear, concise way. Regulators like FinCEN even suggest that a good suspicious activity report (SAR) narrative should clearly state the disposition (what actions the institution took) and where supporting documentation can be found. A well-written disposition narrative demonstrates that the institution’s decision (be it closing an alert as a false positive or filing an official report) was based on consistent logic and evidence. This consistency not only helps internal QA and knowledge sharing, but also builds trust with examiners who need to see that each alert was handled in a rigorous, explainable manner.

The Promise and Risk of Using LLMs

It’s no surprise that many compliance teams have looked to Large Language Models (LLMs) such as OpenAI’s GPT-4, Anthropic’s Claude, or Meta’s LLaMA, to help draft these narratives. The promise is enticing: LLMs can produce well-structured, grammatically sound text in seconds. In theory, an AI assistant could save analysts time by auto-generating the first draft of a narrative, ensuring no spelling mistakes, and even adopting a consistent format. Given the repetitive nature of writing case summaries, an LLM could accelerate the workflow and let analysts focus on investigation rather than wordsmithing.

However, alongside that promise come serious risks that must be managed in regulated contexts. One major concern is the tendency of LLMs to hallucinate, i.e. to confidently make up information that wasn’t in the input data. By design, these models will fill gaps with plausible-sounding statements if prompted, which means they can present fiction as fact. In a financial investigation narrative, a hallucination could be catastrophic, imagine the AI fabricating a nonexistent transaction or mis-stating a regulatory requirement. In fact, LLMs have been observed to invent official-sounding details: for example, an AI might cite a fake regulatory rule (“IFRS 99 standard”) that does not exist, but do so with complete confidence. This kind of unverifiable or incorrect statement is dangerous in a compliance report, where every claim must be backed by evidence. There’s also the issue of tone and consistency. If not carefully constrained, a general-purpose LLM might produce narratives in varying styles or insert inappropriate tone (too casual, or using terms that an examiner would deem unprofessional). Ensuring a steady, impartial tone across all narratives is hard when an off-the-shelf model might one day respond with a formal report and the next with a chatty summary. Such inconsistencies could undermine the credibility of your documentation.

Data leakage and privacy are another critical risk. Using cloud-based LLM APIs out-of-the-box means sending potentially sensitive data (customer information, transaction details, etc.) to a third-party server. In highly regulated industries, that’s often a non-starter. Major banks like JPMorgan Chase, Wells Fargo, and Goldman Sachs have outright banned employees from using ChatGPT-style tools at work due to fears that proprietary client data could be inadvertently transmitted and stored on external servers. Financial institutions operate under strict privacy laws, GDPR in Europe, GLBA in the U.S., the California Consumer Privacy Act, to name a few, and accidentally exposing PII via an AI service can lead to compliance breaches. Even if the LLM provider has a policy to not retain or train on your prompts, the fact remains that your data leaves your controlled environment. There’s also the long-term reputation risk: if a regulator or customer discovered that an AI was providing flawed or leaked information in an investigation, the trust in your compliance program could be severely damaged. In finance, trust once lost is hard to regain. LLMs bring great writing ability, but without strong guardrails they pose unacceptable risks in the context of disposition narratives.

Flagright’s Approach: A Privacy-First AI Stack

Flagright ultimately decided to build a tightly controlled AI stack in-house, rather than send customer data to third-party AI services. This approach was driven by the unique demands of compliance use-cases. Below are the key pillars of Flagright’s privacy-first AI infrastructure:

No customer PII sent to external LLMs: We do not, under any circumstances, feed personally identifiable information or sensitive case details into an external model that might retain it. Any AI model that processes customer data as part of Flagright’s platform runs either on the customer’s infrastructure or in Flagright’s isolated environment, never in a public shared cloud. This eliminates the worry that confidential data could leak or be seen by unauthorized parties. (It’s worth noting that even big banks have taken this stance – preferring not to transmit data to services like ChatGPT due to privacy concerns).
Abstracted and anonymized prompts: In cases where we leverage any third-party component or even for additional safety internally, Flagright ensures that prompts are fully abstracted. This means before the AI sees case data, we scrub or tokenize direct identifiers. For example, “John Doe’s account 12345” might be converted to “Customer X’s account [ID]” in the prompt. The model generates a narrative with placeholders, which are then mapped back to the real identifiers after the AI’s output is received. This way, even if there were a breach of the model, no actual PII is present in its input or output. This design, combined with zero data persistence on the AI processing nodes, gives our customers complete control over their sensitive information.
Secure, auditable deployment: Our AI models run behind secure APIs within the Flagright platform, which come with bank-grade security and full audit logging. Data is encrypted at rest and in transit (AES-256 encryption, FIPS-compliant). Each request to the narrative generator and each output can be logged (without sensitive content) for audit trails, so there is transparency on when and how narratives were produced. The infrastructure is designed in isolated, ephemeral computing environments, much like having a temporary, sandboxed server that self-destructs after processing, ensuring there's no long-lived server with residual data. We also silo data per region: if a customer requires their data to stay in the EU, the AI service is hosted in an EU data center and data never leaves that region, (even logs and backups are region-local). Flagright undergoes regular third-party audits and holds certifications like ISO 27001 and SOC 2 Type II to validate these controls. Our entire pipeline is GDPR compliant, meaning it meets strict EU data protection rules (transparency, minimal data use, rights to deletion/audit).
Alignment and testing (no hallucinations): We heavily fine-tuned our models on real-world AML/fraud investigation data to align them with the factual, professional tone needed. The AI was trained to follow a strict schema and style: for instance, always include the key investigative facts (who, what, when, where, outcome) and avoid any wording that isn’t backed by the input data. We incorporated human feedback loops and even “red team” exercises, essentially stress-testing the AI with tricky cases to catch and correct instances of conjecture or bias. The result is a model that behaves more like a diligent junior analyst than a creative storyteller. It doesn’t ad-lib new suspicions; it sticks to the script of the evidence. Of course, we continue to monitor outputs; any hint of an unsupported statement triggers retraining or rule adjustments. By running this process internally, we can iterate quickly and ensure the model’s knowledge stays up-to-date (and domain-specific) without exposing data externally.
Human-in-the-loop and approval workflows: Flagright’s AI-generated narratives are not unmonitored robo-reports. We built the system to integrate with compliance team workflows. This means an analyst can choose to use the AI suggestion as a draft, edit it, or even require a second pair of eyes before a narrative is finalized. For customers in highly scrutinized environments, an approval step can be configured, e.g. the AI drafts the narrative, but a supervisor must review and approve it in the case management system before it’s marked complete. The AI is there to reduce workload and enforce consistency, but ultimate control remains with the compliance professionals. Every narrative can be traced and edited, which is vital for accountability. In practice, many teams find the AI’s output is solid enough to use immediately, but having the option to intervene adds an extra layer of comfort and safety.

By engineering our stack with these principles, Flagright ensures that no sensitive data leaks, and the AI’s output quality is rigorously controlled. This wasn’t as simple as calling an API, it meant investing in our own model training and infrastructure. But for an application as sensitive as disposition narratives, that investment was necessary to earn the trust of risk-averse compliance teams. The payoff is a system where we and our clients know exactly how the AI is handling data and why it writes what it writes, with full transparency.

How This Helps Compliance Teams

Building a privacy-first, controlled AI for narratives directly benefits the compliance and fraud teams who use it. Here’s how our approach translates into value for the end users (the analysts, investigators, and their managers):

Consistent, high-quality narratives every time: The AI has essentially learned the house style for investigation summaries, so it produces narratives that are uniform in structure and tone. This means whether you have one analyst or ten different analysts, the written output will adhere to the same standards and include the necessary details. Consistency was a pain point for many teams, one person might write “Customer’s account was closed due to suspicious pattern X” while another writes a verbose paragraph or omits rationale. Now, the Flagright AI ensures those variations disappear. Each narrative is factual, concise, and covers the key points regulators expect (who was involved, what happened, why it’s suspicious or not). This not only appeases reviewers but also helps new analysts ramp up quickly by seeing clear examples. And because the content is generated from the actual case data and follows predefined templates, it inherently stays explainable and auditable.
Human-in-the-loop flexibility: Compliance officers remain in control. Analysts can review and tweak the AI-generated narrative easily within the workflow. In many cases, edits are minor (if at all needed), but having a human validate the narrative gives confidence that nothing important was misrepresented. If the AI suggests something that doesn’t sit right, the analyst can correct it before finalizing. Importantly, the presence of an AI draft doesn’t remove accountability, it’s still the investigator’s responsibility to ensure the narrative is accurate. We’ve simply given them a powerful assistant to eliminate the grunt work. For management, this flexibility means they can deploy AI assistance at the pace that fits their comfort: some might auto-approve narratives to save maximum time, while others might enforce a quick check on each. The system supports both modes.
Objective and factual reporting (no “judgmental” language): By training the model on actual case data and company policies, we’ve made sure it sticks to justifiable statements and an impartial tone. The AI won’t speculate or insert opinions, it won’t write something like “the customer was probably structuring to avoid taxes” unless that conclusion is explicitly supported by documented evidence in the case. Instead, it might say “alert triggered for potential structuring; customer consistently kept transactions just below $10k threshold, which is indicative of avoidance of reporting requirements” a factual description. This addresses a subtle risk with human-written narratives: people sometimes (even inadvertently) use loaded language or jump to conclusions. Our AI is calibrated to avoid unsupported claims. It provides the facts and reasoning, leaving any ultimate judgment (e.g. whether to file a SAR) to the human decision-makers. This impartiality means narratives are less likely to draw regulator criticism for being biased or speculative. They read as professional summaries, not stories or accusations.
Regulator-ready documentation, with less effort: Because the narratives are consistent, comprehensive, and generated from actual case data, they are essentially inspection-ready. In the event of an audit or regulatory exam, the FI can produce case files knowing that each one has a well-formed narrative that covers the required bases. This reduces the risk of exam findings like “insufficient documentation of the disposition of alerts” – a deficiency for which banks have been cited in the past. And from the productivity angle, compliance teams are completing investigations faster (some have seen time spent on writing cut by 75% or more). They can redirect that time into investigating more alerts or refining their detection scenarios. In sum, the team becomes more efficient without sacrificing quality, in fact quality goes up because every narrative gets the benefit of an AI that never has an off day or a rushed moment.
Safe adoption of AI in a high-scrutiny environment: Perhaps the overarching benefit is that Flagright’s infrastructure lets compliance teams harness AI safely, even in an environment where mistakes can be costly. The teams don’t have to just “trust a magic box”, they have transparency and control. They know that we aren’t sending their data to unknown servers, and they know that the model has been tailored for their exact use-case (with the ability to audit its outputs). This fosters trust in the tool, which is crucial. By addressing the typical AI concerns (hallucinations, data leakage, lack of explainability) up front, we’ve made it possible for risk-averse institutions to actually use this technology rather than shy away from it. And the outcome is not just internal efficiency, it’s also better external compliance. When every alert is dispositioned with a clear, factual narrative, the organization is inherently in a stronger position to demonstrate the effectiveness of its AML program. It’s no longer “did we document that well enough?” but rather “we have a robust narrative for each case, generated and vetted systematically.”

Conclusion

In the rush of excitement around AI, it’s easy to grab the latest model and throw it at a problem. But when it comes to regulated workflows, especially something as sensitive as AML/fraud case narratives, a naive plug-and-play approach is a recipe for trouble. The experience at Flagright taught us that to truly benefit from LLMs, we had to invest in infrastructure that earns trust. That meant prioritizing privacy, control, and alignment from day one. By comparing options and ultimately building our own privacy-first AI stack, we avoided the pitfalls of hallucinations and data exposure while still reaping the efficiency and consistency benefits of AI. The result is an AI copilot that our customers (and their regulators) can feel confident about, one that boosts productivity and holds up under scrutiny.

For compliance leaders and risk teams, the takeaway is clear: you can embrace AI in your processes, but do it thoughtfully. Ensure the solution you adopt has the necessary safeguards and is tailored to your needs, rather than treating it as a black box. In high-stakes compliance, trust is everything, both the trust you place in your tools, and the trust regulators place in you. With the right infrastructure, AI can enhance that trust by making your narratives more reliable and your operations more transparent. Flagright chose the harder road of building and refining its own AI platform to achieve this, and we believe it’s the right approach for anyone who cannot compromise on data security or result accuracy.

Interested in seeing this in action? Book a demo and let us show you how disposition narratives powered by a privacy-first LLM can transform your compliance workflow, without ever putting your data or reputation at risk.

Book a demo

LLMs for Disposition Narratives: Why Flagright Built Its Own Privacy-First AI Infrastructure

The Promise and Risk of Using LLMs

Flagright’s Approach: A Privacy-First AI Stack

How This Helps Compliance Teams

Conclusion

You might be interested in

Flagright's Solutions

Modern solutions for industry-leading fincrime compliance programs

Transaction monitoring

AI Forensics

Case management

AML screening

Risk scoring