Establishing “Expected Behavior”: Using Median, Standard Deviation, & Average to Detect Suspicious Transactions

In AML and fraud monitoring, expected behavior refers to the normal patterns of transactions for a given customer, their typical transfer amounts, frequency of transactions, usual beneficiaries or destinations, etc. Defining this baseline is critical because it enables compliance teams to distinguish routine activity from anomalies. Unusual transactions that deviate significantly from a customer’s expected behavior can be red flags for money laundering or fraud. Conversely, what’s “unusual” for one customer might be completely ordinary for another, so a one-size-fits-all threshold will either miss risks or generate false alarms.

Regulators and industry standards increasingly demand a risk-based approach that incorporates customer-specific behavior norms. Global authorities like FATF and the EBA emphasize continuous monitoring of customer activity against an individualized risk baseline. The importance of tracking expected vs. actual behavior isn’t just theoretical, there have been real enforcement cases underscoring this need. For example, an FCA action against Santander UK revealed that a small business customer, who said to expect around £5,000 in monthly deposits, was in fact receiving millions per month – far beyond its stated profile. The bank’s systems generated some alerts, but poor tuning and follow-up allowed ~£298 million to flow through the account unchecked. In another case, NatWest was fined after a client’s deposits (totaling £265 million) wildly exceeded expected volumes without timely intervention. These incidents show that failing to compare current behavior against a proper baseline can let obvious anomalies slip through, or conversely, flag too many “noise” alerts if expectations aren’t factored in. In short, knowing a customer’s normal habits is key to catching the abnormal.

Key Methods to Establish Behavioral Baselines

To define what’s “normal” for a customer, analysts use a few fundamental statistical measures. Each has its strengths and weaknesses in capturing typical behavior:

Simple Average (Mean)

The average (mean) is the sum of all observed values divided by the count of values. It’s straightforward and often used as a quick baseline for transaction amounts or counts. For example, if a customer made 10 transfers totaling $5,000 last month, their average transfer amount is $500. A simple average is easy to compute and understand, which makes it a common reference point.

However, the mean can be misleading when there are outliers. One or two unusually large transactions will skew the average upward (or downward, if the outliers are very small). In other words, the mean is not a robust measure of central tendency, it’s sensitive to extreme values. If a user usually sends $200 - $500 per transfer but made one $10,000 transfer, the average might shoot up into the thousands, no longer reflecting the typical range. This is a major con of relying solely on the mean: it may not represent the true normal when the data distribution is skewed or contains anomalies.

On the pro side, the average does capture overall trends and is useful if the data doesn’t contain extreme outliers. In many cases, compliance teams might start with an average transaction size or average count per period as a baseline, then adjust for its weaknesses using the measures below.

Median

The median is the middle value of a data set when sorted from low to high (or the average of the two middle values if there’s an even number of points). By definition, half of the transactions are above the median and half below. This makes the median a robust indicator of the “typical” value, especially for skewed distributions. Unlike the mean, the median isn’t dragged up or down by a few extreme outliers. In a dataset of transfers that are mostly around $300 but with one or two huge transfers, the median will likely stay near $300, reflecting what’s normal for the majority of transactions.

For AML compliance monitoring, using the median transaction amount can be very useful.

Pros: It provides a realistic center of gravity for a customer’s behavior, filtering out sporadic spikes. If a customer’s median transfer is $250, that’s a good sense of their usual transaction size even if they occasionally have a $5,000 wire.

Cons: The median doesn’t convey anything about variability or range by itself, it won’t tell you if the customer sometimes does $1,000 or $5,000 transfers unless you also look at other metrics. Also, if the activity volume is low (say a customer has only a few transactions), the median might be less informative (in an extreme case, with only one transaction, the median equals the mean equals that value).

Standard Deviation

While the mean and median gauge the center of the behavior, the standard deviation (std dev) measures the spread of the data, how much variation there is around the average. In transaction monitoring, standard deviation is useful to understand how volatile or consistent a customer’s behavior is. If most of a user’s transactions are usually in a tight band (e.g. $200-$300), the standard deviation will be relatively small; if their transaction amounts swing wildly between $50 and $5,000, the std dev will be large.

The key use of standard deviation is to set an adaptive threshold for anomaly detection. Statistically, for many distributions (assuming a normal bell-curve approximation), about 95% of observations lie within ±2 standard deviations of the mean, and ~99.7% lie within ±3 standard deviations. That means anything beyond ~2 or 3 std dev from the average is quite rare and can be considered unusual. For example, if a customer’s average transfer is $500 with a standard deviation of $100, then a $800 transaction is 3 std dev above the mean, an extreme outlier in context, since fewer than 0.3% of points would naturally fall beyond 3σ in a normal distribution. In practice, such a transaction would be flagged for review because it’s far outside the expected range.

Pros: Standard deviation gives a concrete way to quantify how far off a current transaction is from the norm. Using rules like “flag if amount is more than 2σ above the average” adapts to each customer’s variability (what’s high for a usually steady customer might be normal for a volatile one).

Cons: Std dev assumes a somewhat symmetric distribution around the mean; it may not be as meaningful if the distribution is highly skewed (where median is more relevant) or if there are frequent outliers (which themselves inflate the std dev). Also, calculating a meaningful std dev requires a decent history of data points, it’s less reliable for a new customer with little transaction history.

In summary, mean, median, and standard deviation together help sketch a customer’s normal behavior profile. The mean gives an overall average, the median provides a robust typical value, and the standard deviation sets a gauge for what counts as a significant deviation from the norm. Next, we’ll see how these metrics are applied in real monitoring scenarios.

Applying These to Transaction Monitoring

Defining a baseline means establishing an expected range for each user’s behavior and then continuously comparing new events against that personal benchmark. Compliance analysts often create a profile like: “This user usually sends $200-$500 per transfer to 2-3 known beneficiaries per week.” With such a baseline, the system can then automatically flag transactions that fall outside this expected pattern.

In practice, this might involve rules such as: if a transaction amount exceeds X times the user’s average (mean) or if it falls outside 2 standard deviations of their normal amount range. Similarly, frequency-based checks can be used. For example, if a user typically makes around 5 transactions a week (with a certain std dev), suddenly making 20 in a day would be an anomaly. These techniques essentially enable dynamic thresholds that adjust to each customer. Rather than a static rule like “alert on any transfer above $10,000,” the threshold for each user can be proportional to their own historical behavior (e.g., “alert if amount is 3× more than this user’s 90-day average”). According to industry best practices, layering in such user-specific behavior signals helps surface risks that wouldn’t be visible under one-size-fits-all rules. By comparing each transaction to the customer’s own baseline, their own median amount, average frequency, usual payees, etc. institutions can stay responsive as behavior shifts over time.

Let’s consider two examples of how these baseline metrics help in detection:

AML Red Flag Example: Suppose an elderly customer usually makes one local transfer of around $300 every month to pay a utility bill. This month, however, they attempt an international wire of $20,000 to an overseas account. This transaction is way outside the customer’s normal range, it might be 10-20 times higher than their average amount and a completely new destination. Such a spike would immediately stand out: it exceeds the user’s typical amount by well over 3 standard deviations, and it doesn’t match their usual pattern of domestic payments. The monitoring system would flag this for review as a potential suspicious transaction (it could indicate the account was taken over by criminals or the customer is suddenly involved in high-risk activity). In fact, many classic money laundering red flags are exactly this, transactions inconsistent with the customer’s known profile. By defining “expected behavior” up front (e.g. normal amount and geography) and alerting on deviations, the institution sharply increases its chances of catching illicit activity like this in real time rather than after the fact.
Fraud Red Flag Example: Consider a user who typically logs in from Berlin and makes at most 2-3 small transactions a week via their digital wallet. If suddenly there’s a login from a new device in another country, followed by a rapid burst of 15 transfer attempts in one day, that’s a major anomaly. Here, the unusual pattern is a combination of factors: a new login location (never seen before) plus a spike in transaction volume far above the user’s normal weekly count. A rule might catch this by saying “if number of transactions in 24 hours > 5x (user’s average daily transactions) and originating device is new, then flag.” In essence, the user’s expected velocity (transactions per day/week) has been shattered. This is likely indicative of account takeover fraud, the fraudster is trying to drain the account quickly. By comparing the activity against the customer’s baseline (low and steady) and noticing a multi-standard-deviation surge in frequency, the system can flag the session for immediate investigation. The compliance team could then intervene or freeze transactions before serious damage is done.

In both examples, the key is context. A $20,000 wire or 15 transactions in a day might not be alarming for some high-net-worth or business customers at all. What makes it suspicious is that it’s out-of-character for that specific user. By establishing each user’s typical behavior (through metrics like median amounts, average frequencies, etc.) and monitoring deviations, banks and fintechs dramatically improve detection accuracy. They can catch truly suspicious outliers while ignoring irrelevant noise. This balance reduces false positives (alerts for routine activity that just looks large in a generic sense) and false negatives (missed true threats).

How Flagright Implements It

Flagright’s real-time transaction monitoring platform is designed with this expected behavior paradigm at its core. Through a powerful no-code rule builder, compliance teams can define dynamic rules and thresholds tailored to each customer’s own history, without writing a single line of code. In practice, this means an institution can easily set up conditions like:

“Alert if the transaction amount is more than 3× the customer’s average in the last 30 days.”
“Flag if the count of transactions this week exceeds the customer’s weekly average by 2 standard deviations.”
“Block the transfer if its value is above the customer’s 99th percentile (approximately mean + 3σ) of past 6 months’ transactions.”

Under the hood, Flagright’s rule engine provides dynamic variables and functions that automatically calculate these baseline metrics per user in real time. Fincrime teams can drag-and-drop placeholders for a user’s median transaction amount, average daily volume, 90-day transaction velocity, and so on, directly into their rule definitions. The platform continuously updates each customer’s baseline metrics as new data comes in, so the thresholds remain current. Real-time comparison is key, when a new transaction or event arrives via the API, Flagright instantly evaluates it against the user-specific benchmarks (like their own median or last N days’ totals) without slowing down processing. This immediate, on-the-fly analysis means suspicious anomalies can be caught the moment they occur.

Notably, Flagright’s approach merges these statistical baselines with broader risk logic. For example, the system allows combining behavior-based checks with other risk factors in a single rule (as we saw in the fraud example with new device + high volume). FIs can choose from a library of common factors or define custom logic using any data field available. Transaction velocity, recipient country, device ID, etc, all can be incorporated alongside median/mean comparisons to create context-rich rules. The result is a highly flexible monitoring framework: compliance teams can set expected ranges for each user’s activity and specify what degree of deviation should trigger an alert, all through an intuitive interface. And because the rule builder is code-free, adjusting thresholds or adding new “expected behavior” checks is quick, no engineering deployment needed. Teams can even simulate rule changes against historical data (using Flagright’s rule simulation and “shadow rules” features) to fine-tune their scenarios before activating them live.

Crucially, Flagright’s platform operates in real time. As soon as a user’s behavior veers off their normal path, the system can respond. If a user suddenly increases their transaction velocity or starts sending unusually large volumes, their risk score or alert status adjusts automatically. This dynamic baseline tracking is continuously feeding into the transaction monitoring and customer risk scoring engines. Flagright users have seen that by leveraging each customer’s expected behavior in this way, they detect threats earlier and reduce false positives, the monitoring isn’t crying wolf at every high transaction, only when it’s high for that customer. It’s a product-aware solution that mirrors the best practices we’ve discussed: define the baseline, watch for divergence.

Conclusion: Know the Baseline, Catch the Anomaly

In the fight against financial crime, one of the most powerful questions a compliance team can ask is, “Is this behavior expected for this customer?” Establishing a baseline for each user, whether it’s their typical transaction size, normal login pattern, or usual transaction count is essential to answer that question. We’ve seen that different metrics serve different purposes: the median often gives a truer picture of typical transaction value in the presence of outliers, the mean can indicate overall trends, and the standard deviation provides a quantitative yardstick for what’s significantly outside the norm. There is no single metric that covers it all; a combination is often ideal. For example, using the median for transaction amounts (to handle skewed distributions) paired with standard deviation (to gauge variability) can establish a robust expected range. Compliance leads and risk analysts should choose the right tool for each behavior pattern, e.g. median or percentile-based thresholds for transaction amounts, vs. standard deviation or velocity measures for frequency of actions.

It’s also important to remember that expected behavior isn’t static. A customer’s habits can evolve over time (gradually or suddenly), and what was abnormal last year might be routine this year, or vice versa. This is why continuous monitoring and dynamic baselining are so crucial. By constantly recalibrating what “normal” looks like for each user, you ensure that your detection logic stays relevant and effective. A system grounded in expected behavior will catch the truly suspicious anomalies, those needle-in-haystack deviations that signal risk, while gracefully handling the day-to-day fluctuations that reflect genuine customer activity.

Flagright’s real-time, AI-native platform was built on these principles. It empowers financial institutions to define and adjust behavioral baselines effortlessly, detecting when activity deviates from expected patterns in real time. By leveraging advanced financial crime compliance solutions, organizations can strengthen both AML compliance and fraud prevention, turning insights into actionable intelligence that enhances accuracy, speed, and regulatory confidence.

Ready to see how Flagright enables dynamic behavior modeling and anomaly detection for your organization? Book a demo.

Book a demo

Establishing “Expected Behavior”: Using Median, Standard Deviation, and Average to Detect Suspicious Transactions

Key Methods to Establish Behavioral Baselines

Simple Average (Mean)

Median

Standard Deviation

Applying These to Transaction Monitoring

How Flagright Implements It

Conclusion: Know the Baseline, Catch the Anomaly

You might be interested in

Flagright's Solutions

Modern solutions for industry-leading fincrime compliance programs

Transaction Monitoring

AI Forensics

Case Management

Watchlist Screening

Risk Scoring