Trade Observations
Stop Guessing and Start Observing

Stop Guessing: When Different Models Agree on the Same Stop Level

February 1, 2026
#trading-systems#risk-management#machine-learning#stops#MAE

Comparing Random Forest and Gradient Boosting stop models using MAE-to-date and recovery probability.


The Question

I’ve been working on replacing static stop rules with a data-driven, regime-aware stop framework.

The core question is simple:

At what point does a trade become statistically unlikely to recover?

Rather than guessing, I modeled probability of recovery as a function of MAE-to-date and market context, and then asked multiple models the same question.

What surprised me was not the answer — but how consistent it was.


The Setup

Data

  • Instrument: ES (1-minute bars)
  • Regime: PA-FIRST
  • Sample size: 639 trades
  • Features:
    • MAE-to-date (true adverse excursion using high/low)
    • ATR (1m)
    • Distance from EMA
    • EMA slope
    • Minutes in trade
    • Regime (one-hot encoded)

Label

A trade is considered recovered if final PnL > 0.

The models predict:

P(recover | current MAE-to-date, context)


Models Compared

I trained and evaluated two nonlinear models:

1. Random Forest

  • Strong baseline
  • Handles nonlinearity and interactions
  • Often criticized for instability

2. Gradient Boosted Trees (Histogram-based)

  • Faster convergence
  • Strong bias control
  • Often outperforms RF on tabular data

Both models were trained identically:

  • Grouped by trade_id (no leakage)
  • Same features
  • Same probability threshold extraction logic

How the Stop Level Is Derived

Instead of using the model output directly, I apply a policy extraction step:

  1. Predict P(recover) at each 1-minute snapshot
  2. Bin snapshots by MAE-to-date
  3. Find the first MAE level where:

mean P(recover) < 0.20

  1. Use that MAE as the model-derived max stop level

This turns a probabilistic model into a deterministic, auditable risk rule.


The Result

Both models independently produced the same stop level:

RegimeThresholdMax Stop (pts)Observations
PA-FIRSTP(recover) < 0.20** 9.9**639

This is remarkably close to the heuristic I had previously derived by hand:

  • Caution zone ≈ 9.5 pts
  • Hard failures accelerate ≈ 10–11 pts
  • Kill switch ≈ 12 pts

Why This Matters

When different model families agree, it usually means:

  • The signal is structural, not model-specific
  • MAE-to-date is the correct axis
  • The decision boundary is stable
  • The result is unlikely to be a coincidence

In other words:

This stop level is being discovered, not fit.


Design Implications

In live trading, this becomes:

  • Model exit: MAE-to-date ≈ 9.9 pts
  • Hard kill switch: 12.0 pts (safety backstop)
  • Execution floor: small buffer (e.g. 0.5 pts) to avoid noise

The model doesn’t replace discipline — it quantifies it.


A Subtle but Important Insight

This approach does not require loading models in production.

The models are used offline to learn regime-conditioned stop policies, which are then written to a database and consumed by the live execution engine.

That keeps live systems:

  • simpler
  • safer
  • easier to reason about

What’s Next

This was just PA-FIRST.

The real test (and likely divergence) comes with:

  • ATM-FIRST trades
  • higher volatility regimes
  • time-conditioned policies (early vs late trade)
  • asymmetric logic (tighten vs exit)

But the takeaway stands:

If Random Forests and Gradient Boosting agree on the same stop level, the market is telling you something worth listening to.


This post is part of an ongoing effort to replace intuition-driven trading rules with observable, testable system behavior.