Adding Random Forest to Machine A: Dual-Model GTO Signal Comparison

One of the long-term goals of Trade Observations is not just to automate trade decisions, but to continuously improve how those decisions are made.

Machine A is responsible for generating the primary 5-minute GTO signal that drives execution decisions for ES/MES futures. Until now, that signal was produced by a TensorFlow/Keras model that returns a probability representing:

Probability the next close is up

That model has been stable and useful, but it raised an important question:

Is Keras actually the best model for this job?

To answer that properly, I added a second model to Machine A:

Random Forest

Instead of replacing the Keras model immediately, the better engineering decision was to run both models side by side and compare them under identical market conditions.

This creates a much stronger decision framework:

same instrument
same 5-minute bars
same feature set
same labels
same thresholds
same production environment

Only the model changes.

That gives a true apples-to-apples comparison.

Existing Machine A Architecture

Machine A currently follows this flow:

NinjaTrader → MSMQ / RTD → Python → Model Inference → Excel (PyXLL) → Signal Push → Database

The Keras model already produces:

probability of next close being up
directional action (Long, Short, Flat)
Kelly sizing
expectancy

That information is written to Excel, logged to SQLite, and pushed into the downstream execution workflow.

The system supports both:

PA-FIRST
ATM-FIRST

execution frameworks.

Why Random Forest

Random Forest offers a very different modeling approach compared to neural networks.

Keras strengths

excellent for nonlinear relationships
flexible architecture
supports sequence-style modeling
strong probability output

Random Forest strengths

fast to train
highly interpretable
robust against noisy features
strong feature importance analysis
easier overfitting inspection

Most importantly:

Random Forest gives visibility into why predictions happen.

That matters when the model is driving real capital decisions.

Training Setup

The goal was not to invent a new problem.

The goal was to compare models on the exact same problem.

Shared Features

Both models use the same core 5-minute RTH feature set:

Open
High
Low
Close
ChopIndex
ITD168
TRG168
ITD-TRG
Chopi1BarChg
TRG1BarChg
EMA
PeriodHH
PeriodLL
ITD-EMA

Shared Label Logic

Labels are generated using ITD and TRG directional agreement:

Label = 1

When:

current ITD > previous ITD
current TRG > previous TRG

Label = -1

When:

current ITD < previous ITD
current TRG < previous TRG

Label = 0

Everything else.

For the first comparison, Random Forest was trained as a binary model:

label == 1 → 1
all others → 0

This matches the Keras bullish-probability framework.

Threshold Logic

Both models use the same live thresholds:

prob_up >= 0.62 → Long
prob_up <= 0.38 → Short
otherwise → Flat

This is critical.

Without identical thresholds, comparing models becomes misleading.

Production Integration

The Random Forest model was added to Machine A without disrupting the existing Keras workflow.

What Changed

A second model loader was added to PyXLL:

XL_LOAD_LATEST_RF_FROM_S3()

This loads:

RF .joblib model
RF metadata JSON

from S3 into Excel process memory.

Live Inference

A new PyXLL macro was added:

XL_RF_PREDICT_FILL_AND_LOG()

This:

reads the latest feature window from the worksheet
reconstructs the exact feature order used during training
performs RF inference
converts probability to GTO action
writes output to Excel
logs predictions to SQLite

The existing Keras macro remains unchanged.

Both now run side by side.

Comparison Database

A shared SQLite database was created:

model_compare.sqlite3

with two tables:

keras_predictions
rf_predictions

This allows direct comparison of:

probabilities
directional actions
agreement vs disagreement
trade frequency
non-flat accuracy
performance by regime

This is far better than comparing screenshots or spreadsheet cells manually.

It creates a real research workflow.

First RF Results

Initial RF training produced:

Binary Accuracy

77.4%

Signal vs Raw Label Accuracy

62.6%

Non-Flat Signal Accuracy

71.9%

That last number matters most.

It means:

When Random Forest commits to a Long or Short signal, it is correct nearly 72% of the time against the raw label.

That is a strong early result.

Even more important:

The probability distribution is healthy and not collapsing around 0.50.

That means the model is actually making directional decisions—not just hiding in uncertainty.

What Will Be Measured Next

The real question is not:

Which model has better accuracy?

The real question is:

Which model produces better trading decisions?

Those are not always the same thing.

The comparison now focuses on:

Signal Frequency

Does one model overtrade?

Does one stay flat too often?

Agreement Cases

When both models agree:

are outcomes stronger?
is confidence higher?

Disagreement Cases

When models disagree:

which model wins more often?
is disagreement itself a useful signal?

Regime Performance

Which model performs better in:

trends
trading ranges
breakout environments
PA-FIRST vs ATM-FIRST

Risk Quality

Does one model produce better stop behavior and better downstream trade management?

That matters more than raw probability scores.

Final Thought

This is not about proving Random Forest is better than Keras.

It is about building a better decision engine.

Good trading systems are not built by defending old assumptions.

They are built by measuring alternatives honestly.

Machine A is now capable of doing exactly that.

And that is a much bigger upgrade than simply adding another model.

It turns Machine A into a research platform.

That is where real edge comes from.