Adding Random Forest to Machine A: Dual-Model GTO Signal Comparison
One of the long-term goals of Trade Observations is not just to automate trade decisions, but to continuously improve how those decisions are made.
Machine A is responsible for generating the primary 5-minute GTO signal that drives execution decisions for ES/MES futures. Until now, that signal was produced by a TensorFlow/Keras model that returns a probability representing:
Probability the next close is up
That model has been stable and useful, but it raised an important question:
Is Keras actually the best model for this job?
To answer that properly, I added a second model to Machine A:
Random Forest
Instead of replacing the Keras model immediately, the better engineering decision was to run both models side by side and compare them under identical market conditions.
This creates a much stronger decision framework:
- same instrument
- same 5-minute bars
- same feature set
- same labels
- same thresholds
- same production environment
Only the model changes.
That gives a true apples-to-apples comparison.
Existing Machine A Architecture
Machine A currently follows this flow:
NinjaTrader → MSMQ / RTD → Python → Model Inference → Excel (PyXLL) → Signal Push → Database
The Keras model already produces:
- probability of next close being up
- directional action (
Long,Short,Flat) - Kelly sizing
- expectancy
That information is written to Excel, logged to SQLite, and pushed into the downstream execution workflow.
The system supports both:
- PA-FIRST
- ATM-FIRST
execution frameworks.
Why Random Forest
Random Forest offers a very different modeling approach compared to neural networks.
Keras strengths
- excellent for nonlinear relationships
- flexible architecture
- supports sequence-style modeling
- strong probability output
Random Forest strengths
- fast to train
- highly interpretable
- robust against noisy features
- strong feature importance analysis
- easier overfitting inspection
Most importantly:
Random Forest gives visibility into why predictions happen.
That matters when the model is driving real capital decisions.
Training Setup
The goal was not to invent a new problem.
The goal was to compare models on the exact same problem.
Shared Features
Both models use the same core 5-minute RTH feature set:
- Open
- High
- Low
- Close
- ChopIndex
- ITD168
- TRG168
- ITD-TRG
- Chopi1BarChg
- TRG1BarChg
- EMA
- PeriodHH
- PeriodLL
- ITD-EMA
Shared Label Logic
Labels are generated using ITD and TRG directional agreement:
Label = 1
When:
- current ITD > previous ITD
- current TRG > previous TRG
Label = -1
When:
- current ITD < previous ITD
- current TRG < previous TRG
Label = 0
Everything else.
For the first comparison, Random Forest was trained as a binary model:
label == 1 → 1
all others → 0
This matches the Keras bullish-probability framework.
Threshold Logic
Both models use the same live thresholds:
prob_up >= 0.62 → Long
prob_up <= 0.38 → Short
otherwise → Flat
This is critical.
Without identical thresholds, comparing models becomes misleading.
Production Integration
The Random Forest model was added to Machine A without disrupting the existing Keras workflow.
What Changed
A second model loader was added to PyXLL:
XL_LOAD_LATEST_RF_FROM_S3()
This loads:
- RF
.joblibmodel - RF metadata JSON
from S3 into Excel process memory.
Live Inference
A new PyXLL macro was added:
XL_RF_PREDICT_FILL_AND_LOG()
This:
- reads the latest feature window from the worksheet
- reconstructs the exact feature order used during training
- performs RF inference
- converts probability to GTO action
- writes output to Excel
- logs predictions to SQLite
The existing Keras macro remains unchanged.
Both now run side by side.
Comparison Database
A shared SQLite database was created:
model_compare.sqlite3
with two tables:
keras_predictionsrf_predictions
This allows direct comparison of:
- probabilities
- directional actions
- agreement vs disagreement
- trade frequency
- non-flat accuracy
- performance by regime
This is far better than comparing screenshots or spreadsheet cells manually.
It creates a real research workflow.
First RF Results
Initial RF training produced:
Binary Accuracy
77.4%
Signal vs Raw Label Accuracy
62.6%
Non-Flat Signal Accuracy
71.9%
That last number matters most.
It means:
When Random Forest commits to a Long or Short signal, it is correct nearly 72% of the time against the raw label.
That is a strong early result.
Even more important:
The probability distribution is healthy and not collapsing around 0.50.
That means the model is actually making directional decisions—not just hiding in uncertainty.
What Will Be Measured Next
The real question is not:
Which model has better accuracy?
The real question is:
Which model produces better trading decisions?
Those are not always the same thing.
The comparison now focuses on:
Signal Frequency
Does one model overtrade?
Does one stay flat too often?
Agreement Cases
When both models agree:
- are outcomes stronger?
- is confidence higher?
Disagreement Cases
When models disagree:
- which model wins more often?
- is disagreement itself a useful signal?
Regime Performance
Which model performs better in:
- trends
- trading ranges
- breakout environments
- PA-FIRST vs ATM-FIRST
Risk Quality
Does one model produce better stop behavior and better downstream trade management?
That matters more than raw probability scores.
Final Thought
This is not about proving Random Forest is better than Keras.
It is about building a better decision engine.
Good trading systems are not built by defending old assumptions.
They are built by measuring alternatives honestly.
Machine A is now capable of doing exactly that.
And that is a much bigger upgrade than simply adding another model.
It turns Machine A into a research platform.
That is where real edge comes from.