You ran 10,000 simulations and they all made money. Congratulations. None of them were real.
The most dangerous backtest is the one that confirms what you already believe.
The data generation problem
Every backtest starts with a dataset. And every dataset is a survivor.
The historical data your strategy runs on has representation issues baked into it. Delistings, halts, and blown-up tickers are either missing or poorly represented. The rare events that would destroy your strategy — flash crashes, liquidity vacuums, circuit breakers — are, by definition, rare in the sample.
You cannot stress-test for black swans using a dataset that barely contains them.
This doesn't mean backtesting is useless. It means the confidence interval is wider than most traders want to admit. Your backtest tells you what would have happened in a world that already happened, with data that already survived.
There is no "fair market value"
Price is a construct. The number on your screen is the last trade — not some objective measurement of what an instrument is worth.
Many traders operate on the assumption that the exchange price IS the market price. It isn't. Not exactly.
Consider futures: the composite price you see is often built from multiple venues and contract months using merge strategies. How that price gets assembled — first-traded, last-traded, volume-weighted, bid-ask midpoint — changes what the number means. Most traders never look at how their data vendor builds the tape. They just trust the line on the chart.
That line is an editorial decision, not a fact.
If your strategy depends on precise entries and exits, you're building on a price that was constructed by someone else's methodology. And if you don't know what that methodology is, you're trading someone else's assumptions as if they were your own.
Market impact is unmodeled
Here is the part that almost no simulation captures: your orders move the market.
If your strategy hits 100,000 contracts on the bid, the bid moves. The fill you modeled at the theoretical price doesn't exist anymore. The liquidity you assumed was there evaporates the moment you try to take it.
Backtesting doesn't simulate this. Paper trading doesn't simulate this. Market replay doesn't simulate this either.
Replay features are impressive engineering — they reconstruct the order book, they play back time and sales, they let you practice in "real" conditions. But the book you're seeing is the book that existed without you in it. The moment you participate at scale, you change the thing you're measuring.
This isn't a software limitation. It's a physics problem. You cannot observe a market and participate in it simultaneously without altering the observation.
The three false comforts
Backtesting, paper trading, and market replay share the same structural flaw: they simulate a market that doesn't know you're in it.
Backtesting runs your logic against historical data and assumes perfect fills at historical prices. It doesn't model slippage, partial fills, or the market's reaction to your orders.
Paper trading lets you practice execution in real time without capital. But the fills are synthetic — you get filled at prices that the real market may not have offered you, because your order didn't actually compete for liquidity.
Market replay reconstructs historical sessions and lets you trade them as if they were live. It's the closest approximation — and it still can't model how the market would have responded to your participation.
All three are useful for learning mechanics, testing logic, and building discipline. None of them tell you what happens when real money enters a real book.
What honest testing actually requires
If the tools are flawed, what do you do?
You use them anyway — but you stop treating their output as proof. A backtest is a hypothesis, not a conclusion. Paper trading is rehearsal, not performance.
Honest testing means:
- acknowledging that your dataset has survivorship bias and limited tail events,
- understanding how your data vendor constructs price,
- sizing your live positions as if the backtest overstated your edge (because it probably did),
- tracking live performance separately from simulated performance and comparing honestly,
- and accepting that the gap between simulation and reality is not a bug — it's the cost of doing business.
The traders who last aren't the ones with the best backtests. They're the ones who understood what the backtest couldn't tell them.
Open questions
- What was your biggest gap between backtest results and live performance?
- Have you built a system that actually survived contact with real markets? What did you have to change?
- How do you account for data quality when evaluating a strategy?