Backtesting metrics

This article contains the explanations of the backtesting metrics used to describe and evaluate the characteristics and the performance of an anomaly, or a portfolio of anomalies.


Count and Time

  • N°Years: the number of years of history of a financial instrument.

  • N°Trades: the number of trades.

  • N°Trades 1y: the average number of trades that has been done in a calendar year.

  • Avg Trade Duration (days): it is the average duration, in calendar days, of a single trade.


Total Return and Profit

  • Total[R]: it is the total return in percentage.

  • Total profit: it is the total profit, in dollars, starting from a capital of 100.

Yearly Trade Returns

  • Avg [R] annualized: it's the average return of the trades, in percentage, annualized.

  • Avg [R] 1y: it is the average yearly return of the trades.

  • Stdev [R] 1y: it's the standard deviation of the yearly returns.

  • RR 1y: the Reward / Risk or RR, is the ratio between the yearly average return and the standard deviation of the yearly returns

  • Winning % 1y: it is the percentage of the positive yearly returns.

  • Sharpe:

    • the 'Sharpe ratio' is a metric to evaluate the risk-adjusted return of the Anomaly or the Portfolio (let's call them 'strategy').
    • It indicates how well the strategy has performed in comparison to a 'Risk-Free' rate of return.
    • It is computed as the ratio between the yearly excess return of the strategy, over the risk free (US 3m T-Bill rate) and the standard deviation of the yearly returns of the strategy.
  • Sortino

    • the Sortino ratio' is a metric to evaluate the** risk-adjusted return** of the anomaly/portfolio.
    • It is similar to the Sharpe ratio, and differs just in the denominator: it only considers the standard deviation of the downside risk, rather than that of the entire (upside + downside) risk.

Trade Returns

  • Avg [R]: it is the average return of the trades.

  • Stdev [R]: it is the standard deviation of the returns.

  • Reward / Risk (RR): is the ratio between the average return and the standard deviation.

  • Winning %: it is the percentage of winning trades.

  • Profit Factor (PF):

    • is an index of the quality of trading, which evaluates, with a number, the relationship between the risks assumed and the results.
    • It is computed by dividing the sum of the profits by the sum of the losses.
  • Stability:

    • it's the stability of the equity line of the backtest.
    • It can goes from 0% (min stability) to 100% (max stability).
    • An high stability means that the equity have had a steady linear rise over time.

Drawdown

  • Avg Dwn:

    • is the average drawdown of the equity line of the backtest.
    • The lower is the average drawdown, the closer the equity line has been to its all-time highs.
  • Max Dwn:

    • is the maximum drawdown of the equity line.
  • Max Dwn / Avg [R] 1y

    • is the ratio between the maximum drawdown of the anomaly and its average yearly return.
    • It expresses how many years it could take to recover from a drawdown equal to the maximum historical drawdown.

Risk in the worst case scenario

  • C-VaR:
    • C-VaR stands for 'Conditional Value at Risk', and is also called 'Expected Shortfall'.
    • It is an advanced metric and is used in portfolio optimization for effective risk management.
    • It is computed by taking the average of the “extreme” losses in the tail of the distribution of historical returns, beyond the value at risk (VaR) cutoff point, usually 99%. So is the average of the worst centile (1%) historical returns.

Winning/Losing Streaks

  • Z-Score streaks
    • the Z-Score streaks measures how it is likely that our streaks of trades (consecutive wins and consecutive loss) are random or not.
    • It fluctuates between -3 to +3, but sometimes, can go above and below these levels.
      • A positive Z-score means that a profitable position is likely to be followed by a losing one, while a losing one should probably be followed by a winning one, so the probability of long winning and losing streaks is low.
      • A Z-score value of 0 means that we are dealing with completely random results.
      • A negative Z-score means that a profitable position is likely to be followed by more profitable positions, and a losing position uses to be followed by more losing positions, it means that winning or losing streaks are probable.
    • For example, if the last trade was a winning one, we can expect that the following one will be:
      • if Z-Score is near to +3 ==> losing
      • if Z-Score is near to 0 ==> 50% losing, 50% winning
      • if Z-Score is near to -3 ==> winning.

Excess Metrics of Anomalies

  • Exc. Avg [R] ann.: it is the difference between the 'average gross return annualized' of the anomaly and the 'other trades' one.

  • Exc. Avg [R]: it is the difference between the 'average gross return' of the anomaly and the 'other trades' one.

  • Exc. RR: it's the difference between the 'reward / risk' of the anomaly and the 'other trades' one.

  • Exc. Winning %: it is the difference between the 'positive percentage' of the anomaly and the 'other trades' one.

Note that:

  • Excess metrics are computed over the 'Other Trades': trades done in the same instrument but in a different periods than the Anomaly one.
  • For example, "Apple TDW 1" are as other trades: "Apple TDW 2,3,4,5".
  • Technicality: the 'other trades' returns are duration-adjusted according to the average trade duration of the anomaly. In the "Apple TDW 1" case, the returns of "Apple TDW 2,3,4,5" are divided by 4.

Excess Metrics of Portfolios

  • Exc. Avg [R] 1y on Bench.: it's the average yearly return of the portfolio minus the benchmark one.

  • Exc. Avg [R] 1y on RF: it is the average yearly return of the portfolio minus the Risk Free one.

  • Exc. RR 1y on Bench.: the yearly Reward / Risk of the portfolio minus the benchmark one.

  • Winning % 1y on Bench.: the percentage of years in which the return of the portfolio has been higher than the benchmark one.

  • Winning % 1y on RF: it's the percentage of years in which the return of the portfolio has been higher than the Risk Free one.

Note that:

  • Excess metrics are computed over 'Benchmark' or over 'Risk-free rate' returns.
    • The 'Benchmark' is the S&P 500.
    • The 'Risk-free rate' is the annualized US 3 months T-Bill rate.

Scores and Ratings

  • Score
    • The 'Score' is the origin of the 'Rating'. Statistically speaking, the'Score' is computed from a statistical test, which is different according to the distribution of returns, as '1 / p-value'.

    • There are 2 types of scores:

      • Score on zero: (for 'anomalies' and 'portfolios') measures how much a set of returns is significantly different from a set of returns with zero-mean.
      • Score on others: (for 'anomalies' only) measures how much a set of returns is significantly different with the 'other trades' set of returns (explained above with Apple TDW example).

There are 3 types of Ratings:

  • Rating

    • is the 'Rating' derived from the 'Score' and goes from 0 to 5 stars.
    • The higher the Rating, the more the returns are statistically significantly different from zero or from another set of returns.
    • the Rating is given according to a 'Score' clustering.
  • Rating FC

    • it's a variation of the Rating
    • it is computed as the weighted average of 3 Ratings given in a same period (In-Sample, or Out-of-Sample, or Entire-Backtest). The formula is: 0.4 'Rating Net' + 0.4 'Rating on Others' + 0.2 'Rating Gross'.
  • Rating FC summary

    • it is a Rating derived from the 'Rating FC' computed over the entire backtest returns. The formula is like: 0.4 'Rating Net All' + 0.4 'Rating on Others All' + 0.2 'Rating Gross All'
    • I have written 'is like' because, in addition, the final formula gives:
      • a bonus to the Anomalies that were able to keep the good 'In-Sample' performance in the 'Out-of-Sample' period;
      • a malus vice-versa in the opposite case.