ETF Due Diligence: Chasing Quality, Not Performance

Written by Elisabeth Kashner, CFA | Nov 2, 2017

The Wall Street Journal lit a fire in the mutual fund due diligence world last week, publishing The Morningstar Mirage, a 19-page analysis of Morningstar’s five-star rating system. In it, the WSJ charged that “funds that earned high [Morningstar] star ratings attracted the vast majority of investor dollars. Most of them failed to perform.” The WSJ continued, “On the average, five-star funds eventually turn into merely ordinary performers.”

The cost of performance chasing is staggering. ETF investors are not immune. Just ask the crowd that invested $8.3 billion in WisdomTree Europe Hedged Equity Fund (HEDJ-US) between January 1, 2015 and June 30, 2016.

The cumulative net return of HEDJ during that time was a mere 0.11%, while AUM increased from $5.6 billion to $10.6 billion.

Over that 18-month period, buy-and-hold investors earned .07% per year. On a dollar-weighted basic, investors lost 21.58% per year. Dollar-weighted returns show the average experience of the “hot money”—the investors who created and redeemed shares of the fund during the period. Dollar-weighted returns take into account the rate of return to every new investment (inflows) and sale (outflows).

That 21.65% performance gap between buy-and-hold and tactical trading in HEDJ shows just how painful performance chasing can be.

In March of this year, I compared fund returns vs. investors’ returns for 60 ETFs in the U.S. total market, U.S. large cap, and Developed Ex-U.S. total market segments, for a five-year period. That’s all the “smart beta” and a few representative vanilla funds that had a five-year track record, with no change in investment strategy that might cloud the results.

The results were clear. Buy-and-hold won out in 53 of the 60 funds. Dollar-weighted investors underperformed their buy-and-hold counterparts by an average of 1.7% per year, per fund.

How Performance Chasing Falls Short

Performance chasing often ends badly, because few funds can outperform quarter after quarter, year after year. As Morningstar explained to The Wall Street Journal in a written response to its article, “Reversion to the mean is a powerful force that can affect any investment vehicle.” Performance history, even when risk-adjusted, is not a reliable predictor of future results.

Morningstar CEO Kunal Kapoor also wrote in the rebuttal, “Our research finds that the star rating points investors toward cheaper funds that are easier to own and likelier to outperform in the future. These are qualities that correspond with investor success.”

We agree that fund quality—low cost and low operational risk—are predictable factors that influence long-term fund performance, because every penny that leaks out of an investment account is a penny that can’t provide future returns.

Fund quality is mostly a function of costs—both operational and trading—but it also includes some risk assessment that measures the probability of rare, high-impact events that potentially have severe consequences. The biggest operational costs are tracking difference and trading charges, which can be measured by the bid-ask spread. ETF closure is an example of a severe negative event, as is ETN issuer default.

Analysis of these costs and risks allows us to differentiate between funds with similar mandates and equal expense ratios, such as iShares TIPS Bond ETF (TIP-US) and PIMCO Broad US TIPS Index ETF (TIPZ-US). Both funds cost 0.20% per year, but the iShares product tracks its index more closely, and trades with higher volumes and much lower spreads. That’s why FactSet ETF Analytics gives TIP an A letter grade, and TIPZ a B.

Costs can add up over time. An investor who chose A-rated ETFS Physical Platinum Shares (PPLT-US) over F-rated iPath Bloomberg Platinum Subindex Total Return ETN (PGM-US) for a one-year holding period would have held onto an average of 2.05%, based on the median tracking difference and average spreads. That’s a huge performance boost, with no market risk.

Performance analysis simply is not a part of an assessment of operating and trading costs. Instead, we look at fund performance and market risk as a separate section, called Fit.

ETF Analytics’ Fit score ranks funds on the basis of active risk. A score of 100 indicates that the fund completely reflects its opportunity set; a score of zero points to extreme bets against the market. Fit measures relative risk, without extrapolating returns.

Why not returns? It turns out that historical performance—even rigorously measured, statistically significant outperformance—is an unreliable predictor of future performance.

This second point is worth taking first. In the ETF universe, risk-adjusted outperformance that passes even the most basic tests of statistical significance is hard to come by. And it’s not just because vanilla ETFs are built to mimic the market rather than outperform. It’s worse. Any time I’ve tested the risk-adjusted returns of “smart beta” ETFs that promote their potential for outperformance, the vast majority perform in-line with risks taken. Not over, not under. Last time I did this, with a five-year look-back as of March 31, 2017, only 8.8% of the 53 funds I tested delivered positive alpha at the generous 90% significance level. In layman’s terms, most of the raw over- and under-performance was attributable to risk and statistical noise.

Even at the simple level—raw performance—persistence of returns turns out to be vanishingly rare. Just ask Aye Soe and Ryan Poirier, authors of the semi-annual SPIVA study, who explained, in their June 2017 Persistence Scorecard, ”no large-cap, mid-cap, or small-cap funds managed to remain in the top quartile at the end of the five year measurement period. This figure paints a negative picture regarding long-term persistence in mutual fund returns.” They went on to say, “The data show a stronger likelihood for the best-performing funds to become the worst-performing funds than vice versa.”

In other words, performance chasing is often quite costly.

ETFs that offer broad-based, cap-weighted coverage of a well-defined market segment are built to reflect the market rather than outperform it. In order to outperform the market, a fund has to offer exposure to risks that are different from the market baseline, aka relative risk. No relative risk means no outperformance, and no underperformance. There’s nothing to chase.

ETF Analytics’ Fit score gives high marks for minimizing relative risk.

For example, iShares Core MSCI Europe (IEUR-US) scores a 99 for Fit, because it provides broad-based, unbiased coverage of the developed European equity markets. By contrast, WisdomTree Europe Hedged Equity Fund (remember the hot money flowing into HEDJ?) holds a narrow portfolio of export-oriented firms that pay dividends, weighted by annual cash dividends paid with specific caps, with a currency hedge overlay. Investors have to get four bets right to win with HEDJ: Europe, exporters, dividend emphasis, currency hedge. That’s why HEDJ scores 39 in Fit.

These portfolio differences make for vastly different returns. While HEDJ had an excellent month (as of 10-27-17), it trailed badly over the six-month lookback, even against other currency hedged or “smart beta” funds like DBEU and EUMV.

ETF Analytics’ Fit score rolls up portfolio comparisons and the returns difference analysis. Investors who prefer to ride the mean, rather than watch their investments surge and fall back in potentially costly mean reversion need only search out a high Fit score. While these investors will bear market risk that comes with exposure to the segment, they can at least avoid relative risk. For those who prefer to make tactical bets, the Fit score allows for a quick assessment of the active risk chosen.

Analyst Pick, a designation granted to one fund per segment, combines classic ETF due diligence and a straightforward investment philosophy that minimizes active risk. FactSet’s ETF Analyst Pick recommends ETFs that investors can hold for an extended time—for as long as they want exposure to a market segment. Analyst Pick is the antidote to performance chasing.

By applying basic cost and liquidity thresholds, Analyst Pick narrows each segment’s offering to the funds that are cheap and low risk to hold, and easy to trade. Next, the methodology selects the simplest vanilla investment strategies, favoring those with the highest Fit score.

Here’s how it works within the US Total Market Equity segment—a favorite for long-term equity exposure. ETF Analyst Pick, Vanguard Total Stock Market ETF (VTI-US), posts a headline expense ratio that is 0.01% higher than the segment low, but executes as if it were even cheaper, with a median -0.01% annual tracking difference. VTI is cheap to trade at a 0.01% spread. Most importantly, VTI’s Fit score of 99 assures investors that the fund does what it claims to—gives exposure to the US Total Market—without making bets against the market.

The table below, from FactSet’s new ETF screener, makes it clear why VTI stands out in this crowded segment.

ETF due diligence works best when costs and risks are foremost. Low cost, liquid, well-run funds preserve investor capital for taking desired risks, such as exposure to a particular market segment. Pointing to the odds-on bet of broad-based, cap-weighted exposure helps position long-term investors to avoid active risk and costly performance chasing.

View full post