This post has been made possible by Financial Modeling Prep, which gave me a full and free API access to their excellent database. This post is more about Business Analysis than Machine Learning, but I’ll write a technical one soon.
Drawing conclusions from Income/Cash Flow/Balance Sheet statements requires two things:
A deep understanding of finance and how different items in financial statements reflect real-world business operations
Good intuition, developed through experience
While machine learning models struggle with the first aspect, they excel at the second. That's why I decided to train a neural network on virtually every public financial statement issued since 2000 to discover what statistical and financial patterns a model could learn from experience.
When analyzing financial statements for investment decisions, I typically ask questions like:
If a company's revenue grew strongly from year n-3 to year n-1, should I expect continued growth or regression to the mean?
Is long-term debt an indicator of healthy investment and long-term planning, or poor financial management?
When a company performs well, is it more likely to issue new stocks (increasing capital) or buy them back (returning capital to shareholders)? In other words, should I expect dilution?
More broadly, is studying financial statements worth my time before investing?
The model I have trained provides some clues. It has been trained to predict 8 metrics1 for next years’ financial statements based on the last three years of history, each comprising 101 components2.
Is it possible to predict the future by looking at financial statements alone?
To some extent.
A naive -yet surprisingly effective- approach is to simply carry over figures from year n-1 to year n. A small corrective factor can be applied to account for year-over-year average growth.
Any improvement in accuracy beyond this baseline can be considered successful. Below, I compare the model's performance to a basic persistence model. The metrics listed are those I consider most relevant for investors and stockholders.
Here, the “loss” is the opposite of accuracy. It is a common machine learning term, which, in this case, is the average prediction error, in standard deviations3.
On average, my model is 16% more precise than a persistence model for these metrics (0.1341 vs 0.1596 average loss). It performs best on the first five metrics but shows poor performance on the last three.
This discrepancy exists because the heuristics the model learned to predict dividends, revenue, and stockholders' equity don't effectively apply to future predictions. While these patterns were learned from historical data (2000 to 2022/2023), they prove counterproductive when forecasting future performance4
It looks like there is no way to reliably predict the Revenue of a company just by looking at its past financial statements. The same holds true for Dividends Paid and Stockholders’ Equity.
The narratives “revenue should keep on growing like the last years” and “revenue will have to regress to the mean” are undecidable, and mostly cancel out. If anything, the latter is slightly more accurate (see footnote 5). This answers my first question.
In contrast, metrics related to net income show much higher predictability.
Interpreting predictability
First of all, it should be noted that the least predictable features are also the least volatile. On average, in the ~25,000 public companies I analyzed, the y.o.y changes in Revenue accounted for only 7% of the standard deviation of revenue across companies. To put it plainly: revenue varies much more across companies than it does y.o.y within the same company. The same can be said of Stockholders Equity.
It is surprising, though, that these metrics can’t be predicted at all. After reflection, I think it boils down to this:
We have to know the initial state of a system to predict its evolution
Financial statements are made to reflect the financial state of a company. The metrics they display have survived a selective pressure from investors and regulators to convey the most important and useful financial information.
But they don’t say much about the operational state of a business (which predicts revenue) or the emotional state of board members (which predicts Dividends and Stockholders’ Equity). The initial state of a system is necessary to predict its evolution, and failing to capture it impairs prediction. Because it lacks the information to make a decent prediction, the model learns its noise and makes prediction based on nonsensical metrics it has constructed.
It's like a gambler thinking, "Each time red came up twice in a row, green followed." He has learned a rule that was true for some past examples but is no more likely than random to hold true in the future. The same gambler, if he has lots of experience but little mathematical ability, will also have observed (rather than calculated) that black and red have about the same probability to come up. As for this rule, it is valid.
My model has learned the same kind of simple (and valid) rule: the best bet on Dividends, Revenue, and Stockholders' Equity evolution is that they remain the same year over year. It has just "improved" this somewhat, lowering the error it yields on past data at the expense of its real prediction ability (just like devising complex roulette rules beyond "black and red are roughly 50-50" is harmful to one's wallet).
In the coming weeks, I'm going to try to build a model capable of digesting operational and unstructured data, like the text of SEC filings and company websites.
What Has the Model Learned? (And How Useful Is It?)
Having a black box that makes predictions can be useful, but understanding its reasoning is even better. With a bit of math, we can peek inside.
How does it predict EPS?
EPS is arguably the most important metric for a shareholder to follow. Let's examine how the model predicts it6.
The table below shows the relative importance of each metric in predicting EPS. Negative numbers indicate that the factor has a negative influence on EPS. However, "influence" should be understood more as correlation. Since the model doesn't fully grasp how different numbers interact, it sometimes makes counterintuitive connections. For example, it considers Cost of Revenue a positive predictor of future EPS, simply because it tends to grow at about the same rate as Revenue itself. Therefore, please interpret the following tables with caution.
The numbers listed below are the gradients of EPS with regard to the different inputs. It gives an idea of the relative weights of the input metrics for EPS forecasting. They have been normalized so that the most important gradient is 1.
A few observations stand out: naturally, EPS and Diluted EPS are the best predictors of themselves, following the "persistence-is-the-best-default-model" rule. Beyond that, the results become harder to interpret. At first glance, they seem counterintuitive:
Free Cash Flow, CapEx, and Net Cash Provided By Operating Activities show positive gradients, while Operating Cash Flow doesn't—despite being closely related.
Net Income has the most negative gradient, while Operating Income shows a reasonably positive one.
Total Debt and Net Debt display opposite gradients across all years.
What's happening here is that the model is creating its own formulas and constructing metrics it needs to predict future EPS. It essentially arrives at the realization: "Hmm, future EPS seems to be related to the previous year's Operating Cash Flow minus previous year's CapEx – there might be something here. And interestingly, most of the time, the previous year's FCF is nearly identical to this subtraction. However, when there's a discrepancy, I'd rather trust FCF, so I'm going to assign a large positive weight to the previous year's FCF, a positive weight to CapEx, and a negative weight to Operating CF."7
It kind of hedges its bets.
What about Free Cash Flow?
The model is still hedging, but the results are far more explainable. It appears that the best predictor of Free Cash Flow (FCF) – aside from FCF itself – is the "net change from year n-3 to year n-1 in things that are, represent, or consume cash (and in that last case, are not strictly mandatory)." This includes investments, equity, retained earnings… essentially, everything that represents the company's ability to generate cash, even if it doesn't appear as FCF.
Interestingly, historical data is much more important here than it is for predicting EPS. EPS can be reliably predicted8 using only year n-1 data, or very nearly so. In contrast, when predicting FCF, data from years n-2 and n-3 combined have nearly the same significance as data from year n-1.
Now for Net Income, then I’ll go to bed
That's quite interesting. The model almost didn't hedge here. Two main takeaways:
Revenue shows a positive gradient in year n-1 but a negative one in year n-3. This suggests that, regarding revenue's impact on future Net Income, growth is more crucial than absolute figures.
The best negative predictor is dividends paid. So, the notion of board members milking a company dry before things go south might not be just a legend after all. On a personal note, I once received a 28% dividend yield from a Polish company before things took a turn for the worse (the company isn't defunct yet, but it's definitely past its prime). This illustrates that while predicting dividend payments might not be reliable, the payment itself can carry significant predictive weight.
Conclusion / Executive Summary
Forecasting company financials based on past financials does work, but only for metrics loosely connected to operations and board members sentiment.
Free Cash Flow, EPS and Net Income are quite predictable while Revenue and Dividends are not. This last observation suggests Financial Statements alone don’t say anything valuable about the mindset of board members and the operational state of a business.
Model hedging makes interpreting EPS prediction heuristics difficult. However the prediction heuristics for Cash Flow and Net Income are easier to grasp. You can sort and search the gradient tables to have a better sense of the influence of each financial statement metric on predictions.
Using only financial statements to predict EPS, the prior year's data provides 70% of the predictive value. This figure is down to 55% for Free Cash Flow and 57% for Net Income.
Further Work
It is surprising that a prediction model could work at all; I wasn’t expecting such significant results. Still, much more can be done, by including unstructured/text data. For example, I’ve always wondered if a business’ performance could be predicted from the content of its website.
Also, it would be a good idea to divide this analysis by sector. Banks and manufacturing companies don’t include the same items in their financial statements. And even though it doesn’t matter much (the model is able to say “Oh, cost of Revenue is 0, it’s more likely to be a bank than something else”), it will definitely help with interpreting gradients.
Unfortunately, I might need to sell one or two kidneys to afford enough compute to train a model for that...
Another post will follow soon, discussing the technical aspects of this project, including some interesting math and a novel (yet simple) architecture I developed to handle the data's sparsity.
["revenue", "netIncome", "eps", "epsdiluted", "freeCashFlow", "totalStockholdersEquity", "operatingCashFlow", "dividendsPaid"]
So the input dim is 303 + 1 for currency embedding (Currency symbol, as a string, is difficult to feed into a NN. Representing it as a float would likely be too compressive, so I chose a 2-dimensional embedding to represent it)
The data was normalized beforehand.
In fact, if we dive even deeper on the last three metrics, we can see something interesting: the performance of the average of persistence model + my model is better than both. What it means is that my model has learned something valuable from the past data, but is over-applying it to the future. By a factor of 2 approximately. Meaning that (Model + PeristenceModel*2)/3 is the best prediction model for these metrics.
This regression-to-the-mean narrative is the one that prevails in past data, as it is what the model has learned. The gradients of the predicted revenue wrt revenue for years n-2 and n-3 are positive, which means that the model has learned that past glory is a better indicator of future revenue than strong growth. The effect is so small that this nitpicking only deserves a footnote.
As this is a business post, I won’t go too deep into the math, but the general idea is that, as the model is nearly linear (it involves only two low-dimensional LeakyRELU non-linearities), its gradients are not too far from constant. So the local sensitivity of my model wrt its inputs is pretty much constant and I can average it over a batch to get a good idea of the input’s influence over the predicted output.
FCF ≈ Operating Cash Flow - CapEx, but with low reliability (the input data wasn’t always clean), and it looks like lambda * FCF - (1-lambda) * (Operating Cash Flow - CapEx) is a better proxy for FCF than FCF itself. The gradients reflect that, giving CapEx a positive influence.
If a 30% improvement over persistence model is deemed reliable