Generative AI in Investing: English Solved, Judgment Unsolved
A couple of years ago, it was reasonable to say that generative AI could help long-term fundamental investors without threatening to replace them. The logic was straightforward. GenAI was fluent in language but unreliable in math, and fundamental investing lives in the messy overlap between narrative, numbers, and judgment (see this NY Times article as a reference). That statement is no longer fully accurate, but it is also not obsolete. What has changed is that several of its most visible weaknesses have been engineered around, while the hardest parts of investing remain largely untouched.
If you asked a model in 2022 or early 2023 to build a discounted cash flow model, it really couldn’t do it: even with default template for a DCF structure, it would often misuse discount rates, apply growth assumptions inconsistently, or quietly violate basic accounting norms. A common example is free cash flow: Specialized models could explain conceptually why capex mattered, then proceed to subtract it twice. They would reconcile EBITDA to free cash flow in prose while producing numbers that did not make sense. These errors were especially dangerous because they were delivered confidently. The output looked like analysis, but it could not survive a spreadsheet audit.
The same was true for multi-step reasoning. A model could probably sometimes identify operating leverage as a key driver of margin expansion in one paragraph, then assume flat margins in its valuation inputs two paragraphs later. If you asked it to stress test assumptions, it would often change one variable while forgetting that others were mechanically linked. These issues reflected a core limitation of early models when asked to reason numerically over multiple steps.
What has improved since then is not raw mathematical intelligence, but rather architecture and workflow design. Serious users no longer ask GenAI to compute intrinsic value directly. Instead, the model now acts as an orchestrator. It pulls structured financials from trusted data sources, runs calculations in external tools or code environments, checks outputs against constraints, and then explains what changed and why. The math happens outside the model. The model supervises, interrogates, and contextualizes it.
This distinction matters in practice. Consider a real-world scenario like analyzing a capital intensive industrial company. A few years ago, asking GenAI to model maintenance versus growth capex was an invitation for confusion. Today, the model can help identify which disclosures matter, extract capex language from multiple filings, build scenarios where reinvestment assumptions differ, and then interpret how those differences affect free cash flow durability. The investor still decides which scenario is realistic, but the time to reach that decision collapses.
Another concrete improvement is temporal coherence. Earlier models struggled with long corporate histories. If a company promised disciplined capital allocation five years ago, pivoted to empire building three years later, and now claims renewed focus on returns, older models often treated each statement in isolation. Today, models can track language shifts across years of earnings calls, flag when “temporary” margin pressure becomes permanent rhetoric, and surface inconsistencies that a human analyst might miss simply due to fatigue. This does not produce insight by itself, but it sharpens investigative leverage.
Yet critical limitations remain, and they show up precisely where investing becomes difficult. Take causality as an example: a model can describe why margins fell, citing input costs, competition, or pricing pressure, but it does not truly understand which forces dominate or how they interact dynamically.
The same problem appears in regime shifts. Models trained on decades of data struggle when historical relationships break. When capital markets tighten, when governance norms shift, or when competitive moats erode faster than expected, GenAI tends to extrapolate rather than adapt. It can articulate risks, but it does not feel the asymmetry of outcomes. It does not know when a situation is uninvestable versus merely uncomfortable.
Most importantly, GenAI cannot bear responsibility. Concentrated fundamental investing often requires acting before the evidence is clean. Investors buy when numbers look ugly, narratives are hostile, and timing feels wrong. These decisions are not optimization problems. They are commitments made under uncertainty with real consequences. GenAI can generate scenarios, but it cannot choose one and live with it.
For GenAI to truly replace fundamental investors, several things would need to change. Models would need stable, auditable reasoning across long horizons, not just convincing outputs at a point in time. They would need feedback loops tied to realized investment outcomes. They would also need to internalize incentives, governance failures, and behavioral reactions in a way that generalizes across cycles. Finally, they would need institutional structures that allow them to take responsibility for decisions, something that is as legal and organizational as it is technical.
Will this happen? Over a long enough horizon, parts of it likely will. Investment processes are not sacred. But public market investing is shaped by human behavior, reflexivity, and narrative shifts that resist full formalization.
In the meantime, the impact of GenAI is already tangible. It dramatically lowers the cost of first-pass research. It accelerates variant perception by forcing assumptions into the open. It improves analytical hygiene by exposing inconsistencies that humans often rationalize away.
The right framing today is not about whether GenAI can invest. You should instead think whether investors who ignore GenAI will be outcompeted by those who use it as a disciplined amplifier. GenAI has raised the floor of analysis meaningfully. It has not lowered the ceiling of judgment. And in markets where the payoff goes to those willing to be decisively right when consensus is unsure, that distinction still defines the edge.
Responses