Publications /
Policy Brief

Back
From NLP to Hype and Financial Bubbles: Integrating News Attention with Bubble Detection Models
Authors
June 9, 2026

In 2017, eight scientists from the Google research team published in the journal Advances in Neural Info Processing Systems the remarkable article “Attention is all you need,” which introduced a Transformer neural network architecture. The paper has been cited over 173,000 times and ranks among the top 100 most cited papers of the 21st century. It builds on the attention principle introduced in 2014 by Bahdanau, Cho and Turing Award winner Bengio, who proposed neural machine translation by jointly learning to align and translate. This transformer approach has become the main architecture for a wide variety of AI tasks, including large language models. In machine learning, “attention” refers to a mechanism that allows models to focus on specific parts of the input data during the learning process and to determine the relative importance of each component within a sequence.

Turning to financial economics, financial news—whether in terms of volume, unusual frequency, or sentiment (positive versus negative tone)—has long attracted the attention of researchers seeking to forecast market dynamics—“buy on rumors, sell on news.” Financial bubbles, however, remain among the most challenging phenomena to model and trade. Traditional models relying solely on price dynamics often fail to detect bubbles in real time, a key objective for stock picking and portfolio selection. Advances in natural language processing (NLP) now enable researchers to quantify market attention and sentiment using financial news and social media activity. 

This paper builds on recent research on sentiment in financial markets and integrates these insights into quantitative bubble detection models derived from the Log-Periodic Power Law (LPPL) literature, while incorporating a Hype Index that measures disproportionate news attention at a given moment, in order to obtain a hype-adjusted view of speculative dynamics. 

Within this framework, sentiment and news intensity modify bubble scores derived from price dynamics. The resulting Hyped Log-Periodic Power Law (HLPPL) model improves the identification of emerging bubbles and enables the detection of negative bubbles, corresponding to temporarily overvalued assets. The approach further highlights the importance of the choice of numéraire with respect to which prices are expressed (e.g., gold versus the dollar), emphasizing that bubbles must be assessed relative to a chosen reference asset. 

Empirical illustrations across equities and cryptocurrencies show how media attention and narrative amplification interact with price dynamics during speculative episodes. Taken together, these results suggest that incorporating information flows and market narratives can significantly improve the early detection and interpretation of financial bubbles. 

Introduction

Financial bubbles have long attracted the attention of economists, policymakers, and market participants because of their profound impact on financial stability and economic cycles. Historical episodes include the Dutch Tulip Mania of the seventeenth century and the South Sea Bubble in the UK—known as the world’s first financial crash—in which Isaac Newton himself is said to have lost the equivalent of 40 million pounds in today’s currency when South Sea Company stock fell by about 80% from its peak of around £1,000 in August 1720. More recent examples, including the dot-com boom of the late 1990s, the housing market bubble preceding the 2008 global financial crisis, and the rapid rise of technology and artificial intelligence-related equities, highlight both the recurring nature of speculative episodes and the persistent difficulty of recognizing bubbles before they burst.

A central challenge in the study of bubbles is that speculative dynamics often appear indistinguishable from strong fundamental growth while they are unfolding. Traditional approaches to bubble detection, therefore, rely heavily on price dynamics, attempting to identify statistical signatures such as super-exponential growth or accelerating volatility patterns. Among these approaches, the LPPL framework, introduced by Sornette et al. (1996), aims to predict the “critical date” at which the bubble will burst. This framework has been particularly influential in modeling the characteristic price patterns observed during bubble regimes.

However, price trajectories alone may not fully capture the mechanisms that drive speculative dynamics. Financial markets are strongly influenced by information flows, narratives, and investor attention. News coverage, social media discussions, and market commentary can amplify optimistic expectations and reinforce positive feedback loops among investors. As a result, speculative episodes are often accompanied by rapid increases in media attention and narrative intensity.

Recent advances in NLP make it possible to quantify these information flows systematically. By analyzing large corpora of financial news and textual data, researchers can construct measures of market sentiment, narrative intensity, and media attention. These signals provide a new source of information that complements traditional price-based indicators.

The article builds on three strands of recent research that combine natural language processing techniques and investor sentiment with quantitative finance, namely a Hype Index and a quantitative bubble detection model derived from the LPPL framework. Combining these approaches provides a framework for integrating price dynamics with systematic measures of market attention and sentiment extracted from textual data. 

The central idea is that news attention can amplify the self-reinforcing feedback loops that characterize financial bubbles; in another asset class, bank runs arise from a similar mechanism, as described, for instance, in the model proposed by the Nobel Prize winners Diamond and Dybvig. When rising prices attract media coverage and investor interest, this attention can in turn reinforce further price increases, creating a narrative-driven amplification mechanism that accelerates speculative dynamics. Hence, incorporating measures of sentiment and hype into bubble diagnostics provides a more comprehensive view of market behavior than price dynamics alone. 

In this sense, the analysis connects the literature on speculative bubbles and critical phenomena in financial markets with the recent advances in machine learning and natural language processing applied to financial text data. 

Market Sentiment and Volatility Prediction

In the paper titled “A Sentiment Analysis Approach for the Prediction of Market Volatility,” Deveikyte et al. (2022) explore the predictive power of sentiment analysis on financial market behavior, focusing in particular on the FTSE 100. More precisely, the authors investigate whether sentiment extracted from financial news headlines and Twitter posts can forecast next-day market returns and volatility. The authors employ NLP techniques, including sentiment scoring and Latent Dirichlet Allocation (LDA) for topic modeling, when processing textual data. In NLP, LDA is a generative statistical model in which documents are represented as random mixtures of a small number of latent topics, and each topic is characterized by a probability distribution over words. In this case, these features are then used as inputs into a logistic regression classifier to predict the direction of market volatility. 

Cao et al. (2025) further underscores the importance of information flows and market narratives in shaping asset price dynamics. They introduce a Hype-Adjusted Probability Measure, which incorporates sentiment extracted from financial news into traditional probabilistic frameworks used in financial modeling, where “hype” captures the effect of excessive news circulating in both general and financial media.

The central idea is that investor beliefs are influenced not only by price movements and fundamental information, but also by the tone and intensity of media coverage surrounding an asset. Using NLP techniques, financial news can be analyzed to quantify positive and negative narratives associated with a company or sector. Incorporating these signals into probabilistic frameworks allows models to capture how narratives and sentiment influence expectations about future price dynamics. 

A classical definition of the daily sentiment score was proposed by Gabrovsek et al. (2016) as:

Where Nd  represents the number of positive, negative, or neutral news headlines at time d.

While sentiment measures capture the direction or tone of news coverage, they do not fully account for the intensity of media attention. An asset may receive overwhelmingly positive sentiment but relatively little coverage, or it may attract extremely high levels of attention regardless of sentiment. To address this distinction, Cao et al. (2025) introduce the Hype Index, a measure designed to quantify disproportionate media attention.

The Hype Index compares the frequency with which an asset is mentioned in financial news with a baseline reflecting its economic scale, typically measured through market capitalization. When an asset receives significantly more attention than would be expected based on its size, it can be described as “hyped.” Conversely, assets receiving relatively little attention may be considered under-hyped.

This concept provides a practical way to measure the imbalance between media attention and the firm’s fundamentals. Empirical evidence suggests that periods of extreme hype often coincide with rapid price appreciation, elevated volatility, and heightened speculative activity—features commonly associated with the formation of financial bubbles. 

Consequently, for bubble detection, the Hype Index provides a complementary signal to price-based diagnostics, allowing practitioners to incorporate information about market attention and narrative amplification alongside traditional price dynamics.

Financial Bubbles Detection

A large body of financial economics literature has investigated bubbles, focusing on their definition, identification, and attempts to estimate their expected burst dates. One strand of this literature analyzes financial bubbles using only price trajectories. Among the most influential approaches is the LPPL framework developed by Didier Sornette (1995) and collaborators. LPPL models capture the accelerating growth and oscillatory behavior often observed during speculative bubble phases, where prices exhibit super-exponential dynamics (as opposed to the geometric Brownian motion underlying standard option pricing models), accompanied by increasingly frequent fluctuations.

The LPPL framework models the price trajectory as approaching a critical time  representing the theoretical end of the bubble regime. As the system approaches this critical point, price dynamics accelerate while oscillations become more frequent. The log-price in the LPPL model is expressed as follows:

where 𝐴 is the baseline price level, 𝐵 captures the super-exponential growth rate, 𝐶 determines the amplitude of log-periodic oscillations, 𝑚 ∈ (0 , 1 ) is the critical exponent, 𝜔 is the log-periodic frequency, 𝜙  is the phase parameter, and   denotes the critical time corresponding to the theoretical termination of the bubble regime.

In the “volatility-confined” LPPL formulation (Lin, Ren, and Sornette, 2014), the price trajectory follows this super-exponential structure while the residual component remains mean-reverting. This property improves the stability of calibration and allows researchers to estimate bubble dynamics using rolling windows without incorporating future information, which would be undesirable.

Despite their success, price-based models face an important limitation: they rely solely on observed price dynamics and therefore do not incorporate information about media attention and investor sentiment, which often drive speculative behavior. In modern financial markets, information flows and narrative amplification can play a central role in reinforcing positive feedback loops among investors. 

As a result, price trajectories alone may be insufficient to detect emerging bubbles in their early stages. Incorporating measures of news attention and sentiment can therefore provide valuable complementary signals, helping to distinguish between price dynamics driven by fundamentals and those amplified by market narratives. 

HLPPL Model

To integrate price dynamics with information flows derived from textual data, we introduce the HLPPL framework. The objective is to combine the traditional LPPL bubble representation—based on a log power law for the log stock price, growing faster than the geometric Brownian motion underlying standard option pricing models—with signals capturing media attention (hype) and sentiment extracted from financial news using NLP.

The starting point of the approach is the concept of a bubble score, which measures the deviation between the observed market price and the price implied by the LPPL model fit. Formally, the bubble score is defined as:

where lnp(t) denotes the observed log market price at time t,  represents the LPPL model-fitted price, and  captures the deviation between the two. This quantity measures the extent to which market prices diverge from the trajectory predicted by the bubble model.

Bubble scores provide a convenient way to quantify the strength of bubble dynamics in asset prices. When the observed price exceeds the LPPL fitted price, the bubble score is positive:

Conversely, when the observed price falls below the LPPL fitted trajectory, the bubble score becomes negative:

While positive scores correspond to classical bubble behavior, negative bubble scores capture situations where asset prices fall significantly below the trajectory implied by the LPPL structure. These cases can be interpreted as “negative bubbles,” corresponding to temporarily undervalued assets or accelerated downward price dynamics. 

However, price-based bubble scores alone may still fail to capture the influence of information flows and market narratives. To address this limitation, the HLPPL framework incorporates two additional signals derived from textual data:

​Ht : the Hype Index, measuring the intensity of media attention,

St​ : the news sentiment signal, capturing the tone of news coverage.

The adjusted bubble scores are defined as:

where ​ and ​ represent weights controlling the influence of hype and sentiment signals. The Hype Index  captures the level of news attention associated with the asset, while the sentiment measure  reflects the polarity of media coverage.

In this formulation, hype and sentiment act as adjustment terms that modify the bubble score derived from price dynamics alone. When strong news attention and positive sentiment coincide with accelerating price dynamics, the adjusted bubble score increases, reinforcing the bubble signal. Conversely, when price movements occur without corresponding narrative amplification, the additional signals act as corrective buffers, reducing the likelihood of false bubble detection. 

By integrating LPPL-based diagnostics with NLP-derived measures of market attention and sentiment, the HLPPL framework provides a more comprehensive approach to detecting speculative dynamics in financial markets.

 

Empirical Examples

Empirical illustrations demonstrate how the integrated framework can identify bubble signals across multiple markets.

Semiconductor equities provide a useful example due to the rapid growth associated with artificial intelligence infrastructure. The SOXX index shows periods of accelerated growth when price dynamics alone suggest potential bubble behavior.

Bubble Thresholds

Bubble scores alone do not automatically generate actionable signals. To convert scores into bubble signals, threshold values must be learned from historical data.

These thresholds are not constant. Instead, they depend on the asset, the time period, and the path of market dynamics. Machine learning techniques can be used to estimate optimal thresholds based on historical bubble episodes.

This adaptive approach allows the model to distinguish between normal periods of growth and true bubble dynamics.

Bubble Threshold Comparison: SPX Index vs. ORCL US Equity

News and Sentiment

Financial bubbles are not driven solely by price dynamics. Narratives, investor attention, and media coverage often play a critical role in amplifying speculative behavior. The rapid diffusion of information through financial news and social media platforms creates feedback loops between price movements and market narratives. As a result, integrating news signals into quantitative models can significantly improve the detection of bubble dynamics.

In this framework, we incorporate news intensity and sentiment measures into the bubble detection process. These signals are extracted using natural language processing techniques applied to financial news articles and other textual data sources. Two types of signals are particularly relevant:

  • News Attention (Hype) – measured by the intensity with which a company or asset is mentioned in financial news relative to a baseline level of attention.

  • News sentiment – measured by the average polarity of news coverage, capturing whether the tone of reporting is positive, neutral, or negative.

The integration of these signals allows the bubble detection model to distinguish between price movements driven by fundamentals and those amplified by excessive market attention.

To illustrate the effect of news and sentiment signals, we examine the case of Oracle Corporation (ORCL). Figure X presents the news signal dynamics alongside the residual structure of the LPPL model both before and after incorporating news and sentiment adjustments.

The top plot shows the evolution of news signals, including publication counts and aggregated sentiment scores. Periods of elevated attention correspond to spikes in news coverage, which often coincide with major corporate announcements or market narratives.

The bottom subplots compare two residual structures:

  • Top panel: residuals from the LPPL model using price dynamics alone;

  • Bottom panel: residuals after adjusting news and sentiment signals.

The inclusion of news and sentiment information introduces corrective buffers to the bubble detection process. When price movements are supported by strong news-driven narratives, the adjusted residuals reflect this amplification. Conversely, when price fluctuations occur without corresponding news support, the model dampens the bubble score.

This adjustment improves the robustness of bubble detection by reducing false positives that arise from short-term volatility or purely technical price movements.

More broadly, the integration of news signals highlights the role of information flows as a catalyst for speculative dynamics. Excessive media attention can accelerate the positive feedback loops that characterize financial bubbles, while negative news shocks may contribute to the rapid unwinding of speculative positions.

Change of Numéraire and Relative Bubbles

An important conceptual insight is that bubbles are inherently relative phenomena. Asset prices are always measured relative to a chosen numéraire, such as a currency, a bond, or a market index.

Changing the numéraire can reveal new perspectives on bubble dynamics. For example, analyzing the price of NVIDIA relative to gold or relative to the S&P 500 can produce different interpretations of whether a bubble exists. This perspective highlights that bubble detection should account for relative valuation rather than relying solely on absolute price levels.

Conclusion

This paper presents an integrated framework linking price-based bubble diagnostics with measures of market attention derived from textual data. Building on the LPPL framework, we introduce the HLPPL approach, which adjusts traditional bubble scores using signals capturing news intensity and sentiment extracted from financial news.

The central contribution of the framework is to combine quantitative price dynamics with information flows and market narratives. While LPPL-based diagnostics provide a structural description of bubble-like price trajectories, the addition of hype and sentiment signals allows the model to incorporate the role of media attention and narrative amplification in speculative episodes. This integration improves the robustness of bubble detection by strengthening signals when price dynamics and narratives reinforce one another, while reducing false positives when price movements occur without corresponding information flows.

The empirical illustrations presented in this paper demonstrate how the framework can be applied across different markets, including technology equities and cryptocurrencies. In particular, examples involving semiconductor equities, AI-related companies, and digital assets highlight how speculative dynamics often emerge alongside rapid increases in news coverage and narrative intensity.

More broadly, the analysis emphasizes that financial bubbles are not purely price-driven phenomena. They are shaped by feedback loops between prices, investor expectations, and information flows. Advances in natural language processing now make it possible to quantify these information flows systematically, providing new tools for studying speculative dynamics in financial markets.

The paper also highlights an important conceptual insight: bubbles are inherently relative phenomena, as asset prices are always evaluated relative to a chosen numéraire. Examining price dynamics under alternative numéraires can therefore reveal additional perspectives on speculative behavior and valuation dynamics.

Future research may further extend this framework to other asset classes (such as interest rates) and information sources. Potential directions include applications to commodities, macroeconomic indicators, and alternative data such as social media or prediction markets. As advances in artificial intelligence continue to improve the analysis of large textual datasets, integrating NLP-based signals with financial modeling may provide valuable new approaches for understanding market dynamics and identifying bubbles in the making.

References

Cao, Z., Geman, H., (2025) ‘A hype-adjusted probability measure for NLP stock return forecasting’. Frontiers in. Artificial. Intelligence

Cao, Z., Geman, H., et al. ‘Identifying and Quantifying Financial Bubbles with the Hyped Log‑Periodic Power Law Model’. Working paper.

Deveikyte, J.,Geman, H., Piccarii, C., and Provetti, A (2022) ‘ A sentiment analysis approach to the prediction of market volatility ‘, Frontiers in Artificial Intelligence’

Lin, L., Ren, R., Sornette, D. (2014). ‘The volatility‑confined LPPL model ‘, International Review of Financial Analysis

Vaswani,A et al (2017) Attention is all you need’, Adv in Neural Info Processing Systems

RELATED CONTENT

  • Authors
    June 9, 2026
    In 2017, eight scientists from the Google research team published in the journal Advances in Neural Info Processing Systems the remarkable article “Attention is all you need,” which introduced a Transformer neural network architecture. The paper has been cited over 173,000 times and ranks among the top 100 most cited papers of the 21st century. It builds on the attention principle introduced in 2014 by Bahdanau, Cho and Turing Award winner Bengio, who proposed neural machine transla ...
  • Authors
    Karim El Mokri
    Idriss El Abbassi
    June 8, 2026
    Bientôt disponible en vente sur amazon et Livremoi. Le monde du travail est engagé dans un processus de mutation sans précédent. La quatrième révolution industrielle, portée par l’intelligence artificielle (IA) et la robotisation, a été le point de commencement d’une ère nouvelle où les repères économiques et sociaux se redéfinissent à une vitesse inédite. Ces technologies suscitent un engouement légitime et offrent des perspectives prometteuses, mais requièrent une vigilance a ...
  • Authors
    February 11, 2026
    The U.S.–China technological rivalry has become a central axis of global economic and geopolitical competition. While the United States continues to lead in frontier innovation—most notably in advanced semiconductors and artificial intelligence (AI)—China has consolidated strengths in large-scale implementation, manufacturing capacity, and control over critical segments of global supply chains. These advantages are especially visible in clean energy technologies and in the processin ...
  • Authors
    January 30, 2026
    The 2026 World Economic Forum (WEF) Annual Meeting took place in an environment of elevated economic uncertainty and structural risk repricing. According to the Global Risks Report (GRR) 2026, geoeconomic confrontation and economic downturn rank among the most severe near-term risks, while inflation-related risks and economic volatility have risen sharply in perceived severity compared with the previous edition. Notably, 50% of respondents to the Global Risks Perception Survey ...
  • January 29, 2026
    Le Mali, le Burkina Faso et le Niger, réunis au sein de l’Alliance des États du Sahel (AES), affirment l’ambition de bâtir une souveraineté nationale qui leur permettrait de s’émanciper de la domination et des influences extérieures et de se doter d’une liberté d’action dans les choix de développement politique et économique. Cependant, cette ambition se heurte à de nombreuses contraintes économiques et sécuritaires.La souveraineté ne peut se construire sans une base économique soli ...
  • Authors
    January 27, 2026
    This paper revisits Big Push industrialization theory in the context of open economies deeply integrated into global value chains (GVCs). While classical Big Push models emphasize demand complementarities and coordination failures in largely closed economies, many middle-income countries now industrialize through foreign-owned, import-intensive production networks. We develop an extended Big Push framework that incorporates GVC integration and import leakage, and show how these feat ...
  • Authors
    Ahmed Ouhnini
    December 5, 2025
    L’« uberisation », terme né du nom de l’entreprise américaine Uber au début des années 2010, désigne initialement un modèle économique fondé sur la mise en relation directe entre offre et demande via des plateformes numériques. Rapidement popularisé, le concept s’est élargi, avec Airbnb dans l’hébergement, puis à une multitude d’autres secteurs : livraison de repas, services à domicile, commerce en ligne, voire des métiers traditionnellement régulés ou corporatistes. Ce néologisme t ...
  • Authors
    Ahmed Ouhnini
    December 5, 2025
    “Uberization,” a term derived from the American company Uber in the early 2010s, initially refers to an economic model based on directly matching supply and demand through digital platforms. Rapidly popularized, the concept has expanded with Airbnb in accommodation, and then to a wide range of other sectors: food delivery, home services, e-commerce, and even traditionally regulated or corporatist professions. This neologism thus reflects a transformation of economic and social relat ...
  • Authors
    December 3, 2025
     Global GDP growth has proven resilient in 2025, despite the shocks caused by the trade policies implemented by United States President Donald Trump in the first year after his return to office. The gloomy projections offered by multilateral and private institutions in the first quarter of 2025 have given way to revised levels mostly in the 2.5% to 3% range for the year. ...