Word2Prices: embedding central bank communications for inflation prediction

BIS Working Papers  |  No 1253  | 
26 March 2025

Summary

Focus

Machine learning tools that transform texts into numbers, capturing their meaning and sentiment, expand the range of economic analyses. Central bank communications are particularly important: their speeches, press conferences and other texts contain information about the economy. But to evaluate this information for prediction tasks, machine learning tools need to decode texts in real time. In this paper, we assess a lightweight machine learning model called Word2Vec, re-trained at each quarter. The task is simple yet challenging: do central bank speeches help to predict inflation out of sample?

Contribution

The prominence of large language models has put a spotlight on the use of text as a source of data in economics. But these models are often not transparent about their knowledge cutoff date, eroding the ability to test their out-of-sample performance. In addition, large language models need significant resources to re-train from scratch. In contrast, a less powerful model, Word2Vec, can be re-trained anew. We compare the inflation forecasts of the Word2Vec, estimated with European Central Bank press conference remarks, with past values of inflation.

Findings

Texts processed by Word2Vec improve euro area core inflation forecasts multiple quarters ahead. This indicates Word2Vec indeed captures relevant information. Large language models also improve forecasts, but they likely have look-ahead bias. Commonly used traditional techniques in economics, eg counting specific words or the overall sentiment of each text, obtain a more limited improvement. We also show this performance is not explained by the ECB's forecasts, indicating Word2Vec extracts novel information.


Abstract

Word embeddings are vectors of real numbers associated with words, designed to capture semantic and syntactic similarity between the words in a corpus of text. We estimate the word embeddings of the European Central Bank's introductory statements at monetary policy press conferences by using a simple natural language processing model (Word2Vec), only based on the information and model parameters available as of each press conference. We show that a measure based on such embeddings contributes to improve core inflation forecasts multiple quarters ahead. Other common textual analysis techniques, such as dictionary-based metrics or sentiment metrics do not obtain the same results. The information contained in the embeddings remains valuable for out-of-sample forecasting even after controlling for the central bank inflation forecasts, which are an important input for the introductory statements.

JEL classification: E31, E37, E58

Keywords: embeddings, inflation, forecasting, central bank texts