International Journal of Engineering and Management Research

  • Year: 2024
  • Volume: 14
  • Issue: 5

Sea MNF vs. LDA: Unveiling the Power of Short Text Mining in Financial Markets

1Tencent Inc., China

2Uber Technologies Inc., USA

3China Academy of Art, China

Abstract

The objective of this study is to construct a time series forecasting framework that incorporates textual features. By leveraging text mining techniques, we extract thematic and sentiment information from a vast array of news headlines related to the future. These text-derived features are then utilized as exogenous variables for prediction purposes. This paper addresses two critical questions: why headlines over full articles and why futures news over gold news. News headlines are considered summaries of the full articles, encapsulating most of the essential information. Additionally, our approach aligns with the work of Li et al. [1,2,3,4,5] which opted for news headlines to extract topics and sentiment information. The choice of futures news over gold news is justified by the scarcity of crude oil news and the established complex correlations between futures prices such as gold, natural gas, and crude oil. Research by Sujit & Kumar (2011) suggests that gold price fluctuations can impact the WTI index, and the dependence of different countries on crude oil can influence their currency exchange rates, thereby affecting the purchasing power of gold. Villar & Joutz (2006) indicate that a 20% temporary shock to WTI has a 5% contemporaneous impact on natural gas prices.[6,7,8,9]

We construct a daily topic strength index by following the SeaMNF approach, which allows us to calculate the probability of each headline belonging to each topic. The optimal number of topics is selected based on Pointwise Mutual Information (PMI) scores. Given the vast number of news articles published daily by media outlets, we compute the average weight of news as the topic strength for the day. The topic strength index for day t is defined as the sum of the weights of the first topic across all news articles published on that day.[10,11,12,13,14,15]

Keywords

PSO-SVR Hybrid Model, Machine Learning, Uncertainty Sentiment, Empirical Asset Pricing