← ahead.market blog
Structuring News Flow for Sentiment and Impact Analysis

Introduction

To make informed investment decisions, it is crucial to stay updated on the latest economic trends, market movements, individual companies, industries, and more. Political events, company announcements, regulatory changes, and other significant news must all be considered.

Large banks and hedge funds employ numerous people to process news streams and respond appropriately to events. Smaller companies and independent traders lack the resources to cover such a vast amount of news, forcing them to spend a significant portion of their time on this task.

Everyone invests a lot of time in this process, but with limited event coverage since it's impossible to review everything. The ability of large language models (LLMs) to effectively process textual information opens up new possibilities for significantly enhancing this process.

In this article, we discuss how we tackled this challenge for a private hedge fund.

Data Sources

When collecting data and preparing it for subsequent analysis, it is crucial to differentiate original sources from reprints, interpretations, opinions, etc. Data from original sources should carry more weight during subsequent processing and analysis.

Major News Providers
Primary data sources with broad coverage, high quality, and fast delivery.
Examples: Bloomberg, Reuters, etc.
News Wire Services
Publish official press releases, providing primary source information.
Examples: BusinessWire, PR Newswire, GlobeNewswire, etc.
Specialized Financial Publications
Sources of analytical information, research reports, and opinions from influential figures. Examples: Wall Street Journal, The Economist, Financial Times, etc.
Financial Media Portals and Aggregators
Offer editorial content and materials, gather data from various sources, and provide collaborative analytics and opinions.
Examples: Yahoo Finance, Barron's, The Motley Fool, Investing.com, etc.
General News Aggregators
Collect news from multiple sources and present it in one place for easy access.
Examples: Google News, Flipboard, etc.
General News Services
Broad coverage of news across various sectors, often including finance.
Examples: CNN, BBC, ABC News, etc.
Social Media
Platforms where news spreads quickly and investors can gauge public sentiment.
Examples: Twitter, LinkedIn, Reddit, etc.

Information Collection Methods

API
A primary method for working with paid services, which can use either push or pull models.

For time critical data processing, push might be preferred, depending on the service's implementation.
RSS Feeds
A widespread method supported by most websites, allowing for quick reception of headlines and descriptions.

It's very versatile and easy to implement, though it doesn't provide access to the full content.
Social Networks APIs
All social networks offer methods for working through programmatic interfaces.

These often have strict rate limits, which prevent gathering large amounts of information but are excellent for targeted monitoring of specific news streams.
Web Crawling
The least reliable and most time-consuming method. It is crucial to consider terms and conditions, as not all websites allow content scraping.

Despite its challenges and limitations, this method is indispensable for achieving broad coverage, especially when finding rare information not covered by major news agencies.

This is particularly important when dealing with news about small-cap companies.

Main Objectives of Information Collection and Processing

In the project described in this article, the client's primary objectives were:

Automation of News Sentiment Analysis for Individual Companies
While there are existing solutions for sentiment analysis, using LLMs for this task offers new possibilities compared to traditional NLP algorithms.

Beyond the language tone (negative/neutral/positive), LLMs' semantic analysis capabilities allow for determining the news's impact, significance, and which specific business segment it affects.
Automation of News Flow Analysis by Specific Segments
Many specialized resources allow for effective filtering of news and events by broad categories, such as market, economy, and company. However, there's no convenient means for monitoring news by more specific segments.

General news aggregators (like Google) offer broader filtering capabilities, but this leads to the need to sift through vast amounts of irrelevant information for investment purposes.
Development of an Interactive Analytical Tool
To aid analysts in tracking major news events through custom semantic filters.

Results of news flow analysis can be used for:

Developing independent trading strategies

Enhancing risk management systems for both automated and manual trading

Accelerating analysts, traders, and portfolio managers' work in finding new ideas and updating their understanding of the current situation

Technological Implementation Details

The news flow analysis process consists of the following main steps:

Information Search and Retrieval

Pre-processing of Individual News

Analysis of News Groups

Deduplication, Filtering, Identifying Key Events, Determining Interconnections

Processing User Requests

Information Search and Retrieval

This step utilizes methods discussed in the "Information Collection Methods" section.

Key tasks include:

1. Prioritizing Information for Processing
Resource constraints make it impractical and economically inefficient to process all available information. Prioritization is essential due to the time factor, as urgent news must be processed first.

Effective strategies include:

Scoring potential significance using LLMs based on headlines and brief descriptions, which is faster than processing the entire context and can save resources.
Deduplication through semantic search (if the original news has already been processed, interpretations/reprints can be postponed).
Using market information to manage processing priorities. Significant price changes should prioritize information processing for those companies.

2. Discovery of Additional, Unique Information
When potentially interesting events are detected, LLMs can generate additional search engine queries to expand the analysis context and improve processing results.

Pre-processing Individual News

Determine the type
of received information

e.g., official company press release, analysts' opinion, lawsuit, political news, economic news, etc.

Identify main topics and keywords

Highlight significant content

dates, numerical indicators, key entities

(companies, people, institutions, etc.)

Generate embeddings for subsequent semantic search based on the highlights

It's crucial to use appropriate embedding methods, as various models have different tasks. In this case, multiple embedding models were used: one for clustering and another for semantic search.

The information is stored in vector storage. Choosing the right vector storage implementation is vital, considering the need to store a large amount of metadata alongside the embeddings.

Effective handling of metadata is essential for search and filtering tasks.

Based on our experience, many popular vector databases struggle with this task, and performance drops significantly with relatively small data volumes (~50GB). While we don't publicly share our benchmarks, we are open to discussing our experiences upon request.

Analyzing News Groups

Topic Modeling

Necessary for structuring collected information, which is crucial for subsequent analysis and identifying
the most actively discussed topics.

There are many algorithms for this task, such as LDA (Latent Dirichlet
Allocation) and LDA + LLM implementations.

Our experience shows that iterative clustering of embeddings followed by LLM analysis for topic formation
yields the best results.

This approach is much more efficient and produces higher-quality topic modeling
results.

Deduplication

Essential for performance, as there's no point in processing the same information multiple times.

However, it's crucial not to lose information; different articles may present the same facts but draw opposite conclusions, which is important for subsequent analysis. Tracking publication timestamps helps monitor the original source of information.

Deduplicated information should retain all sources so that analysts can trace everything back. The number
and quality of information sources also serve as an additional criterion for the event's significance, analogous
to PageRank used by search engines

Updating the Financial Knowledge Graph

This graph models the interconnections of all collected information. It enhances the algorithmic efficiency of search and information processing and allows LLMs to automatically seek necessary information for event analysis and potential impacts.

Processing User Requests

Request Analysis

Using LLMs to determine the request type, normalize it from a free-text form, and identify key entities (e.g., whether the request pertains to a specific company or multiple companies).

Analysis Plan Creation

Identify necessary information, comparisons, and conclusions, and how the result will be presented. The analysis plan is formed based on the request type using LLMs.

Searching Relevant Information and Filling the Context

Generate queries to the knowledge graph based on the analysis plan using LLMs. The LLM is provided with context information about the types of entities and relations in the knowledge graph.

Information Analysis

Since the volume of information extracted for analysis can exceed the LLM model's context size, an iterative process of analysis is necessary. This should be considered during plan formation. The iterative process can be two-pass: compressing information to get a broad view first, then conducting a detailed analysis of the most significant parts

Generating a User Response

The LLM generates a response for the user based on the analysis results. The response format and content should align with the user's goals identified during the analysis phase.

Key Results

To meet the objective, we needed to cover approximately 4,000 different tickers traded on NYSE and NASDAQ. The system processes information from over 11,000 unique sources to achieve this.

The primary economic benefit of implementing such a system is the accelerated work of analysts and portfolio managers. The cost of implementing and maintaining this system is significantly lower than analysts' salaries, and its implementation drastically improves their work speed and quality. Unfortunately, we lack sufficient data for a statistically significant study, so we rely solely on subjective assessments here.

Schedule My Demo

Must-Have Sources for LLM-Based Financial Analysis

Discover the most reliable data sources for leveraging Large Language Models (LLMs) in financial analysis. Learn how to harness AI to structure and analyze news for informed investment decisions.

Structuring News Flow for Sentiment and Impact Analysis

Discover how Ahead.Market uses AI to structure and analyze news flow for sentiment and impact, enhancing market responsiveness and investment decisions. Learn about the integration of LLMs for dynamic financial analysis.

Using AI to Transform Earnings Report Analysis

Dive into how Ahead.Market utilizes Large Language Models (LLMs) to revolutionize earnings report analysis, enhancing prediction accuracy and investment decision-making. Discover the impact of AI on financial analytics.

AI in Investment: Our Strategic Bet on the Future of Finance

Explore how Ahead.Market leverages AI in financial strategies to surpass traditional investment methods. This article delves into the role of AI in enhancing decision-making, structuring unstructured data, and creating innovative tools for investors.

← ahead.market blogStructuring News Flow for Sentiment and Impact Analysis

← ahead.market blog
Structuring News Flow for Sentiment and Impact Analysis