May 15, 2025 8:45:48 AM

Machine Learning Approach to Trading with Twitter-Based Sentiment Data

6:18

At Context Analytics, our S-Factor Feed is the cornerstone of our product suite—our flagship and longest-standing data stream. Over time, we’ve demonstrated its value across use cases, from isolating the impact of individual sentiment factors to studying combinations of signals using static thresholds or quintile bucketing.

These approaches are powerful to understand how sentiment factors behave, but the most common and practical use case for our S-Factor data feed is integrating it directly into machine learning-based trading models—specifically, as complementary features alongside traditional quantitative indicators like momentum, P/E ratios, trading volume, etc.

In this post, we walk through an example of that integration by combining common momentum features with the S-Factor sentiment data to train a machine learning model that predicts stock returns, builds a daily portfolio, and evaluates performance against the S&P 500 (SPY).

Modeling Framework: Momentum Meets Sentiment

For this research, we constructed a daily close-to-close equity strategy using an XGBoost classification model, a tree-based ensemble machine learning method that learns sequentially by correcting prior prediction errors.

Standard Price-Based Features

We began with classic momentum signals for each security:

1-Day Momentum
3-Day Momentum
5-Day Momentum

These serve as baseline indicators of price trends and are well-established inputs in systematic strategies.

Feature Engineering with the S-Factor Feed

The S-Factor Feed delivers 15 sentiment fields at the 15:40 ET timestamp, capturing 24-hour market sentiment. We incorporated both base features from this feed and engineered derivatives to maximize the signal extraction:

Base S-Factor Features Used:

raw-s, raw-s-mean, raw-volatility
s-score
sv-mean, sv-volatility, sv-score
s-dispersion

Engineered Sentiment Features:

24-Hour Deltas: s-score, s-volume, sv-score, s-buzz
3-Day Rolling Statistics: mean and standard deviation of s-score, s-volume, sv-score
3-Day Momentum: s-score
Percentile Ranks: s-score, sv-score, s-buzz
Composite Metrics:
- buzz * s-score
- s-score / volume (Sentiment Volume Ratio)
- s-score / dispersion (Dispersion-Adjusted Sentiment)

These features showcase the depth and flexibility available through feature engineering with our data. The granularity of the S-Factor feed provides ample opportunity to customize sentiment inputs to fit any strategy design, time horizon, or model architecture.

Model Training and Evaluation

We trained the XGBoost classifier to predict the next day’s return (close-to-close) as:

Positive (1) if return > 0
Non-Positive (0) otherwise

Training & Tuning

Training period: December 1, 2011 to January 1, 2022
Tuning method: Randomized Search Cross-Validation using scikit-learn, across various XGBoost hyper-parameter combinations

Evaluation Results

When inspecting the classification report on the test set, we found:

The model performs very well at identifying positive returns, with a recall of 0.93 for the positive class
However, it does a poor job on non-positive returns

This means the model is biased toward predicting positive returns, which makes the direct output unsuitable for selecting securities in a long-short portfolio construction. But since we have a very strong recall for the positive class, we can still leverage the model to identify likely winners.

So instead of treating the classification as binary, we leveraged the model’s predicted probabilities directly to build our portfolio.

Building the Portfolio

Using the trained model, we built a daily long-only portfolio as follows:

Universe: S&P 500 constituents
Prediction timing: Daily before market close
Selection: Top 100 stocks by predicted probability of positive return
Weighting: Proportional to the predicted probability

This approach plays to the model’s strength—identifying positive return opportunities—while avoiding weaker downside predictions. By selecting the top 100 stocks and weighting them by predicted probability, allocation reflects the model’s confidence. The resulting portfolio is well-diversified, with weights forming a near-normal distribution centered around 1%, and a maximum of just 1.2%.

Out-of-Sample Performance

Over the 3.5-year out-of-sample testing period, our model-driven portfolio demonstrated strong and consistent outperformance:

Cumulative Return:
+4% above the SPY benchmark, highlighting the model’s ability to generate excess returns even in a large-cap, competitive universe.
Risk-Adjusted Metrics:
- Sharpe Ratio: Improved, indicating better return per unit of total risk
- Sortino Ratio: Also improved, reflecting enhanced downside protection and return consistency during volatile periods

This outperformance underscores not only the predictive power of combining momentum with sentiment, but also the robustness of the model across varied market conditions. The daily rebalanced strategy maintained a stable edge, validating both the modeling framework and the quality of engineered sentiment features.

Feature Insights

Screenshot 2025-05-14 at 1.55.29 PM

Feature importance (via SHAP values and F Scores) revealed:

Momentum indicators as the strongest drivers
S-Factor fields like raw-volatility, s-buzz, raw-s-mean, and engineered deltas as additive signals

This confirms that sentiment data enhances traditional signals, providing incremental predictive power on common pricing factors.

Conclusion

This research underscores the versatility and strength of the S-Factor feed in trading applications. Whether through straightforward thresholds or deep integration into machine learning workflows, sentiment data from Context Analytics can amplify alpha and enable more informed, nuanced trading decisions.

Although this strategy operated on a daily cadence, the feature engineering techniques and modeling approach apply to intraday, weekly, or even monthly strategies.

Want to see how S-Factors can enhance your models?
Visit contextanalytics-ai.com to learn more or request a data trial.

machine learning, S-Factors, S-Score, predictive sentiment, Sentiment, SP500, trading, portfolio, Raw-s, SV-Score, Alpha, price movements, S-Dispersion

Machine Learning Approach to Trading with Twitter-Based Sentiment Data

Subscribe to Context Analytics Blog