Using intraday Social Sentiment monitoring can inform traders of any changes in market conditions...
Machine Learning Approach to Trading with Twitter-Based Sentiment Data
At Context Analytics, our S-Factor Feed is the cornerstone of our product suite—our flagship and longest-standing data stream. Over time, we’ve demonstrated its value across use cases, from isolating the impact of individual sentiment factors to studying combinations of signals using static thresholds or quintile bucketing.
These approaches are powerful to understand how sentiment factors behave, but the most common and practical use case for our S-Factor data feed is integrating it directly into machine learning-based trading models—specifically, as complementary features alongside traditional quantitative indicators like momentum, P/E ratios, trading volume, etc.
In this post, we walk through an example of that integration by combining common momentum features with the S-Factor sentiment data to train a machine learning model that predicts stock returns, builds a daily portfolio, and evaluates performance against the S&P 500 (SPY).
Modeling Framework: Momentum Meets Sentiment
For this research, we constructed a daily close-to-close equity strategy using an XGBoost classification model, a tree-based ensemble machine learning method that learns sequentially by correcting prior prediction errors.
Standard Price-Based Features
We began with classic momentum signals for each security:
- 1-Day Momentum
- 3-Day Momentum
- 5-Day Momentum
These serve as baseline indicators of price trends and are well-established inputs in systematic strategies.
Feature Engineering with the S-Factor Feed
The S-Factor Feed delivers 15 sentiment fields at the 15:40 ET timestamp, capturing 24-hour market sentiment. We incorporated both base features from this feed and engineered derivatives to maximize the signal extraction:
Base S-Factor Features Used:
- raw-s, raw-s-mean, raw-volatility
- s-score
- sv-mean, sv-volatility, sv-score
- s-dispersion
Engineered Sentiment Features:
- 24-Hour Deltas: s-score, s-volume, sv-score, s-buzz
- 3-Day Rolling Statistics: mean and standard deviation of s-score, s-volume, sv-score
- 3-Day Momentum: s-score
- Percentile Ranks: s-score, sv-score, s-buzz
- Composite Metrics:
- buzz * s-score
- s-score / volume (Sentiment Volume Ratio)
- s-score / dispersion (Dispersion-Adjusted Sentiment)
These features showcase the depth and flexibility available through feature engineering with our data. The granularity of the S-Factor feed provides ample opportunity to customize sentiment inputs to fit any strategy design, time horizon, or model architecture.
Model Training and Evaluation
We trained the XGBoost classifier to predict the next day’s return (close-to-close) as:
- Positive (1) if return > 0
- Non-Positive (0) otherwise
Training & Tuning
- Training period: December 1, 2011 to January 1, 2022
- Tuning method: Randomized Search Cross-Validation using scikit-learn, across various XGBoost hyper-parameter combinations
Evaluation Results
When inspecting the classification report on the test set, we found:
- The model performs very well at identifying positive returns, with a recall of 0.93 for the positive class
- However, it does a poor job on non-positive returns
This means the model is biased toward predicting positive returns, which makes the direct output unsuitable for selecting securities in a long-short portfolio construction. But since we have a very strong recall for the positive class, we can still leverage the model to identify likely winners.
So instead of treating the classification as binary, we leveraged the model’s predicted probabilities directly to build our portfolio.
Building the Portfolio
Using the trained model, we built a daily long-only portfolio as follows:
- Universe: S&P 500 constituents
- Prediction timing: Daily before market close
- Selection: Top 100 stocks by predicted probability of positive return
- Weighting: Proportional to the predicted probability
This approach plays to the model’s strength—identifying positive return opportunities—while avoiding weaker downside predictions. By selecting the top 100 stocks and weighting them by predicted probability, allocation reflects the model’s confidence. The resulting portfolio is well-diversified, with weights forming a near-normal distribution centered around 1%, and a maximum of just 1.2%.
Out-of-Sample Performance
Over the 3.5-year out-of-sample testing period, our model-driven portfolio demonstrated strong and consistent outperformance:
- Cumulative Return:
+4% above the SPY benchmark, highlighting the model’s ability to generate excess returns even in a large-cap, competitive universe. - Risk-Adjusted Metrics:
- Sharpe Ratio: Improved, indicating better return per unit of total risk
- Sortino Ratio: Also improved, reflecting enhanced downside protection and return consistency during volatile periods
This outperformance underscores not only the predictive power of combining momentum with sentiment, but also the robustness of the model across varied market conditions. The daily rebalanced strategy maintained a stable edge, validating both the modeling framework and the quality of engineered sentiment features.
Feature Insights
Feature importance (via SHAP values and F Scores) revealed:
- Momentum indicators as the strongest drivers
- S-Factor fields like raw-volatility, s-buzz, raw-s-mean, and engineered deltas as additive signals
This confirms that sentiment data enhances traditional signals, providing incremental predictive power on common pricing factors.
Conclusion
This research underscores the versatility and strength of the S-Factor feed in trading applications. Whether through straightforward thresholds or deep integration into machine learning workflows, sentiment data from Context Analytics can amplify alpha and enable more informed, nuanced trading decisions.
Although this strategy operated on a daily cadence, the feature engineering techniques and modeling approach apply to intraday, weekly, or even monthly strategies.
Want to see how S-Factors can enhance your models?
Visit contextanalytics-ai.com to learn more or request a data trial.