Bringing good news to thousands of readers around the world.
.png)
When Nomad approached us, they had developed a mobile news app that spread positive news to thousands of readers around the world. We helped develop a real-time news classification engine that scanned daily RSS feeds, new sites, and search crawlers to identify positive news and push it to readers on their platform.
Nomad is combatting mental health by providing readers around the world with positive news. Download the iOS and Android app from the App store today.
The challenge
Identifying positive news is challenging in environments dominated by negative coverage, where nuanced context can obscure sentiment. Advanced language models like XLNet help address this by accurately interpreting complex text relationships and subtle sentiment cues, enabling more reliable classification even amid overwhelming negative news.
.png)
XLNet models are advanced language processing models that utilize transformer architecture. They excel at understanding complex relationships within text and detecting subtle indicators of sentiment.
How often does the algorithm run?
Identifying positive news is challenging in environments dominated by negative coverage, where nuanced context can obscure sentiment. Advanced language models like XLNet help address this by accurately interpreting complex text relationships and subtle sentiment cues, enabling more reliable classification even amid overwhelming negative news.
What is the high level architecture proposed?
The pipeline orchestrates a comprehensive sequence of data engineering and machine learning tasks, all consolidated within a unified Airflow workflow. Set to activate daily, this pipeline executes a series of operations:
Inference Pipeline (scheduled daily)
The pipeline orchestrates a comprehensive sequence of data engineering and machine learning tasks, all consolidated within a unified Airflow workflow. Set to activate daily, this pipeline executes a series of operations:
1. Aggregates news articles from various sources and stores them in a PostgreSQL database.
2. Retrieves the latest model artifacts from an S3 bucket for utilization in predictions.
3. Utilizes Apache Spark for preprocessing tasks such as data cleaning and transformation.
4. Performs predictions using the model and saves the results to the backend database.
Model Training Pipeline (triggers based on performance decay)
This pipeline is designed to retrain the model when there is a decline in performance, with the threshold set at a reduction in recall value below 85%.
.png)
Batch vs. Stream system
This process runs as a daily batch job, where data is scraped throughout the day and the model generates predictions on the complete daily dataset. This approach aligns with business needs by capturing all news published during the day and producing consolidated predictions at day’s end.
How do we monitor the AI after launch?
Experts review the model’s predictions daily, focusing on high-risk articles labeled as positive to monitor recall, which measures how well the model avoids missing positive cases. This recall score is tracked daily, and the model achieved an 89% recall on production data.
Results
The solution delivered measurable results by significantly improving content quality and reliability on the platform. The real-time news classification engine reduced fake news by over 70% while achieving high classification accuracy. These outcomes were driven by training the model on a large volume of curated news articles and posts, ensuring more trustworthy and relevant content for readers.
+70%
Reduction in fake news, verified by the Nomad Content team
+94%
Accuracy achieved, verified by the Nomad Content team
200,000+
News articles and posts used for training
Want to build something?
Let’s talk about what you’re working on next and see how we can help.
No pitches, no hard sell. Just a real conversation.
.png)

.png)
.png)
.png)