GOODABLE SUCCESS STORY
Bringing good news to thousands of readers around the world using AI
The story
When Goodable approached us, they had developed a mobile news app that spread positive news to thousands of readers around the world. We helped develop a real-time news classification engine that scanned daily RSS feeds, new sites, and search crawlers to identify positive news and push it to readers on their platform.
Tech stack
About Goodable
Goodable is combatting mental health by providing readers around the world with positive news. Download the iOS and Android app from the App store today.
The challenge
Identifying positive news amidst a deluge of negative articles poses a significant challenge, particularly with a ratio of 21 negative articles to every positive one. Compounding this issue is the nuanced context in which positive news may appear negative upon closer examination.
To address this, advanced language processing technologies such as XLNet can be employed. XLNet models excel at deciphering intricate text relationships and discerning subtle sentiment cues. This enables more accurate categorization, especially in environments saturated with negative news.
XLNet models are advanced language processing models that utilize transformer architecture. They excel at understanding complex relationships within text and detecting subtle indicators of sentiment.
Product & process
How often does the algorithm run?
A tracking mechanism is implemented to monitor the precision, recall, and F1-score of model predictions over time. If these metrics fall below a predefined threshold, currently set at 85% for recall, the system automatically initiates model retraining to ensure optimal performance and model recovery.
What is the high level architecture proposed?
The entire workflow is automated using Apache Airflow, consisting of two pipelines:
Inference Pipeline (scheduled daily)
The pipeline orchestrates a comprehensive sequence of data engineering and machine learning tasks, all consolidated within a unified Airflow workflow. Set to activate daily, this pipeline executes a series of operations:
Aggregates news articles from various online platforms and stores them in a PostgreSQL database.
Retrieves the latest model artifacts from an S3 bucket for utilization in predictions.
Utilizes Apache Spark for preprocessing tasks such as data cleaning and transformation.
Performs predictions using the model and saves the results to the backend database for further analysis.
Model Training Pipeline (triggers based on performance decay)
This pipeline is designed to retrain the model when there is a decline in performance, with the threshold set at a reduction in recall value below 85%.
Batch vs Stream system
It is a batch job triggered daily. The approach involves scraping data daily, with the model making predictions on the daily dataset. This decision aligns with business needs, as news is collected throughout the day, culminating in batch predictions by day's end.
How do we monitor the AI after launch to ensure we did a “good” job?
Experts daily review the predictions made by our model, with a particular focus on articles it identifies as "positive" due to their high-risk implications. The comparison between the model's predictions and the expert-reviewed classifications allows us to monitor the recall metric, which emphasizes the importance of accurately identifying positive cases.
This recall score, reflecting the model's ability to minimize missed positive articles, is reported on a daily basis.
The model performed with a recall score of 89% on the production data
Results
+70%
Reduction in fake news, verified by the Goodable Content team
+94%
Accuracy achieved, verified by the Goodable Content team
200,000+
News articles and posts used for training