GOODABLE SUCCESS STORY

Bringing good news to thousands of readers around the world using AI

The story

When Goodable approached us, they had developed a mobile news app that spread positive news to thousands of readers around the world. We helped develop a real-time news classification engine that scanned daily RSS feeds, new sites, and search crawlers to identify positive news and push it to readers on their platform.

Tech stack

About Goodable

Goodable is combatting mental health by providing readers around the world with positive news. Download the iOS and Android app from the App store today.

The challenge

Identifying positive news amidst a deluge of negative articles poses a significant  challenge, particularly with a ratio of 21 negative articles to every positive one.  Compounding this issue is the nuanced context in which positive news may appear negative upon closer examination.

To address this, advanced language processing technologies such as XLNet can be employed. XLNet models excel at deciphering intricate text relationships and discerning  subtle sentiment cues. This enables more accurate categorization, especially in  environments saturated with negative news.

What are XLNet models?

XLNet models are advanced language processing models that utilize transformer  architecture. They excel at understanding complex relationships within text and detecting  subtle indicators of sentiment.

Product & process

How often does the algorithm run?

A tracking mechanism is implemented to monitor the precision, recall, and F1-score of model predictions over time. If these metrics fall below a predefined threshold, currently set at 85% for recall, the system automatically initiates model retraining to ensure optimal performance and model recovery.

What is the high level architecture proposed?

The entire workflow is automated using Apache Airflow, consisting of two pipelines:

Inference Pipeline (scheduled daily)

The pipeline orchestrates a comprehensive sequence of data engineering and machine learning tasks, all consolidated within a unified Airflow workflow. Set to activate daily, this pipeline executes a series of operations:

Aggregates news articles from various online platforms and stores them in a PostgreSQL database.

Retrieves the latest model artifacts from an S3 bucket for utilization in  predictions.

Utilizes Apache Spark for preprocessing tasks such as data cleaning and  transformation.

Performs predictions using the model and saves the results to the backend  database for further analysis.

Model Training Pipeline (triggers based on performance decay)

This pipeline is designed to retrain the model when there is a decline in performance,  with the threshold set at a reduction in recall value below 85%.

Batch vs Stream system

It is a batch job triggered daily. The approach involves scraping data daily, with the model making predictions on the daily dataset. This decision aligns with business needs, as news is collected throughout the day, culminating in batch predictions by day's end.

How do we monitor the AI after launch to ensure we did a “good” job?

Experts daily review the predictions made by our model, with a particular focus on articles it identifies as "positive" due to their high-risk implications. The comparison between the model's predictions and the expert-reviewed  classifications allows us to monitor the recall metric, which emphasizes the importance of accurately identifying positive cases.

This recall score, reflecting the model's ability to minimize missed positive articles, is reported on a daily basis.

The model performed with a recall score of 89% on the production data

Results

+70%

Reduction in fake news, verified by the Goodable Content team

+94%

Accuracy achieved, verified by the Goodable Content team

200,000+

News articles and posts used for training

Interested in learning more?

Opening Motion Calendar...

Congrats, you may be CDAP grant eligible!
Book a consultation with our founder on another tab.
Facing issues? Click here to try again.

Oops! Something went wrong while submitting the form.