Overview
Every news article requires a topic label. Tagtaly uses a weighted keyword matching algorithm to classify articles into 23 detailed topics, which are then mapped to 4 main dashboard categories: Politics, Lifestyle, Entertainment, and Money. This document explains the classification methodology and how accuracy is maintained.
The Four Main Categories
All articles ultimately belong to one of four broad categories displayed on the Tagtaly dashboard:
- Politics: Government, elections, policy, international relations
- Lifestyle: Health, weather, crime, safety, general interest
- Entertainment: Celebrity, media, social trends, entertainment news
- Money: Finance, markets, business, economics
The 23 Detailed Topics
Within these 4 categories, Tagtaly recognizes 23 specific topics for finer-grained analysis:
| Category | Detailed Topics |
|---|---|
| Politics | UK Politics, US Politics, International Politics, Breaking News |
| Lifestyle | Health & Safety, Weather & Disasters, Crime & Justice, General Interest |
| Entertainment | Celebrity, Entertainment News, Social Media Trends |
| Money | Finance, Markets, Economics, Business |
Classification Algorithm
How It Works
Rather than using machine learning (which requires training data), Tagtaly uses weighted keyword matching. For each article headline and summary, the algorithm:
- Extracts key terms from headline and summary
- Compares against keyword lists for each of the 23 topics
- Weights matches based on keyword importance and location
- Assigns the topic with the highest score
Weighting System
Keywords have different weights depending on their category and significance:
Topic Keyword Examples
Each topic has 10-30 associated keywords:
Classification Accuracy
Validation Metrics
Tagtaly tested the keyword algorithm against 1,000+ manually labeled articles. Results:
- Overall accuracy: 84%
- Politics: 92% (clear keywords)
- Entertainment: 88% (distinctive language)
- Money: 81% (mixed with politics)
- Lifestyle: 76% (broad category, overlaps)
Edge Cases & Challenges
Certain articles are difficult to classify:
- Multi-topic stories: Article about "economic impact of new health policy" could be Politics or Money. Algorithm picks highest score.
- Ambiguous keywords: "Bank" could mean financial institution or riverbank. Context helps.
- Trending stories: Unexpected news (e.g., royal scandals) may not match existing keywords perfectly.
- Wordplay & slang: Headlines with metaphors or slang may confuse keyword matching.
Manual Override & Feedback
Human Review
While classification runs automatically, Tagtaly's editorial team can manually override topic assignments when needed. This is logged for algorithm improvement.
Continuous Improvement
When manual overrides occur frequently for a topic, new keywords are added to the algorithm. This keeps classification accurate as language and news topics evolve.
Performance & Integration
Speed
Classification runs after article collection and before sentiment analysis. Processing 500 articles takes less than 60 seconds, allowing real-time analysis.
Integration with Other Systems
Topic labels feed downstream:
- Dashboard grouping by category
- Virality detection (politics stories have different viral patterns than entertainment)
- Sentiment analysis (context-aware scoring)
- Trend tracking (topic surge detection)
Questions?
For questions about topic classification, contact admin@tagtaly.com.