Overview

Every news article requires a topic label. Tagtaly uses a weighted keyword matching algorithm to classify articles into 23 detailed topics, which are then mapped to 4 main dashboard categories: Politics, Lifestyle, Entertainment, and Money. This document explains the classification methodology and how accuracy is maintained.

The Four Main Categories

All articles ultimately belong to one of four broad categories displayed on the Tagtaly dashboard:

Dashboard Categories:
  • Politics: Government, elections, policy, international relations
  • Lifestyle: Health, weather, crime, safety, general interest
  • Entertainment: Celebrity, media, social trends, entertainment news
  • Money: Finance, markets, business, economics

The 23 Detailed Topics

Within these 4 categories, Tagtaly recognizes 23 specific topics for finer-grained analysis:

Category Detailed Topics
Politics UK Politics, US Politics, International Politics, Breaking News
Lifestyle Health & Safety, Weather & Disasters, Crime & Justice, General Interest
Entertainment Celebrity, Entertainment News, Social Media Trends
Money Finance, Markets, Economics, Business

Classification Algorithm

How It Works

Rather than using machine learning (which requires training data), Tagtaly uses weighted keyword matching. For each article headline and summary, the algorithm:

  1. Extracts key terms from headline and summary
  2. Compares against keyword lists for each of the 23 topics
  3. Weights matches based on keyword importance and location
  4. Assigns the topic with the highest score

Weighting System

Keywords have different weights depending on their category and significance:

KEYWORD_WEIGHTS = { 'local_politics': 15, # UK politics keywords 'global_keywords': 10, # International relevance 'engagement_keywords': 3 # Trending/viral markers } # Example: # Headline: "UK Parliament debates NHS funding" # - "Parliament" = 15 points (local politics) # - "NHS" = 15 points (health keyword) # - "debate" = 3 points (engagement) # - Total = 33 points → "UK Politics" topic

Topic Keyword Examples

Each topic has 10-30 associated keywords:

Politics: parliament, election, minister, policy, legislation, government, congress, senate, vote, bill
Health & Safety: hospital, disease, pandemic, health, vaccine, medical, doctor, illness, injury, safety
Crime & Justice: police, arrest, court, trial, crime, murder, theft, justice, law, sentence
Entertainment: actor, celebrity, film, movie, music, concert, award, show, entertainment, star

Classification Accuracy

Validation Metrics

Tagtaly tested the keyword algorithm against 1,000+ manually labeled articles. Results:

Accuracy Scores:
  • Overall accuracy: 84%
  • Politics: 92% (clear keywords)
  • Entertainment: 88% (distinctive language)
  • Money: 81% (mixed with politics)
  • Lifestyle: 76% (broad category, overlaps)

Edge Cases & Challenges

Certain articles are difficult to classify:

Manual Override & Feedback

Human Review

While classification runs automatically, Tagtaly's editorial team can manually override topic assignments when needed. This is logged for algorithm improvement.

Continuous Improvement

When manual overrides occur frequently for a topic, new keywords are added to the algorithm. This keeps classification accurate as language and news topics evolve.

Performance & Integration

Speed

Classification runs after article collection and before sentiment analysis. Processing 500 articles takes less than 60 seconds, allowing real-time analysis.

Integration with Other Systems

Topic labels feed downstream:

Questions?

For questions about topic classification, contact admin@tagtaly.com.

Next: Sentiment Analysis Methodology

Learn how we measure emotional tone in news

Sentiment Analysis →