News Bbc json

BBC Latest News Dataset 2021

Source: bbc.co.uk · Format: json

CrawlFeeds is not affiliated with, endorsed by, or sponsored by Bbc. This dataset is independently collected from publicly available pages on bbc.co.uk. "Bbc" is a registered trademark used here solely to describe the source of the data.

Records

1.17 Million

Fields

Format

json

Last collected

—

Description

This dataset contains more than 1 million news articles and extracted all the data points present in the news article page. BBC news articles first collected on the year 2021 and convered all the categories present in the BBC site.

This news dataset is ideal for text clasification, finding popular categories, NLP and other reasearch purposes.

Dataset is available in JSON format.

Data fields

category

BBC news category

crawled_at

Timestamp when the article was collected

Use cases

Media coverage analysis

Analyze how news topics evolve across categories and regions over time.

News recommendation systems

Build personalized news recommendation engines using article metadata and categories.

Topic modeling

Discover emerging themes and recurring discussions across millions of news articles.

Named entity recognition

Extract people, organizations, locations, and events from large-scale news content.

News summarization

Train AI models to generate concise summaries from long-form news articles.

Media bias and framing research

Compare reporting patterns across categories and analyze narrative trends.

News search applications

Build semantic search engines using article content, tags, and metadata.

Historical news archive analysis

Study major events and reporting trends throughout 2021.

Language model training

Use high-quality editorial content for NLP, LLM fine-tuning, and text understanding tasks.

Knowledge graph construction

Create structured relationships between entities, events, categories, and locations.

Frequently asked questions

Which BBC news sections are included in the dataset?

The dataset covers all major news categories available on the BBC website during the 2021 collection period.

Does the dataset include the complete article text?

Yes. Both cleaned content and the original raw extracted content are included, allowing different processing approaches.

Can I analyze news trends throughout 2021?

Yes. Publication dates enable chronological analysis of events, topics, and reporting patterns during 2021.

What is the difference between raw_content and content?

The dataset provides both the original extracted article text and a cleaned version, allowing users to choose the format that best suits their analysis.

Is the dataset useful for journalism and media research?

Yes. Researchers can examine reporting patterns, category distribution, regional coverage, and editorial trends across more than one million articles.

BBC Latest News Dataset 2021

Data at any scale, any source

Everything you need, none of the overhead

Trusted by data teams worldwide

Ready to get your data?