News Bbc json

BBC Latest News Dataset 2021

Source: bbc.co.uk  ยท  Format: json

CrawlFeeds is not affiliated with, endorsed by, or sponsored by Bbc. This dataset is independently collected from publicly available pages on bbc.co.uk. "Bbc" is a registered trademark used here solely to describe the source of the data.
Records
1.17 Million
Fields
13
Format
json
Last collected
โ€”

Description

This dataset contains more than 1 million news articles and extracted all the data points present in the news article page. BBC news articles first collected on the year 2021 and convered all the categories present in the BBC site.

This news dataset is ideal for text clasification, finding popular categories, NLP and other reasearch purposes.

Dataset is available in JSON format.

Data fields

tags
Keywords and topics associated with the article
title
News article headline
news_post_date
Date the article was published
raw_content
Original extracted article content
content
Cleaned article text for analysis
url
Original BBC news article URL
author
Article author or contributor
language
Language of the article
id
Unique article identifier
region
Geographic region related to the article
short_description
Brief summary of the news article
category
BBC news category
crawled_at
Timestamp when the article was collected

Use cases

Media coverage analysis

Analyze how news topics evolve across categories and regions over time.

News recommendation systems

Build personalized news recommendation engines using article metadata and categories.

Topic modeling

Discover emerging themes and recurring discussions across millions of news articles.

Named entity recognition

Extract people, organizations, locations, and events from large-scale news content.

News summarization

Train AI models to generate concise summaries from long-form news articles.

Media bias and framing research

Compare reporting patterns across categories and analyze narrative trends.

News search applications

Build semantic search engines using article content, tags, and metadata.

Historical news archive analysis

Study major events and reporting trends throughout 2021.

Language model training

Use high-quality editorial content for NLP, LLM fine-tuning, and text understanding tasks.

Knowledge graph construction

Create structured relationships between entities, events, categories, and locations.

Frequently asked questions

Which BBC news sections are included in the dataset?

The dataset covers all major news categories available on the BBC website during the 2021 collection period.

Does the dataset include the complete article text?

Yes. Both cleaned content and the original raw extracted content are included, allowing different processing approaches.

Can I analyze news trends throughout 2021?

Yes. Publication dates enable chronological analysis of events, topics, and reporting patterns during 2021.

What is the difference between raw_content and content?

The dataset provides both the original extracted article text and a cleaned version, allowing users to choose the format that best suits their analysis.

Is the dataset useful for journalism and media research?

Yes. Researchers can examine reporting patterns, category distribution, regional coverage, and editorial trends across more than one million articles.

$370.0
One-time payment ยท instant delivery
Download dataset View sample JSON ๐ŸŽ First-time buyer โ€” 20% off applied automatically

Dataset highlights

1.17M+ BBC news articles collected throughout 2021
Coverage across all major BBC news categories
Includes both cleaned article content and raw extracted content
Author, publication date, region, and language metadata included
Rich article metadata for news classification and content analysis
Ideal for journalism research, NLP, and AI training projects
Structured JSON format for scalable data processing
Supports media intelligence, trend analysis, and news aggregation applications
Our services

Data at any scale, any source

Pre-built datasets, custom scraping, specialist feeds, and image extraction โ€” all from one team.

From $175
Web data collection

Custom scraping for any website โ€” fields, volume, frequency, and format to your spec. Captchas, proxies, and infrastructure fully managed.

  • Any website or domain
  • One-time or recurring
  • CSV, JSON, Parquet
  • Scraper maintenance included
Submit a request
From $225
ImageHub โ€” image extraction

Bulk image downloads with custom folder structures and updated file paths in your records. Delivered via Google Drive or your dashboard.

  • Bulk image downloads
  • Custom folder structure
  • File paths updated in CSV
  • Google Drive delivery
Explore ImageHub
$175+
Standard scraping
Custom
Large-scale & enterprise
$225+
Image extraction
Why CrawlFeeds

Everything you need, none of the overhead

No scrapers to build. No proxies to manage. No infrastructure to maintain.

Instant delivery

Pre-built datasets download immediately after purchase. Custom projects scoped and delivered in days, not weeks.

Free sample first

Every dataset comes with a free sample download. Verify quality, structure, and field coverage before committing.

Fully custom

Need different fields, more volume, or a different source? We scope and build to your exact specification.

Recurring feeds

Weekly, monthly, or quarterly refresh. Delta or full-refresh delivery. Keeps your pipeline current without rebuilding.

Dedicated support

Live chat support on every order. Enterprise projects get a dedicated account manager, SLA, and NDA options.

Legal & compliant

Publicly available data only. No IP violation, no ToS grey areas. Clear sourcing framing on every dataset.

Customer stories

Trusted by data teams worldwide

โ˜…โ˜…โ˜…โ˜…โ˜…

"Saved us 100+ hours of manual data collection. Data quality is excellent and delivery was instant. Would recommend to any team that needs structured web data fast."

S
Sarah M.
Data Analyst, TechCorp
โ˜…โ˜…โ˜…โ˜…โ˜…

"The free sample let me verify quality before purchasing. Much cheaper than hiring a developer to build a scraper, and the data came clean and ready to use."

J
John D.
Marketing Manager, GrowthLabs
โ˜…โ˜…โ˜…โ˜…โ˜…

"Fresh data, reliable delivery, and a team that actually responds. The first-time buyer discount was a nice bonus. We've now used CrawlFeeds across three projects."

E
Emily R.
Founder, DataInsights

Ready to get your data?

Browse 500+ ready datasets or submit a custom request โ€” we'll scope and deliver to your exact requirements.