Long-Form Article & Blog Content Dataset — 10M Records

Description

Dive into a massive collection of over 10 million articles from Medium.com, offering unparalleled insight into one of the web's most prominent publishing platforms. This comprehensive dataset features critical metadata including article titles, subtitles, raw content, polished content, author details, and publication timestamps. Gain deep understanding with engagement metrics like total claps and comments count, alongside valuable specifics such as reading time, language, tags, and word counts. Ideal for content strategy, market research, NLP applications, and trend analysis, this data empowers you to monitor digital narratives and analyze reader engagement. Leverage this rich, structured information to identify emerging topics and uncover valuable competitive intelligence.

Highlights

10,086,154 Medium article records.
Covers Medium.com articles, 24 data points.
Freshly scraped Medium article content.
Includes full content, claps, comments, tags.

Sample Data

Preview of available data:

Url	Tags	Image	Title	Author	Source	Content	Is Free	Post Id	Uniq Id	Version	Language	Sub Title	Author Url	Created At	Scraped At	Modified At	Raw Content	Total Claps	Words Count	Imported At	Published At	Reading Time	Comments Count
https://0-o.medium.com/meri...	Funny,Funny Story,Redmond,Q...	https://miro.medium.com/max...	Meritocracy	0-o	medium.com	Meritocracy\n\nMy feet kick...	\N	38c4ee5102a5	d98acb22-061c-5180-8805-95b...	1772643118844	en	Within minutes his smile ar...	https://medium.com/@0-o	2020-11-17 03:07:15.302000000	15/08/2025 22:15:02	2021-12-16 15:50:40.642000000	<section>\n<p>My feet kicke...	43	\N	2026-03-04 16:51:58	2020-11-17 03:22:52.171000000	3 min	0
https://000kmmolloy.medium....	About Me,10 Things About Me...	https://miro.medium.com/max...	About me — KmMolloy	KmMolloy	medium.com	About me — KmMolloy\n\nwrit...	\N	4e62384ca694	33114930-b808-5c98-9f35-666...	1772654574271	en	write — read — mentor write...	https://medium.com/@000kmmo...	2023-01-01 17:35:25.359000000	13/08/2025 00:36:20	2024-01-22 17:44:29.003000000	<section>\n<h3>write — read...	181	\N	2026-03-04 20:02:54	2023-01-02 14:27:03.493000000	1 min	5
https://001-dharmendra.medi...	Bqml,Bigquery Ml,Python	\N	Unleashing Machine Learning...	dharmendra mishra	medium.com	Unleashing Machine Learning...	\N	3ea7fe8cd765	76d203db-0af2-59d9-a14a-59c...	1772655248072	en	BigQuery, a fully managed d...	https://medium.com/@001-dha...	2023-10-25 14:09:28.970000000	11/08/2025 17:03:19	2023-10-25 14:30:05.039000000	<section>\n<p>BigQuery, a f...	2	\N	2026-03-04 20:14:08	2023-10-25 14:30:04.113000000	2 min	0
https://00110011.medium.com...	Bootcamp Experience,Mental ...	\N	A goodbye to my old self…	syd	medium.com	A goodbye to my old self…\n...	\N	4260cd5506a3	f49dbfbd-0f8a-5384-987b-660...	1772649000702	en	So it is quite a humbling e...	https://medium.com/@00110011	2021-01-29 18:02:30.436000000	19/08/2025 01:46:02	2021-12-29 00:25:36.063000000	<section>\n<p>So it is quit...	66	\N	2026-03-04 18:30:00	2021-01-29 19:40:32.521000000	3 min	1
https://00110011.medium.com...	JavaScript,Promises,Fetch,C...	https://miro.medium.com/max...	Do you promise to use async...	syd	medium.com	Do you promise to use async...	\N	81c8a006f91c	d0013c42-c88f-5088-a041-dca...	1772655126561	en	This week in Flatiron bootc...	https://medium.com/@00110011	2021-02-28 02:37:54.662000000	24/08/2025 21:49:05	2022-01-01 11:59:20.509000000	<section>\n<p>This week in ...	11	\N	2026-03-04 20:12:06	2021-02-28 04:36:29.447000000	3 min	0
https://00110011.medium.com...	Typescript,JavaScript,Front...	https://miro.medium.com/max...	Introducing TypeScript…	syd	medium.com	Introducing TypeScript…\n\n...	\N	c3d34f64fc02	c0a53e98-e81f-53ad-aaae-40f...	1772654667241	en	If you are a ride or die Ja...	https://medium.com/@00110011	2021-07-03 02:20:03.287000000	22/08/2025 16:04:17	2022-01-06 08:09:52.723000000	<section>\n<p>If you are a ...	3	\N	2026-03-04 20:04:27	2021-07-03 03:40:21.513000000	3 min	0
https://00110011.medium.com...	React,Props,Flatiron School...	https://miro.medium.com/max...	Props to you, React!	syd	medium.com	Props to you, React!\n\nIn ...	\N	2293ea82e9b5	8cabd6e8-7c97-5621-81e5-cd1...	1774551069655	en	In another attempt to teach...	https://medium.com/@00110011	2021-03-15 22:30:23.220000000	27/08/2025 12:59:09	2022-01-07 15:04:04.903000000	<section>\n<p>In another at...	100	\N	2026-03-26 18:51:09	2021-03-16 03:32:38.338000000	3 min	0
https://00110011.medium.com...	Hooks,React,React Hook,Reac...	https://miro.medium.com/max...	Redux and Hooks… friends or...	syd	medium.com	Redux and Hooks… friends or...	\N	7b17975e320a	13658140-ffae-53fd-9123-075...	1772654978450	en	While learning React and Re...	https://medium.com/@00110011	2021-06-08 01:31:42.797000000	20/08/2025 22:11:13	2022-01-06 09:56:37.966000000	<section>\n<p>While learnin...	153	\N	2026-03-04 20:09:38	2021-06-26 17:36:08.059000000	3 min	0

Data Fields

This dataset includes the following data points:

Url

Source

Title

Sub Title

Author

Author Url

Is Free

Post Id

Image

Reading Time

Created At

Published At

Modified At

Comments Count

Total Claps

Language

Why This Data

This articles dataset from Medium provides comprehensive market intelligence and competitive insights. Perfect for:

Market Research: Understand market trends and customer preferences
Competitive Analysis: Compare pricing, products, and strategies
Business Intelligence: Make data-driven decisions
Price Monitoring: Track price changes and optimize your pricing

Use Cases

This dataset is perfect for various applications:

Consumer Demand & Pricing Insights: Analyze article content and engagement (claps, comments) to identify trending product features, emerging consumer needs, and sentiment towards product categories for optimizing e-commerce pricing and offerings.

Industry & Competitor Intelligence: Monitor article content, tags, and author activity to identify emerging industry trends, track competitor strategies, and gauge public perception of brands and technologies.

Domain-Specific NLP Model Training: Utilize the vast Raw Content, Tags, and Comments Count as a rich dataset to train and validate custom AI/ML models for advanced sentiment analysis, topic modeling, and entity extraction.

Future Product Feature & Innovation Discovery: Analyze Tags, Title, and highly engaged articles (claps, comments) over time to identify discussions around unmet user needs, innovative concepts, and future technology shifts guiding product roadmaps.

High-Impact Content Strategy Optimization: Identify successful topics, keywords, and content formats by analyzing top-performing articles based on Total Claps, Comments Count, Tags, and Reading Time to inform a data-driven SEO and content marketing plan.

Get Access to This Dataset

Start using this dataset today. Available in CSV, JSON, and Excel formats with flexible access options.

Request Access Contact Sales

Frequently Asked Questions

The dataset includes a `scraped_at` timestamp for each record, indicating that data is regularly collected to ensure freshness. Exact update cycles can vary, but this field provides insight into the latest scrape time for individual articles.

The data is typically available in structured formats like JSON or CSV, suitable for programmatic access or bulk downloads. Specific delivery methods depend on the provider's integration options and user requirements.

Yes, you can usually customize data requests by selecting from a wide range of available fields like `title`, `author`, `tags`, `reading_time`, and both `raw_content` and `content` to match your specific needs.

This dataset is ideal for content analysis, trend tracking, competitive research, and author performance evaluation. Researchers, content strategists, data scientists, and marketers can greatly benefit from the rich article metadata and content.

Yes, samples are typically available to help users evaluate the data quality and structure prior to a full subscription or purchase. Support channels often include documentation, email, or dedicated technical assistance.

The dataset includes over 10 million records from `medium.com` with a `language` data point, implying broad coverage across various languages. However, specific details on completeness (e.g., every single article ever published) may require further inquiry.

You can effectively filter the dataset using various data points such as `author`, `tags`, `language`, `published_at` date ranges, `total_claps`, or `reading_time` to narrow down and analyze specific subsets of articles.

Long-Form Article & Blog Content Dataset — 10M Records with Engagement Metrics

Description

Highlights

Sample Data

Data Fields

Why This Data

Use Cases

Get Access to This Dataset

Frequently Asked Questions

180K+ AI Medium Articles Dataset

Long-Form Article & Blog Content Dataset — 10M Records with Engagement Metrics

Description

Highlights

Sample Data

Data Fields

Why This Data

Use Cases

Get Access to This Dataset

Frequently Asked Questions

How frequently is the Medium article data updated, and how fresh is the information?

In what format is the Medium articles dataset provided, and what are the typical delivery methods?

Are there options to customize the Medium article data, such as selecting specific fields or requesting additional information?

What are the primary use cases for this Medium articles dataset, and who would benefit most from it?

Can I get a sample of the Medium articles dataset before making a commitment, and what kind of support is available?

Does this dataset cover all articles published on Medium.com, and are there any language restrictions?

How can I filter or search through the 10 million+ Medium articles in the dataset effectively?

Browse Categories

More from medium.com

180K+ AI Medium Articles Dataset