Description

Dive into a massive collection of over 10 million articles from Medium.com, offering unparalleled insight into one of the web's most prominent publishing platforms. This comprehensive dataset features critical metadata including article titles, subtitles, raw content, polished content, author details, and publication timestamps. Gain deep understanding with engagement metrics like total claps and comments count, alongside valuable specifics such as reading time, language, tags, and word counts. Ideal for content strategy, market research, NLP applications, and trend analysis, this data empowers you to monitor digital narratives and analyze reader engagement. Leverage this rich, structured information to identify emerging topics and uncover valuable competitive intelligence.

Highlights

  • 10,086,154 Medium article records.

  • Covers Medium.com articles, 24 data points.

  • Freshly scraped Medium article content.

  • Includes full content, claps, comments, tags.

Sample Data

Preview of available data:

Url Tags Image Title Author Source Content Is Free Post Id Uniq Id Version Language Sub Title Author Url Created At Scraped At Modified At Raw Content Total Claps Words Count Imported At Published At Reading Time Comments Count
https://0-o.medium.com/meri... Funny,Funny Story,Redmond,Q... https://miro.medium.com/max... Meritocracy 0-o medium.com Meritocracy\n\nMy feet kick... \N 38c4ee5102a5 d98acb22-061c-5180-8805-95b... 1772643118844 en Within minutes his smile ar... https://medium.com/@0-o 2020-11-17 03:07:15.302000000 15/08/2025 22:15:02 2021-12-16 15:50:40.642000000 <section>\n<p>My feet kicke... 43 \N 2026-03-04 16:51:58 2020-11-17 03:22:52.171000000 3 min 0
https://000kmmolloy.medium.... About Me,10 Things About Me... https://miro.medium.com/max... About me — KmMolloy KmMolloy medium.com About me — KmMolloy\n\nwrit... \N 4e62384ca694 33114930-b808-5c98-9f35-666... 1772654574271 en write — read — mentor write... https://medium.com/@000kmmo... 2023-01-01 17:35:25.359000000 13/08/2025 00:36:20 2024-01-22 17:44:29.003000000 <section>\n<h3>write — read... 181 \N 2026-03-04 20:02:54 2023-01-02 14:27:03.493000000 1 min 5
https://001-dharmendra.medi... Bqml,Bigquery Ml,Python \N Unleashing Machine Learning... dharmendra mishra medium.com Unleashing Machine Learning... \N 3ea7fe8cd765 76d203db-0af2-59d9-a14a-59c... 1772655248072 en BigQuery, a fully managed d... https://medium.com/@001-dha... 2023-10-25 14:09:28.970000000 11/08/2025 17:03:19 2023-10-25 14:30:05.039000000 <section>\n<p>BigQuery, a f... 2 \N 2026-03-04 20:14:08 2023-10-25 14:30:04.113000000 2 min 0
https://00110011.medium.com... Bootcamp Experience,Mental ... \N A goodbye to my old self… syd medium.com A goodbye to my old self…\n... \N 4260cd5506a3 f49dbfbd-0f8a-5384-987b-660... 1772649000702 en So it is quite a humbling e... https://medium.com/@00110011 2021-01-29 18:02:30.436000000 19/08/2025 01:46:02 2021-12-29 00:25:36.063000000 <section>\n<p>So it is quit... 66 \N 2026-03-04 18:30:00 2021-01-29 19:40:32.521000000 3 min 1
https://00110011.medium.com... JavaScript,Promises,Fetch,C... https://miro.medium.com/max... Do you promise to use async... syd medium.com Do you promise to use async... \N 81c8a006f91c d0013c42-c88f-5088-a041-dca... 1772655126561 en This week in Flatiron bootc... https://medium.com/@00110011 2021-02-28 02:37:54.662000000 24/08/2025 21:49:05 2022-01-01 11:59:20.509000000 <section>\n<p>This week in ... 11 \N 2026-03-04 20:12:06 2021-02-28 04:36:29.447000000 3 min 0
https://00110011.medium.com... Typescript,JavaScript,Front... https://miro.medium.com/max... Introducing TypeScript… syd medium.com Introducing TypeScript…\n\n... \N c3d34f64fc02 c0a53e98-e81f-53ad-aaae-40f... 1772654667241 en If you are a ride or die Ja... https://medium.com/@00110011 2021-07-03 02:20:03.287000000 22/08/2025 16:04:17 2022-01-06 08:09:52.723000000 <section>\n<p>If you are a ... 3 \N 2026-03-04 20:04:27 2021-07-03 03:40:21.513000000 3 min 0
https://00110011.medium.com... React,Props,Flatiron School... https://miro.medium.com/max... Props to you, React! syd medium.com Props to you, React!\n\nIn ... \N 2293ea82e9b5 8cabd6e8-7c97-5621-81e5-cd1... 1774551069655 en In another attempt to teach... https://medium.com/@00110011 2021-03-15 22:30:23.220000000 27/08/2025 12:59:09 2022-01-07 15:04:04.903000000 <section>\n<p>In another at... 100 \N 2026-03-26 18:51:09 2021-03-16 03:32:38.338000000 3 min 0
https://00110011.medium.com... Hooks,React,React Hook,Reac... https://miro.medium.com/max... Redux and Hooks… friends or... syd medium.com Redux and Hooks… friends or... \N 7b17975e320a 13658140-ffae-53fd-9123-075... 1772654978450 en While learning React and Re... https://medium.com/@00110011 2021-06-08 01:31:42.797000000 20/08/2025 22:11:13 2022-01-06 09:56:37.966000000 <section>\n<p>While learnin... 153 \N 2026-03-04 20:09:38 2021-06-26 17:36:08.059000000 3 min 0

Data Fields

This dataset includes the following data points:

Url
Source
Title
Sub Title
Author
Author Url
Is Free
Post Id
Image
Reading Time
Created At
Published At
Modified At
Comments Count
Total Claps
Language
Tags
Raw Content
Content
Uniq Id
Scraped At
Words Count
Version
Imported At

Why This Data

This articles dataset from Medium provides comprehensive market intelligence and competitive insights. Perfect for:

  • Market Research: Understand market trends and customer preferences
  • Competitive Analysis: Compare pricing, products, and strategies
  • Business Intelligence: Make data-driven decisions
  • Price Monitoring: Track price changes and optimize your pricing

Use Cases

This dataset is perfect for various applications:

Consumer Demand & Pricing Insights: Analyze article content and engagement (claps, comments) to identify trending product features, emerging consumer needs, and sentiment towards product categories for optimizing e-commerce pricing and offerings.

Industry & Competitor Intelligence: Monitor article content, tags, and author activity to identify emerging industry trends, track competitor strategies, and gauge public perception of brands and technologies.

Domain-Specific NLP Model Training: Utilize the vast Raw Content, Tags, and Comments Count as a rich dataset to train and validate custom AI/ML models for advanced sentiment analysis, topic modeling, and entity extraction.

Future Product Feature & Innovation Discovery: Analyze Tags, Title, and highly engaged articles (claps, comments) over time to identify discussions around unmet user needs, innovative concepts, and future technology shifts guiding product roadmaps.

High-Impact Content Strategy Optimization: Identify successful topics, keywords, and content formats by analyzing top-performing articles based on Total Claps, Comments Count, Tags, and Reading Time to inform a data-driven SEO and content marketing plan.

Get Access to This Dataset

Start using this dataset today. Available in CSV, JSON, and Excel formats with flexible access options.

Frequently Asked Questions

The dataset includes a `scraped_at` timestamp for each record, indicating that data is regularly collected to ensure freshness. Exact update cycles can vary, but this field provides insight into the latest scrape time for individual articles.

The data is typically available in structured formats like JSON or CSV, suitable for programmatic access or bulk downloads. Specific delivery methods depend on the provider's integration options and user requirements.

Yes, you can usually customize data requests by selecting from a wide range of available fields like `title`, `author`, `tags`, `reading_time`, and both `raw_content` and `content` to match your specific needs.

This dataset is ideal for content analysis, trend tracking, competitive research, and author performance evaluation. Researchers, content strategists, data scientists, and marketers can greatly benefit from the rich article metadata and content.

Yes, samples are typically available to help users evaluate the data quality and structure prior to a full subscription or purchase. Support channels often include documentation, email, or dedicated technical assistance.

The dataset includes over 10 million records from `medium.com` with a `language` data point, implying broad coverage across various languages. However, specific details on completeness (e.g., every single article ever published) may require further inquiry.

You can effectively filter the dataset using various data points such as `author`, `tags`, `language`, `published_at` date ranges, `total_claps`, or `reading_time` to narrow down and analyze specific subsets of articles.