How to Convert HTML to Markdown (Complete Guide)
Learn why developers convert HTML to Markdown, which tools work best, and how to automate the process for documentation, AI ingestion, and Obsidian.
Markdown has become the universal format for documentation, README files, and AI context. But most web content exists as HTML — and moving between the two is a daily task for developers, technical writers, and researchers.
This guide covers everything you need to know about HTML to Markdown conversion: why to do it, how tools work under the hood, and which approach fits your use case.
Why Convert HTML to Markdown?
HTML is designed for browsers. It carries layout, styling, and semantic information that is largely irrelevant when you want to work with content. Markdown strips all of that away, leaving just the text and its structure.
Common reasons to convert HTML to Markdown:
- Documentation: Import external docs into your MkDocs, Docusaurus, or Notion workspace
- AI ingestion: Markdown is 60–80% more token-efficient than equivalent HTML for LLMs
- Obsidian notes: The Obsidian vault is built on Markdown — HTML doesn’t fit natively
- Version control: Markdown diffs cleanly in Git; HTML is noisy and hard to review
How HTML to Markdown Conversion Works
The conversion process has two main steps:
1. Parsing the HTML
The converter parses the HTML DOM to build a tree of elements. This is where element types matter:
<h1>→# Heading<strong>or<b>→**bold**<em>or<i>→_italic_<a href="...">→[text](url)<ul><li>→- list item<code>→`code`<pre><code>→```code block```
2. Serialization
The parsed tree is walked depth-first and each node is converted to its Markdown equivalent. Whitespace is normalized, and elements that have no Markdown equivalent (like <div>, <span>, <style>) are stripped or passed through as plain text.
GitHub Flavored Markdown (GFM)
Standard Markdown doesn’t support tables. For most developer use cases, you want a converter that supports GFM, which adds:
- Tables (
| col | col |) - Task lists (
- [x] Done) - Strikethrough (
~~deleted~~) - Fenced code blocks with language hints (
```javascript)
Our HTML to Markdown converter uses Turndown with the turndown-plugin-gfm extension, which handles all of these.
Common Edge Cases
Images
<img src="photo.jpg" alt="A photo"> becomes . If the alt attribute is missing, you get  — which is valid but not ideal for accessibility or SEO.
Nested Lists
HTML supports arbitrary nesting. Markdown also supports it, but indentation rules vary between parsers. A good converter uses 2-space or 4-space indentation consistently.
Tables
HTML tables can have merged cells (colspan, rowspan). Markdown tables cannot. A converter will drop merged cells and produce a flat table, which may lose information from complex layouts.
Inline Styles
<span style="color: red">text</span> — there is no Markdown equivalent for inline color. The style is stripped and only the text survives.
When to Use a URL-to-Markdown Tool Instead
If you’re converting a full web page (not just a fragment of HTML), use a URL to Markdown converter instead. URL converters:
- Fetch the full page
- Apply Mozilla Readability to strip navigation, ads, sidebars
- Convert only the article content to Markdown
This produces much cleaner results than pasting the full page HTML into an HTML converter.
Automating HTML to Markdown
For bulk conversion, consider using the Turndown library directly in Node.js:
import TurndownService from 'turndown';
import { gfm } from 'turndown-plugin-gfm';
const td = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });
td.use(gfm);
const markdown = td.turndown('<h1>Hello</h1><p>World</p>');
console.log(markdown); // # Hello\n\nWorld
For browser environments, you can use the same library — it works in both Node.js and the browser.
Try It Free
The fastest way to get started is our HTML to Markdown Converter — paste HTML, get Markdown, copy and go. No sign-up required.