markdown-read
v3.8.0
Published
turn url or html file to markdown
Downloads
118
Readme
Markdown Read
Convert any URL to Markdown.
Try it online: HTML To Markdown
Tech Stack
@mozilla/readability
for read meaning htmlturndown
for html to markdownjsdom
for parse html
Usage
You will need Node.js installed on your system, then install it globally.
$ npm i -g markdown-read
# Turn current page to markdown
$ markdown https://example.com
## Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
[More information...](https://www.iana.org/domains/example)
Options
--header
: Add custom headers to the request. This can be useful for setting user-agent strings or other HTTP headers required by the target website.
Example:
$ markdown https://httpbin.org/get --header 'User-Agent: Markdown Reader'
API Reference
markdown(url: string, options?: MarkdownOptions): Promise<MarkdownContent | null>
Converts a web page to Markdown format.
url
: The URL of the web page to convertoptions
: Optional settings for document retrieval and Markdown conversionheaders
: Additional headers to include in the requestfetcher
: Custom function to fetch the HTML content- All options from
TurndownOptions
are also supported
Returns a Promise that resolves to a MarkdownContent
object or null
if conversion fails.
MarkdownContent
The MarkdownContent
object extends ReadabilityContent
and includes:
markdown
: The converted Markdown contentlength
: The length of the Markdown contenturl
: The original URL of the web page
turndown(html: string, options?: TurndownOptions): string
Converts HTML content to Markdown.
html
: The HTML string to convertoptions
: Optional settings for Turndown conversion. These options will override the default settings.
Returns the Markdown representation of the input HTML.
Default Options
{
emDelimiter: '*',
codeBlockStyle: 'fenced',
fence: '```',
headingStyle: 'atx',
bulletListMarker: '+'
}
Example
import { turndown } from 'markdown-read';
const html = '<h1>Hello</h1><em>World</em>';
const options = {
headingStyle: 'setext',
emDelimiter: '_'
};
const markdown = turndown(html, options);
console.log(markdown);
// Output:
// Hello
// =====
//
// _World_
For a full list of available options, please refer to the Turndown Options documentation.
Advanced Features
- Handles lazy-loaded images by setting their
src
attribute. - Extracts byline information from meta tags.
- Supports platform-specific processing for various websites.
- Uses Mozilla's Readability for content extraction.
- Allows custom fetching logic through the
fetcher
option.