htmlarkdown

v1.0.2

Published

2 years ago

HTML-to-Markdown converter that can output HTML-syntax when required. (eg. when there's "align" attribute in , or "width" in <img>)

Downloads

1,687

0High
0Medium
0Low

evitanrelta

HTMLarkdown is a HTML-to-Markdown converter that's able to output HTML-syntax when required.
Like when center-aligning, or resizing images:

Written completely in TypeScript.
Has many Jest tests, covering many edge-case conversions.
Leave a issue/PR if you can think of more!
For now, is designed for GFM.
Try it out at the demo site below!
https://evitanrelta.github.io/htmlarkdown

How is this different?

Switching to HTML-syntax

Whenever elements cannot be represented in markdown-syntax, HTMLarkdown will switch to HTML-syntax:

Note: The HTML-switching is controlled by the rules' Rule.toUseHtmlPredicate.

But HTMLarkdown tries to use as little HTML-syntax as possible. Mixing markdown and HTML if needed:

Depending on the situation, HTMLarkdown will switch between markdown's backslash-escaping or HTML-escaping:

Handling of edge cases

Adding separators in-between adjacent lists to prevent them from being combined by markdown-renderers:

And more!
But this section is getting too long so...

Installation

npm install htmlarkdown

Usage

Markdown conversion (either from `Element` or `string`)

import { HTMLarkdown } from 'htmlarkdown'

/** Convert an element! */
const htmlarkdown = new HTMLarkdown()
const container = document.getElementById('container')
console.log(container.outerHTML)
// => '<div id="container"><h1>Heading</h1></div>'
htmlarkdown.convert(container)
// => '# Heading'


/** 
 * Or a HTML string! 
 * Whichever u prefer. It's 2022, I don't judge :^)
 */
const htmlString = `
<h1>Heading</h1>
<p>Paragraph</p>
`
const htmlStrWithContainer = `<div>${htmlString}</div>`
htmlarkdown.convert(htmlString)
// Set 2nd param 'hasContainer' to true, for container-wrapped string.
htmlarkdown.convert(htmlStrWithContainer, true)
// Both output => '# Heading\n\nParagraph'

Note: If an element is given to convert, it's deep-cloned before any processing/conversion.
Thus, you don't have to worry about it mutating the original element :)

Configuring

/** Configure when creating an instance. */
const htmlarkdown = new HTMLarkdown({
    htmlEscapingMode: '&<>',
    maxPrettyTableWidth: Number.POSITIVE_INFINITY,
    addTrailingLinebreak: true
})

/** Or on an existing instance. */
htmlarkdown.options.maxPrettyTableWidth = -1

Plugins

Plugins are of type (htmlarkdown: HTMLarkdown): void.
They take in a HTMLarkdown instance and configure it by mutating it.

There's 2 plugin-options available in the options object: preloadPlugins and plugins.
The difference is:

preloadPlugins loads the plugins first, before your other options. (likes "presets")
Allowing you to overwrite the plugins' changes:

const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
    htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
    addTrailingLinebreak: false,
    preloadPlugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // false

plugins loads the plugins after your other options.
Meaning, plugins can overwrite your options.

const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
    htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
    addTrailingLinebreak: false,
    plugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // true

You can also load plugins on existing instances:

htmlarkdown.loadPlugins([myPlugin])

Making a copy of an instance

The conversion of a HTMLarkdown instance solely depends on its options property.
Meaning, you create a copy of an instance like this:

const htmlarkdown = new HTMLarkdown()
const copy = new HTMLarkdown(htmlarkdown.options)

Configuring rules/processes

See this section for info on what the rules/processes do.

/**
 * Overwriting default rules/processes.
 * (does NOT include the defaults)
 */
const htmlarkdown = new HTMLarkdown({
    preProcesses: [myPreProcess1, myPreProcess2],
    rules: [myRule1, myRule2],
    textProcesses: [myTextProcess1, myTextProcess2],
    postProcesses: [myPostProcess1, myPostProcess2]
})

/**
 * Adding on to default rules/processes.
 * (includes the defaults)
 */
const htmlarkdown = new HTMLarkdown()
htmlarkdown.addPreProcess(myPreProcess)
htmlarkdown.addRule(myRule)
htmlarkdown.addTextProcess(myTextProcess)
htmlarkdown.addPostProcess(myPostProcess)

How it works

HTMLarkdown has 3 distinct phases:

Pre-processing
The container-element that's received (and deep-cloned) by the convert method is passed consecutively to each PreProcess in options.preProcesses.
Conversion
The pre-processed container-element is then recursively converted to markdown.
Elements are converted by Rule in options.rules.
Text-nodes are converted by TextProcess in options.textProcesses.
The rule/text-process outputs strings are then appended to each other, to give the raw markdown.
Post-processing
The raw markdown string is then passed consecutively to each PostProcess in options.postProcess, to give the final markdown.

Contributing

Bugs

HTMLarkdown is still under-development, so there'll likely be bugs.

So the easiest way to contribute is submit an issue (with the bug label), especially for any incorrect markdown-conversions :)

For any incorrect markdown-conversions, state the:

input HTML
current incorrect markdown output
expected markdown output

New conversions, ideas, features, tests

If you have any new elements-conversions / ideas / features / tests that you think should be added, leave an issue with feature or improve label!

feature label is for new features
improve label is for improvements on existing features
Understandably, there are gray areas on what is a "feature" and what is an "improvement". So just go with whichever seems more appropriate :)

Other markdown specs

Currently, HTMLarkdown has been designed to output markdown for GitHub specifically (ie. GFM).
BUT, if there's another markdown spec. that you'd like to design for (maybe as a plugin?), do leave an issue/discussion :D

Coding-related stuff

Code-formatting is handled by Prettier, so no need to worry bout it :)

Any new feature should

be documented via TSDoc
come with new unit-tests for them
and should pass all new/existing tests

As for which merging method to use, check out the discussion.

Contributors

So far it's just me, so pls send help! :^)

Roadmap

If you've any new ideas / features, check out the Contributing section for it!

Element conversions

Block-elements:

[x] Headings (For now, only ATX-style)
[x] Paragraph
[x] Codeblock
[x] Blockquote
[x] Lists
(ordered, unordered, tight and loose)
[x] (GFM) Table
[ ] (GFM) Task-list (Below are some planned block-elements that don't have markdown-equivalent)
[x]  (handled by a noop-rule)
[x] <div> (For now, handled by a noop-rule)
[ ] Definition list (ie. <dl>, <dt>, <dd>)
[ ] Collapsible section (ie. <details>)

Text-formattings:

[x] Bold (For now, only outputs in asterisks **BOLD**)
[x] Italic (For now, only outputs in asterisks *ITALIC*)
[x] (GFM) ~~Strikethrough~~
[x] Code
[x] Link (For now, only inline links)
[x] Superscript (ie. )
[x] Subscript (ie. )
[x] Underline (ie. , <ins>)
(didn't know underlines possible till recently)

Misc:

[x] Images (For now, only inline links)
[x] Horizontal-rule (ie. <hr>)
[x] Linebreaks (ie. <brr>)
[ ] Preserved HTML comments (Issue #25) (eg. )

Features to be added:

Custom id attributes

Go to [section with id](#my-section)

<p id="my-section">
  My section
</p>

Reversing GitHub's Issue/PR autolinks
Ability to customise how codeblock's syntax-highlighting langauge is obtained from the <pre><code> elements

License

The MIT License (MIT).
So it's freeeeeee