htmlarkdown
v1.0.2
Published
HTML-to-Markdown converter that can output HTML-syntax when required. (eg. when there's "align" attribute in <p>, or "width" in <img>)
Downloads
25
Readme
HTMLarkdown is a HTML-to-Markdown converter that's able to output HTML-syntax when required.
Like when center-aligning, or resizing images:
- Written completely in TypeScript.
- Has many Jest tests, covering many edge-case conversions.
Leave a issue/PR if you can think of more!
- For now, is designed for GFM.
- Try it out at the demo site below!
https://evitanrelta.github.io/htmlarkdown
How is this different?
Switching to HTML-syntax
Whenever elements cannot be represented in markdown-syntax, HTMLarkdown will switch to HTML-syntax:
Note: The HTML-switching is controlled by the rules'
Rule.toUseHtmlPredicate
.
But HTMLarkdown tries to use as little HTML-syntax as possible. Mixing markdown and HTML if needed:
Depending on the situation, HTMLarkdown will switch between markdown's backslash-escaping or HTML-escaping:
Handling of edge cases
Adding separators in-between adjacent lists to prevent them from being combined by markdown-renderers:
And more!
But this section is getting too long so...
Installation
npm install htmlarkdown
Usage
Markdown conversion (either from Element
or string
)
import { HTMLarkdown } from 'htmlarkdown'
/** Convert an element! */
const htmlarkdown = new HTMLarkdown()
const container = document.getElementById('container')
console.log(container.outerHTML)
// => '<div id="container"><h1>Heading</h1></div>'
htmlarkdown.convert(container)
// => '# Heading'
/**
* Or a HTML string!
* Whichever u prefer. It's 2022, I don't judge :^)
*/
const htmlString = `
<h1>Heading</h1>
<p>Paragraph</p>
`
const htmlStrWithContainer = `<div>${htmlString}</div>`
htmlarkdown.convert(htmlString)
// Set 2nd param 'hasContainer' to true, for container-wrapped string.
htmlarkdown.convert(htmlStrWithContainer, true)
// Both output => '# Heading\n\nParagraph'
Note: If an element is given to
convert
, it's deep-cloned before any processing/conversion.
Thus, you don't have to worry about it mutating the original element :)
Configuring
/** Configure when creating an instance. */
const htmlarkdown = new HTMLarkdown({
htmlEscapingMode: '&<>',
maxPrettyTableWidth: Number.POSITIVE_INFINITY,
addTrailingLinebreak: true
})
/** Or on an existing instance. */
htmlarkdown.options.maxPrettyTableWidth = -1
Plugins
Plugins are of type (htmlarkdown: HTMLarkdown): void
.
They take in a HTMLarkdown
instance and configure it by mutating it.
There's 2 plugin-options available in the options
object: preloadPlugins
and plugins
.
The difference is:
preloadPlugins
loads the plugins first, before your other options. (likes "presets")
Allowing you to overwrite the plugins' changes:const enableTrailingLinebreak: Plugin = (htmlarkdown) => { htmlarkdown.options.addTrailingLinebreak = true } const htmlarkdown = new HTMLarkdown({ addTrailingLinebreak: false, preloadPlugins: [enableTrailingLinebreak], }) htmlarkdown.options.preloadPlugins // false
plugins
loads the plugins after your other options.
Meaning, plugins can overwrite your options.const enableTrailingLinebreak: Plugin = (htmlarkdown) => { htmlarkdown.options.addTrailingLinebreak = true } const htmlarkdown = new HTMLarkdown({ addTrailingLinebreak: false, plugins: [enableTrailingLinebreak], }) htmlarkdown.options.preloadPlugins // true
You can also load plugins on existing instances:
htmlarkdown.loadPlugins([myPlugin])
Making a copy of an instance
The conversion of a HTMLarkdown
instance solely depends on its options
property.
Meaning, you create a copy of an instance like this:
const htmlarkdown = new HTMLarkdown()
const copy = new HTMLarkdown(htmlarkdown.options)
Configuring rules/processes
See this section for info on what the rules/processes do.
/**
* Overwriting default rules/processes.
* (does NOT include the defaults)
*/
const htmlarkdown = new HTMLarkdown({
preProcesses: [myPreProcess1, myPreProcess2],
rules: [myRule1, myRule2],
textProcesses: [myTextProcess1, myTextProcess2],
postProcesses: [myPostProcess1, myPostProcess2]
})
/**
* Adding on to default rules/processes.
* (includes the defaults)
*/
const htmlarkdown = new HTMLarkdown()
htmlarkdown.addPreProcess(myPreProcess)
htmlarkdown.addRule(myRule)
htmlarkdown.addTextProcess(myTextProcess)
htmlarkdown.addPostProcess(myPostProcess)
How it works
HTMLarkdown has 3 distinct phases:
Pre-processing
The container-element that's received (and deep-cloned) by theconvert
method is passed consecutively to eachPreProcess
inoptions.preProcesses
.Conversion
The pre-processed container-element is then recursively converted to markdown.
Elements are converted byRule
inoptions.rules
.
Text-nodes are converted byTextProcess
inoptions.textProcesses
.
The rule/text-process outputs strings are then appended to each other, to give the raw markdown.Post-processing
The raw markdown string is then passed consecutively to eachPostProcess
inoptions.postProcess
, to give the final markdown.
Contributing
Bugs
HTMLarkdown is still under-development, so there'll likely be bugs.
So the easiest way to contribute is submit an issue (with the bug
label), especially for any incorrect markdown-conversions :)
For any incorrect markdown-conversions, state the:
- input HTML
- current incorrect markdown output
- expected markdown output
New conversions, ideas, features, tests
If you have any new elements-conversions / ideas / features / tests that you think should be added, leave an issue with feature
or improve
label!
feature
label is for new featuresimprove
label is for improvements on existing featuresUnderstandably, there are gray areas on what is a "feature" and what is an "improvement". So just go with whichever seems more appropriate :)
Other markdown specs
Currently, HTMLarkdown has been designed to output markdown for GitHub specifically (ie. GFM).
BUT, if there's another markdown spec. that you'd like to design for (maybe as a plugin?), do leave an issue/discussion :D
Coding-related stuff
Code-formatting is handled by Prettier, so no need to worry bout it :)
Any new feature should
- be documented via TSDoc
- come with new unit-tests for them
- and should pass all new/existing tests
As for which merging method to use, check out the discussion.
Contributors
So far it's just me, so pls send help! :^)
Roadmap
If you've any new ideas / features, check out the Contributing section for it!
Element conversions
Block-elements:
- [x] Headings (For now, only ATX-style)
- [x] Paragraph
- [x] Codeblock
- [x] Blockquote
- [x] Lists
(ordered, unordered, tight and loose) - [x] (GFM) Table
- [ ] (GFM) Task-list (Below are some planned block-elements that don't have markdown-equivalent)
- [x]
<span>
(handled by a noop-rule) - [x]
<div>
(For now, handled by a noop-rule) - [ ] Definition list (ie.
<dl>
,<dt>
,<dd>
) - [ ] Collapsible section (ie.
<details>
)
Text-formattings:
- [x] Bold (For now, only outputs in asterisks
**BOLD**
) - [x] Italic (For now, only outputs in asterisks
*ITALIC*
) - [x] (GFM) ~~Strikethrough~~
- [x]
Code
- [x] Link (For now, only inline links)
- [x] Superscript (ie.
<sup>
) - [x] Subscript (ie.
<sub>
) - [x] Underline (ie.
<u>
,<ins>
)
(didn't know underlines possible till recently)
Misc:
- [x] Images (For now, only inline links)
- [x] Horizontal-rule (ie.
<hr>
) - [x] Linebreaks (ie.
<brr>
) - [ ] Preserved HTML comments (Issue #25)
(eg.
<!-- COMMENT -->
)
Features to be added:
- Custom
id
attributesGo to [section with id](#my-section) <p id="my-section"> My section </p>
- Reversing GitHub's Issue/PR autolinks
- Ability to customise how codeblock's syntax-highlighting langauge is obtained from the
<pre><code>
elements
License
The MIT License (MIT).
So it's freeeeeee