dismark
v0.1.1
Published
A markdown parser matching discord's parsing on desktop bug for bug
Downloads
179
Readme
Dismark
This is a discord-flavored markdown parser matching discord's desktop parsing bug for bug.
Contents
Installation
npm i --save dismark
Usage
import { MarkdownParser } from "dismark";
const parser = new MarkdownParser();
console.log(JSON.stringify(parser.parse("Hello *World*"), null, 4));
// {
// "type": "doc",
// "content": [
// {
// "type": "plain",
// "content": "Hello "
// },
// {
// "type": "italics",
// "content": {
// "type": "doc",
// "content": [
// {
// "type": "plain",
// "content": "World"
// }
// ]
// }
// }
// ]
// }
AST Structure
AST nodes take the following form:
export type document_fragment = {
type: "doc";
content: markdown_node[];
};
export type plain_text = {
type: "plain";
content: string;
};
export type italics = {
type: "italics";
content: markdown_node;
};
export type bold = {
type: "bold";
content: markdown_node;
};
export type underline = {
type: "underline";
content: markdown_node;
};
export type strikethrough = {
type: "strikethrough";
content: markdown_node;
};
export type spoiler = {
type: "spoiler";
content: markdown_node;
};
export type inline_code = {
type: "inline_code";
content: string;
};
export type code_block = {
type: "code_block";
language: string | null;
content: string;
};
export type header = {
type: "header";
level: number;
content: markdown_node;
};
export type subtext = {
type: "subtext";
content: markdown_node;
};
export type masked_link = {
type: "masked_link";
target: string;
content: markdown_node;
};
export type list = {
type: "list";
start_number: number | null;
items: markdown_node[];
};
export type blockquote = {
type: "blockquote";
content: markdown_node;
};
export type markdown_node =
| document_fragment
| plain_text
| italics
| bold
| underline
| strikethrough
| spoiler
| inline_code
| code_block
| header
| subtext
| masked_link
| list
| blockquote;
Example
Here's a simple example for working with the generated AST: This function walks the AST and prints out the plain text content, stripping all markdown formatting:
function extract_text(node: markdown_node): string {
switch (node.type) {
case "doc":
return node.content.map(extract_text).join("");
case "plain":
case "inline_code":
case "code_block":
return node.content;
case "italics":
case "bold":
case "underline":
case "strikethrough":
case "spoiler":
case "masked_link":
return extract_text(node.content);
case "header":
case "blockquote":
case "subtext":
return extract_text(node.content) + "\n";
case "list":
return node.items.map(extract_text).join("");
default:
throw new Error(`Unhandled markdown ast node type ${(node as any).type}`);
}
}
function strip_formatting(ast: markdown_node) {
return extract_text(ast);
}
const parser = new MarkdownParser();
const ast = parser.parse(`# foo
**bar** baz
- bar`);
console.log(strip_formatting(ast));
// prints:
// foo
// bar baz
// bar
Parse Rules
The MarkdownParser
class can be constructed with a list of parse rules to use. The default rules, available through
MarkdownParser.default_rules
are the following:
EscapeRule
BoldRule
UnderlineRule
ItalicsRule
StrikethroughRule
SpoilerRule
CodeBlockRule
InlineCodeRule
BlockquoteRule
SubtextRule
HeaderRule
LinkRule
ListRule
TextRule
Example
As an example, to construct a parser that only parses bold text, you can construct the parser with
const parser = new MarkdownParser([new BoldRule()]);
Custom Rules
The Rule
base class is defined as:
export type match_result = RegExpMatchArray;
export type parser_state = {
at_start_of_line: boolean;
in_quote: boolean;
};
export abstract class Rule {
// attempt to match a rule at the start of the substring `remaining`
abstract match(remaining: string, parser: MarkdownParser, state: parser_state): match_result | null;
// given a `match_result`, parse a markdown node
abstract parse(match: match_result, parser: MarkdownParser, state: parser_state, remaining: string): parse_result;
// attempt to coalesce to sequential markdown nodes (e.g. to merge sequential plain text nodes into a single node)
coalesce?(a: markdown_node, b: markdown_node): markdown_node | null;
}
Example
Below is an example rule to add support for $$math$$
syntax:
const MATH_RE = /^\$\$(.+?)\$\$/s;
type math = {
type: "math";
content: string;
};
class MathRule extends Rule {
override match(remaining: string): match_result | null {
return remaining.match(MATH_RE);
}
override parse(match: match_result, parser: MarkdownParser, state: parser_state): parse_result {
return {
node: {
type: "math",
content: match[1],
} as markdown_node | math as markdown_node, // unfortunately for now this hack is needed in typescript
fragment_end: match[0].length,
};
}
}
const custom_parser = new MarkdownParser([
new EscapeRule(),
new BoldRule(),
new UnderlineRule(),
new ItalicsRule(),
new StrikethroughRule(),
new SpoilerRule(),
new MathRule(), // <-- added here
new CodeBlockRule(),
new InlineCodeRule(),
new BlockquoteRule(),
new SubtextRule(),
new HeaderRule(),
new LinkRule(),
new ListRule(),
new TextRule(),
]);
const math_ast = custom_parser.parse("foo $$a^n + b^n = c^n$$ bar") as markdown_node | math;
console.log(math_ast);
// {
// type: "doc",
// content: [
// { type: "plain", content: "foo " },
// { type: "math", content: "a^n + b^n = c^n" },
// { type: "plain", content: " bar" }
// ]
// }