social-components-parser
v0.3.1
Published
`social-components` post content parser
Downloads
51
Maintainers
Readme
social-components-parser
Utility functions for parsing markup into Content.
API
parseHtmlContent
import { parseHtmlContent } from 'social-components-parser'
Parses HTML markup into Content.
parseHtmlContent(html: string, {
syntax,
// Optional parameters:
context,
onUnknownElementType,
getContentElementsForUnknownElementType
}): Content?
Options:
syntax: object[]
— A set of rules for parsing HTML markup. See the Syntax section of this document.context?: object
— An optional object that gets passed tocreateElement()
functions (seesyntax
).onUnknownElementType?: ({ element: Element, elementMarkup: string }) => void
— A function that gets called when an unsupported DOM element is found.getContentElementsForUnknownElementType?: ({ element: Element, getContentElements: ContentElement[] }) => ContentElement[]
— A function that could be used to modify the default behavior when encountering an unsupported DOM element. The default behavior is to return the child content elements (getContentElements()
).
Examples
import { parseHtmlContent } from 'social-components-parser'
const content = parseHtmlContent('Some <b>emphasized</b> text', {
syntax: [{
tag: 'b',
createElement(content) {
return {
type: 'text',
style: 'bold',
content
}
}
}]
})
// `content` is a list of block elements (paragraphs).
// The first paragraph is a list of inline elements.
content === [
[
'Some ',
{
type: 'text',
style: 'bold',
content: 'emphasized'
},
' text'
]
]
import { parseHtmlContent } from 'social-components-parser'
const content = parseHtmlContent('Some <span class="red">emphasized</span> text', {
syntax: [{
tag: 'span',
attributes: {
name: 'class',
value: 'red'
},
createElement(content) {
return {
type: 'text',
style: 'bold',
content
}
}
}]
})
// `content` is a list of block elements (paragraphs).
// The first paragraph is a list of inline elements.
content === [
[
'Some ',
{
type: 'text',
style: 'bold',
content: 'emphasized'
},
' text'
]
]
import { parseHtmlContent } from 'social-components-parser'
const content = parseHtmlContent('Some <b><i>heavily</i> emphasized</b> text', {
syntax: [{
tag: 'b',
createElement(content) {
return {
type: 'text',
style: 'bold',
content
}
}
}, {
tag: 'i',
createElement(content) {
return {
type: 'text',
style: 'italic',
content
}
}
}]
})
// `content` is a list of block elements (paragraphs).
// The first paragraph is a list of inline elements.
content === [
[
'Some ',
{
type: 'text',
style: 'bold',
content: [
{
type: 'text',
style: 'italic',
content: 'heavily'
},
' emphasized'
]
},
' text'
]
]
Syntax
The syntax
parameter should be a list of definitions for each possible type of a content block.
A content block type is defined by properties:
tag: string
— An HTML tag.attributes?: object[]
— An optional set of attributes the HTML tag must have. Each attribute object should be of shape:name: string
— HTML attribute name.value: string | RegExp
— HTML attribute value.
convertContentToText?: boolean
— One could set this flag totrue
to instruct the parser to convert the contents of the HTML tag from HTML markup to simple text, effectively stripping any HTML tags from the child nodes and leaving just the text content. Example:<pre>some <b>bold</b> text</pre>
→<pre>some bold text</pre>
. This feature could be used to simplify parsing certain HTML tags having too complex internal structure.content?: boolean
— By default, empty HTML tags get skipped unlesscontent: false
flag is specified for a certain content block type.block?: boolean
— Set this flag totrue
if the content element is a "block element" rather than an "inline element", otherwise it won't be parsed correctly.createElement: (childContent?: Content, utility, options) => Content
— Returns theContent
representing the parsed HTML tag. Receives the parsedContent
of the child elements as an argument.utility: object
— An object providing utility functions:getAttribute(name: string)? string
— Returns an attribute of the HTML tag.
context?: object
— Thecontext
parameter of theparseHtmlContent()
function.
Other
splitContentIntoParagraphsByMultipleLineBreaks
Sometimes, the markup could contain multiple line breaks in the midst of some deeply-nested part of content.
Some <b><i>multiline<br/><br/>heavily</i> emphasized</b> text
When parsed using parseHtmlContent()
, it will return the content where <br/>
s w:
[
[
'Some ',
{
type: 'text',
style: 'bold',
content: [
{
type: 'text',
style: 'italic',
content: [
'multiline',
'\n',
'\n',
'heavily'
]
},
' emphasized'
]
},
' text'
]
]
In order to convert those multiple "line breaks" into a "paragraph break", one could use the splitContentIntoParagraphsByMultipleLineBreaks()
function that is exported from social-components
package.
import splitContentIntoParagraphsByMultipleLineBreaks from 'social-components/utility/post/splitContentIntoParagraphsByMultipleLineBreaks.js'
const normalizedContent = splitContentIntoParagraphsByMultipleLineBreaks(content)
normalizedContent === [
[
'Some ',
{
type: 'text',
style: 'bold',
content: [
{
type: 'text',
style: 'italic',
content: 'multiline'
}
]
}
],
[
{
type: 'text',
style: 'bold',
content: [
{
type: 'text',
style: 'italic',
content: 'heavily'
},
' emphasized'
]
},
' text'
]
]
Test
npm test
GitHub
On March 9th, 2020, GitHub, Inc. silently banned my account (erasing all my repos, issues and comments) without any notice or explanation. Because of that, all source codes had to be promptly moved to GitLab. The GitHub repo is now only used as a backup (you can star the repo there too), and the primary repo is now the GitLab one. Issues can be reported in any repo.