parse-entities
v4.0.1
Published
Parse HTML character references
Downloads
38,368,847
Maintainers
Readme
parse-entities
Parse HTML character references.
Contents
What is this?
This is a small and powerful decoder of HTML character references (often called entities).
When should I use this?
You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.
Install
This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:
npm install parse-entities
In Deno with esm.sh
:
import {parseEntities} from 'https://esm.sh/parse-entities@3'
In browsers with esm.sh
:
<script type="module">
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>
Use
import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie ©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotel
API
This package exports the identifier parseEntities
.
There is no default export.
parseEntities(value[, options])
Parse HTML character references.
options
Configuration (optional).
options.additional
Additional character to accept (string?
, default: ''
).
This allows other characters, without error, when following an ampersand.
options.attribute
Whether to parse value
as an attribute value (boolean?
, default: false
).
This results in slightly different behavior.
options.nonTerminated
Whether to allow nonterminated references (boolean
, default: true
).
For example, ©cat
for ©cat
.
This behavior is compliant to the spec but can lead to unexpected results.
options.position
Starting position
of value
(Position
or Point
, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}
options.warning
Error handler (Function?
).
options.text
Text handler (Function?
).
options.reference
Reference handler (Function?
).
options.warningContext
Context used when calling warning
('*'
, optional).
options.textContext
Context used when calling text
('*'
, optional).
options.referenceContext
Context used when calling reference
('*'
, optional)
Returns
string
— decoded value
.
function warning(reason, point, code)
Error handler.
Parameters
this
(*
) — refers towarningContext
when given toparseEntities
reason
(string
) — human readable reason for emitting a parse errorpoint
(Point
) — place where the error occurredcode
(number
) — machine readable code the error
The following codes are used:
| Code | Example | Note |
| ---- | ------------------ | --------------------------------------------- |
| 1
| foo & bar
| Missing semicolon (named) |
| 2
| foo { bar
| Missing semicolon (numeric) |
| 3
| Foo &bar baz
| Empty (named) |
| 4
| Foo &#
| Empty (numeric) |
| 5
| Foo &bar; baz
| Unknown (named) |
| 6
| Foo € baz
| Disallowed reference |
| 7
| Foo � baz
| Prohibited: outside permissible unicode range |
function text(value, position)
Text handler.
Parameters
this
(*
) — refers totextContext
when given toparseEntities
value
(string
) — string of contentposition
(Position
) — place wherevalue
starts and ends
function reference(value, position, source)
Character reference handler.
Parameters
this
(*
) — refers toreferenceContext
when given toparseEntities
value
(string
) — decoded character referenceposition
(Position
) — place wheresource
starts and endssource
(string
) — raw source of character reference
Types
This package is fully typed with TypeScript.
It exports the additional types Options
, WarningHandler
,
ReferenceHandler
, and TextHandler
.
Compatibility
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.
Security
This package is safe: it matches the HTML spec to parse character references.
Related
wooorm/stringify-entities
— encode HTML character referenceswooorm/character-entities
— info on character referenceswooorm/character-entities-html4
— info on HTML4 character referenceswooorm/character-entities-legacy
— info on legacy character referenceswooorm/character-reference-invalid
— info on invalid numeric character references
Contribute
Yes please! See How to Contribute to Open Source.