normalize-html-whitespace
v1.0.0
Published
Safely remove repeating whitespace from HTML text.
Downloads
456,153
Maintainers
Readme
normalize-html-whitespace
Safely remove repeating whitespace from HTML text.
Using \s
to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
\u00a0
or
(non-breaking space)\ufeff
or
(zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680
or 
(Ogham space mark)\u180e
or᠎
(Mongolian vowel separator)\u2000
or 
(en quad)\u2001
or 
(em quad)\u2002
or 
(en space)\u2003
or 
(em space)\u2004
or 
(three-per-em space)\u2005
or 
(four-per-em space)\u2006
or 
(six-per-em space)\u2007
or 
(figure space)\u2008
or 
(punctuation space)\u2009
or 
(thin space)\u200a
or 
(hair space)\u2028
or

(line separator)\u2029
or

(paragraph separator)\u202f
or 
(narrow non-breaking space)\u205f
or 
(medium mathematical space)\u3000
or 
(ideographic space)
For the sake of completeness, the following characters which are not part of \s
will also not be affected:
\u200b
or​
(zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Installation
Node.js >= 8
is required. Type this at the command line:
npm install normalize-html-whitespace
Usage
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');
//-> ' foo bar baz '