npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mnl-ws-norm

v1.0.3

Published

Light-weight tool for normalizing whitespace, splitting lines, and accurately tokenizing words (no regex). Multiple natural languages supported.

Downloads

3

Readme

Light-weight tool for normalizing whitespace and accurately tokenizing words (no regex). Multiple natural languages supported. Useful for scrapping, machine learning, and data analysis.

Installation

npm install mnl-ws-norm

function isWhiteSpace(char)

returns true if char is a whitespace character.

char must be passed as a string with a length of 1.

import {isWhiteSpace} from 'mnl-ws-norm';

console.log("Half-width space isWhiteSpace: " + isWhiteSpace(" "));
console.log("Tab is white space: " + isWhiteSpace("	"));
console.log("'A' is white space: " + isWhiteSpace("A"));
console.log("'\\n' is white space: " + isWhiteSpace("\n"));

function isLineBreak(char)

returns true if char is a line break.

char must be passed as a string with a length of 1.

import {isLineBreak} from 'mnl-ws-norm';

console.log("'\\n' is line break: " + isLineBreak("\n"));
console.log("Half-width space is line break: " + isLineBreak(" "));

function splitBySpaces(inputStr)

inputStr is the string from which words are to be tokenized.

inputStr must be passed as a string.

Note: This function splits inputStr by all whitespace characters (spaces, line breaks, etc.).

import {splitBySpaces} from 'mnl-ws-norm';

//Source string 1 with half-width spaces (Unicode: U+0020) and a tab (Unicode: U+0009).
var sourceStr1 = "Hey, everybody,  how are you doing?";

//Source string 2 with half-width spaces and a \n character (Unicode: U+000A).
var sourceStr2 = "Hey, everybody\nhow are you doing?";

//Source string 3 with half-width spaces and a full-width space (Unicode: U+3000).
var sourceStr3 = "Hey, everybody,	how are you doing?";

//The join method is used in this example to separate the elements in the returned array.

console.log("sourceStr1: " + splitBySpaces(sourceStr1).join("-"));
console.log("sourceStr2: " + splitBySpaces(sourceStr2).join("-"));
console.log("sourceStr3: " + splitBySpaces(sourceStr3).join("-"));

function splitByLines(inputStr, removeExtraSpaces = false)

Required parameter -> inputStr

inputStr is the string from which lines are to be tokenized.

inputStr must be passed as a string.

Optional parameter -> removeExtraSpaces

By default, leading/trailing spaces are not removed from lines. Specifying removeExtraSpace as true removes leading/trailing spaces.

import {splitBySpaces, splitByLines} from 'mnl-ws-norm';

var sourceStr = "Hey, everybody.\nHow are you doing?\rI am alright.";

var lines = splitByLines(sourceStr);

//The join method is used in this example to separate the elements in the returned array.

for (var i = 0; i < lines.length; i++) {
	console.log("Line " + i.toString() + " : " + (splitBySpaces(lines[i])).join("-"));
}

function normSpaces(inputStr, spaceType, removeExtraSpaces = false)

Required parameters -> inputStr, spaceType

inputStr is the string in which the whitespace characters are to be replaced.

inputStr must be passed as a string.

spaceType is the string used to replace all whitespace characters in inputStr.

spaceType must be passed as a string.

Optional parameter -> removeExtraSpaces

By default, extra whitespace characters are not removed from inputStr.

Specifying removeExtraSpaces as true removes extra whitespace characters from inputStr.

Note: Regardless of the value of removeExtraSpaces, the returned string may have leading/trailing whitespace characters, so you may want to use the trim() method as necessary.

import {normSpaces} from 'mnl-ws-norm';

//Source string with consecutive half-width spaces (Unicode: U+0020) and a tab (Unicode: U+0009).
var sourceStr = "  Hey,  everybody, 	how  are  you  doing?  ";

//Spaces in sourceStr are replaced with a half-width space, while extra spaces are ignored.
console.log(normSpaces(sourceStr, " "));

//Spaces in sourceStr are replaced with a half-width space, and extra spaces are removed.
console.log(normSpaces(sourceStr, " ", true));

//Spaces in source_str are replaced with a full-width space (Unicode: U+3000), and extra spaces are removed.
console.log(normSpaces(sourceStr, " ", true));

Other languages

  1. Python -> https://github.com/Rairye/mnl-ws-norm