npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mnl-punct-norm

v1.0.2

Published

Tool for stripping and normalizing punctuation and other non-alphanumeric characters. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.

Downloads

3

Readme

Tool for stripping and normalizing punctuation and other non-alphanumeric characters. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.

Note: "non-alphanumeric characters" include characters such as emojis and pictographs. Whitespace characters are not included. (If you would like to normalize whitespace characters, please see https://www.npmjs.com/package/mnl-ws-norm )

Hereinafter, the terms "punctuation marks" and "punctuation" refer to both punctuation marks and other non-alphanumeric characters.

Installation

npm install mnl-punct-norm

function isPunct(char)

Required parameter -> char

Note: char must be a string with length of 1.


import {isPunct} from 'mnl-punct-norm';

//Half-width period used in English, etc.
console.log("Half-width period isPunct -> " + isPunct("."));

//Full-width period used in Japanese, etc.
console.log("Full-width period isPunct -> " + isPunct("。"));

//Hiragana character
console.log("Hiragana character isPunct -> " + isPunct("あ"));

//Kanji
console.log("Kanji character isPunct -> " + isPunct("私"));

//Pictograph example
console.log("★ isPunct -> " + isPunct("★"));

function stripPunct(inputStr, inputSkips = "")

This function strips all punctuation marks from a string.

Required parameter -> inputStr

inputStr is the string from which punctuation marks are to be stripped.

inputStr must be passed as a string.

Optional parameter -> inputSkips

inputSkips is a string containing a sequence of punctuation marks that are not to be stripped from inputStr.

inputSkips must be passed as a string.


import {stripPunct} from 'mnl-punct-norm';

var sourceStr = "This light-weight module, which provides multi-language support, normalizes punctuation in strings.";

//Strips all punctuation from sourceStr.
console.log(stripPunct(sourceStr));

//Strips all punctuation from sourceStr, except for hyphens.
console.log(stripPunct(sourceStr, "-"));

//Strips all punctuation from sourceStr, except for hyphens and commas.
console.log(stripPunct(sourceStr, "-,"))

var japaneseStr = "私は人間(にんげん)です。";

//Strips all punctuation from japaneseStr.
console.log(stripPunct(japaneseStr));

//Strips all punctuation from japaneseStr, except for parentheses.
console.log(stripPunct(japaneseStr, "()"));

function stripOuterPunct(word)

Required parameter -> word

Strips leading and trailing punctuation (stops at alphanumeric character or whitespace character).


import {stripOuterPunct} from 'mnl-punct-norm';

var source1 = "(((((((hey-buddy-how-is-it-going))))))";
console.log(stripOuterPunct(source1));

var source2 = "x=3-2";
console.log(stripOuterPunct(source2));

function replacePunct(inputStr, inputSkips = "", replacement= " ")

This function replaces all punctuation marks in a string with either a half-width space (default) or a user-specified string.

Required parameter -> inputStr

inputStr is the string in which punctuation marks are to be replaced. inputStr must be passed as a string.

Optional parameters -> inputSkips, replacement

inputSkips is a string containing a sequence of punctuation marks that are not to be replaced with the replacement string.

inputSkips must be passed as a string.

replacement is a string that is used to replace punctuation marks (a half-width space by default).

replacement must be passed as a string.

Note: If the replacement string follows a space or other substring that is equal to the replacement string, the replacement string will not be added (to avoid creating extra spaces/substrings in the string returned by the function).

Also, in cases where multiple punctuation marks are used sequentially, only a single instance of the replacement string will be used.

Note: There may be leading/trailing spaces in the string returned by the function, so you may want to use the trim() method if necessary.

import {replacePunct} from 'mnl-punct-norm';

//Replaces all punctuation in sourceStr with a half-width space.
console.log(replacePunct(sourceStr));

//Replaces all punctuation in sourceStr with " <PUNCT> ".
console.log(replacePunct(sourceStr, "", " <PUNCT> "));

//Replaces all punctuation in japaneseStr with a full-width space.
console.log(replacePunct(japaneseStr, "", " "));

//String with multiple punctuation marks. The extra spaces in the string are not normalized by the function.
var multiplePunctStr = "Wha ... what are you     doing!?!?";

//Example in which multiple punctuation marks are used in a row.
console.log(replacePunct(multiplePunctStr));

//Example in which multiple punctuation marks are used in a row, with replacement passed as " <PUNCT> ".
console.log(replacePunct(multiplePunctStr, "", " <PUNCT> "));

Use Case: Removing non-standard punctuation marks (such as emojis) while keeping standard punctuation marks

There may be cases where you only want to remove non-standard punctuation marks (such as in text taken from reviews, comment sections, or other places on the Web).

import {stripPunct} from 'mnl-punct-norm';

var englishPunct = "\"!#$%&\'()=-~^\\|[]{}:*;+,.<>?/_";

//Source string containing both standard and non-standard punctuation marks.
var sourceStr = "★♡▲ Have a nice day!! ★♡▲";

//Non-standard punctuation marks are removed, while the standard punctuation marks remain.
console.log(stripPunct(sourceStr, englishPunct));

Other Languages

Python -> https://github.com/Rairye/mnl-punct-norm