npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

virastar

v0.21.0

Published

Cleaning-up Persian Texts!

Downloads

136

Readme

Virastar (ویراستار)

Virastar is a Persian text cleaner.

A javascript port of aziz/virastar with lots of help from ebraminio/persiantools

see live demo

Build Status Dependency Status NPM version GitHub issues GitHub license js-semistandard-style

Install

npm install virastar

Usage

var Virastar = require('virastar');
var virastar = new Virastar();

virastar.cleanup("فارسي را كمی درست تر می نويسيم"); // Outputs: "فارسی را کمی درست‌تر می‌نویسیم"

Browser

<script src="lib/virastar.js"></script>
<script>
  var virastar = new Virastar();
  alert(virastar.cleanup("فارسي را كمی درست تر می نويسيم"));
</script>

Virastar([text] [,options])

text

Type: string

string of persian source to be cleaned.

options

Type: object

Virastar("سلام 123" ,{"fix_english_numbers":false}); // Outputs: "سلام 123"

Options and Specifications

Virastar comes with a list of options to control its behavior.

normalize_eol

default: true

  • replaces windows end of lines with unix eol (\n)

decode_htmlentities

default: true

  • converts numeral and selected html character-sets into original characters

fix_dashes

default: true

  • replaces triple dash to mdash
  • replaces double dash to ndash

fix_three_dots

default: true

  • removes spaces between dots
  • replaces three dots with ellipsis character

normalize_ellipsis

default: true

  • replaces more than one ellipsis with one
  • replaces (space|tab|zwnj) after ellipsis with one space

normalize_dates

default: true

  • re-orders date parts with slash as delimiter

fix_english_quotes_pairs

default: true

  • replaces english quote pairs (“”) with their persian equivalent («»)

fix_english_quotes

default: true

  • replaces english quote marks with their persian equivalent

fix_hamzeh

default: true

  • replaces ه followed by (space|ZWNJ|lrm) follow by ی with هٔ
  • replaces ه followed by (space|ZWNJ|lrm|nothing) follow by ء with هٔ
  • replaces هٓ or single-character ۀ with the standard هٔ

fix_hamzeh_arabic

default: false

  • converts arabic hamzeh ة to هٔ

cleanup_rlm

default: true

  • converts Right-to-left marks followed by persian characters to zero-width non-joiners (ZWNJ)

cleanup_zwnj

default: true

  • converts all soft hyphens (&shy;) into zwnj
  • removes more than one zwnj
  • cleans zwnj after characters that don't conncet to the next
  • cleans zwnj before and after numbers, english words, spaces and punctuations
  • removes unnecessary zwnj on start/end of each line

fix_arabic_numbers

default: true

  • replaces arabic numbers with their persian equivalent

fix_english_numbers

default: true

  • replaces english numbers with their persian equivalent

fix_numeral_symbols

default: true

  • replaces english percent signs (U+066A)
  • replaces dots between numbers into decimal separator (U+066B)
  • replaces commas between numbers into thousands separator (U+066C)

fix_misc_non_persian_chars

default: true

  • replaces arabic normal/swash kaf with its persian equivalent
  • replaces arabic/urdu/pushtu/uyghur yeh with its persian equivalent
  • replaces kurdish he with its persian equivalent

fix_punctuations

default: true

  • replaces ,, ; with its persian equivalent

fix_question_mark

default: true

  • replaces question marks with its persian equivalent

fix_perfix_spacing

default: true

  • puts zwnj between the word and the prefix:
    • mi*, nemi*, bi*

fix_suffix_spacing

default: true

  • puts zwnj between the word and the suffix:
    • *ha, *haye
    • *am, *at, *ash, *ei, *eid, *eem, *and, *man, *tan, *shan
    • *tar, *tari, *tarin
    • *hayee, *hayam, *hayat, *hayash, *hayetan, *hayeman, *hayeshan

fix_suffix_misc

default: true

  • replaces ه followed by ئ or ی, and then by ی, with ه‌ای

fix_spacing_for_braces_and_quotes

default: true

  • removes inside spaces and more than one outside for (), [], {}, “” and «»

fix_spacing_for_punctuations

default: true

  • removes space before punctuations
  • removes more than one space after punctuations, except followed by new-lines
  • removes space after colon that separates time parts
  • removes space after dots in numbers
  • removes space before some common domain tlds
  • removes space between question and exclamation marks
  • removes space between same marks

fix_diacritics

default: true

  • cleans zwnj before diacritic characters
  • cleans more than one diacritic characters
  • cleans spaces before diacritic characters

remove_diacritics

default: false

  • removes all diacritic characters

fix_persian_glyphs

default: true

  • converts incorrect persian glyphs to standard characters

fix_misc_spacing

default: true

  • removes space before parentheses on misc cases
  • removes space before braces containing numbers

cleanup_spacing

default: true

  • replaces more than one space with just a single one
  • cleans whitespace/zwnj between new-lines

cleanup_line_breaks

default: true

  • cleans more than two contiguous line breaks

cleanup_begin_and_end

default: true

  • removes space/tab/zwnj/nbsp from the beginning of the new-lines
  • removes spaces, tabs, zwnj, direction marks and new lines from the beginning and end of text

markdown

markdown_normalize_braces

default: true

  • removes spaces between [] and () ([text] (link) into [text](link))
  • removes space between ! and opening brace (! [alt](src) into ![alt](src))
  • removes spaces inside double (), [], {} ([[ text ]] into [[text]])
  • removes spaces between double (), [], {} ([[text] ] into [[text]])

markdown_normalize_lists

default: true

  • removes extra lines between two items on a markdown list beginning with -, * or #

skip_markdown_ordered_lists_numbers_conversion

default: false

  • skips converting english numbers of ordered lists in markdown

aggressive editing

cleanup_extra_marks

default: true

  • replaces more than one exclamation mark with just one
  • replaces more than one english or persian question mark with just one
  • re-orders consecutive marks: ?! into !?

kashidas_as_parenthetic

default: true

  • replaces kashidas to ndash in parenthetic

cleanup_kashidas

default: true

  • converts kashida between numbers to ndash
  • removes all kashidas between non-whitespace characters

extras

preserve_frontmatter

default: true

  • preserves frontmatter data in the text

preserve_HTML

default: true

  • preserves all html tags in the text

preserve_comments

default: true

  • preserves all html comments in the text

preserve_entities

default: true

  • preserves all html entities in the text

preserve_URIs

default: true

  • preserves all uri strings in the text

preserve_brackets

default: false

  • preserves strings inside square brackets ([])

preserve_braces

default: false

  • preserves strings inside curly braces ({})

preserve_nbsps

default: true

  • preserves all no-break space entities in the text

License

This software is licensed under the MIT License. View the license.