npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wikitext2plaintext

v0.1.0

Published

A simple, brute-force, non-exhaustive method for converting wikimedia markdown/wikitext to plaintext

Downloads

24

Readme

Wikitext2Plaintext

Javascript module used to convert mediawiki markdown/wikitext to plain text. This module employs the repetitive application of regular expressions to strip out the markdown and therefore is not intended to be used in high-performance applications.

The module was developed to be used in scenarios where the wikitext only needs to be converted to plaintext on a best effort basis and perfect results are not required. It was designed to be used on wikitext/markdown from mediawiki and specifically Wikipedia database dumps.

Note that this library does NOT convert the HTML version of a wiki page, it only converts the wikitext/markdown version.

Install

npm install wikitext2plaintext

Usage

To remove wikitext from a string using the default options.

const WT2PT = require('wikitext2plaintext');

var wt = new WT2PT();

wt.parse('## The Title ##\r\n*List item 1\r\n*List item 2\r\n');

/*
The Title
- List item 1
- List item 2
*/

If you do not want certain wikitext removed or a specific rule is causing problems with your particular use-case, you can exclude specific rules or specific rule groups. The rules and rule groups are listed at the bottom of this page.

API

Constructor

You must create an instance of the parser prior to using the functions.

const wikitext2plaintext = require('wikitext2plaintext');

var wt2pt = new wikitext2plaintext();

wt2pt.parse(wiki_text)

  • wiki_text (string) - Contains the wiki/markdown text to convert to
  • Return value (string) - Contains the plain text version of the wiki text which was passed in

This is the main function used to convert wiki text to plain text.

var wt2pt = new wikitext2plaintext();
var plaintext;

plaintext = wt.parse('## The Title ##\r\n*List item 1\r\n*List item 2\r\n');

console.log(plaintext);
/*
The Title
- List item 1
- List item 2
*/

wt2pt.exclude_rule_group(rule_group_name)

Causes the specified rule group (see table below) to be excluded. All rules within that rule group will NOT be applied when the parse function is called.

  • rule_group_name (string) - The name of the rule group to exclude during parsing.
  • Return value (n/a) - No value returned from this function

wt2pt.include_rule_group(rule_group_name)

Causes the specified rule group (see table below) to be included. All rules within that rule group will be applied when the parse function is called.

  • rule_group_name (string) - The name of the rule group to include during parsing.
  • Return value (n/a) - No value returned from this function

wt2pt.exclude_rule(rule_name)

Causes the specified rule group (see table below) to be excluded. The specified rule will NOT be applied when the parse function is called.

  • rule_name (string) - The name of the rule to exclude during parsing.
  • Return value (n/a) - No value returned from this function

wt2pt.include_rule(rule_name)

Causes the specified rule group (see table below) to be included. The specified rule will be applied when the parse function is called.

  • rule_name (string) - The name of the rule to include during parsing.
  • Return value (n/a) - No value returned from this function

wt2pt.repeat_rule_group(rule_group_name, repeat_count)

Calling this function causes the rule group (rule_group_name) to be applied multiple times (repeat_count). This is done in order to handle nested markdown.

  • rule_group_name (string) - The name of the rule group to repeat.
  • repeat_count (number) - The number of times the rule group should be repeated (between 1 and 1000)
  • Return value (n/a) - No value returned from this function

Rules & Rule Groups

All rules in bold are run by default.

|Rule Name|Rule Group|Description| |-----|-----|-----| |BOLD_TAGS|N/A|Removes any bold tags (leaves text)| |HEADER_TAGS|N/A|Removes any header tags (leaves text)| |WIKI_TABLES_REMOVE|WIKI_TABLES|Removes wiki tables entirely (including removal of text)| |FILE_LINKS|LINKS|Removes media/file references and replaces with the alt description| |LOCAL_LINKS_ALT|LINKS|Replaces local wiki links with their alt link text| |LOCAL_LINKS|LINK|Replaces local links with their name (when no alt text exists)| |EXTERNAL_LINKS_ALT|LINKS|Replaces external links with their alt text| |EXTERNAL_LINKS_REMOVE|LINKS|Removes external links which have no alt text| |EXTERNAL_LINKS_KEEP_URL|LINKS|Replaces external links which have no alt text with the URL| |CATEGORIES_FORMAT|N/A|Replaces a reference to a category with "Category - "| |CATEGORIES_REMOVE|N/A|Remove any category references| |LIST_DEPTH_6|LISTS|Prefix depth 6 list elements with 6 dashes in place of markdown| |LIST_DEPTH_5|LISTS|Prefix depth 5 list elements with 5 dashes in place of markdown| |LIST_DEPTH_4|LISTS|Prefix depth 4 list elements with 4 dashes in place of markdown| |LIST_DEPTH_3|LISTS|Prefix depth 3 list elements with 3 dashes in place of markdown| |LIST_DEPTH_2|LISTS|Prefix depth 2 list elements with 2 dashes in place of markdown| |LIST_DEPTH_1|LISTS|Prefix depth 1 list elements with 1 dashes in place of markdown| |HTML_REF_TAGS|HTML_TAGS|Removes HTML "ref" tags| |HTML_COMMENT_TAGS|HTML_TAGS|Removes HTML "comment" tags| |HTML_MATH_TAGS|HTML_TAGS|Removes HTML "math" tags| |HTML_SUB_TAGS|HTML_TAGS|Removes HTML "sub" tags| |HTML_SUP_TAGS|HTML_TAGS|Removes HTML "sup" tags| |HTML_BLOCKQUOTE_TAGS|HTML_TAGS|Removes HTML "blockquote" tags| |CITE_TITLE|DBL_CURLY_TAGS|Replaces Wikipedia "cite" templates with the title of the cite| |CITATION_TITLE_1|DBL_CURLY_TAGS|Replaces Wikipedia "citation" templates with the title and publisher| |CITATION_TITLE_2|DBL_CURLY_TAGS|Removes Wikipedia "citation" templates with the title and publisher (reverse)| |ISBN_FORMAT|DBL_CURLY_TAGS|Replaces ISBN templates with the ISBN number| |IMDB_STATIC|DBL_CURLY_TAGS|Replaces IMDB templates with static text: "IMDB Reference"| |DMOZ_FORMAT|DBL_CURLY_TAGS|Replaces DMOZ templates with the name of the DMOZ reference| |OFFICIAL_WEB_STATIC|DBL_CURLY_TAGS|Replaces official website links with static text: "Official Website"| |CITE_REMOVE|DBL_CURLY_TAGS|Removes all cite templates| |CURLY_OTHER|DBL_CURLY_TAGS|Removes all templates/content enclosed in double curly brackets| |REPEATED_BLANK_LINES_REMOVE|N/A|Removes repeated blank lines which get created when removing markdown|