npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nlptools/nlptools

v0.0.2

Published

Main NLPTools package - Complete suite of NLP algorithms, text distance, similarity, splitting, and tokenization utilities

Readme

@nlptools/nlptools

npm version npm downloads npm license Contributor Covenant

Main NLPTools package - Complete suite of NLP algorithms and utilities

This is the main NLPTools package (@nlptools/nlptools) that exports all algorithms and utilities from the entire toolkit. It provides a single entry point to access all string distance, similarity algorithms, text splitting, and tokenization utilities.

Features

  • 🎯 All-in-One: Complete access to all NLPTools algorithms
  • 📦 Convenient: Single import for all functionality
  • ✂️ Text Splitting: Document chunking and text processing utilities
  • 🪙 Tokenization: Fast text encoding and decoding for LLM models
  • 📏 Distance & Similarity: Comprehensive string comparison algorithms
  • 🚀 Performance Optimized: Automatically uses the fastest implementations available
  • 📝 TypeScript First: Full type safety with comprehensive API
  • 🔧 Easy to Use: Consistent API across all algorithms

Installation

# Install with npm
npm install @nlptools/nlptools

# Install with yarn
yarn add @nlptools/nlptools

# Install with pnpm
pnpm add @nlptools/nlptools

Usage

Basic Setup

import * as nlptools from "@nlptools/nlptools";

// All algorithms are available as named functions
console.log(nlptools.levenshtein("kitten", "sitting")); // 3
console.log(nlptools.jaro("hello", "hallo")); // 0.8666666666666667
console.log(nlptools.cosine("abc", "bcd")); // 0.6666666666666666

Distance vs Similarity

Most algorithms have both distance and normalized versions:

// Distance algorithms (lower is more similar)
const distance = nlptools.levenshtein("cat", "bat"); // 1

// Similarity algorithms (higher is more similar, 0-1 range)
const similarity = nlptools.levenshtein_normalized("cat", "bat"); // 0.6666666666666666

Text Splitting

This package includes text splitters from @nlptools/splitter:

import { RecursiveCharacterTextSplitter } from "@nlptools/nlptools";

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const text = "Your long document text here...";
const chunks = await splitter.splitText(text);
console.log(chunks);

Tokenization

This package includes tokenization utilities from @nlptools/tokenizer:

import { Tokenizer } from "@nlptools/nlptools";

// Load tokenizer from HuggingFace Hub
const modelId = "HuggingFaceTB/SmolLM3-3B";
const tokenizerJson = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer.json`,
).then((res) => res.json());
const tokenizerConfig = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer_config.json`,
).then((res) => res.json());

const tokenizer = new Tokenizer(tokenizerJson, tokenizerConfig);

// Encode text
const encoded = tokenizer.encode("Hello World");
console.log(encoded.ids); // [9906, 4435]
console.log(encoded.tokens); // ['Hello', 'ĠWorld']

// Get token count
const tokenCount = tokenizer.encode("This is a sentence.").ids.length;
console.log(`Token count: ${tokenCount}`);

Available Algorithm Categories

This package includes all algorithms from @nlptools/distance, @nlptools/splitter, and @nlptools/tokenizer:

Edit Distance Algorithms

  • levenshtein - Classic edit distance
  • fastest_levenshtein - High-performance Levenshtein distance
  • damerau_levenshtein - Edit distance with transpositions
  • myers_levenshtein - Myers bit-parallel algorithm
  • jaro - Jaro similarity
  • jarowinkler - Jaro-Winkler similarity
  • hamming - Hamming distance for equal-length strings
  • sift4_simple - SIFT4 algorithm

Sequence-based Algorithms

  • lcs_seq - Longest common subsequence
  • lcs_str - Longest common substring
  • ratcliff_obershelp - Gestalt pattern matching
  • smith_waterman - Local sequence alignment

Token-based Algorithms

  • jaccard - Jaccard similarity
  • cosine - Cosine similarity
  • sorensen - Sørensen-Dice coefficient
  • tversky - Tversky index
  • overlap - Overlap coefficient

Bigram Algorithms

  • jaccard_bigram - Jaccard similarity on character bigrams
  • cosine_bigram - Cosine similarity on character bigrams

Naive Algorithms

  • prefix - Prefix similarity
  • suffix - Suffix similarity
  • length - Length-based similarity

Text Splitters

  • RecursiveCharacterTextSplitter - Splits text recursively using different separators
  • CharacterTextSplitter - Splits text by character count
  • MarkdownTextSplitter - Specialized splitter for Markdown documents
  • TokenTextSplitter - Splits text by token count
  • LatexTextSplitter - Specialized splitter for LaTeX documents

Tokenization Utilities

  • Tokenizer - Main tokenizer class for encoding and decoding text
  • encode() - Convert text to token IDs and tokens
  • decode() - Convert token IDs back to text
  • tokenize() - Split text into token strings
  • AddedToken - Custom token configuration class

Universal Compare Function

const result = nlptools.compare("hello", "hallo", "jaro");
console.log(result); // 0.8666666666666667

Performance

The package automatically selects the fastest implementation available:

  • WebAssembly algorithms: 10-100x faster than pure JavaScript
  • High-performance implementations: Including fastest-levenshtein for optimal speed

License