near-duplicates

v0.2.0

Published

15 days ago

A TypeScript npm package for finding near duplicate string pairs

Downloads

326

0High
0Medium
0Low

jimexist

duplicates ukkonen string-deduplication levenshtein

near-duplicates

NPM Version

A TypeScript npm package to find near duplicate pairs in a string set.

Usage

npm install near-duplicates

import { findNearDuplicates } from "near-duplicates";

findNearDuplicates(["hello", "hallo", "halo"]);

// you can also specify a threshold for the maximum distance
// if not specified, it will be a default value of 12
findNearDuplicates(["hello", "hallo", "halo"], 2);

Benchmarks

Here's a benchmark of the performance of findNearDuplicates using the default max distance of 12.

 DEV  v2.1.4 near-duplicates

 ✓ test/index.bench.ts (5) 5211ms
   ✓ findNearDuplicates (5) 5210ms
     name                                       hz     min      max     mean      p75      p99     p995     p999     rme  samples
   · small strings                      214,439.85  0.0043   0.2387   0.0047   0.0046   0.0053   0.0071   0.0592  ±0.38%   107220   fastest
   · 10 random strings (20-500 chars)    29,533.57  0.0148   0.2318   0.0339   0.0369   0.0557   0.1355   0.1764  ±0.58%    14767
   · 100 random strings (20-500 chars)    1,469.32  0.5852   1.3275   0.6806   0.6818   1.1802   1.1990   1.3275  ±1.05%      735
   · 500 random strings (20-500 chars)     98.7281  9.1328  11.3120  10.1288  10.8487  11.3120  11.3120  11.3120  ±2.03%       50
   · 2k random strings (20-500 chars)       6.2139  155.68   173.60   160.93   162.18   173.60   173.60   173.60  ±2.25%       10   slowest

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

near-duplicates

v0.2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

near-duplicates

Usage

Benchmarks