string-similarity-algorithm

v1.1.0

Published

3 years ago

A lib to compare similarity of two strings

Downloads

0High
0Medium
0Low

creep

string-similarity-algorithm

A set of string similarity algorithm implementations.

[TOC]

Install

npm i string-similarity-algorithm --save

Usage

import similarity from 'string-similarity-algorithm'

const x = '赵丽颖否认产子'
const y = '赵丽颖极力否认生子'

const lcsScore = similarity(x, y, 'lcs') // 0.75
const levenshteinScore = similarity(x, y, 'levenshtein') // 0.6666666666666667

API

similarity (x: string, y: string, type: Type = 'lcs', options?: SimhashOptions): SimilarityReturn

Calculate the similarity of string x and string y.

type Type = 'lcs' | 'levenshtein' | 'simhash'
type SimilarityReturn = number | SimhashSimilarityReturn

type
- lcs: use function lcs.
- levenshtein: use function levenshtein.
- simhash: use function simhashSimilarity.

lcs (x: string, y: string): number

Calculates the similarity between strings x and y using longest common subsequence.

levenshtein (x: string, y: string): number

Calculates the similarity between strings x and y using levenshtein distance(edit distance).

lcslen (x: string, y: string): number

Return the longest common subsequence length of string x and string y.

levenshteinDistance (x: string, y: string): number

Return the edit distance of string x and string y.

simhash (s: string, options: SimhashOptions = {}): number

Return simhash of string s.

interface SimhashOptions {
  hashType?: HashType, // default is hashlittle
  kshinglesN?: number  // default is 3
}

hammingDistance (x: number, y: number): number

Return hamming distance of x and y.

hammingWeight (x: number): number

Return hamming weight(number of 1 bits) of x.

simhashSimilarity (x: string, y: string, options: SimhashOptions = {}): SimhashSimilarityReturn

Calculates the similarity between strings x and y using simhash, hamming distance and hammingWeight).

interface SimhashSimilarityReturn {
  score: number, // similar score of x and y, scope: [0, 1]
  hammingDistance: number // hamming distance of x and y
}

Others

If strings x and y is short(eg x and y are doc titles), the best is lcs, worst is simhash.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

string-similarity-algorithm

Install

Usage

API

similarity (x: string, y: string, type: Type = 'lcs', options?: SimhashOptions): SimilarityReturn

lcs (x: string, y: string): number

levenshtein (x: string, y: string): number

lcslen (x: string, y: string): number

levenshteinDistance (x: string, y: string): number

simhash (s: string, options: SimhashOptions = {}): number

hammingDistance (x: number, y: number): number

hammingWeight (x: number): number

simhashSimilarity (x: string, y: string, options: SimhashOptions = {}): SimhashSimilarityReturn

Others