string-similarity-algorithm
v1.1.0
Published
A lib to compare similarity of two strings
Downloads
12
Maintainers
Readme
string-similarity-algorithm
A set of string similarity algorithm implementations.
[TOC]
Install
npm i string-similarity-algorithm --save
Usage
import similarity from 'string-similarity-algorithm'
const x = '赵丽颖否认产子'
const y = '赵丽颖极力否认生子'
const lcsScore = similarity(x, y, 'lcs') // 0.75
const levenshteinScore = similarity(x, y, 'levenshtein') // 0.6666666666666667
API
similarity (x: string, y: string, type: Type = 'lcs', options?: SimhashOptions): SimilarityReturn
Calculate the similarity of string x
and string y
.
type Type = 'lcs' | 'levenshtein' | 'simhash'
type SimilarityReturn = number | SimhashSimilarityReturn
- type
- lcs: use function
lcs
. - levenshtein: use function
levenshtein
. - simhash: use function
simhashSimilarity
.
- lcs: use function
lcs (x: string, y: string): number
Calculates the similarity between strings x and y using longest common subsequence
.
levenshtein (x: string, y: string): number
Calculates the similarity between strings x and y using levenshtein distance
(edit distance
).
lcslen (x: string, y: string): number
Return the longest common subsequence length of string x
and string y
.
levenshteinDistance (x: string, y: string): number
Return the edit distance of string x
and string y
.
simhash (s: string, options: SimhashOptions = {}): number
Return simhash of string s
.
interface SimhashOptions {
hashType?: HashType, // default is hashlittle
kshinglesN?: number // default is 3
}
hammingDistance (x: number, y: number): number
Return hamming distance of x and y.
hammingWeight (x: number): number
Return hamming weight(number of 1 bits) of x.
simhashSimilarity (x: string, y: string, options: SimhashOptions = {}): SimhashSimilarityReturn
Calculates the similarity between strings x and y using simhash
, hamming distance
and hammingWeight
).
interface SimhashSimilarityReturn {
score: number, // similar score of x and y, scope: [0, 1]
hammingDistance: number // hamming distance of x and y
}
Others
If strings x and y is short(eg x and y are doc titles), the best is lcs
, worst is simhash
.