k-means-plus
v1.0.0
Published
K-means++ clustering a classification of data, so that points assigned to the same cluster are similar (in some sense). It is identical to the K-means algorithm, except for the selection of initial conditions.
Downloads
1
Readme
major-colors
K-means++ clustering a classification of data, so that points assigned to the same cluster are similar (in some sense). It is identical to the K-means algorithm, except for the careful selection of initial conditions. It is a way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. k-means++ variance address this major theoretic shortcoming and guarantee an approximation ratio O(log k) in expectation (over the randomness of the algorithm), where k is the number of clusters used.
Install
npm i k-means-plus
yarn add k-means-plus
Usage
import KMeansCluster from 'k-means-plus';
const cluster = new KMeansCluster({ distance, maximumIterations, convergedFn });
const results = cluster.cluster(vectors, k);
import KMeansCluster from 'k-means-plus';
const data = [
{'company': 'Microsoft' , 'size': 91259, 'revenue': 60420},
{'company': 'IBM' , 'size': 400000, 'revenue': 98787},
{'company': 'Skype' , 'size': 700, 'revenue': 716},
{'company': 'SAP' , 'size': 48000, 'revenue': 11567},
{'company': 'Yahoo!' , 'size': 14000 , 'revenue': 6426 },
{'company': 'eBay' , 'size': 15000, 'revenue': 8700},
];
// Create the data 2D-array (vectors) describing the data
let vectors = [];
for (let i = 0 ; i < data.length ; i++) {
vectors.push([ data[i]['size'] , data[i]['revenue']]);
}
const k = 4;
const cluster = new KMeansCluster();
const {
model: { observations, centroids, assignments },
iterations,
durationMs
} = cluster.cluster(vectors, k);
Constructor
- distance -
(vector1: [number], vector2: [number]) => number
distance function used for clustering between two vectors. - Default: euclidean distance
- maximumIterations -
number
maximum iterations of clustering. - Default: 200
- convergedFn -
(centroids1: [[number]], centroids2: [[number]]) => boolean
determine if two consecutive set of centroids are converged. - Default: the clusters are converged if the cluster assignments are the same.
Inputs
- vectors -
[[number]]
point vectors to cluster - k - number of final clusters
Outputs
type result = {
model: {
observations: [[number]], // the original vectors to cluster
centroids: [[number]], // vectors of the centers of cluster
assignments: [number] // mapping from index of original vector to the index of cluter center it belongs to
},
iterations: number, // number of iterations ran before converging
durationMs: number // the duration of the algorithm
}