@urschrei/ckmeans
v1.0.5
Published
A Rust implementation of Wang and Song's Ckmeans clustering algorithm
Downloads
9
Readme
Ckmeans
Ckmeans clustering is an improvement on 1-dimensional (univariate) heuristic-based clustering approaches such as Jenks. The algorithm was developed by Haizhou Wang and Mingzhou Song (2011) as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.
Minimizing the difference within groups – what Wang & Song refer to as withinss
, or within sum-of-squares – means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups that emphasize differences between data.
Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.
Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided. It does provide the roundbreaks
method to produce nclusters - 1
breaks for labelling, however.
How To Use
Browser as ES Module
// preliminary ritual
import _initCkmeansWasm, {ckmeans_wasm, roundbreaks_wasm } from "@urschrei/ckmeans";
const CKMEANS_WASM_VERSION = "1.0.5";
const CKMEANS_WASM_CDN_URL = `https://cdn.jsdelivr.net/npm/@urschrei/ckmeans@${CKMEANS_WASM_VERSION}/ckmeans_bg.wasm`;
let WASM_READY = false;
export async function initCkmeansWasm() {
if (WASM_READY) {
return;
}
await _initCkmeansWasm(CKMEANS_WASM_CDN_URL);
console.log(`got wasm from ${CKMEANS_WASM_CDN_URL}`);
WASM_READY = true;
}
await initCkmeansWasm();
// Now let's calculate some clusters and breaks
let data = [3.0, 12.0, 13.0, 14.0, 15.0, 16.0, 2.0, 2.0, 3.0,
5.0, 7.0, 1.0, 2.0, 5.0, 7.0,
1.0, 5.0, 82.0, 1.0, 1.3, 1.1, 78.0]
let nclusters = 3;
try {
let clusters = wasm.ckmeans_wasm(data, nclusters);
// [
// [1.0, 1.0, 1.0, 1.0, 1.1, 1.3, 2.0, 2.0, 2.0, 3.0, 5.0,
// 5.0, 5.0, 7.0, 7.0],
// [12., 13., 14., 15., 16.],
// [78., 82.]
// ]
console.info(clusters);
} catch (error) {
console.error("Error:", error);
}
try {
let breaks = wasm.roundbreaks_wasm(data, nclusters);
// [9.0, 40.0]
console.info(breaks);
} catch (error) {
console.error("Error:", error);
}
Observable
ckmeans_wasm = {
const wasm_module = await import(
'https://unpkg.com/@urschrei/[email protected]/ckmeans.js'
);
await wasm_module.default();
return wasm_module.ckmeans_wasm;
}
Perf
100k floats into 5 clusters in ~38 ms.