kuromojist
v2.0.0
Published
Calculate some costs from kuromoji.js
Downloads
68
Readme
kuromojist
Calculate some costs ( like Mecab, %pw
, %pc
, %pC
) from kuromoji.js (kuromojin)
Attention
This package use Unexposed API
Tokenizer.getLattice
lattice.nodes_end_at
Tokenizer.viterbi_searcher.search
Example
const { analyzeCosts } = require('kuromojist')
analyzeCosts('すもももももももものうち').then( result => {
console.log(result)
})
This result
[
{ word_id: -1, surface_form: '', cost: 0, edge_cost: 0, shortest_cost: 0 },
{ word_id: 415760, surface_form: 'すもも', cost: 7546, edge_cost: -283, shortest_cost: 7263 },
{ word_id: 93220, surface_form: 'も', cost: 4669, edge_cost: -4158, shortest_cost: 7774 },
{ word_id: 1614710, surface_form: 'もも', cost: 7219, edge_cost: 17, shortest_cost: 15010 },
{ word_id: 93220, surface_form: 'も', cost: 4669, edge_cost: -4158, shortest_cost: 15521 },
{ word_id: 1614710, surface_form: 'もも', cost: 7219, edge_cost: 17, shortest_cost: 22757 },
{ word_id: 93100, surface_form: 'の', cost: 4816, edge_cost: -4442, shortest_cost: 23131 },
{ word_id: 62510, surface_form: 'うち', cost: 5796, edge_cost: -5198, shortest_cost: 23729 },
{ word_id: -1, surface_form: '', cost: 0, edge_cost: -2484, shortest_cost: 21245 }
]
Result Object
word_id
- Same as
kuromoji.tokenize
- Same as
surface_form
- Same as
kuromoji.surface_form
- Same as
cost
- Word Cost
- Mecab format:
%pw
edge_cost
- Lattice Edge Cost
- Mecab format:
%pC
shortest_cost
- Minimum connection costs (accumulated)
- Mecab format:
%pc