npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@goblinalchemist/nodejieba

v0.0.27

Published

chinese word segmentation for node

Downloads

60

Readme

Build Status Financial Contributors on Open Collective Author Platform Performance License NpmDownload Status NPM Version Code Climate Coverage Status


NodeJieba "结巴"分词的 Node.js 版本 English

介绍

NodeJieba是"结巴"中文分词的 Node.js 版本实现, 由CppJieba提供底层分词算法实现, 是兼具高性能和易用性两者的 Node.js 中文分词组件。

特点

  • 词典载入方式灵活,无需配置词典路径也可使用,需要定制自己的词典路径时也可灵活定制。
  • 底层算法由 C++ 实现,高性能。
  • 支持多种分词算法,各种分词算法见 CppJieba 的 README.md 介绍。
  • 支持动态补充词库。

对实现细节感兴趣的请看如下博文:

下载

npm install nodejieba

由于默认源速度很慢并且 GitHub 访问不稳定,可以使用国内镜像,命令如下:

npm install nodejieba --registry=https://registry.npmmirror.com --nodejieba_binary_host_mirror=https://registry.npmmirror.com/-/binary/nodejieba/

用法

import { cut } from "nodejieba";

const result = cut("南京市长江大桥");
console.log(result);
//["南京市","长江大桥"]

更详细的其他用法请看 测试案例

配置词典载入

如果没有主动调用词典函数时,则会在第一次调用 cut 等功能函数时,自动载入默认词典。

如果要主动触发词典载入,则使用以下函数主动触发。

import { load } from "nodejieba";

load();

以上用法会自动载入所有默认词典。

如果需要载入自己的词典,而不是默认词典,你需要传递参数。

比如载入自己的用户词典:

import { load } from "nodejieba";

load({
  userDict: "./test/testdata/userdict.utf8",
});

字典载入函数 load 的参数项都是可选的,如果没有对应的项则自动填充默认参数。 所以上面这段代码和下面这代代码是等价的。

import {
  DEFAULT_DICT,
  DEFAULT_HMM_DICT,
  DEFAULT_IDF_DICT,
  DEFAULT_STOP_WORD_DICT,
  load,
} from "nodejieba";

load({
  dict: DEFAULT_DICT,
  hmmDict: DEFAULT_HMM_DICT,
  userDict: "./test/testdata/userdict.utf8",
  idfDict: DEFAULT_IDF_DICT,
  stopWordDict: DEFAULT_STOP_WORD_DICT,
});

词典说明

  • dict: 主词典,带权重和词性标签,建议使用默认词典。
  • hmmDict: 隐式马尔科夫模型,建议使用默认词典。
  • userDict: 用户词典,建议自己根据需要定制。
  • idfDict: 关键词抽取所需的 idf 信息。
  • stopWordDict: 关键词抽取所需的停用词列表。

词性标注

import { tag } from "nodejieba";

console.log(tag("红掌拨清波"));
//[ { word: '红掌', tag: 'n' },
//  { word: '拨', tag: 'v' },
//  { word: '清波', tag: 'n' } ]

更详细的其他用法请看 测试案例

关键词抽取

import { extract, textRankExtract } from "nodejieba";

const topN = 4;

console.log(extract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: 'CEO', weight: 11.739204307083542 },
//  { word: '升职', weight: 10.8561552143 },
//  { word: '加薪', weight: 10.642581114 },
//  { word: '巅峰', weight: 9.49395840471 } ]

console.log(textRankExtract("升职加薪,当上CEO,走上人生巅峰。", topN));
//[ { word: '当上', weight: 1 },
//  { word: '不用', weight: 0.9898479330698993 },
//  { word: '多久', weight: 0.9851260595435759 },
//  { word: '加薪', weight: 0.9830464899847804 },
//  { word: '升职', weight: 0.9802777682279076 } ]

更详细的其他用法请看 test/demo.js

Node.js 支持

  • v16
  • v18
  • v20

应用

相关项目

性能评测

性能杠杠的,应该是目前性能最好的 Node.js 中文分词库,没有之一。 详见: Jieba 中文分词系列性能评测

在线演示

http://cppjieba-webdemo.herokuapp.com/ (chrome is suggested)

NodeJieba 在 Windows 上面的一系列安装问题和斗争成功过程参考

客服

Email: [email protected]

作者

Contributors

Code Contributors

This project exists thanks to all the people who contribute.

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. Contribute