npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

brill-pos-tagger

v0.0.4

Published

Part of speech tagger based on Eric Brill's algorithm

Downloads

4

Readme

Brill's POS Tagger

Installation

npm install brill-pos-tagger

Usage

var Tagger = require("./lib/brill_pos_tagger");

var base_folder = "/home/hugo/workspace/brill-pos-tagger";
var rules_file = base_folder + "/data/tr_from_pos.txt";
var lexicon_file = base_folder + "/data/lexicon.json";
var default_category = 'N';

var tagger = new Tagger(lexicon_file, rules_file, default_category, function(error) {
  if (error) {
    console.log(error);
  }
  else {
    var sentence = ["I", "see", "the", "man", "with", "the", "telescope"];
    console.log(JSON.stringify(tagger.tag(sentence)));
  }
});

Lexicon

The lexicon is either a JSON file that has the following structure:

{
  "word1": ["cat1"],
  "word2": ["cat2", "cat3"],
  ...
}

or a text file:

word1 cat1 cat2
word2 cat3
...

Words may have multiple categories in the lexicon file. The tagger uses only the first one.

Specifying transformation rules

Transformation rules are specified as follows:

OLD_CAT NEW_CAT PREDICATE PARAMETER

This means that if the predicate is true that if the category of the current position is OLD_CAT, the category is replaced by NEW_CAT. The predicate may use the parameter in distinct ways: sometimes the parameter is used for specifying the outcome of the predicate:

NN CD CURRENT-WORD-IS-NUMBER YES

This means that if the outcome of CURRENT-WORD-IS-NUMBER is YES, the category is replaced by CD The parameter can also be used to check the category of a word in the sentence:

VBD NN PREV-TAG DT

Here the category of the previous word must be DT for the rule to be applied.

Algorithm

The tagger applies transformation rules that may change the category of words. The input sentence must be split into words which are assigned with categories. The tagged sentence is then processed from left to right. At each step all rules are applied once; rules are applied in the order in which they are specified. Algorithm:

function(sentence) {
  var tagged_sentence = new Array(sentence.length);

  // snip

  // Apply transformation rules
  for (var i = 0, size = sentence.length; i < size; i++) {
    this.transformation_rules.forEach(function(rule) {
      rule.apply(tagged_sentence, i);
    });
  }
  return(tagged_sentence);
}

Adding a predicate

Predicates are defined in module lib/Predicate.js. In that file a function must be created that serves as predicate. A predicate accepts a tagged sentence, the current position in the sentence that is being tagged, and the outcome(s) of the predicate. An example of a predicate that checks the category of the current word:

function current_word_is_tag(tagged_sentence, i, parameter) {
  return(tagged_sentence[i][0] === parameter);
}

Some predicates accept two parameters. Next step is to map a keyword to this predicate so that it can be used in the transformation rules. The mapping is also defined in the grammar file:

var predicates = {
  "CURRENT-WORD-IS-TAG": current_word_is_tag,
  "PREV-WORD-IS-CAP": prev_word_is_cap
}

Acknowledgements/references

  • Part of speech tagger by Percy Wegmann, https://code.google.com/p/jspos/
  • Node.js version of jspos: https://github.com/neopunisher/pos-js
  • A simple rule-based part of speech tagger, Eric Brill, Published in: Proceeding ANLC '92 Proceedings of the third conference on Applied natural language processing, Pages 152-155. http://dl.acm.org/citation.cfm?id=974526