npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@datagrok/nlp

v1.1.0

Published

Natural Language Processing Package

Downloads

10

Readme

NLP

NLP is a Datagrok package for natural language processing. The package provides integration with AWS Translate, a neural machine translation service, and extends Datagrok with info panels for text files.

Natural Language Processing, or NLP for short, is a branch of artificial intelligence that builds a bridge between computers and human languages. This field has many applications, including:

Text extraction

It all starts with extracting text. This is a building block for other, more complex tasks. Due to the high demand, it is essential to support as many popular text file formats as possible. The platform comes with a built-in file browser for easy file management. The package extends it by processing text from pdf, doc, docx, odt, and other text formats.

Extract text from PDF

Language identification

Determining the language of a document is an important preprocessing step for many language-related tasks. Automatic language detection may be part of applications that perform machine translation or semantic analysis. Datagrok's language identification is powered by Google's Compact Language Detector v3 (CLD3) and supports over 100 languages. As with text extraction, this functionality is used in the Translation info panel.

Neural machine translation

The package creates a new info panel for text files. It uses AWS Translate service, which supports over 70 languages.

To translate a text, navigate to the file browser and select one of the demo files (see the texts folder). Alternatively, open your personal folder and drag-and-drop your file to the platform. Now, whenever you click on the file, you will see a suggestion to translate it in the context panel on the right.

Translate text files

The language is identified automatically, but you always have a chance to change it manually. The default target language is English, so be sure to choose another option if the original text is in English.

Text statistics

Increasingly often texts are analyzed for readability. Readability scores take into account various parameters: the average number of words per sentence or syllables per word, percentage of long words, etc.

The Text Statistics info panel calculates two common formulas:

Calculate text statistics

Search

The package has search tools for similar texts.

Open table and select a cell of text column. If not specified, set the Text quality in properties of the selected column:

  • Right-click on the column and select Column Properties.... A dialog opens
  • Press + in Tags and add the quality tag with the value Text. Now, a tooltip of the column contains quality: Text

Add quality text

Select any cell of the column and expand Similar in Context Panel. You will get a set of similar elements of the column. Search results are separated with a line, and common words are in bold:

Similar panel

Explore the obtained search results in the Similar panel:

  • Click to navigate directly to the grid cell containing the text of interest
  • Right-click to add a word to filters

Navigate and filters

Videos

User Meeting 9: Natural Language Processing

Developer notes

The package demonstrates two ways of developing info panels for Datagrok: with panel scripts and with JavaScript panel functions.

To write a panel script in any of the languages supported by the platform, you should indicate the panel tag and specify conditions for the panel to be shown (in the condition header parameter):

# name: language detection
# language: python
# input: file file {semtype: text} [a text to analyze]
# output: string language {semtype: lang} [detected language]
# tags: nlp, panel
# condition: file.isfile && file.size < 1e6 && supportedext(file.name)

The scripts folder contains more examples of such panel scripts, which are written in Python and work specifically on text files.

A different approach is used to add an info panel from a JavaScript file. The panel function should be properly annotated to return a widget. A simplified example is shown below:

//name: Translation
//tags: panel, widgets
//input: file textfile
//output: widget result
//condition: isTextFile(textfile)
export function translationPanel(textfile) {
    return new DG.Widget(ui.divText("Lost in Translation"));
}

Refer to src/package.js to see the panel's complete code.

See also: