npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@syonfox/gpt-3-encoder

v1.4.0-rc5

Published

Javascript BPE Encoder Decoder for GPT-2 / GPT-3. The "gpt-3-encoder" module provides functions for encoding and decoding text using the Byte Pair Encoding (BPE) algorithm. It can be used to process text data for input into machine learning models, or to

Downloads

400

Readme

GPT-3-Encoder

Javascript library for encoding and decoding text using Byte Pair Encoding (BPE), as used in GPT-2 and GPT-3 models by OpenAI. This is a fork of the original python implementation by OpenAI, which can be found here.

This fork includes additional features such as the countTokens and tokenStats functions, as well as updated documentation.

Installation

To install with npm:

npm install gpt-3-encoder

Usage

JSDocs

Also check out the browser demo browser demo

GitHub last commit example workflow github

Compatible with Node >= 12

To use the library in your project, import it as follows:

const GPT3Encoder = require('gpt-3-encoder');

Additional Features

In addition to the original encoding and decoding functions, this fork includes the following additional features: countTokens(text: string): number

This function returns the number of tokens in the provided text, after encoding it using BPE. tokenStats(text: string): object

This function returns an object containing statistics about the tokens in the provided text, after encoding it using BPE. The returned object includes the following properties:

  • count: the total number of tokens in the text.
  • unique: the number of unique tokens in the text.
  • frequencies: an object containing the frequency of each token in the text.
  • postions: an object mapping tokens to positions in the encoded string
  • tokens: same as the output to tokens Compatibility

This library is compatible with both Node.js and browser environments, we have used webpack to build /dist/bundle.js 1.5 MB including the data. A compiled version for both environments is included in the package. Credits

This library was created as a fork of the original GPT-3-Encoder library by latitudegames.

Example

See browser.html and demo.js Note you may need to include it from the appropriate place in node modules / npm package name


import {encode, decode, countTokens, tokenStats} from "gpt-3-encoder"
//or note you might need @syonfox/gpt-3-encoder if thats what you npm install
const {encode, decode, countTokens, tokenStats} = require('gpt-3-encoder')

const str = 'This is an example sentence to try encoding out on!'
const encoded = encode(str)
console.log('Encoded this string looks like: ', encoded)

console.log('We can look at each token and what it represents')
for (let token of encoded) {
    console.log({token, string: decode([token])})
}

//example count tokens usage
if (countTokens(str) > 5) {
    console.log("String is over five tokens, inconcevable");
}

const decoded = decode(encoded)
console.log('We can decode it back into:\n', decoded)

Developers

I have added som other examples to the examples folder. Please take a look at package.json for how to do stuff

git clone https://github.com/syonfox/GPT-3-Encoder.git

cd GPT-3-Encoder

npm install # install dev deps (docs tests build)

npm run test # run tests
npm run docs # build docs

npm run build # builds it for the browser
npm run browser # launches demo inf firefox
npm run demo # runs node.js demo


less Encoder.js # the main code is here

firefox ./docs/index.html # view docs locally

npm publish --access public # dev publish to npm


Performance

Built bpe_ranks in 100 ms

// using js loading (probably before cache) Loaded encoder in 121 ms Loaded bpe_ranks in 91 ms

// using fs loading Loaded encoder in 32 ms Loaded bpe_ranks in 44 ms

//back to js loading Loaded encoder in 35 ms Loaded bpe_ranks in 40 ms

todo

More stats that work well with this token representation.

Clean up and keep it simple.

Here are some additional suggestions for improving the GPT-3 Encoder:

  • Add more unit tests to ensure the correctness and reliability of the code. This can be particularly important for the encode and decode functions, which are the main functions of the encoder.
  • Add more documentation and examples to help users understand how to use the encoder and integrate it into their own projects. This could include additional JSDoc comments, as well as additional documentation in the README file and/or GitHub Pages.
  • Consider adding support for other languages and character sets. Currently, the encoder only supports ASCII characters, but there may be a demand for support for other languages and character sets.
  • Explore potential optimizations and performance improvements for the encode and decode functions. Some ideas might include using faster data structures (such as a hash map or a trie), implementing more efficient algorithms, or using multi-threading or web workers to take advantage of multiple cores or processors.
  • Consider adding support for other models or use cases. For example, you could add support for other OpenAI models ( such as GPT-2 or GPT-3) or for other applications of BPE encoding (such as machine translation or natural language processing).