react-native-tokenizers
v0.1.9
Published
react native library for hf tokenizers
Downloads
739
Readme
react-native-tokenizers
A react native turbo module library for using the tokenizers library from hugging face.
Install
react-native-tokenizers
only supports react native 76 and above.
yarn add react-native-tokenizers
pnpm add react-native-tokenizers
npm install react-native-tokenizers
Supported tokenizers
distilgpt2
-new DistilGpt2Tokenizer()
bert-base-cased
-new BertBaseCased()
bert-base-uncased
-new BertBaseUncased()
mosaic-bert-base-uncased
-new MosaicBertBaseUncased()
You can also create a tokenizer with your own dictionary.
const tokenizer = new CustomTokenizer(dictionary);
Another way to create a tokenizer is to download the dictionary from the hugging face hub.
const tokenizer = await PreTrainedTokenizer.load('bert-large-cased');
Usage
import { Token, BertBaseUncased } from 'react-native-tokenizers';
// Create a new instance of the tokenizer
const tokenizer = new BertBaseUncased();
// Tokenize a string into tokens
const tokens: Token[] = tokenizer.tokenize("Hello there");
for (const token of tokens) {
console.log("Token ID:", token.id);
console.log("Token string:", token.token);
console.log("Token range:", token.start, token.end);
}
// Tokenize a string into token IDs
const tokenIds: number[] = tokenizer.getIds("Hello there");
console.log("Token IDs:", tokenIds);
// Get token strings for the token IDs
const tokenStrings: string[] = tokenIds
.map(id => tokenizer.idToToken(id))
.filter(token => token !== undefined) as string[];
console.log("Tokens from IDs:", tokenStrings);
// Convert a token string back to its ID
const tokenId = tokenizer.tokenToId("hello");
if (tokenId !== undefined) {
console.log(`The ID for the token "hello" is ${tokenId}`);
} else {
console.log(`The token "hello" does not exist.`);
}
// Tokenize a batch of strings
const batchInputs = ["Hello there", "How are you?"];
const batchTokenIds = tokenizer.tokenizeBatch(batchInputs);
console.log("Batch Token IDs:", batchTokenIds);
// Tokenize a string including special tokens
const tokensWithSpecial: Token[] = tokenizer.tokenizeWithSpecialTokens("Hello there");
for (const token of tokensWithSpecial) {
console.log("Token with Special ID:", token.id);
console.log("Token with Special string:", token.token);
console.log("Token with Special range:", token.start, token.end);
}
Checkout the example for more App.tsx.