segmenter
v2.0.1
Published
Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter
Downloads
74
Maintainers
Readme
segmenter
Work with grapheme, words, and sentences with small, simple, and fast API using
Intl.Segmenter
Install
npm install segmenter
Why
Intl.Segmenter
is supported in all major browsers and94%
of users have it available — it's time for adoption.- If you have a use case other than iterating over all graphemes/words/sentences in a text, then
Intl.Segmenter
might be a little hard to work with. - In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji
👨🔧️
is a single grapheme but consists of 6 characters.for
loop will make 6 iterations,for of
looping👨🔧️
will make 4 iterations — it's confusing, just use graphemes. - Before
Intl.Segmenter
, working with graphemes required libraries likegraphemer
that is94KB
in size.
Usage
import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";
graphemeAt("👨🔧️ the fixer", 3); // 👨🔧️
graphemeRangeAt("👨🔧️ the fixer", 3); // { start: 0, end: 6 }
wordAt("hello-world"); // "hello"
wordRangeAt("hello-world"); // { start: 0, end: 5 }
API
Graphemes
graphemeAt(string: string, position: number): string | undefined
Get the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the grapheme at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
graphemes(string: string): string[]
Get all graphemes in the string
as Array
.
Words
wordAt(string: string, position: number): string | undefined
Get the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the word at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
words(string: string): string[]
Get all words in the string
as Array
.
Sentences
Note: Intl.Segmenter
doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office
will be split into two sentences.
sentenceAt(string: string, position: number): string | undefined
Get the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined
Get the start
and end
positions of the sentence at position
in string
. Returns undefined
if position
is out of bounds or string
is empty.
sentences(string: string): string[]
Get all sentences in the string
as Array
.