npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

segment-string

v0.0.8

Published

A lightweight wrapper around Intl.Segmenter for segment-aware string operations

Downloads

605

Readme


Key Features

  • Intuitive Intl.Segmenter Wrapper: Simplifies text segmentation with a clean API.
  • Standards-Based: Built on native Intl.Segmenter for robust compatibility.
  • Lightweight & Tree-Shakeable: Minimal footprint with optimal bundling (836B minified + gzipped).
  • Highly Performant: Uses iterators for efficient, on-demand processing.
  • Full TypeScript Support: Strict types for safe, predictable usage.

Installation

npm install segment-string

Getting Started

segment-string is a lightweight wrapper for Intl.Segmenter, designed to simplify locale-sensitive text segmentation in JavaScript and TypeScript. It lets you easily segment and manipulate text by graphemes, words, or sentences, ideal for handling complex cases like multi-character emojis or language-specific boundaries.

import { SegmentString } from "segment-string";

const str = new SegmentString("Hello, world! 👩‍👩‍👧‍👦🌍🌈");

// Segment by grapheme
console.log([...str.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', '👩‍👩‍👧‍👦', '🌍', '🌈']

SegmentString Class

The SegmentString class encapsulates a string and provides methods for segmentation, counting, and retrieving segments at specified indices with locale and granularity options.

Constructor

new SegmentString(str: string, locales?: Intl.LocalesArgument);
  • str: The string to segment.
  • locales: Optional locales argument for segmentation.

Methods

segments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Iterable<string>

Segments the string by the specified granularity and returns the segments as strings.

rawSegments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.Segments | Iterable<Intl.SegmentData>

Returns raw Intl.SegmentData objects based on granularity and options.

segmentCount(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): number

Counts segments in the string based on the specified granularity.

segmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): string | undefined

Retrieves the segment at a specific index, supporting negative indices.

rawSegmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.SegmentData | undefined

Returns the raw segment data at a specific index, supporting negative indices.

graphemes(options?: SegmentationOptions): Iterable<string>

Returns an iterable of grapheme segments as strings.

rawGraphemes(options?: SegmentationOptions): Iterable<Intl.SegmentData>

Returns an iterable of raw grapheme segments.

graphemeCount(options?: SegmentationOptions): number

Counts grapheme segments in the string.

graphemeAt(index: number, options?: SegmentationOptions): string | undefined

Returns the grapheme at a specific index, supporting negative indices.

rawGraphemeAt(index: number, options?: SegmentationOptions): Intl.SegmentData | undefined

Returns the raw grapheme data at a specific index, supporting negative indices.

words(options?: WordSegmentationOptions): Iterable<string>

Returns an iterable of word segments, with optional filtering for word-like segments.

rawWords(options?: WordSegmentationOptions): Iterable<Intl.SegmentData>

Returns an iterable of raw word segments, with optional filtering for word-like segments.

wordCount(options?: WordSegmentationOptions): number

Counts word segments in the string.

wordAt(index: number, options?: WordSegmentationOptions): string | undefined

Returns the word at a specific index, supporting negative indices.

rawWordAt(index: number, options?: WordSegmentationOptions): Intl.SegmentData | undefined

Returns the raw word data at a specific index, supporting negative indices.

sentences(options?: SegmentationOptions): Iterable<string>

Returns an iterable of sentence segments.

rawSentences(options?: SegmentationOptions): Iterable<Intl.SegmentData>

Returns an iterable of raw sentence segments.

sentenceCount(options?: SegmentationOptions): number

Counts sentence segments in the string.

sentenceAt(index: number, options?: SegmentationOptions): string | undefined

Returns the sentence at a specific index, supporting negative indices.

rawSentenceAt(index: number, options?: SegmentationOptions): Intl.SegmentData | undefined

Returns the raw sentence data at a specific index, supporting negative indices.

[Symbol.iterator](): Iterator<string>

Returns an iterator over the graphemes of the string.


Example Usage

import { SegmentString } from "segment-string";

const text = new SegmentString("Hello, world! 👩‍👩‍👧‍👦🌍🌈");

// Segmenting by words
for (const word of text.words()) {
	console.log(word); // 'Hello', ',', ' ', 'world', '!', ' 👩‍👩‍👧‍👦🌍🌈'
}

// Segmenting graphemes and counting
console.log([...text.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', '👩‍👩‍👧‍👦', '🌍', '🌈']
console.log("Grapheme count:", text.graphemeCount()); // 17
console.log("String length:", text.toString().length); // 29

// Accessing a specific word
const secondWord = text.wordAt(1, { isWordLike: true }); // 'world'
console.log(secondWord);

SegmentSplitter Class

Alternatively, the SegmentSplitter class allows you to create an instance that can be directly used with JavaScript's String.prototype.split method for basic segmentation.

Constructor

new SegmentSplitter<T extends Granularity>(granularity: T, options?: SegmentationOptions<T>);
  • granularity: Specifies the segmentation granularity level ('grapheme', 'word', 'sentence', etc.).
  • options: Optional settings to customize the segmentation for the given granularity.

Example Usage

const str = "Hello, world!";
const wordSplitter = new SegmentSplitter("word", { isWordLike: true });
const words = str.split(wordSplitter);
console.log(words); // ["Hello", "world"]

Individual Functions

getRawSegments

function getRawSegments(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Intl.Segments | Iterable<Intl.SegmentData>;
  • Description: Returns raw Intl.SegmentData objects based on granularity and options.
  • Parameters:
    • str: The string to segment.
    • granularity: Specifies the segmentation level ('grapheme', 'word', or 'sentence').
    • options: Includes locales for specifying locale and isWordLike for filtering word-like segments.
  • Returns: An iterable of raw Intl.SegmentData.

getSegments

function getSegments(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Iterable<string>;
  • Description: Returns segments of the string as plain strings.
  • Parameters: Similar to getRawSegments.
  • Returns: An iterable of segments as strings.

segmentCount

function segmentCount(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): number;
  • Description: Returns the count of segments based on granularity and options.
  • Parameters: Similar to getRawSegments.
  • Returns: Number of segments.

rawSegmentAt

function rawSegmentAt(
	str: string,
	index: number,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Intl.SegmentData | undefined;
  • Description: Returns the raw segment data at a specified index, supporting negative indices.
  • Parameters: Similar to getRawSegments, plus an index parameter.
  • Returns: The Intl.SegmentData at the specified index, or undefined if out of bounds.

segmentAt

function segmentAt(
	str: string,
	index: number,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): string | undefined;
  • Description: Returns the segment at a specified index, supporting negative indices.
  • Parameters: Similar to getRawSegments, plus an index parameter.
  • Returns: The segment at the specified index or undefined if out of bounds.

filterRawWordLikeSegments

function filterRawWordLikeSegments(
	segments: Intl.Segments,
): Iterable<Intl.SegmentData>;
  • Description: Filters and returns an iterable of raw word-like segment data where isWordLike is true.
  • Parameters:
    • segments: The segments to filter.
  • Returns: An iterable of Intl.SegmentData for each word-like segment.

filterWordLikeSegments

function filterWordLikeSegments(segments: Intl.Segments): Iterable<string>;
  • Description: Filters and returns an iterable of word-like segments as strings where isWordLike is true.
  • Parameters:
    • segments: The segments to filter.
  • Returns: An iterable of strings for each word-like segment.

💙 This package was templated with create-typescript-app.