txt-util

v0.0.1

Published

a month ago

A text utility library

Downloads

0High
0Medium
0Low

jvickers

Text Util

A utility library for various text and HTML document processing tasks.

Installation

To install the library, use npm:

npm install text-util

API

`analyzeText(text)`

Analyzes the given text and returns various statistics about it.

Parameters:

text (string): The text to analyze.

Returns:

An object containing the following properties:
- spaceCount
- fullStopCount
- semicolonCount
- squareBracketCount
- colonCount
- doubleQuoteCount
- curlyBracketCount
- otherPunctuationCount
- spaceToFullStopRatio
- spaceToParenthesesRatio
- fullStopSpaceToFullStopRatio
- squareBracketToCharactersRatio
- curlyBracketToCharactersRatio
- colonToCharactersRatio
- doubleQuoteToSpaceRatio

`simple_ratio_analyse_text(text)`

Alias for analyzeText.

`indicate_is_computer_language(text)`

Determines if the given text is likely to be computer code.

Parameters:

text (string): The text to analyze.

Returns:

true if the text is likely to be computer code, false otherwise.

`get_html_document_text_nodes(html, downloaded_url_id, filter, do_trim_white_space = true)`

Extracts text nodes from an HTML document.

Parameters:

html (string): The HTML document.
downloaded_url_id (number): The ID of the downloaded URL.
filter (function): A filter function to apply to the nodes.
do_trim_white_space (boolean): Whether to trim white space from the text nodes.

Returns:

An array of text nodes.

`get_html_document_a_elements(html_document, downloaded_url_id)`

Extracts <a> elements from an HTML document.

Parameters:

html_document (string): The HTML document.
downloaded_url_id (number): The ID of the downloaded URL.

Returns:

An array of <a> elements.

`iterate_html_document_nodes(html)`

Iterates over the nodes of an HTML document.

Parameters:

html (string): The HTML document.

Yields:

Each node in the document.

`join_url_parts(...a)`

Joins URL parts into a single URL.

Parameters:

...a (string[]): The URL parts to join.

Returns:

The joined URL.

`normalise_linked_url(page_url, linked_url, archive_url)`

Normalizes a linked URL based on the page URL and archive URL.

Parameters:

page_url (string): The page URL.
linked_url (string): The linked URL.
archive_url (string): The archive URL.

Returns:

The normalized URL.

`get_scheme_based_authority(url)`

Gets the scheme-based authority of a URL.

Parameters:

url (string): The URL.

Returns:

The scheme-based authority.

`is_simple_url(url)`

Determines if a URL is simple.

Parameters:

url (string): The URL to check.

Returns:

true if the URL is simple, false otherwise.

`get_string_byte_length(str)`

Gets the byte length of a string.

Parameters:

str (string): The string.

Returns:

The byte length of the string.

`compress_sync(inputString, level = 10)`

Compresses a string using Brotli compression.

Parameters:

inputString (string): The string to compress.
level (number): The compression level (default is 10).

Returns:

The compressed buffer.

`decompress_sync(buf)`

Decompresses a Brotli-compressed buffer.

Parameters:

buf (Buffer): The compressed buffer.

Returns:

The decompressed string.

`calc_hash_256bit(value)`

Calculates a 256-bit hash of a value.

Parameters:

value (string): The value to hash.

Returns:

The hash as a buffer.

`count_whitespace(str)`

Counts the number of whitespace characters in a string.

Parameters:

str (string): The string.

Returns:

The number of whitespace characters.

`get_char_info(str)`

Gets information about the first character in a string.

Parameters:

str (string): The string.

Returns:

An object containing information about the character.

`lazy_lookup_get_char_info(character)`

Gets information about a Unicode character using lazy lookup.

Parameters:

character (string): The Unicode character.

Returns:

An object containing information about the character.

`fn_tn_filter(tn)`

Filters text nodes based on their parent element.

Parameters:

tn (object): The text node.

Returns:

true if the text node passes the filter, false otherwise.

`get_url_domain(url)`

Gets the domain of a URL.

Parameters:

url (string): The URL.

Returns:

The domain of the URL.

`remake_html(str_html)`

Remakes an HTML document.

Parameters:

str_html (string): The HTML document.

Returns:

The remade HTML document.

`remake_trim_html(str_html)`

Remakes an HTML document by trimming text nodes.

Parameters:

str_html (string): The HTML document.

Returns:

The remade HTML document.

`remake_html_to_els_structure(str_html)`

Remakes an HTML document to an elements structure.

Parameters:

str_html (string): The HTML document.

Returns:

The remade HTML document.

`count_character_types(str)`

Counts the types of characters in a string.

Parameters:

str (string): The string.

Returns:

An object containing the counts of different character types.

`iterate_chars(str)`

Iterates over the characters of a string.

Parameters:

str (string): The string.

Yields:

Each character in the string.

`iterate_string_chars_info(text)`

Iterates over the characters of a string and provides information about each character.

Parameters:

text (string): The string.

Yields:

An object containing information about each character.

`iterate_char_type_groups_tokenize_string(str)`

Iterates over character type groups and tokenizes a string.

Parameters:

str (string): The string.

Yields:

An object representing each token.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Text Util

Installation

API

analyzeText(text)

simple_ratio_analyse_text(text)

indicate_is_computer_language(text)

get_html_document_text_nodes(html, downloaded_url_id, filter, do_trim_white_space = true)

get_html_document_a_elements(html_document, downloaded_url_id)

iterate_html_document_nodes(html)

join_url_parts(...a)

normalise_linked_url(page_url, linked_url, archive_url)

get_scheme_based_authority(url)

is_simple_url(url)

get_string_byte_length(str)

compress_sync(inputString, level = 10)

decompress_sync(buf)

calc_hash_256bit(value)

count_whitespace(str)

get_char_info(str)

lazy_lookup_get_char_info(character)

fn_tn_filter(tn)

get_url_domain(url)

remake_html(str_html)

remake_trim_html(str_html)

remake_html_to_els_structure(str_html)

count_character_types(str)

iterate_chars(str)

iterate_string_chars_info(text)

iterate_char_type_groups_tokenize_string(str)

`analyzeText(text)`

`simple_ratio_analyse_text(text)`

`indicate_is_computer_language(text)`

`get_html_document_text_nodes(html, downloaded_url_id, filter, do_trim_white_space = true)`

`get_html_document_a_elements(html_document, downloaded_url_id)`

`iterate_html_document_nodes(html)`

`join_url_parts(...a)`

`normalise_linked_url(page_url, linked_url, archive_url)`

`get_scheme_based_authority(url)`

`is_simple_url(url)`

`get_string_byte_length(str)`

`compress_sync(inputString, level = 10)`

`decompress_sync(buf)`

`calc_hash_256bit(value)`

`count_whitespace(str)`

`get_char_info(str)`

`lazy_lookup_get_char_info(character)`

`fn_tn_filter(tn)`

`get_url_domain(url)`

`remake_html(str_html)`

`remake_trim_html(str_html)`

`remake_html_to_els_structure(str_html)`

`count_character_types(str)`

`iterate_chars(str)`

`iterate_string_chars_info(text)`

`iterate_char_type_groups_tokenize_string(str)`