fod4se

v0.2.0

Published

15 days ago

A playful text normalization library that cleans up profanity, decodes obfuscated words, and gives you full control over how the text is displayed in the end.

Downloads

369

0High
0Medium
0Low

victorafael

profanity obfuscated filter censorship bad language

Flexible Obfuscation Dictionary For Sanitization Enforcement (FOD4SE)

A powerful text sanitization library with support for multiple languages, symbol normalization, and flexible configuration options.

Features

Built-in dictionaries for multiple languages
Symbol normalization (e.g., @ → a, $ → s)
Partial or full word replacement
Left-to-right or right-to-left replacement direction
Customizable replacement characters
Ignore list support
Full or partial word matching

Installation

npm install fod4se

Quick Start

import { LanguageFilter } from "fod4se";

// Create a filter with English dictionary
const filter = new LanguageFilter({ baseLanguage: "en" });

// Clean text
const cleaned = filter.getSafe("Your text here");

Alternative Usages

You can also use the getSafeText and analyzeText functions directly without creating a LanguageFilter instance.

Using getSafeText

import { getSafeText } from "fod4se";

const cleaned = getSafeText("Your text here", { baseLanguage: "en" });
console.log(cleaned); // Returns sanitized text

Using analyzeText

import { analyzeText } from "fod4se";

const result = analyzeText("Text to analyze", { baseLanguage: "en" });
console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details

Detailed Usage

Basic Usage with Built-in Dictionary

import { LanguageFilter } from "fod4se";

const filter = new LanguageFilter({
  baseLanguage: "en", // Use built-in English dictionary
});

filter.getSafe("Text to clean"); // Returns sanitized text

The base dictionaries are at an early stage of development and are very incomplete. If you miss something, refer to the contributing section.

Custom Dictionary

import { LanguageFilter } from "fod4se";

const filter = new LanguageFilter({
  baseLanguage: "none",
  config: {
    profanity: ["word1", "word2"],
    ignore: ["goodword1", "goodword2"],
  },
});

Advanced Analysis

import { LanguageFilter } from "fod4se";

const filter = new LanguageFilter({ baseLanguage: "en" });
const result = filter.analyze("Text to analyze");

console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details

Custom Configuration

import { LanguageFilter, regexTemplate } from "fod4se";

const filter = new LanguageFilter({
  baseLanguage: "en",
  config: {
    replaceString: "#@", // Pattern used in replacement (What is this #@#@#)
    replaceRatio: 0.5, // Replace 50% of matched words
    replaceDirection: "LTR", // Replace from left to right
    matchTemplate: regexTemplate.partialMatch, // Match partial words
    ignoreSymbols: true, // Don't normalize symbols
  },
});

Configuration Options

LanguageFilter Options

| Option | Type | Default | Description | | ------------ | ------------------------- | ------- | -------------------------- | | baseLanguage | "none" | "en" | "pt-br" | - | Built-in dictionary to use | | config | FSConfig | - | Configuration object |

FSConfig Options

| Option | Type | Default | Description | | ---------------- | -------------- | ---------------------- | ----------------------------------- | | profanity | string[] | [] | Custom list of words to filter | | ignore | string[] | [] | Words to exclude from filtering | | replaceString | string | "*" | Character(s) used for replacement | | replaceRatio | number | 1 | Portion of word to replace (0 to 1) | | replaceDirection | "LTR" | "RTL" | "RTL" | Direction of partial replacement | | matchTemplate | string | regexTemplate.fullWord | Word matching pattern | | ignoreSymbols | boolean | false | Disable symbol normalization |

Match Templates

import { getSafeText, regexTemplate } from "fod4se";

const text = "c4t category [cat]";
const profanity = ["cat"];

const templates = [
  //regexTemplate.fullWord matches "cat" but not "category":
  regexTemplate.fullWord,
  //regexTemplate.partialMatch matches both "cat" and "category"
  regexTemplate.partialMatch,
  //custom template to match only [cat]
  "\\[{0}\\]",
];

templates
  .map((matchTemplate) => getSafeText(text, profanity, { matchTemplate }))
  .forEach((result) => console.log(result));
/*
Outputs:
*** category [***] //full
*** ***egory [***] //partial
c4t category ***** //custom
*/

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.