fod4se
v0.2.0
Published
A playful text normalization library that cleans up profanity, decodes obfuscated words, and gives you full control over how the text is displayed in the end.
Downloads
369
Maintainers
Readme
Flexible Obfuscation Dictionary For Sanitization Enforcement (FOD4SE)
A powerful text sanitization library with support for multiple languages, symbol normalization, and flexible configuration options.
Features
- Built-in dictionaries for multiple languages
- Symbol normalization (e.g.,
@
→ a,$
→s
) - Partial or full word replacement
- Left-to-right or right-to-left replacement direction
- Customizable replacement characters
- Ignore list support
- Full or partial word matching
Installation
npm install fod4se
Quick Start
import { LanguageFilter } from "fod4se";
// Create a filter with English dictionary
const filter = new LanguageFilter({ baseLanguage: "en" });
// Clean text
const cleaned = filter.getSafe("Your text here");
Alternative Usages
You can also use the getSafeText
and analyzeText
functions directly without creating a LanguageFilter
instance.
Using getSafeText
import { getSafeText } from "fod4se";
const cleaned = getSafeText("Your text here", { baseLanguage: "en" });
console.log(cleaned); // Returns sanitized text
Using analyzeText
import { analyzeText } from "fod4se";
const result = analyzeText("Text to analyze", { baseLanguage: "en" });
console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details
Detailed Usage
Basic Usage with Built-in Dictionary
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "en", // Use built-in English dictionary
});
filter.getSafe("Text to clean"); // Returns sanitized text
The base dictionaries are at an early stage of development and are very incomplete. If you miss something, refer to the contributing section.
Custom Dictionary
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "none",
config: {
profanity: ["word1", "word2"],
ignore: ["goodword1", "goodword2"],
},
});
Advanced Analysis
import { LanguageFilter } from "fod4se";
const filter = new LanguageFilter({ baseLanguage: "en" });
const result = filter.analyze("Text to analyze");
console.log(result.cleaned); // Sanitized text
console.log(result.profanity); // true if anything was found
console.log(result.matches); // Array of matches with details
Custom Configuration
import { LanguageFilter, regexTemplate } from "fod4se";
const filter = new LanguageFilter({
baseLanguage: "en",
config: {
replaceString: "#@", // Pattern used in replacement (What is this #@#@#)
replaceRatio: 0.5, // Replace 50% of matched words
replaceDirection: "LTR", // Replace from left to right
matchTemplate: regexTemplate.partialMatch, // Match partial words
ignoreSymbols: true, // Don't normalize symbols
},
});
Configuration Options
LanguageFilter Options
| Option | Type | Default | Description | | ------------ | ------------------------- | ------- | -------------------------- | | baseLanguage | "none" | "en" | "pt-br" | - | Built-in dictionary to use | | config | FSConfig | - | Configuration object |
FSConfig Options
| Option | Type | Default | Description | | ---------------- | -------------- | ---------------------- | ----------------------------------- | | profanity | string[] | [] | Custom list of words to filter | | ignore | string[] | [] | Words to exclude from filtering | | replaceString | string | "*" | Character(s) used for replacement | | replaceRatio | number | 1 | Portion of word to replace (0 to 1) | | replaceDirection | "LTR" | "RTL" | "RTL" | Direction of partial replacement | | matchTemplate | string | regexTemplate.fullWord | Word matching pattern | | ignoreSymbols | boolean | false | Disable symbol normalization |
Match Templates
import { getSafeText, regexTemplate } from "fod4se";
const text = "c4t category [cat]";
const profanity = ["cat"];
const templates = [
//regexTemplate.fullWord matches "cat" but not "category":
regexTemplate.fullWord,
//regexTemplate.partialMatch matches both "cat" and "category"
regexTemplate.partialMatch,
//custom template to match only [cat]
"\\[{0}\\]",
];
templates
.map((matchTemplate) => getSafeText(text, profanity, { matchTemplate }))
.forEach((result) => console.log(result));
/*
Outputs:
*** category [***] //full
*** ***egory [***] //partial
c4t category ***** //custom
*/
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.