npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mtl-text-processor

v0.7.0

Published

A set of tools to pre-process for translation (by machine translators) and post-process the results back into the full sentence

Downloads

2

Readme

mtl-text-processor

GitHub package.json version License

A set of tools to pre-process for translation (by machine translators) and post-process the results back into the full sentence.

Get started

Install:

npm install mtl-text-processor

Use it in your project:

const { TextProcessor } = require('mtl-text-processor');
let myProcessor = new TextProcessor({myOptions});
let myProcess = myProcessor.process("My text.");
let translatableStrings = myProcess.getTranslatableLines();
/**
 * Send translatableStrings (Array<string>) to the translator.
 * Store response as an array in translationResult.
 */
myProcess.setTranslatedLines(...translationResult);
let translatedText = myProcess.getTranslatedLines();

The use will vary a bit according to the translator you are using. If your translator already accepts Arrays and already returns Arrays, you're good to go. If your translator can only work with one string at a time, then you'd need to get the entire array before you can use it. But you don't have to deliver all translations at once:

// ...
translatableStrings.forEach(translatableLine => {
    let translation = sendToMyTranslator(translatableLine);
    myProcess.setTranslatedLines([translation]);
});
let translatedText = myProcess.getTranslatedLines();

Complete Example using Axios and Sugoi Translator to translate a .txt file

In this example we utilize Axios to control HTTP requests to a Sugoi Translator server started at port 14366. You'll need to install both axios and mtl-text-processor:

npm install mtl-text-processor && npm install axios
const axios = require('axios');
const { TextProcessor } = require('mtl-text-processor');
const fs = require('fs');
let file = "MyTextFile";
let text = fs.readFileSync(`./${file}.txt`, {encoding : "utf8"});
let myProcessor = new TextProcessor();
let myProcess = myProcessor.process(text);
let toTranslate = myProcess.getTranslatableLines();

let translations = [];

let maxRequests = 10;
let i = 0;
function sendBatch () {
    let batch = [];
    while (i < toTranslate.length) {
        batch.push(toTranslate[i++]);
        if (batch.length >= maxRequests) {
            break;
        }
    }

    axios.post("http://0.0.0.0:14366/", {
        message: "translate sentences",
        content : batch
    }).then(res => {
        translations.push(...res.data);
        if (i < toTranslate.length) {
            sendBatch();
        } else {
            myProcess.setTranslatedLines(...translations);
            fs.writeFileSync(
                `./${file}_translated.txt`,
                myProcess.getTranslatedLines().join("\n"),
                {encoding : "utf8"}
            );
        }
    });
}

sendBatch();

Why?

Machine Translators are great. But most Machine Translators can't handle being thrown a ton of text at once - if they don't outright crash, they will produce less than ideal translations.

This Text Processor has the main goal of safeguarding untranslatable symbols, but the way in which this is achieved also results in smaller sentences which are translated in isolation and then put back together. This is just a big win when the different sentences already have no relation to the others (like two entirely separate sentences that are only related by the fact that they are used in sequence, with no refences at all). Translation in isolation is also beneficial when translating a symbol (like a name, or a sub-sentence inside brackets, makes it more consistent).

This processor attempts to give the Machine the contextual information of "A thing is in this part of the sentence", but without overwhelming them with the entire content of that symbol.

Example

Overview of uses

  • Escaping Symbols: Allows setting up Regular Expressions to catch and remove untranslatable symbols, like scripts. Can also be used to save the translation of names for later - the translator will be fed a Placeholder Symbol instead of the original sequence. Most translators have support to certain symbols that are understood as "a thing", which allows the translator to still have the benefit of context, but without having to actually deal with the hard to understand original sequence.
    • Can be used for untranslatable scripts (\n[0] used as a name, or an actual script call).
    • Can be used if you have a manually crafted translation for a name (instead of having the translator translate the name, it will maintain the Placeholder).
    • There are many choices of placeholders. Ideal placeholder choice varies with translator. Some translators, like Google and DeepL, for instance, are able to understand symbols such as {{A}} as something that has relevance in the context, but that should not be touched. Other translators will handle symbols such as %A better.
  • Isolating Sentences: Internal parts of a sentence can be isolated for translation. Their place in the original sentence will be replaced by a placeholder.
  • Sentence Splitting: Sentences can be split in reasonable points so that each part is translated in isolation.
    • Most translators are not able to properly handle multiple paragraphs in a row, so by splitting sentences on new paragraphs, better translation quality can be achieved. It's faster and cheaper, too!
  • Protected Untranslatables: Parts of a sentence can easily not even be sent to the translator at all, like the bounding brackets of a quote - the TextProcessor will store them until it gets the translation for what's inside, then put it back together again.

Author

@reddo9999

License

GNU General Public License v3.0


README created with ❤️ by md-generate