Split a French sentence without spaces nor accents into words




Words Splitter

Split a French sentence without any spaces nor accents, into words.

Example: jepenserestersurparisouprendreletgvpourmarseilleje pense rester sur Paris ou prendre le TGV pour Marseille

Useful for domains/URLs, which can't contain spaces. Example: or https://desurlsansespaces.sé


npm i words-splitter


const { addWord, splitSentence, loadDictionary } = require('words-splitter');

const main = async () => {
  await loadDictionary(); // Take a few seconds to load all French words with their probability

  addWord('MonMOTÀmoi'); // If you want, you can add your own words to the dictionary
  // "monmotamoi" will now be recognized as a word: "MonMOTÀmoi"
  // previously, "monmotamoi" was converted to "mon mot à moi"



  // faire 10h de train avec la SNCF
  // je m'appelle Mathilde et vous
  // mon nom est Thomas
  // ou partez-vous en vacances
  // j'hésite entre Nancy et Alençon et vous
  // je pense rester sur Paris ou prendre le TGV pour Marseille

  // que pensez-vous de MonMOTÀmoi



Algorithm from List of words created from Wikipedia French articles.

Building the dictionary

Download frwiki-latest-pages-articles.xml from, extract it and move it in the current folder. Execute node --max_old_space_size=4096 create-dictionary.js. It will output two files, the one to use is words-fr-new.txt. The process may also crash at some point because of a too high memory usage 🤷‍♂️. Then, execute node minimize-dictionary.js to slightly reduce the size of the dictionary. It will output a words-fr-new-min.txt which is the final dictionary.