@dangbh1002/language-data
v1.0.1
Published
Generate language sentences data for every language in the world.
Downloads
5
Readme
Build language data
If you have a Language Learning Application, data is very important. So, this sample code will help you generate sentences data (indexed json data for every language in the world).
The raw data included, it's have more than 6 million sentences in every language, provide by tatoeba project.
Prepare
1. Install
Install via github:
git clone https://github.com/dangbh1002/language-data.git
Install via npm:
npm install @dangbh1002/language-data
2. Required
Download 3 largeSize files below at this link and put them to the ./rawData
directory:
- links.csv
- sentences_with_audio.csv
- sentences.csv
3. Project tree
- build/
- node_modules/
- rawData/
- links.csv
- sentences_with_audio.csv
- sentences.csv
- src/
- language-data.js
- letter-code.js
- .gitignore
- index.js
- package.json
- README.md
How To Use
1. Find the supported language you want to translate from ./letter-code.js
file.
/**
* This sample code only have 5 language. But you can add any language you want.
* The 3-letter codes of every language, you can find here: ./rawData/sentences.csv
*/
let letterCode = {
"english": "eng",
"vietnamese": "vie",
"japan": "jpn",
"france": "fra",
"germany": "deu"
};
2. Run bash
npm start @OriginLanguage @TargetLanguage
3. Example bash
- Build indexed json for translate English to Vietnamese:
npm start english vietnamese
- Build indexed json for translate English to Japanese:
npm start english japanese
- Build indexed json for translate Japanese to English:
npm start japanese english
4. Result
After run bash command line, you'll get the json file in ./build
directory:
For example: ./build/japanese-to-english.json
[
{"eng":"I have to go to sleep.","vie":"Tôi phải đi ngủ.","author":"tatoeba","soundID":"1277","syllables":6},
{"eng":"The password is Muiriel.","vie":"Mật mã là Muiriel.","author":"tatoeba","soundID":"1283","syllables":4},
{"eng":"I just don't know what to say.","vie":"Tôi không biết nên nói gì cả.","author":"tatoeba","soundID":"1288","syllables":7},
{"eng":"I don't know what you mean.","vie":"Tôi không biết ý của bạn là gì.","author":"tatoeba","soundID":"1408","syllables":6}
...
]
Created by [Brian Dhang]. Powered by tatoeba, nodejs, javascript, csvtojson and love.
All rights reserved.