@tremho/locale-string-tables
v2.0.1
Published
###### Foundational code for i18n locale based string table lookup
Downloads
152
Readme
locale-string-tables
Foundational code for i18n locale based string table lookup
This module allows localization strings to be provided for an array of languages and regions. It has a cascading structure, so the more specific language + region entries will supercede the more general language - only entries, and finally falling back to a 'common' group for strings that are not localized.
Pluralization is also supported, as are ordinals (as of 1.0.0)
This module is not meant to necessarily provide the end-all-be-all API interface for your application and it's localization needs, but rather the foundation for one that you may create within your application. It is an implementation of the time-honored design of using key-value data sets to represent string by identifiers that are switched in scope per the locale setting.
Since the strings are managed separately and as part of your application's resources, you can easily choose whether you wish to keep all the string tables for all locales bundled with a single distributed application, or simply exclude those that do not apply to certain regions in order to save bundle size.
In-memory, only those languages that have been loaded into scope will occupy Javascript Array space. Unused locales in the data files will remain on disk.
Revision History
v 1.1.0 (prerelease)
- new API:
- added
enumerateAvailableLocales()
- added
- changed API (breaking)
These changes only affect apps needing to declare a
FileOps
objectFileOps
no longer support propertyrootPath
.FileOps
now requires propertyi18nPath
, to indicate the i18n folder directly.- Similarly,
init
parameter 'customLocation' behavior is modified as is no longer relative to the (now non-existent)rootPath
- non-API changes
- Fix major bug in loading of pluralRules scripts.
- documentation updates
v 1.0.0 - 1.0.1
- Initial release and subsequent doc updates
Setting up for use in an application
At the root of your project, create a folder named "i18n". Other names are possible, but this is the standard default used by the library.
Within this folder, you may create a folder named 'common'. This folder will hold string definitions that are independent of language.
Create a new folder for each language that you will support, using the ISO 639 2-letter language code (lower case) for that language (e.g. 'en', or 'fr') In this language folder you will create files that define strings for a particular language, but independent of region. For example, English, regardless of whether it's US or GB.
Create a new folder for each language-region you will support, using the
ISO 639 2-letter language code (lower case) followed by a dash ('-') and the
ISO 3166 2-letter region code (upper case), as in RFC 1766. (e.g. 'en-US').
In these folders you will create files that define strings unique to this language region.
For example, idioms and phrases, or format order and detail.
You may also create folders named 'common-<region>' where <region> is the
ISO 3166 code for regions that you will support.
In these folders, you may wish to define strings (such as formats or other non-literal text)
that apply to a geographical region regardless of the language.
Your i18n
folder tree might look like this, for instance:
i18n
common
serviceEndpoints.json
metricUnits.json
common-US
USImpUnits.json
en
dateStrings.json
welcome.json
account.json
order.json
en-GB
dateStrings.json
welcome.json
account.json
order.json
es
dateStrings.json
welcome.json
account.json
order.json
fr
dateStrings.json
welcome.json
account.json
order.json
fr-CA
welcomeCA.json
orderCA.json
The .json
files that are in each folder can have any
name, but must have the .json extension, and must contain valid
JSON. You may wish to choose file names that represent the
strings for use in various parts of your application, or
according to another scheme.
Identifiers are unique. If an identifier exists in more than one place, the one of the greatest hierarchical priority will be the one used for that chosen locale.
For example, suppose strings representing formats and unit names for
the Metric system were defined in metricUnits.json
, and
placed in the common
folder. These strings will apply to
all languages, unless overridden. Since the metric system
is used by most of the world, this is convenient. However, since
the US does not use this system, the same string identifiers are
defined in a file we'll call USImpUnits.json
which is placed in the common-US
folder. Since the common-US
folder is more specific than the
common
folder, these strings take precedence for the US region.
Since it's in the common-US
folder, any locale language that
uses the US as its region will use the US standard definitions
for these unit strings, unless further overridden by a language
or language-region entry.
Precedence always occurs from most specific to least, so, in order of precedence:
- lang-region
- lang
- common-region
- common
Note that strings are just that: Strings of text. Most commonly, these are translations of words and phrases into different languages, but they may also be format templates, URLs, keywords, or other text not necessarily meant for human reading, but nevertheless to be in context when in the prescribed locale scope. You might think of these types of string resources almost like configuration settings.
A string table .json file entry is a straight-forward key/value association between string identifiers (string ID) and the value of that string in the target language.
Identifiers can be any legal JSON string you like. However, the convention adopted and promoted by this library is to use prefix notation to help identify the context of the string and its use in the application. We've adopted the convention of using dots (.) to separate prefix and suffix portions of the identifier. You may choose to do the same, or you may wish to use something different, such as dashes, slashes. spaces, or camelCase. Try to be consistent with whatever convention you choose, however, as string ids will multiply and become complex to manage as your app grows.
Using @tremho/gen-format
Consider using the npm package @tremho/gen-format, which provides generalized formatting support including localization for Date/Time among other things, and is based upon this library. If you set up gen-format using the i18n tables that are associated with it, you will have a pre-established localization structure you can continue to populate with your own strings across a large number of language locales.
If you use gen-format, you do not need to import or set up this @tremho/locale-string-tables module independently
Using locale-string-tables
Please note that all the code examples used here are
written using TypeScript. If using plain JavaScript, please
convert the import
and export
statements to valid require
and module.exports
form (or other module syntax your framework may use)
You must create an instance of locale-string-tables and use this in your application. The steps to creating and initializing the instance are as follows:
install locale-string-version. If you haven't already, you can install it as follows:
npm install @tremho/install-string-tables
Define a 'FileOps' object. This is simply an object with the following properties
function
read
(pathname) : a function that reads and returns the text from the given pathnamefunction
enumerate
(relDir, fileCallback) : a function that recursively enumerates the directories starting at the folder designated byrelDir
, a relative pathname as referenced from your project root (in which you have placed yourii8n
folder tree). This function will send the files it finds within this tree back through fileCallback, with the fully realized pathname of the file as the argument.string property or getter function
i18bnOath
: returns the relative or absolute path to the location of the i18n folder that holds the locale strings, usually off of the application root path.
Since different applications may use this module in different contexts, it is up to the application to supply these basic file operations. If you are working with Node, the following code will work. If you are relying on a different platform file system, you will need to adjust to match your platform.
NodeFileOps.ts
import * as fs from 'fs'
import * as path from 'path'
let root = './'
class NodeFileOps {
// read a text file, returning contents as string
read(realPath:string): string {
return fs.readFileSync(realPath).toString()
}
// enumerate all files within the folder tree given,
// sending paths to files found through callback
enumerate(dirPath:string, callback:any) {
let apath = path.normalize(path.join(root, dirPath))
if(!fs.existsSync(apath)) {
console.warn('warning: path not found '+apath)
return;
}
let entries = fs.readdirSync(apath)
entries.forEach(file => {
let pn = path.join(root, dirPath, file)
let state = fs.lstatSync(pn)
if(state.isDirectory()) {
this.enumerate(path.join(dirPath, file), callback)
} else {
callback(pn)
}
})
}
// property (or getter) that provides the root path
// that the `i18n` tree resides within.
get i18nPath() { return './i18n/'}
}
// note that we instantiate this class before exporting
export default new NodeFileOps()
- create a module for your instance. In this example, we'll call this module
i18
. It should start out looking something like this:
i18n.ts
import {getSystemLocale, LocaleStrings} from '@tremho/locale-string-tables'
// Have your fileops ready (change this import line to suit your own FileOps object)
import {NodeFileOps} from './NodeFileOps'
// Construct the instance
const i18n = new LocaleStrings()
// init it with your fileops object
i18n.init(NodeFileOps)
// (optional) preload locales you wish to use
// (they will load on demand anyway if you choose not to do that here)
i18n.loadForLocale(getSystemLocale())
i18n.loadForLocale('en')
i18n.loadForLocale('en-GB')
i18n.loadForLocale('fr-FR')
i18n.loadForLocale('fr-CA')
// set the current locale
i18n.setLocale(getSystemLocale())
// export this for your app to use
export default i18n
Use and apply in your own modules
import i18n from `./i18n' function someFunction() { // first english i18n.setLocale('en-US') let greet = i18n.getLocaleString('example.greeting') console.log(greet) // then french i18n.setLocale('fr-FR') greet = i18n.getLocaleString('example.greeting') console.log(greet) // then spanish i18n.setLocale('es-ES') greet = i18n.getLocaleString('example.greeting') console.log(greet) }
In this hypothetical module, if someFunction
is called,
it will attempt to display a "hello" greeting in each of
the three selected languages. What actually happens when
you run this will depend upon your setup.
Create a file named example.json
with the following contents
and place in a folder at i18n/en
:
example.json
{
"example.greeting" : "Hello"
}
Copy this file to folders at i18n/es
and i18n/fr
also, and
then edit these so that the one in the fr
folder looks like this:
{
"example.greeting" : "Bonjour"
}
and the one in the i18n/es
folder like this:
{
"example.greeting" : "Hola"
}
Now when you run your app, it should work as expected. If you ran this program before supplying these strings, or missing one of the referenced languages, you would see both warning messages as well as the string "%$$>example.greeting<$$%" returned.
See the API docs for more on managing this behavior and using strings.
API
getSystemLocale
Interrogates services of the underlying platform to determine the current system locale. For browser contexts, this comes from the window.navigator object For Nativescript, this comes from the platform info Node does not have a convenient means of identifying this, so it falls to the default case, which is to assign the system locale as 'en-US'
init
We must initialize LocaleStrings with a FileOps object that contains the following methods:
read(filepath)
- read text from the given true file path (e.g. fs.readFileSync)
enumerate(relDir)
-- enumerates recursively the folder (relative to presumed root) and calls back with full file paths for each fle
i18nPath
-- a string property or getter function that returns the relative or absolute path to the presumed `i18n' folder, usually at project root.
Parameters
fileOps
FileOps object containing necessary file operations for this environmentcustomLocation
string? optional path than overrides thei18nPath
inFileOps
as the folder off of root for the locale string files
loadForLocale
Loads the translation string tables for a given locale, but does not change the current setting.
Parameters
locale
string the RFC 1766 language-region specifier
Returns LoadStats LoadStats object containing details of the loaded table
setLocale
Switches to a new locale. If the locale has not been previously loaded, it is loaded now.
Parameters
locale
string the RFC 1766 language-region specifier
Returns LoadStats LoadStats object containing details of the table that has been set
isLocaleLoaded
Tests to see if the given locale is loaded.
Parameters
locale
string the RFC 1766 language-region specifier
Returns boolean true
if the specified locale has been loaded
hasLocaleString
Tests to see if the given string Id can be found in the current locale table.
Parameters
id
string the string identifier to find in the current locale
Returns boolean true
if the specified string id exists in the currently active table
getLocaleString
Returns the localized string according to the current locale for the
string Id passed.
If the string does not exist, the value supplied by useDefault
is returned instead.
If useDefault is undefined, a decorated version of the string ID is returned, as "%$$>string.id<$$%"
if silent
is not true, the console will emit a warning indicating that the string Id requested is
not found in the table, and will show the default or decorated return value also. This may be useful
in reconciling string tables. pass true
for the silent
option to prevent these messages.
Parameters
id
string the string identifier to find in the current localeuseDefault
(string | undefined) the string to return if the id is not found in the table. do not include, or use undefined to have a 'decorated' version of the id returned in this case.silent
boolean if the string id is not found, a warning is emitted to the console. Passingtrue
here will silence these warnings.
populateObjectStrings
Traverses the object (deep by default, or without recursion if shallow
is true)
looking for string properties that begin with '@'. These strings are parsed
as @token:default
, meaning that the substring following the '@' character for the
remainder of the string or to the first occurrence of a ':' character is used as
a token into the locale string table. If there is a : character in the string, the
substring following this is used as the default if the string table does not have
the token entry.
This is effectively equivalent to getLocalString(token, default)
for the strings
converted. This method is a convenient means of translating many strings at once
and for populating objects with values that include localizable string data.
Note that this version translates in place, without a return object. This makes it unsuitable for re-translation, but useful for passing functional objects.
Parameters
obj
object Object to be traversed for '@token' and '@token:default' patterns.shallow
boolean? Optional; if true recursion is prohibited
translateObjectStrings
Preferred method of translating a set of strings.
See populateObjectStrings
for general description.
However: This makes a COPY of the passed-in object with the translated values.
This allows the original to be used for re-translation more easily.
Parameters
obj
object Object to be traversed for '@token' and '@token:default' patterns.shallow
boolean? Optional; if true recursion is prohibited
Returns object Resulting object with translated strings.
getTokenDefault
Parses an incoming string for possible localized substitutions.
String is searched for patterns of the form @token:default
, meaning that
the substring following the '@' character to the first occurrence of a ':' character,
or else the remainder of the string, is used asa token into the locale string table.
If there is a : character in the string, the
substring following, up to the next '@' or the end of the string is used as the
default if the string table does not have the token entry.
This is effectively equivalent to getLocalString(token, default)
for the strings
converted.
If a literal '@' or ':' is desired, use @@
and ::
, respectively
Parameters
inStr
string the string with @token:default substitutions to makesilent
Returns string the returned translated or default string.
getPluralizedString
Provides pluralization support.
Also provides ordinal counting support. That is, return the "nth item".
In English, pluralization is pretty simple: You either have a singular or a plural.
The string tables alone could be used here: Lookup word.plural for counts != 1 and make sure the table has the correct entries (e.g. 'dog' and 'dogs', 'sheep' and 'sheep', 'ox' and 'oxen')
Further, we could eliminate the need for too many duplicates by adding rules (i.e. append 's' by default) that are overridden if there is a string table '.plural' entry.
Other languages are not so simple. See discussion online for this topic in detail. For example, Arabic or Russian (and other languages) support multiple forms of pluralized words for common items depending upon the count. There may be a different name for a 'few' things than for 'many'. Or for 'zero'. Or when fractional amounts are involved. Some languages use different wording for counts with 1 as the last digit. And so it goes. Using tables with suffixes will work for all of these, but one must prepare.
The W3C intl spec for PluralRules
and its select
method support the following plural results:
'one', 'two', 'few', 'many' and 'other', (where 'other' is synonymous with 'plural' by default).
This in turn should be used to look up the corresponding correct word form in the i18n table.
The i18n table for the language / locale must contain the word referenced in singular form and may also require the pluralized form(s) as needed (per language).
The pluralized form of the word is held in an id that is the same as the singular word identifier plus a suffix (e.g. '.plural'). For example:
"item.cow" : "cow",
"item.cow.plural" : "cows",
"item.sheep": "sheep",
"item.sheep.plural" : "sheep"
in other languages, one may use the other suffixes of ".zero", ".two", ".few", ".many" or "plural"
Note that these map directly to the terminology of the W3C PluralRules specification, but with these exceptions:
- The PluralRules 'one' is not used. The "no-suffix" original identifier is used.
- The PluralRules 'other' is changed to 'plural' as the suffix (more semantically aligned to english at least).
Note that "simple plurals" need not be literally provided in the table if the plurization script can assign pluralization correctly. For instance, in the above example, "item.cow" need not have a literal "item.cow.plural" compliment, since the word "cow" can be automatically pluralized to "cows" correctly. However, "item.sheep" will probably need the literal entry to prevent the algorithm from naming it as "sheeps".
Automatic pluralization is the domain of the findPlurals
method. The plural-en.js
script provided
supplies the simple version for English, and handles appended "s" or "es" in most common cases, but does
not handle exceptions (so use literals when in doubt).
The getPluralizedString
method encapsulates this into a single place. However, it requires proper setup
to be useful.
Specific plural string ids can be placed directly into the string table data by including the suffix ".plural" For example:
animal.names.cow = 'cow'
animal.names.cow.plural = 'cows'
animal.names.sheep = 'sheep'
animal.names.sheep.plural = 'sheep'
- As noted, some languages pluralize differently depending upon the count, for example there may be different words
for 1 cow, 2 cows, 6 cows, or 20 cows
- This is the role of
getPluralRulesSelect
(see below), orIntl.PluralRules.select()
if available. - This should be supported by relevant pluralRules scripts where possible and practical.
- To support this behavior using the string tables, append the 'select' result to the string id, as in: animal.name.cow animal.name.cow.two animal.name.cow.few animal.name.cow.many animal.name.cow.plural
- This is the role of
Besides the string tables, the other source for pluralization is the pluralRules
script.
This code is within a script named for the language, as in pluralRules-en.js
for the en
language.
This script may supply each of three methods. These are optional, and default behavior will occur if not defined.
getPluralRulesSelect
takes two argumentscount
The number of itemstype
[optional] is one of 'cardinal' or 'ordinal'. 'cardinal' is the default. 'ordinal' is not yet supported here. See the PluralRules definitions of these. The function should return perPluralRules.select
for this language. It may choose to implement directly as a pass through tointl.PluralRules
if this is available. It must return one of 'zero', 'one', 'two', 'few', 'many', or 'other' accordingly. Note that for English, the allowed returns (for type 'cardinal') are 'one' and 'other'. (future support for 'ordinal' in English will define the other return values per intl spec)findPlural
takes two argumentssingle
The word in singular formcount
The count to pluralize to The function should return the pluralized version of the word in that form, either by rule or internal lookup, or else return null.If the plural rule script is not available, the
Intl.PluralRules
method will be used directly to get the correct plural suffixed string from the i18n table, assuming a full or partial implementation of W3C Intl is available to the system.if the stringId has no singular entry in the table, then an empty string will be returned.
If none of these support features are available, all requests will return a string "%$$%" (where lang is the language requested).
Ordinal support in the pluralRules script
The pluralRules-<lang>.js
script may also include support for ordinals by supplying the following function
makeOrdinal
takes two argumentssingle
The word in singular formcount
The count to make ordinal to The function should return the correct ordinal form of the word in that form, either by rule or internal lookup, or else return null. The ordinal should contain the word as well, not just the count, For example, in English, sending asking for 0,1,2,3,11 and 99 'cows' would return "zeroeth cow", "first cow", "second cow", "third cow", "eleventh cow", "99th cow"
If the ordinal fails, the word itself is returned unmodified.
Parameters
locale
string The locale to pluralize this id for. If not given, the system locale is used.stringId
string The i18n string identifier for the singular form of the word to pluralizecount
number The number of items involved in the pluralization or ordinal counttype
string? alllowed types are 'cardinal' and 'ordinal', default is 'cardinal'.
Returns string
pluralize
Use the application-supplied pluralRules-<lang>.js
to pluralize a word, or return it in ordinal form.
See the discussion of the pluralRules
script in the documentation for getPluralizedString
.
This function works the same way, but you pass the singular form word itself, not an identifier to look up in
the tables.
pluralRules-en.js is provided by this library. Other languages do not have pluralRules scripts supplied.
Parameters
locale
string The locale to pluralize this id for. If not given, the system locale is used.word
string The word to be pluralized or ordinated, in singular formcount
number The number of items involved in the pluralization or ordinal counttype
string? alllowed types are 'cardinal' and 'ordinal', default is 'cardinal'.
getInstalledLocales
Returns an array of all the locales that have been currently loaded.
Returns Array {string} Array of loaded locale strings
enumerateAvailableLocales
Enumerates all available locales by lang-region identifier via a callback function that accepts the locale name as a string parameter.
This walks the i18n folder tree to determine which potential locales are
available. This differs from getInstalledLocales
, which only lists those
that have been loaded into memory.
Note that folders in the i18n tree that do not contain files will not be enumerated.
The 'common' folders are not included in the enumeration, just the named languages and regions.
Parameters
callback
a function that accepts the locale identifier as a string. This function will be called for each locale enumerated.
StringTable
String Tables are a simple name/value pairing from a JSON file. This can be used as the basis for configuration, localization, or other common mappings.
All string table files are relative to the app folder root.
getString
Returns a string from the string table
Parameters
name
Returns string
setString
Sets the value of a string identifier.
Parameters
name
value
numStrings
Returns the number of strings in this table
load
Loads string values from a JSON file on disk. This is an asynchronous non-blocking promise call.
Parameters
filePath
silent
boolean?true
to supress file not found error. Other errors may still throw.
Returns Promise