Internationalization (i18n) and localization (l10n)¶
Babel¶
Marking strings for translation¶
In order to translate strings, they have to be marked. This ensures that they will be picked up by the extraction process later on. Only strings which are visible in the UI have to be translated.
Python¶
Warning
Do not use f-strings
for translatable text. Interpolation is not supported for these types of strings (see Issue).
Instead, use format
:
- _(f"Hi {username}")
+ _("Hi {username}".format(username=username))
Marking strings in python is done via the Flask-BabelEx
package. First, make sure to import the package and the *gettext
function. We usually import it as _
to be consistent throughout different packages.
from flask_babelex import lazy_gettext as _
Now strings can be marked for translation:
error_message = _("Invalid scheme")
Certain data, identifiers or keywords do not have to be translated. This can be achieved with string interpolation:
_("Invalid {scheme}").format(scheme)
Adding notes for translators helps them to better understand the context where this string will be used:
# NOTE: This is displayed in ....
_("Invalid {scheme}").format(scheme)
Jinja Templates¶
The gettext
function is made available in jinja templates as well:
<p>
{{ _('Only published versions are displayed.') }}
</p>
JavaScript¶
Marking strings in JavaScript is done via the i18next
module. First, make sure to import i18next
from the module.
import { i18next } from "@translations/invenio_app_rdm/i18next";
Now strings can be marked for translation:
<p>{i18next.t("Only published versions are displayed.")}</p>
For strings containg HTML/React nodes, the Trans
component is the way to go (full documentation):
import { Trans } from '@translations/i18next';
const title = record.title;
<Trans>
The record {{ title }} can <b>only</b> be accessed by <b>users specified</b> in the permissions.
</Trans>
For more complex strings, custom variables can be passed to the Trans
component as well.
<Trans
defaults="On <bold>{{ date }}</bold> the record and the files will automatically be made publicly accessible. Until then, the record and the files can <bold>only</bold> be accessed by <bold>users specified</bold> in the permissions."
values={{ date: fmtDate }}
components={{ bold: <b /> }}
/>
This will interpolate the nodes from the values
and components
properties and is a good approach, if the nodes are to be used multiple times.
Singular & Plural¶
Sometimes, certain words or phrases in a sentence change based on a number or the amount of objects. There is also support for this, by passing the count
property.
const records = [{ title: "Record 1" }, { title: "Record 2" },];
<Trans
i18nKey="numberOfRecords"
count={records.length}>
There are {{ count: records.length }} records.
</Trans>
This will generate the following two ids for translation:
{
"numberOfRecords": "There is one record.",
"numberOfRecords_plural": "There are {{ count }} records."
}
Given that you must use the reserved property count
, the translation is different when having to translate sentences that have multiple (nested) plurals, e.g. 2 views, 1 download
.
In this case, you will have to (see official documentation here):
- in your code, define the translated string as a key only, with all the params:
i18next.t("viewsAndDownloads", { views: uniqueViews, downloads: uniqueDownloads })
- make sure that the translation keys are defined, with interpolation, using the
count
reserved variable:{ "viewsAndDownloads": "$t(views, {\"count\": {{views}} }), $t(downloads, {\"count\": {{downloads}} })", "views": "{{count}} view", "views_plural": "{{count}} views", "downloads": "{{count}} download", "downloads_plural": "{{count}} downloads" }
Lazy vs non-lazy¶
Coming soon
Storage of strings and timestamps¶
Strings should be stored in UTF-8
.
Timestamps should be stored in UTC and localized when displayed.
EDTF localization¶
The EDTF (Extended Date Time Format) is used when dealing with imprecise dates and for their region specific display.
Python¶
Here we make use of the babel-edtf
module. It allows to pass the locale to the format_edtf
function. As always, make sure to import it before you use it.
from babel_edtf import format_edtf
format_edtf('2020-01', locale='en')
'Jan 2020'
It also allows to pass a certain format (one of: short
, medium
, long
, full
):
format_edtf('2020-01/2020-09', format='long', locale='en')
'January – September 2020'
Extracting Strings¶
After strings have been marked for translation, it is now time to extract them. Instructions on how to do so can be found in the respective package in the file located at .tx/config
. Depending on the package, only Python or JavaScript may be available for translation.
Python¶
Configuration¶
- In the root folder of your package update the
.tx/config
file:
[main]
host = https://www.transifex.com
# python conf
[<parent_namespace_of_your_transifex_projext>.<package_name>-messages]
file_filter = <package_name>/translations/<lang>/LC_MESSAGES/messages.po
source_file = <package_name>/translations/messages.pot
source_lang = en
type = PO
- also in the root folder add
babel.ini
if not there:
# Extraction from Python source files
[python: **.py]
encoding = utf-8
# Extraction from Jinja2 templates
[jinja2: **/templates/**.html]
encoding = utf-8
extensions = jinja2.ext.autoescape, jinja2.ext.with_
- also in the root folder add these configuration line in
setup.cfg
if not there:
[compile_catalog]
directory = <package_name>/translations/
use-fuzzy = True
- remember to add this line for
MANIFEST.in
recursive-include <package_name> *.po *.pot *.mo
- also the entypoint in
setup.py
entry_points={
....
'invenio_i18n.translations': [
'messages = <package_name>',
],
},
- remember to add your new language to the instance configuration by modifying
I18N_LANGUAGES
variable, e.g.I18N_LANGUAGES = [('de', _('German'))]
Usage¶
In the root folder of the package run
python setup.py extract_messages
This command will gather all the marked strings from .py and jinja template files, group them by their ID and place everything in the source file specified in the config (usually at <package_name>/translations/messages.pot
).
Push translation sources (.pot and .po) to transifex. It will add the missing strings to the translation awaiting list.
tx push -s -t
NOTE: When there is no resource for this package on transifex, one will be created automatically on the first push.
For the next step you need to have some translations done on the Transifex website. Otherwise next step will not be successfully done, because there will be no translated content to work with. Translations are stored under the namespace defined in your .tx/config
file.
Pull translations from Transifex:
tx pull -l <lang>
JavaScript¶
Configuration¶
Navigate to the <package_name>/theme/assets/semantic-ui/translations/<package_name>
folder.
Create following folder and file structure:
|-messages/
|-index.js
|-scripts/
|-compileCatalog.js
|-initCatalog.js
|-i18next.js
|-i18next-scanner.config.js
|-package.json
in the root folder of your package update the .tx/config
file:
[main]
host = https://www.transifex.com
# React conf
[<parent_namespace_of_your_transifex_projext>.<package_name>-messages-ui]
file_filter = <package_name>/assets/semantic-ui/translations/<package_name>/messages/<lang>/messages.po
source_file = <package_name>/assets/semantic-ui/translations/<package_name>/translations.pot
source_lang = en
type = PO
where:
i18next.js
is your translation configurationi18next-scanner.config.js
is translation string scanner configurationcompileCatalog.js
is a script transforming Transifex*.po
output translations into.json
files used by ReactinitCatalog.js
is a script initializing a new language configurationpackage.json
configuration and dependencies of the translation mini-app
example content of above files available here:
Usage¶
Install i18n dev dependencies via (dependencies from package.json)
npm install
Now run to add a new language
npm run init_catalog lang <your_new_lang_two_letters_abbreviation>
Now run
npm run extract_messages
This command will gather all the marked strings from .js files, group them by their ID and place everything in the source file specified in the config (usually at <package_name>/theme/assets/semantic-ui/translations/<package_name>/translations.pot
).
Update the ./messages/index.js file to add the configuration of your new language to the React translation configuration map
import TRANSLATE_<lang> from './<lang>/translations.json'
export const translations = {
...rest,
<lang>: { translation: TRANSLATE_<lang> }
}
If you don't already have it, install the python transifex client. You will need also the transifex.com account and your private API key
pip install transifex-client
Push translation sources (.pot and .po) to transifex. It will add the missing strings to the translation awaiting list.
tx push -s -t
NOTE: When there is no resource for this package on transifex, one will be created automatically on the first push.
For the next step you need to have some translations done on the Transifex website. Otherwise next step will not be successfully done, because there won't be no translated content to work with. Translations are stored under the namespace defined in your .tx/config
file
Pull translations from Transifex:
tx pull -l <lang>
You can use -f
to overwrite existing .po
files. After this step you should see new .po
files pulled to package_name>/theme/assets/semantic-ui/translations/<package_name>/messages/<lang>/
folder. If there are no new files, the translations were empty.
Compile the .json
files for React to use.
npm run compile_catalog
npm run compile_catalog lang <lang> # for specific language
Invenio-CLI¶
In addition, if you want to translate strings from your invenio.cfg
file or the code in your instance's site folder you can do so with the invenio-cli. Note that this workflow also enables you to overwrite translations from any other existing module's string.
Extract messages catalog¶
In your instance there is a translations
folder. The first step would be to extract the strings to a message catalog:
invenio-cli translations extract
Running this command will generate the corresponding .pot
file:
/translations
|- babel.ini
|- messages.pot
Absolute Paths in Localization Files
Absolute paths in messages.po
and messages.pot
are comments for human reading context. They don't affect the compiled .mo
file or app behavior.
Initialize a message catalog for a locale¶
The second step is to initialize the catalog for the specific locale of your choice.
These locales are the keys of the I18N_LANGUAGES
configuration variable. For example, for German would be:
# invenio.cfg file
I18N_LANGUAGES = [
('de', _('German')),
]
invenio-cli translations init -l de
Running this command will produce the corresponding .po
files:
/translations
|- babel.ini
|- messages.pot
|- de/LC_MESSAGES
|- messages.po
Compile the catalogs¶
Finally, you need to compile the catalogs. That can be achieved with the translations
command.
However, this process is also a part of the setup process so it will be executed when running invenio-cli services setup
(or its containerized counter part).
invenio-cli translations compile
Running this command will produce the corresponding .mo
files:
/translations
|- babel.ini
|- messages.pot
|- de/LC_MESSAGES
|- messages.po
|- messages.mo