Original link: ocavue.com/django_tran…

An overview of

Recently, as the company prepares to expand overseas, it has added internationalization and localization support to its Django system. Internationalization is generally referred to as I18N, representing the 18 letters of I and N in Internationalization; Localization is referred to as L10n, indicating that there are 10 letters in L and N in Localization. It’s interesting to note that we use lowercase I and uppercase L to prevent confusion.

To put it simply: I18N is a framework for internationalization, L10n is tailored to different regions. Here’s a simple example:

i18n:

datetime.now().strftime('%Y/%m/%d')  # before i18n
datetime.now().strftime(timeformat)  # after i18n
Copy the code

L10n:

timeformat = {
    'cn': '%Y/%m/%d'.'us': '%m/%d/%Y'.'fr': '%d/%m/%Y'. }Copy the code

See the W3C’s explanation for a more specific definition.

I18n covers a wide range of languages, time zones, currency units, singular and plural numbers, character codes and even text reading order (RTL). This article focuses only on the multilingual aspects of i18N.

Arabic Windows system, text and even interface orientation are opposite to Chinese version (photo source)

Basic steps

Django is a large and comprehensive framework that already provides a multilingual solution, but I did a little comparison and couldn’t find a library that worked better with Django than the official solution. Django’s solution can be broken down into four simple steps:

  1. Some necessary configuration
  2. Mark the text to be translated in your code
  3. usemakemessagesCommand to generate a Po file
  4. compilecompilemessagesCommand to compile the MO file

Let’s look at it in detail

Step 1: Configuration

Add these to settings.py first

LOCALE_PATHS = (
    os.path.join(__file__, 'language'),
)
MIDDLEWARE = (
    ...
    'django.middleware.locale.LocaleMiddleware',...). LANGUAGES = ( ('en'.'English'),
    ('zh'.'Chinese'),Copy the code

LOCALE_PATHS: Specify the location of the generated files in steps 3 and 4 below. Older versions of Django required manually creating this directory.

LocaleMiddleware: Lets Django recognize and select the appropriate language.

LANGUAGES: Specifies which LANGUAGES are available in this project.

Step 2: Tag the text

There was no need for multiple languages, so we wrote Chinese in AJAX code, like this:

return JsonResponse({"msg": "Too long"."code": 1."data": None})
Copy the code

Now that you’re multilingual, you need to tell Django what needs to be translated. For the above example, it would be written like this:

from django.utils.translation import gettext as _

return JsonResponse({"msg": _ ("Too long"), "code": 1."data": None})
Copy the code

This uses the gettext function to wrap the original string so that Django can return the appropriate string for the current language. A single underscore _ is generally used to improve readability.

Because we use AJAX for almost all of our front-end communication, we don’t use Much of Django’s template functionality (our front-end multilingual tool, by the way, is i18Next). But here, too, are the tags for Django templates:

<title>{% trans "This is the title." %}</title>
<title>{% trans myvar %}</title>
Copy the code

The trans tag tells Django to translate the contents of this parenthesis. Refer to the official documentation for more details.

Step 3:makemessages

Before performing this step, verify that you have GNU Gettext installed with xgettext –version. GNU Gettext is a standard I18N L10n library. Django and multilingual modules in many other languages and libraries call on GNU Gettext, so some of the following Django features are actually due to GNU Gettext. If it is not installed, you can install it as follows:

ubuntu:

$ apt update
$ apt install gettext
Copy the code

macOS:

$ brew install gettext
$ brew link --force gettext
Copy the code

windows

After installing GNU Gettext, run the following command on your Django project

$ python3 manage.py makemessages --local en
Copy the code

You can then find the generated file: language/en/LC_MESSAGES/django.po. Replace the en in the command above with another language to generate django.po files in different languages. It went something like this:

#: path/file.py:397
msgid "Order has been deleted"
msgstr "".Copy the code

Django finds all the strings wrapped in the gettext function, stored as msgid in Django.po. The msgid below each msGID represents what you want to translate that msGID into. You can modify this file to tell Django what to translate. The comment also indicates which line the MSGID appears in which file.

There are a few interesting features about this file:

  • Django will put the same files in multiple filesmsgidGroup them together. “Edit once, translate everywhere”
  • If later in the source codemsgidDelete, then execute againmakemessagesAfter the command, thismsgidAnd it’smsgstrWill continue to be saved as a comment in thedjango.poIn the.
  • Since the string in the source code is just a so-called ID, I can write strings with no real meaning in the source code, such as_("ERROR_MSG42")“And then translate “ERROR_MSG42” into Both Chinese and English.
  • Placeholders for template strings are kept in this file. For example, you can use named placeholders to enable different order of placeholders in different languages. Here is an example:

py file:

_('Today is {month} {day}.').format(month=m, day=d)
_('Today is %(month)s %(day)s.') % {'month': m, 'day': d}
Copy the code

po file:

msgid "Today is {month} {day}."
msgstr "Aujourd'hui est {day} {month}."

msgid "Today is %(month)s %(day)s."
msgstr "Aujourd'hui est %(day)s %(month)s."
Copy the code

Step 4:compilemessages

After modifying the django. Po file, execute the following command:

$ python3 manage.py compilemessages --local en
Copy the code

Django will call the program and compile a binary file named django.mo from django.po in the same location as django.po. This is the file that the program reads when it executes.

After you’ve done the above four steps, change your browser’s language Settings and you’ll see Django’s different output.

↑ Chrome language Settings

Advanced features

i18n_patterns

Sometimes we want to be able to select different languages by URL. This has many advantages, such as the fact that the data returned from the same URL must be in the same language. Djangos documentation uses this approach:

Simplified Chinese: https://docs.djangoproject.com/zh-hans/2.0/

English: https://docs.djangoproject.com/en/2.0/

To do this, add

to the URL.

urlpatterns = ([
    path('category/<slug:slug>/', news_views.category),
    path('<slug:slug>/', news_views.details),
])
Copy the code

For details, refer to the official Django documentation.

How does Django decide which language to use

We talked earlier about how LocaleMiddleware can decide which language to use. Specifically, LocaleMiddleware is in the following order (decreasing priority) :

  • i18n_patterns
  • request.session[settings.LANGUAGE_SESSION_KEY]
  • request.COOKIES[settings.LANGUAGE_COOKIE_NAME]
  • request.META['HTTP_ACCEPT_LANGUAGE']That is, in the HTTP requestAccept-Language header
  • settings.LANGUAGE_CODE

Our company chooses to put the language information into the Cookies. When the user manually selects the language, the front-end can directly modify the Cookies without requesting a certain interface in the background. Users who have not manually set the language do not have this cookie and follow the browser Settings. The default Settings.LANGUAGE_COOKIE_NAME is django_language, and the front end doesn’t want Django in their code, So I added LANGUAGE_COOKIE_NAME = app_language 😂 to settings.py.

LANGUAGE_CODE can also be used to manually find out which language LocaleMiddleware uses in the View via Request. LANGUAGE_CODE. You can even manually specify the language used by the current thread using the activate function:

from django.utils.translation import activate

activate('en')
Copy the code

ugettext

In the Python2 era, to distinguish Unicode strings from bytestrings, there were two functions, ugettext and gettext. In Python3, ugettext and gettext are equivalent due to the unification of string encoding. Officials say ugettext may be deprecated in the future, but as of now (Django 2.0), ugettext is not deprecated.

gettext_lazy

Here’s an example to give you an intuition of the difference between gettext_lazy and Gettext

from django.utils.translation import gettext, gettext_lazy, activate, get_language

gettext_str = gettext("Hello World!")
gettext_lazy_str = gettext_lazy("Hello World!")

print(type(gettext_str))
# <class 'str'>
print(type(gettext_lazy_str))
# <class 'django.utils.functional.lazy.<locals>.__proxy__'>

print("current language:", get_language())
# current language: zh
print(gettext_str, gettext_lazy_str)
# Hello world! Hello world!

activate("en")

print("current language:", get_language())
# current language: en
print(gettext_str, gettext_lazy_str)
# Hello world! Hello World!
Copy the code

The gettext function returns a string, but gettext_lazy returns a proxy object. When the object is used, the translation will be determined by the language in the current thread.

This feature is especially useful in Djangos Models. Because the code that defines a string in models is executed only once. In subsequent requests, this so-called string will behave differently depending on the language.

from django.utils.translation import gettext_lazy as _

class MyThing(models.Model):
    name = models.CharField(help_text=_('This is the help text'))

class YourThing(models.Model):
    kind = models.ForeignKey(
        ThingKind,
        on_delete=models.CASCADE,
        related_name='kinds',
        verbose_name=_('kind'),Copy the code

Modify the source code using AST/FST

Since our project is very large, manpower adds _(…) to each string. Too fussy. So I tried to find a way to automate it.

The initial choice is Python’s built-in AST module. The basic idea is to find all the strings in your project through the AST and add _(…) to those strings. . Finally, the modified syntax tree is converted back into code.

However, because the AST does not support formatting information well, it is easy to make changes to the code that result in formatting clutter. So we found an improvement called FST (Full Syntax Tree). My FST library of choice is RedBaron. The core code is as follows:

root = RedBaron(original_code)

for node in root.find_all("StringNode") :if (
        has_chinese_char(node)
        and not is_aleady_gettext(node)
        and not is_docstring(node)
    ):
        node.replace("_ ({})".format(node))

modified_code = root.dumps()
Copy the code

I put the complete code in Gist, because it is a one-time script, write more casual, you can refer to.

There are also some problems with using RedBaron, which are noted here: The biggest problem is that RedBaron has stopped maintenance! So some new syntax, such as Python3.6’s f-string, is not supported. Second, the library was slow compared to the AST standard library, and my computer made a sound like an airplane engine every time I ran the script. The third point is that some strange formats can result:

Modify before:

OutStockSheet = {
    1: 'Unshipped'.2: 'Out of stock'.3: 'Deleted'
}
Copy the code

After modification (the parentheses to the right of ‘deleted’ move to the next line) :

OutStockSheet = {
    1: _ ('Unshipped'),
    2: _ ('Out of stock'),
    3: _ ('Deleted'
)}
Copy the code

The last point can be solved with a formatting tool.

utf8 vs utf-8

Some py files in the project are older and use # coding: UTf8 at the beginning of the file. For Python, UTF8 is an alias for UTF-8, so there is no problem. When Django calls GNU Gettext, the encoding is specified as UTF-8 with a parameter, but GNU also reads the encoding notation in the file, and it has a higher priority. Unfortunately utF8 is an unknown encoding for GNU Gettext, so GNU Gettext will demote to ASCII and report errors when encountering Chinese characters (stupid!). :

$ python3 manage.py makemessages --local en
...
xgettext: ./path/filename.py:1: Unknown encoding "utf8". Proceeding with ASCII instead.
xgettext: Non-ASCII comment at or before ./path/filename.py:26.
Copy the code

So I need to change # coding: utf8 to # coding: utf-8, or just delete this line, because Python3 already uses utf-8 encoding by default.

conclusion

The multilingual capabilities of Django (and the GNU Gettext behind it) are extensive, such as ngettext for singular and plural numbers and pgettext for polysemous words. The translated text is used in the HTTP response, but the gettext_noop of the translated text is left in the log.

This article focuses on the features and pitfalls I’ve encountered in my own practice, and hopefully gives you a basic understanding of Django’s multilingual usage. You are welcome to comment at 👏.


This article is licensed under a Creative Commons Attribution – Noncommercial use – No Deductive 2.5 Mainland China License.