Recently, I was looking through two relatively new Python books and found that they both made a serious elementary mistake!

The books are Python Programming: From Getting Started to Practice and Father and Son’s Programming Journey, both bestsellers, both released in new editions in October 2020, and both use Python3.7+ syntax.

However, when it comes to naming variables, they make the same mistake: they still use python2-era rhetoric that names only support combinations of “letters, numbers, and underscores.”

In fact, Python3.x already supports full Unicode encoding, such as the use of Chinese variable names.

>>> Name ="Python cat"
>>> print(F I am.{name}, welcome to follow!") I am Python Cat, welcome to follow!Copy the code

I’m not sure how many new editions of books still use the old rules, since I don’t have any other samples on hand. However, translation books are likely to have such problems. In addition, some domestic books that are not rigorous may also make mistakes because they draw on outdated materials.

I’m afraid that some students who are new to Python will get the wrong idea. While this may not be a serious problem, it is one that should be avoided and easily avoided.

Therefore, I think this topic is worth talking about.

In programming languages, there is a very common concept, namely identifier, commonly known as name, used to identify variables, constants, functions, classes, symbols and other entities names.

There are some basic rules you must consider when defining identifiers:

  • What characters can it be composed of?
  • Is it case sensitive? (case sensitive)
  • Does it allow for special words? (i.e. keywords/reserved words)

For the first question, most programming languages followed this rule in early versions: identifiers consist of letters, numbers, and underscores, and cannot start with a number. A few programming languages, with exceptions, support special symbols like $, @, %, etc. (PHP, Ruby, Perl, etc.).

Earlier versions of Python, specifically prior to 3.0, followed this naming convention. Here’s what the official documentation says:

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"."z"
uppercase  ::=  "A"."Z"
digit      ::=  "0"."9"
Copy the code

Reference: docs.python.org/2.7/referen…

However, this rule has been broken since version 3.0. The latest official document reads:

Reference: docs.python.org/3/reference…

With the popularization of the Internet, languages of various countries have entered into the context of internationalization, and programming languages have also increased their demands for internationalization with The Times.

The Unicode coding standard was released in 1994 and has since been adopted by major programming languages. So far, at least 73 programming languages support Unicode variable names (data basis: rosettacode.org/wiki/Unicod…

In 2007, when Python was working on its landmark 3.0 release, official support for Unicode encodings was also considered, hence the important PEP 3131 — Supporting Non-ASCII Identifiers.

Source: www.python.org/dev/peps/pe…

In fact, there’s a lot more to the Unicode character set than just Chinese, which we care about most.

All of the following uses are acceptable when naming variables (use caution, the cat is not responsible for being beaten…) :

>>> Bits of =1
>>> Δ = 1
>>> ಠ _ ಠ ="hello"
Copy the code

To sum up, some Python books on variable naming conventions are out of date and should not be misled by them!

Python 3, as a modernization-oriented/internationalization-oriented language, has good support for Unicode encoding. Whether you should name your identifiers in Chinese in your project is another matter…