The premise

When I was learning Tomcat today, I started Tomcat using the source code, but the information printed out by the console was garbled. As a Java development, for the code should be not unfamiliar, so at first I went to modify the configuration files as usual, but found that whether GBK or utf-8, console print information is garbled, so I will go to the Internet for relevant experience and blog, after a search, I found a good solution, but there still Some problems, eventually experienced a search finally solved the problem

First solution attempt

Record a Tomcat source start console Chinese garble problem debugging process

I started by following the blogger’s second method and modifying the methods in both classes

  • org.apache.tomcat.util.res.StringManagerIn the classgetString(final String key, final Object... args)
  • org.apache.jasper.compiler.LocalizerOf the classgetMessage(String errCode)

A sentence was added to the corresponding method in both classes

value = new String(value.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
Copy the code

And then when you restart it, it turns out that the console information is correct

Second resolution attempt

However, when I open the Host-manager project and manager project of the Tomcat project, garbled characters appear on the page again

A lengthy debugging process followed to figure out why the garbled code problem was occurring

Why is there a garble problem?

  • First I find a garbled output class in the garbled console, and then CLICK on it to find the output part

  • Let’s go back to this one that was calledgetStringmethods

It finds itself in the same method that the blogger modified earlier, and then, logically, continues to trace the source of the string

Then click on the next source layer

From here you can see that the bundle’s getString method is called, so let’s continue inside

If you look closely at the upper left corner, the top directory is rt.jar, which is the source code loaded by the Bootstrap ClassLoader, and then further inside

You can see it’s a getObject method of this class, and the key is handleGetObject, so let’s go inside

The handleGetObject method of PropertyResourceBundle is called. The key of this method is the lookup. Get (key).

Through the definition and debug content, it is not difficult to guess that the information stored in the map is the string information to obtain. It can be seen that the information stored in the map is already garbled, so when we obtain the string information according to the key, the obtained string itself is a garbled string

And the way we tried it before, after we got the garbled information from it, was to re-encode the string in UTF-8 format, and I guess the reason why the console information is not garbled but host-manager and manager are still garbled is because there are other places where we still use this StringManager, but it doesn’t go in Line is manually recoded, so I wonder if I can make the string in the looup itself correct, not garbled, and the message is loaded by a ResourceBoundle, followed by a debug that relies on a control

As you can see, our bundle is finally created by the newBundle method of Control

Yes, now we have the answer, because when we created the bundle, the source code used the default InputStream, and InputStream was read in ISO-8859-1 by default, and our file was saved in UTF-8, so we have the problem of garbled characters, so we know the problem At first I wanted to change the format of the read with a wrapper

InputStreamReader isr = new InputStreamReader(stream, "UTF-8");

However, it was discovered in hindsight that this was part of the JDK source code and was not allowed to be modified, so does that mean there is no other way? No, look closely at the constructor section of the ResourceBundle

That’s right, in addition to a baseName constructor, we can also pass in a Control constructor, and since the bundle is created by calling the newBundle method of Control, we just need to inherit the Control class and override the newBundle method in the ne overridden In the wBundle method, InputStream is wrapped with InputStreamReader

I searched the Internet and found this answer on StackOverflow

java – How to use UTF-8 in resource properties with ResourceBundle – Stack Overflow

This person’s answer gave a very complete and detailed explanation, and also attached a complete solution. The original reason for reading garbled code is that the default format of ISO-8859-1 is used when InputStream is read. However, the IDE I use is IDEA, and the default encoding format is UTF-8, so when I read garbled code, I read it Because the format is inconsistent and display garble, the problem is found, so how to solve it?

The first method is to convert any characters in a saved file that are outside the ISO-8859-1 encoding into \uXXXX format. You can use the native2ASCIi.exe tool that comes with the JDK

But the whole conversion is a hassle and a lot of files, so I don’t think anyone will use this method

The second option is to pass in a custom UTF8Control when creating a ResourceBundle

Let’s take a look at a ready-made UTF8Control class provided by the blogger

package org.apache.tomcat.util.res;

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.nio.charset.StandardCharsets;
import java.util.Locale;
import java.util.PropertyResourceBundle;
import java.util.ResourceBundle;

public class UTF8Control extends ResourceBundle.Control {
    // Override the parent class's newBundle method
    public ResourceBundle newBundle
            (String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
            throws IllegalAccessException, InstantiationException, IOException {
        // The below is a copy of the default implementation.
        // The default implementation is c-v
        String bundleName = toBundleName(baseName, locale);
        String resourceName = toResourceName(bundleName, "properties");
        ResourceBundle bundle = null;
        InputStream stream = null;
        if (reload) {
            URL url = loader.getResource(resourceName);
            if(url ! =null) {
                URLConnection connection = url.openConnection();
                if(connection ! =null) {
                    connection.setUseCaches(false); stream = connection.getInputStream(); }}}else {
            stream = loader.getResourceAsStream(resourceName);
        }
        if(stream ! =null) {
            try {
                // Only this line is changed to make it to read properties files as UTF-8.
                // This is the key point, wrapping the original InputStremReader that is read in UTF-8 format
                // UtF-8 format will be read by default!!
                bundle = new PropertyResourceBundle(new InputStreamReader(stream, StandardCharsets.UTF_8));
            } finally{ stream.close(); }}returnbundle; }}Copy the code

Let’s test the results

The third solution is to use the corresponding IDE. In Eclipse, characters outside the ISO-8859-1 range are automatically converted to \uXXXX format when processing. Properties files, so you don’t need to do any setup when using Eclipse, you can start T without any garble omcat!!!

As you can see, I’m now commenting out all of the previous statements, so let’s run and see what happens

Refer to the link

java – How to use UTF-8 in resource properties with ResourceBundle – Stack Overflow

Note a tomcat source start console Chinese garble debugging process _zhoutaoping1992 blog -CSDN blog