This is the fourth article in a series. For the first few, please click on the link

Good News for programmers – An introduction to Apache Commons

Good news for Programmers – Apache Commons Lang

Programmer’s Gospel – Apache Commons IO

Apache Commons Codec provides a number of Codec related utility classes. The latest version of Codec is currently 1.15, and the minimum requirement is Java7.

Maven coordinates are as follows:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.15</version>
</dependency>
Copy the code

The following is the overall structure:

Here are only some of them that are commonly used to illustrate, the rest of the interested can browse the source code research.

01. Binary correlation

Binary packages mainly provide hexadecimal, Base64, Base32 codec utility classes.

1. Hexadecimal (Hex)

Hexadecimal is often used to represent binary in a shorter form. For example, MD5 is 128 bits, which is too long to represent, but only 32 characters are needed to represent binary in hexadecimal. The sample code is as follows

// Convert the byte array to a hexadecimal string
String hex = Hex.encodeHexString("123".getBytes());
System.out.println(hex);
// Hexadecimal string decoding
byte[] src = Hex.decodeHex(hex);
System.out.println(new String(src));
Copy the code

Base64, Base32, Base16

Base64 is one of the most common encodings used to transmit binary data on the network. Base64 is a method of representing binary data based on 64 printable characters. Base32 uses 32 printable characters, and Base16 uses 16 (which is actually hexadecimal).

The name of the Encoding table string Whether the digit shortage will be completed
base16 Numbers 0 to 9 and letters A to F No, the digits happen to be multiples of 4
base32 Letters A to Z and numbers 2 to 7 will
base64 Base Uppercase letters A-Z, lowercase letters A-Z, digits 0 to 9, and “+”, “/” will
/ / base64 encoding
String base64 = Base64.encodeBase64String("Test".getBytes());
System.out.println(base64);
/ / base64 decoding
byte[] src = Base64.decodeBase64(base64);
System.out.println(new String(src));
// Whether the string is base64
Base64.isBase64(base64);

// Base32
Copy the code

Codec also provides the Base series stream processing, which streams the Base Codec as shown in the following example

// Provide Base64 encoding and decoding as a stream
// Attach: base64 encoding of "123" as "MTIz"

// Base64 encoding the input stream
InputStream is = new ByteArrayInputStream("123".getBytes());
Base64InputStream ebis = new Base64InputStream(is, true);
String enc = IOUtils.toString(ebis, "UTF-8"); // MTIz

// Decode the Base64 data stream
is = new ByteArrayInputStream(enc.getBytes());
Base64InputStream dbis = new Base64InputStream(is, false);
String dec = IOUtils.toString(dbis, "UTF-8"); / / 123

// -----------------------

// Write the data base64 encoded to the output stream
final String data = "123";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Base64OutputStream ebos = new Base64OutputStream(baos, true);
IOUtils.write(data, ebos, "UTF-8");
String enc2 = baos.toString(); // MTIz

// Write base64 data as decoded to the output stream
baos = new ByteArrayOutputStream();
Base64OutputStream dbos = new Base64OutputStream(baos, false);
IOUtils.write(data, dbos, "UTF-8");
String dec2 = dbos.toString(); / / 123
Copy the code

02. URL

Urls are encoded because some characters in urls can be ambiguous.

For example, the URL parameter string is passed in the form of key=value. The key and value pairs are separated by ampersand (&), such as /s? Q = abc&ie = utf-8. If your value string contains = or &, the server receiving the URL will parse it incorrectly, so the ambiguous & and = symbols must be escaped, that is, encoded.

The URL is encoded in ASCII, not Unicode, which means you can’t include any non-ASCII characters in the URL, such as Chinese. Otherwise, Chinese may cause problems if client and server browsers support different character sets.

The principle of URL encoding is to use secure characters (printable characters with no special purpose or meaning) to represent insecure characters.

The sample codec code is shown below

URLCodec urlCodec = new URLCodec();
/ / url encoding
String encUrl = urlCodec.encode("Http://x.com?f=");
System.out.println(encUrl);
/ / url decoding
String decUrl = urlCodec.decode(encUrl);
System.out.println(decUrl);
Copy the code

03. Summary algorithm

The algorithm is a one-way hash algorithm, which satisfies the following characteristics.

  • The input length is arbitrary

  • The output length is fixed

  • For every given input, it is easy to calculate the output

  • Irreversible, unable to deduce the original data from the output

  • The output is independent of the input. That is, if you change the input by one byte you get a very different result

Due to the above characteristics of the abstract algorithm, it is mainly used for data integrity check. For example, a resource on the Internet usually provides a summary value (generally using MD5 algorithm). After downloading the resource, the user can use the tool to compare the MD5 value with the given value on the Internet. If the value is inconsistent, it means that the file is incomplete.

You can also do the fingerprint of data, such as web disk second transmission, is to use the summary value to make a judgment. The client upload the file first before doing the value, to the server, the server found that have the same file contents is consistent with the two files, so you don’t need to upload the file storage path to the file directly, not only achieved a pass, also save the server disk space (the same content of different users file actually point to the same document).

Many systems also store passwords in MD5, which is not as secure. Md5 already has a lot of public results, and it’s easy to crack using rainbow table collisions. Therefore, md5 is not recommended for storing passwords. The BCrypt algorithm is recommended.

There are several algorithms as follows

  • Message Digest (MD) : Message Digest

  • Secure Hash Algorithm (SHA) : indicates the Secure Hash

  • Message Authentication Code (MAC) : indicates the Message Authentication Code

1. MD series

MD2, MD4, and MD5 are commonly used

If you use Java's built-in API, you need more than 10 lines to implement md5 algorithm

// Do MD5 for data, parameters support string, byte data, input stream
String md5 = DigestUtils.md5Hex("Test");
Copy the code

2. SHA series

The SHA series includes SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, SHA3-224, SHA3-256, SHA3-384, and SHA3-512. At present, for the sake of safety, the general choice is more than 256, and the recommendation is more than 384. Of course, the longer the summary is, the longer the calculation time is.

// Parameter support string, byte data, input stream
String sha1 = DigestUtils.sha1Hex("Test");
String sha256 = DigestUtils.sha256Hex("Test");
String sha384 = DigestUtils.sha384Hex("Test");
String sha512 = DigestUtils.sha512Hex("Test");
String sha3_256 = DigestUtils.sha3_256Hex("Test");
String sha3_384 = DigestUtils.sha3_384Hex("Test");
String sha3_512 = DigestUtils.sha3_512Hex("Test");
Copy the code

3. The HMAC series

The Keyed-hash Message Authentication Code (HMAC) series is the Hash algorithm containing the key, and includes the MD and SHA series of Message digest algorithms. Fusion MD, SHA:

MD series: HMacMD2, HMacMD4, HMacMD5

SHA series: HMacSHA1, HMacSHA224, HMacSHA256, HMacSHA38

HMacSHA512,

String key = "asdf3234asdf3234asdf3234asdf3234";
String valueToDigest = "Test data"// The valueToDigest parameter supports byte data, streams, files, etc
// Make hMAC-MD5 digest
String hmacMd5 = new HmacUtils(HmacAlgorithms.HMAC_MD5, key).hmacHex(valueToDigest);
// Make hMAC-SHA digest
String hmacSha256 = new HmacUtils(HmacAlgorithms.HMAC_SHA_256, key).hmacHex(valueToDigest);
String hmacSha384 = new HmacUtils(HmacAlgorithms.HMAC_SHA_384, key).hmacHex(valueToDigest);
String hmacSha512 = new HmacUtils(HmacAlgorithms.HMAC_SHA_512, key).hmacHex(valueToDigest);
Copy the code

4. The command line

The COdec package also provides an entry point to the command line summary algorithm.

java -cp ./commons-codec-1.15.jar org.apache.commons.codec.cli.Digest MD5 123
Copy the code

05. Conclusion

In addition to the tool classes described above, there are other less commonly used and I will not do much to introduce. Interested can browse the source code research.

I look forward to your attention as I continue to introduce you to other useful utility libraries in The Commons section.