This is the fifth in a series of articles. For the first few, check out the link

Good News for programmers – An introduction to Apache Commons

Good news for Programmers – Apache Commons Lang

Programmer’s Gospel – Apache Commons IO

Good news for programmers – Apache Commons Codec

Apache Commons Compress provides a number of codec – related utility classes. Compress the latest version is 1.21, and Java8 or higher is the minimum requirement.

Maven coordinates are as follows:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.21</version>
</dependency>
Copy the code

The following is the overall structure:

org.apache.commons.compress

org.apache.commons.compress.archivers

org.apache.commons.compress.changes

org.apache.commons.compress.compressors

org.apache.commons.compress.parallel

org.apache.commons.compress.utils

org.apache.commons.compress.harmony

Here are only some of them that are commonly used to illustrate, the rest of the interested can browse the source code research.

01. Compression

Compression: an algorithm to reduce the size of the space occupied by a file

Decompress: restore the file according to the corresponding reverse algorithm

Compress comes with many compression-related classes. The main ones are as follows

Compressed GzipCompressorOutputStream: * * * * “*. Gz” file

GzipCompressorInputStream: unzip “*. Gz” file

BZip2CompressorOutputStream: compression “*..bz2 files.

BZip2CompressorInputStream: decompression “. *.bz2 files”

XZCompressorOutputStream: compresses the “*. Xz “file

XZCompressorInputStream: decompresses the “*. Xz “file

FramedLZ4CompressorOutputStream: compression “*. Lz4” file

FramedLZ4CompressorInputStream: unzip “*. Lz4” file

BlockLZ4CompressorOutputStream: compression “*. Block_lz4” file

BlockLZ4CompressorInputStream: unzip “*. Block_lz4” file

Pack200CompressorOutputStream: compression *. “pack” file

Pack200CompressorInputStream: unzip *. “pack” file

DeflateCompressorOutputStream: compression *. “deflate” file

DeflateCompressorInputStream: unzip *. “deflate” file

LZMACompressorOutputStream: compression “*. Lzma” file

LZMACompressorInputStream: unzip “*. Lzma” file

Compression FramedSnappyCompressorOutputStream: “*. Sz” files

FramedSnappyCompressorInputStream: unzip *. “sz” file

ZCompressorInputStream: decompresses the “*.Z” file

Here’s a quick example

1. gzip

Gzip is a common compression tool on Unix and Linux, and is a very popular compression technology on WEB sites today. There are concepts such as compression levels, which can be set using GzipParameters. JDK8 also comes with the GZIPInputStream class, which is used similarly.

/ / gzip compression
String file = "/test.js";
GzipParameters parameters = new GzipParameters();
parameters.setCompressionLevel(Deflater.BEST_COMPRESSION);
parameters.setOperatingSystem(3);
parameters.setFilename(FilenameUtils.getName(file));
parameters.setComment("Test file");
parameters.setModificationTime(System.currentTimeMillis());
FileOutputStream fos = new FileOutputStream(file + ".gz");
try (GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(fos, parameters);
    InputStream is = new FileInputStream(file)) {
    IOUtils.copy(is, gzos);
}
Copy the code
/ / gzip decompression
String gzFile = "/test.js.gz";
FileInputStream is = new FileInputStream(gzFile);
try (GzipCompressorInputStream gis = new GzipCompressorInputStream(is)) {
    GzipParameters p = gis.getMetaData();
    File targetFile = new File("/test.js");
    FileUtils.copyToFile(gis, targetFile);
    targetFile.setLastModified(p.getModificationTime());
}
Copy the code

2. bz2

Bz2 is a common compressed file format in Linux. It is a compressed file whose suffix ends in. Bz2 and is generated by bzip2, a compression tool with high compression rate.

/ / compression.bz2
String srcFile = "/test.tar";
String targetFile = "/test.tar.bz2";
FileOutputStream os = new FileOutputStream(targetFile);
try (BZip2CompressorOutputStream bzos = new BZip2CompressorOutputStream(os);
    InputStream is = new FileInputStream(srcFile)) {
    IOUtils.copy(is, bzos);
}
Copy the code
/ / decompression.bz2
String bzFile = "/test.tar.bz2";
FileInputStream is = new FileInputStream(bzFile);
try (BZip2CompressorInputStream bzis = new BZip2CompressorInputStream(is)) {
    File targetFile = new File("test.tar");
    FileUtils.copyToFile(bzis, targetFile);
}
Copy the code

The other compression algorithms are used in much the same way as BZ2, so I won’t do any code examples here.

02. Archive

Archiving: Organizing many scattered files into one file, the total size of the file is basically the same

Unpack: Release files from the archive

Compress comes with many archiving related classes

TarArchiveOutputStream: archives the “*.tar” file

TarArchiveInputStream: unpackages the “*.tar” file

ZipArchiveOutputStream: Archive compressed “*.zip” files

ZipArchiveInputStream: decompress the “*.zip” file

JarArchiveOutputStream: archive and compress “*.jar” files

JarArchiveInputStream: Unpack and decompress the “*.jar” file

DumpArchiveOutputStream: Archive the “*.dump” file

DumpArchiveInputStream: unpackages the “*.dump” file

CpioArchiveOutputStream: archive compressed “*. Cpio “files

CpioArchiveInputStream: unpack and decompress the “*.cpio” file

ArArchiveOutputStream: archive compressed “*. Ar “files

ArArchiveInputStream: Decompress the “*.ar” file

ArjArchiveInputStream: Unpack the “*.arj” file

SevenZOutputFile: archives compressed “*.7z” files

SevenZFile: decompress the *.7z file

Among them, ZIP, JAR, CPIO, AR and 7Z support both archiving and compression, and can do compression processing during the archiving process.

Since they deal with piecemeal files, there is the concept of ArchiveEntry, where an ArchiveEntry represents a directory or file within an archive package. Let’s take a quick look at an example

1. tar

Tar is a commonly used compression and archival tool on Unix and Linux systems. It can combine multiple files into a single file, and the file suffix is “tar”.

/ / the tar compression
public void tar(a) throws IOException {
    File srcDir = new File("/test");
    String targetFile = "/test.tar";
    try (TarArchiveOutputStream tos = new TarArchiveOutputStream(
            new FileOutputStream(targetFile))) {
        tarRecursive(tos, srcDir, ""); }}// Recursively compress files and directories under the directory
private void tarRecursive(TarArchiveOutputStream tos, File srcFile, String basePath) throws IOException {
    if (srcFile.isDirectory()) {
        File[] files = srcFile.listFiles();
        String nextBasePath = basePath + srcFile.getName() + "/";
        if (ArrayUtils.isEmpty(files)) {
            / / empty directory
            TarArchiveEntry entry = new TarArchiveEntry(srcFile, nextBasePath);
            tos.putArchiveEntry(entry);
            tos.closeArchiveEntry();
        } else {
            for(File file : files) { tarRecursive(tos, file, nextBasePath); }}}else {
        TarArchiveEntry entry = newTarArchiveEntry(srcFile, basePath + srcFile.getName()); tos.putArchiveEntry(entry); FileUtils.copyFile(srcFile, tos); tos.closeArchiveEntry(); }}Copy the code
/ / the tar decompressed
public void untar(a) throws IOException {
    InputStream is = new FileInputStream("/test.tar");
    String outPath = "/test";
    try (TarArchiveInputStream tis = new TarArchiveInputStream(is)) {
        TarArchiveEntry nextEntry;
        while((nextEntry = tis.getNextTarEntry()) ! =null) {
            String name = nextEntry.getName();
            File file = new File(outPath, name);
            // If it is a directory, create the directory
            if (nextEntry.isDirectory()) {
                file.mkdir();
            } else {
                // The file is written to the specific pathFileUtils.copyToFile(tis, file); file.setLastModified(nextEntry.getLastModifiedDate().getTime()); }}}}Copy the code

2. 7z

7Z is a new compression format with extremely high compression ratio.

Main features of the 7Z format:

  • Open structure

  • High compression ratio

  • Powerful AES-256 encryption

  • Compatible with any compression, conversion, encryption algorithm

  • Supports file compression of up to 16000000000 GB

  • A file name that is standard in Unicode

  • Supports solid compression

  • Support file header compression

/ / 7 z compression
public void _7z(a) throws IOException {
    try (SevenZOutputFile outputFile = new SevenZOutputFile(new File("/test.7z"))) {
        File srcFile = new File("/test");
        _7zRecursive(outputFile, srcFile, ""); }}// Recursively compress files and directories under the directory
private void _7zRecursive(SevenZOutputFile _7zFile, File srcFile, String basePath) throws IOException {
    if (srcFile.isDirectory()) {
        File[] files = srcFile.listFiles();
        String nextBasePath = basePath + srcFile.getName() + "/";
        / / empty directory
        if (ArrayUtils.isEmpty(files)) {
            SevenZArchiveEntry entry = _7zFile.createArchiveEntry(srcFile, nextBasePath);
            _7zFile.putArchiveEntry(entry);
            _7zFile.closeArchiveEntry();
        } else {
            for(File file : files) { _7zRecursive(_7zFile, file, nextBasePath); }}}else {
        SevenZArchiveEntry entry = _7zFile.createArchiveEntry(srcFile, basePath + srcFile.getName());
        _7zFile.putArchiveEntry(entry);
        byte[] bs = FileUtils.readFileToByteArray(srcFile); _7zFile.write(bs); _7zFile.closeArchiveEntry(); }}Copy the code
 / / 7 z decompression
public void un7z(a) throws IOException {
    String outPath = "/test";
    try (SevenZFile archive = new SevenZFile(new File("test.7z"))) {
        SevenZArchiveEntry entry;
        while((entry = archive.getNextEntry()) ! =null) {
            File file = new File(outPath, entry.getName());
            if (entry.isDirectory()) {
                file.mkdirs();
            }
            if (entry.hasStream()) {
                final byte [] buf = new byte [1024];
                final ByteArrayOutputStream baos = new ByteArrayOutputStream();
                for (int len = 0; (len = archive.read(buf)) > 0;) {
                    baos.write(buf, 0, len); } FileUtils.writeByteArrayToFile(file, baos.toByteArray()); }}}}Copy the code

3. Ar, ARj, CPIO, dump, zip, jar

These compression utility classes are used in a similar way to tar, so I won’t do an example

03. Modify the archive file

Sometimes we have the need to modify the files in the archive, such as adding or deleting a file, modifying the contents of the file, etc., of course, we can also extract all the changes in the compressed back. In addition to more code, large archive files also lead to longer operation time. Is there a way to dynamically modify the contents of the archive using code?

Org.apache.commons.com press. The changes under the package right provides some classes are used to dynamically modify the contents in the archive. Let’s look at a simple example

String tarFile = "/test.tar";
InputStream is = new FileInputStream(tarFile);
// The original test.tar will be overwritten after the replacement, and Windows may overwrite the error because the file was accessed
OutputStream os = new FileOutputStream(tarFile);
try (TarArchiveInputStream tais = new TarArchiveInputStream(is);
     TarArchiveOutputStream taos = new TarArchiveOutputStream(os)) {
    ChangeSet changes = new ChangeSet();
    // Delete "dir/1.txt" from "test.tar"
    changes.delete("dir/1.txt");
    // Delete the "t" directory in "test.tar"
    changes.delete("t");
    // Add the file or replace it if it already exists
    File addFile = new File("/a.txt");
    ArchiveEntry addEntry = taos.createArchiveEntry(addFile, addFile.getName());
    // Add can pass a third argument: true: replace if it already exists (default), false: do not replace
    changes.add(addEntry, new FileInputStream(addFile));
    // Perform the modification
    ChangeSetPerformer performer = new ChangeSetPerformer(changes);
    ChangeSetResults result = performer.perform(tais, taos);
}
Copy the code

4. Other

1. Simple factory

Commons-compress also provides some simple factory-class users to retrieve compression and archive streams dynamically.

// Use factory to dynamically retrieve the archive stream
ArchiveStreamFactory factory = new ArchiveStreamFactory();
String archiveName = ArchiveStreamFactory.TAR;
InputStream is = new FileInputStream("/in.tar");
OutputStream os = new FileOutputStream("/out.tar");
// Get the implementation class dynamically, where ais is actually TarArchiveOutPutStream
ArchiveInputStream ais = factory.createArchiveInputStream(archiveName, is);
ArchiveOutputStream aos = factory.createArchiveOutputStream(archiveName, os);
// Other business operations

// ------------------------

// Use factory to dynamically obtain the compressed stream
CompressorStreamFactory factory = new CompressorStreamFactory();
String compressName = CompressorStreamFactory.GZIP;
InputStream is = new FileInputStream("/in.gz");
OutputStream os = new FileOutputStream("/out.gz");
// Get the implementation class dynamically, where ais is actually TarArchiveOutPutStream
CompressorInputStream cis = factory.createCompressorInputStream(compressName, is);
CompressorOutputStream cos = factory.createCompressorOutputStream(compressName, os);
// Other business operations
Copy the code

2. Decompress the package

A lot of this is a single operation, but what about decompressing “test.tar.gz” as an archive and compression?

In fact, it’s very simple, we don’t need to decompress and unpack first, we can do it at the same time, we just need to wrap the corresponding stream (I have to sigh Java IO decorator pattern design is really clever). Let’s look at a code example

// Decompress the test.tar.gz file
String outPath = "/test";
InputStream is = new FileInputStream("/test.tar.gz");
// Decompress first, so you need to wrap the file stream with gzip stream first
CompressorInputStream gis = new GzipCompressorInputStream(is);
// In unpacking, wrap the gzip stream with a tar stream
try (ArchiveInputStream tgis = new TarArchiveInputStream(gis)) {
    ArchiveEntry nextEntry;
    while((nextEntry = tgis.getNextEntry()) ! =null) {
        String name = nextEntry.getName();
        File file = new File(outPath, name);
        // If it is a directory, create the directory
        if (nextEntry.isDirectory()) {
            file.mkdir();
        } else {
            // The file is written to the specific pathFileUtils.copyToFile(tgis, file); file.setLastModified(nextEntry.getLastModifiedDate().getTime()); }}}Copy the code

05. Conclusion

In addition to the tool classes described above, there are other less commonly used and I will not do much to introduce. Interested can browse the source code research.

I look forward to your attention as I continue to introduce you to other useful utility libraries in The Commons section.