“This is the second day of my participation in the First Challenge 2022, for more details: First Challenge 2022”.

Hello, everyone, I am Huaijin Shake Yu, a big data meng new, home has two gold swallowing beast, Jia and Jia, can code can teach next almighty dad

If you like my article, you can [follow ⭐]+[like 👍]+[comment 📃], your three companies is my motivation, I look forward to growing up with you ~

preface

Hive compresses both intermediate and final data to improve throughput and performance.

Generally, a high compression ratio occupies a small space but decomposes slowly, and vice versa. The commonly used compression formats are GZIP, BZIP2, and Snappy

Compression way	Compressed size	Compression speed	Separable or not
GZIP	In the	In the	no
BZIP2	small	slow	is
Snappy	big	fast	is

Snappy is the most commonly used compression method for big data storage because it can be decompressed quickly and separated.

The cause of

Run batch today, read the HDFS Snappy file error, used here is org.. Apache hadoop. Hive. Serde2. JsonSerDe analytic method, analytic TEXTFILE JSON file storage, compression is Snappy, Just want to see Snappy inside the specific file content.

To solve the problem

1. View information on the CLI

hadoop fs -text /XXX/XXX.snappy
Copy the code

You can run the -text command to view the file or output the command to the file.

The disadvantage is that it is not convenient to perform some complex processing, or output statistical results.

2. Use code parsing

After code parsing, it can be read and processed, or statistical results can be output, or abnormal data can be located directly.

CompressionCodec has two methods for compressing or decompressing data. To compress data that is being written to an output stream, create a CompressionOutputStream using the createOutputStream(OutputStreamout) method and write it to the underlying stream in a compressed format. Instead, to decompress data read from an input stream, call createInputStream(InputStreamin) to obtain a CompressionInputStream, which reads uncompressed data from the underlying stream.

Compression way	Compressed package
GZIP	org.apache.hadoop.io.compress.GzipCodec
BZIP2	org.apache.hadoop.io.compress.BZip2Codec
Snappy	org.apache.hadoop.io.compress.SnappyCodec
DEFLATE	org.apache.hadoop.io.compress.DefaultCodec
The search code for compression and decompression is as follows:

public static void main(String[] args) throws IOException { decompres("d:\a1-k01-1642561371606.snappy"); } public static void compress(String filername, String method) throws ClassNotFoundException, IOException {// 1 Create the input stream of the compressed File path File fileIn = new File(filername); InputStream in = new FileInputStream(fileIn); // 2 Obtain the compressed method of Class codecClass = class.forname (method); Configuration conf = new Configuration(); / / 3 by name to find the corresponding coding/decoder CompressionCodec codec = (CompressionCodec) ReflectionUtils. NewInstance (codecClass, conf); / / 4 the compression method corresponding to the File extension File fileOut = new File (filername + codec. GetDefaultExtension ()); OutputStream out = new FileOutputStream(fileOut); CompressionOutputStream cout = codec.createOutputStream(out); // Ioutils. copyBytes(in, cout, 1024 * 1024 * 5, false); // Ioutils. copyBytes(in, cout, 1024 * 1024 * 5, false); // Set buffer to 5MB // 6 close resource in.close(); cout.close(); out.close(); } public static void decompres(String filename) throws FileNotFoundException, IOException { Configuration conf = new Configuration(); CompressionCodecFactory factory = new CompressionCodecFactory(conf); // 1 Obtain the file compression method CompressionCodec codec = factory.getCodec(new Path(filename)); // 2 Check whether the compression method exists if (null == codec) {system.out. println("Cannot find codec for file "+ filename); return; } InputStream cin = codec.createInputStream(new FileInputStream(filename)); // 4 Create output stream File fout = new File(filename + ".decoded"); OutputStream out = new FileOutputStream(fout); // Ioutils. copyBytes(cin, out, 1024 * 1024 * 5, false); // 6 Close the resource cine.close (); out.close(); }Copy the code

conclusion

If you like my article, you can [follow ⭐]+[like 👍]+[comment 📃], your three companies is my motivation, I look forward to growing up with you ~

You can pay attention to the public number “Huaijin Shake Yu jia and Jia”, access to resources download

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Snappy, YOU drag your vest and I still know you

preface

The cause of

To solve the problem

1. View information on the CLI

2. Use code parsing

conclusion

Snappy, YOU drag your vest and I still know you

preface

The cause of

To solve the problem

1. View information on the CLI

2. Use code parsing

conclusion

Related Posts

Deploy VSCode to the server, open a browser, and log on! Start “Cloud code”!

Server model — from single-thread blocking to multi-thread non-blocking (bottom)

Redis Combat: Easy and elegant to achieve the list