preface

Thrift supports serialization and deserialization of binary, compressed, and JSON-formatted data. Developers can be more flexible in choosing the exact form of the protocol. The protocol is freely extensible, new version of the protocol, fully compatible with the old version!

The body of the

Introduction to data interchange formats

Currently popular data interchange formats can be divided into the following categories:

(1) self-analytic type

The serialized data contains the complete structure, including the field name and value value. For example, XML/JSON/Java Serizable, McPack/Compack of Big Baidu, all fall into this category. That is, adjusting the order of different attributes has no effect on serialization/deserialization.

(2) semi-analytic type

Serialized data, discarding some information, such as the field name, but introducing indexes (often in the form of ID +type) to correspond to specific attributes and values. Examples of this are Google Protobuf/Thrift.

(3) no analytic type

Legend of baidu infpack implementation, is with the help of this way to achieve, discarded a lot of effective information, performance/compression ratio is the best, but backward compatibility needs to develop certain work, details do not know.

Exchange format type advantages disadvantages
Xml The text Easy to read Bloated and does not support binary data types
JSON The text Easy to read Discards type information, such as “score”:100, which is ambiguous for int/double, and does not support binary data types
Java serizable binary Using a simple Bloated, limited to the JAVA domain
Thrift binary efficient Hard to read, backward compatibility with certain conventions
Google Protobuf binary efficient Hard to read, backward compatibility with certain conventions

Data type of Thrift

  1. Basic type: bool: Boolean byte: 8-bit signed integer I16:16-bit signed integer I32:32-bit signed integer I64:64-bit signed integer Double: 64-bit floating point number string: Utf-8 encoded string binary
  2. Struct: struct object defined
  3. Container types: list: a list of ordered elements Set: an unordered set of non-repeating elements Map: an ordered set of keys and values
  4. Exception type: exception: indicates the exception type
  5. Service type: service: indicates the service class

Thrift’s serialization protocol

Thrift allows the user to select the types of transmission protocols between the client and server. The transmission protocols are generally divided into text and binary. To save bandwidth and improve transmission efficiency, binary transmission protocols are generally used, and text-based protocols are sometimes used based on the actual requirements of the project or product. Common protocols are as follows:

  • TBinaryProtocol: Transmits data in binary encoding format
  • TCompactProtocol: an efficient, dense binary encoding format for data transfer
  • Use TJSONProtocol:JSONThe textData encoding protocol for data transmission
  • TSimpleJSONProtocol: Provided onlyJSONJust writeThe agreement applies throughScript language parsing

A test of Thrift serialization

(a). First write a simple thrift file called pair.thrift:

struct Pair {
    1: required string key
    2: required string value
}
Copy the code

The required field is identified here, requiring that it be properly assigned when used, or a TProtocolException will be thrown at runtime. When the default and are specified as optional, non-null field validation is not performed at runtime.

(b) compile and generate Java source code:

thrift -gen java pair.thrift
Copy the code

(c). Write test code for serialization and deserialization:

  • The serialization test willPairObject to a file
private static void writeData(a) throws IOException, TException {
    Pair pair = new Pair();
    pair.setKey("key1").setValue("value1");
    FileOutputStream fos = new FileOutputStream(new File("pair.txt"));
    pair.write(new TBinaryProtocol(new TIOStreamTransport(fos)));
    fos.close();
}
Copy the code
  • Deserialization tests, generated parsing from filesPairobject
private static void readData(a) throws TException, IOException {
    Pair pair = new Pair();
    FileInputStream fis = new FileInputStream(new File("pair.txt"));
    pair.read(new TBinaryProtocol(new TIOStreamTransport(fis)));
    System.out.println("key => " + pair.getKey());
    System.out.println("value => " + pair.getValue());
    fis.close();
}
Copy the code

(d) Observe the running results, and the normal output indicates that the serialization and deserialization processes are completed normally.

Thrift protocol source code

(I) writeData() analysis

Write (TProtocol) (pair.write(TProtocol));

Look at the scheme() method and decide whether to use a TupleScheme plan or a StandardScheme for serialization, which is the default.

Write () method under StandardScheme:

Here are a few steps:

(a). Verify that the field is correctly assigned against the required field defined in the Thrift IDL file.

public void validate(a) throws org.apache.thrift.TException {
  // check for required fields
  if (key == null) {
    throw new org.apache.thrift.protocol.TProtocolException("Required field 'key' was not present! Struct: " + toString());
  }
  if (value == null) {
    throw new org.apache.thrift.protocol.TProtocolException("Required field 'value' was not present! Struct: "+ toString()); }}Copy the code

(b). Use writeStructBegin() to record the start mark of the write structure.

public void writeStructBegin(TStruct struct) {}
Copy the code

(c). Write each field of the Pair one by one, including the field start mark, field value, and field end mark.

if(struct.key ! =null) {
  oprot.writeFieldBegin(KEY_FIELD_DESC);
  oprot.writeString(struct.key);
  oprot.writeFieldEnd();
}
/ / to omit...
Copy the code

(1). The first is the field start tag, including type and field-ID. Type is the id of the data type of the field. Field-id is the field order defined by Thrift IDL. For example, key is 1 and value is 2.

public void writeFieldBegin(TField field) throws TException {
  writeByte(field.type);
  writeI16(field.id);
}
Copy the code

Thrift provides TType, which provides typeids that are uniquely identified for different types of data.

public final class TType {
    public static final byte STOP   = 0;   // Data was read and written
    public static final byte VOID   = 1;   / / null
    public static final byte BOOL   = 2;   / / a Boolean value
    public static final byte BYTE   = 3;   / / byte
    public static final byte DOUBLE = 4;   // Double precision floating point type
    public static final byte I16    = 6;   / / short integer
    public static final byte I32    = 8;   / / integer
    public static final byte I64    = 10;  / / long integer
    public static final byte STRING = 11;  // A string of characters
    public static final byte STRUCT = 12;  // Reference type
    public static final byte MAP    = 13;  // Map
    public static final byte SET    = 14;  / / collection
    public static final byte LIST   = 15;  / / list
    public static final byte ENUM   = 16;  / / the enumeration
}
Copy the code

(2). Then write the value of the field, and according to the data type of the field, summarize it into the following implementation: WriteByte (), writeBool(), writeI32(), writeI64(), writeDouble(), writeString(), and writeBinary() methods.

The TBinaryProtocol caches temporary byte data written to or read from a byte array of length 8.

private final byte[] inoutTemp = new byte[8];
Copy the code

** common sense 1: ** hexadecimal introduction. Data starting with 0x is represented in hexadecimal, and 0xff is 255 in decimal. In the hexadecimal system, the letters A, B, C, D, E, and F represent 10, 11, 12, 13, 14, and 15, respectively.

Hexadecimal to decimal: F indicates 15. The weight of the NTH bit is 16^ n, from right to left from 0:0xFF = 1516^1 + 1516^0 = 255 Hexadecimal change from binary to decimal: 0xFF = 1111 1111 = 2^ 8-1 = 255

Common sense 2: The use of bit operators. >> represents the right shift symbol, for example: int I =15; The result of I >>2 is 3, and what moved out will be discarded. While << represents a left shift symbol, as opposed to >>.

The binary form may be better understood, but 0000 1111(15) moves two bits to the right and 0000 0011(3), 0001 1010(18) moves three bits to the right and 0000 0011(3).

  • WriteByte () : Writes a single byte of data.
public void writeByte(byte b) throws TException {
  inoutTemp[0] = b; Trans_. Write (inoutTemp,0.1);
}
Copy the code
  • WriteBool () : Writes Boolean data.
public void writeBool(boolean b) throws TException {
  writeByte(b ? (byte)1 : (byte)0);
}
Copy the code
  • writeI16(): writeShort integershortType data.
public void writeI16(short i16) throws TException {
  inoutTemp[0] = (byte) (0xff & (i16 >> 8));
  inoutTemp[1] = (byte) (0xff& (i16)); Trans_. Write (inoutTemp,0.2);
}
Copy the code
  • writeI32(): writeThe integerintType data.
public void writeI32(int i32) throws TException {
  inoutTemp[0] = (byte) (0xff & (i32 >> 24));
  inoutTemp[1] = (byte) (0xff & (i32 >> 16));
  inoutTemp[2] = (byte) (0xff & (i32 >> 8));
  inoutTemp[3] = (byte) (0xff& (i32)); Trans_. Write (inoutTemp,0.4);
}
Copy the code
  • writeI64(): writeLong integerlongType data.
public void writeI64(long i64) throws TException {
  inoutTemp[0] = (byte) (0xff & (i64 >> 56));
  inoutTemp[1] = (byte) (0xff & (i64 >> 48));
  inoutTemp[2] = (byte) (0xff & (i64 >> 40));
  inoutTemp[3] = (byte) (0xff & (i64 >> 32));
  inoutTemp[4] = (byte) (0xff & (i64 >> 24));
  inoutTemp[5] = (byte) (0xff & (i64 >> 16));
  inoutTemp[6] = (byte) (0xff & (i64 >> 8));
  inoutTemp[7] = (byte) (0xff& (i64)); Trans_. Write (inoutTemp,0.8);
}
Copy the code
  • writeDouble(): writeDouble floating pointdoubleType data.
public void writeDouble(double dub) throws TException {
  writeI64(Double.doubleToLongBits(dub));
}
Copy the code
  • WriteString () : Writes the length of the string first and then the content of the string.
public void writeString(String str) throws TException {
  try {
    byte[] dat = str.getBytes("UTF-8"); writeI32(dat.length); Trans_. Write (dat,0, dat. Length); }catch (UnsupportedEncodingException uex) {
    throw new TException("JVM DOES NOT SUPPORT UTF-8"); }}Copy the code
  • writeBinary: writeBinary arrayType data, where the data input isNIOIn theByteBufferType.
public void writeBinary(ByteBuffer bin) throws TException {
  intlength = bin.limit() - bin.position(); writeI32(length); Trans_.write (bin.array(), bin.position() + bin.arrayoffset (), length); }Copy the code

(3). After each field is written, the field end mark needs to be recorded.

public void writeFieldEnd(a) {}
Copy the code

(d). When all fields have been written, the field stop flag needs to be recorded.

public void writeFieldStop(a) throws TException {
  writeByte(TType.STOP);
}
Copy the code

(e). When all data is written, writeStructEnd() records the completion mark of the write structure.

public void writeStructEnd(a) {}
Copy the code

(2) readData() analysis

Pair.read (TProtocol) : Pair.read (TProtocol) :

The StandardScheme is used for data reading as well as data writing. The StandardScheme read() method:

Here are a few steps:

(a). Read the start tag of the structure by readStructBegin.

iprot.readStructBegin();
Copy the code

(b). Loop reads all the fields in the data structure to Pair object, until read to org. The apache. Thrift. Protocol. The TType. STOP. Iprot.readfieldbegin () indicates that the field start flag needs to be read before starting to read the next field.

while (true) {
  schemeField = iprot.readFieldBegin();
  if (schemeField.type == org.apache.thrift.protocol.TType.STOP) {
    break;
  }
  // field read, omit...
}
Copy the code

(c). Read the corresponding field according to the field-ID defined by Thrift IDL, assign the value to the Pair, and set the corresponding field to the read state (provided: field is defined as required in IDL).

switch (schemeField.id) {
  case 1: // KEY
    if (schemeField.type == org.apache.thrift.protocol.TType.STRING) {
      struct.key = iprot.readString();
      struct.setKeyIsSet(true);
    } else{org. Apache. Thrift. Protocol. TProtocolUtil. Skip (iprot, schemeField. Type); }break;
  case 2: // VALUE
    if (schemeField.type == org.apache.thrift.protocol.TType.STRING) {
      struct.value = iprot.readString();
      struct.setValueIsSet(true);
    } else{org. Apache. Thrift. Protocol. TProtocolUtil. Skip (iprot, schemeField. Type); }break;
  default: org. Apache. Thrift. Protocol. TProtocolUtil. Skip (iprot, schemeField. Type); }Copy the code

There are also implementations of readByte(), readBool(), readI32(), readI64(), readDouble(), readString(), and readBinary(), depending on the data type of the field.

  • ReadByte () : Reads a single byte of data.
public byte readByte(a) throws TException {
  if (trans_.getBytesRemainingInBuffer() >= 1) {
    byte b = trans_.getBuffer()[trans_.getBufferPosition()];
    trans_.consumeBuffer(1);
    returnb; } readAll (inoutTemp,0.1);
  return inoutTemp[0];
}
Copy the code
  • ReadBool () : Reads Boolean data.
public boolean readBool(a) throws TException {
  return (readByte() == 1);
}
Copy the code
  • readI16()Reading:Short integershortType data.
public short readI16(a) throws TException {
  byte[] buf = inoutTemp;
  int off = 0;

  if (trans_.getBytesRemainingInBuffer() >= 2) {
    buf = trans_.getBuffer();
    off = trans_.getBufferPosition();
    trans_.consumeBuffer(2);
  } else{readAll (inoutTemp,0.2);
  }

  return (short) (((buf[off] & 0xff) < <8) |
                 ((buf[off+1] & 0xff)));
}
Copy the code
  • readI32()Reading:The integerintType data.
public int readI32(a) throws TException {
  byte[] buf = inoutTemp;
  int off = 0;

  if (trans_.getBytesRemainingInBuffer() >= 4) {
    buf = trans_.getBuffer();
    off = trans_.getBufferPosition();
    trans_.consumeBuffer(4);
  } else{readAll (inoutTemp,0.4);
  }
  return ((buf[off] & 0xff) < <24) |
         ((buf[off+1] & 0xff) < <16) |
         ((buf[off+2] & 0xff) < <8) |
         ((buf[off+3] & 0xff));
}
Copy the code
  • readI64()Reading:Long integerlongType data.
public long readI64(a) throws TException {
  byte[] buf = inoutTemp;
  int off = 0;

  if (trans_.getBytesRemainingInBuffer() >= 8) {
    buf = trans_.getBuffer();
    off = trans_.getBufferPosition();
    trans_.consumeBuffer(8);
  } else{readAll (inoutTemp,0.8);
  }

  return ((long)(buf[off]   & 0xff) < <56) |
         ((long)(buf[off+1] & 0xff) < <48) |
         ((long)(buf[off+2] & 0xff) < <40) |
         ((long)(buf[off+3] & 0xff) < <32) |
         ((long)(buf[off+4] & 0xff) < <24) |
         ((long)(buf[off+5] & 0xff) < <16) |
         ((long)(buf[off+6] & 0xff) < <8) |
         ((long)(buf[off+7] & 0xff));
}
Copy the code
  • readDouble()Reading:Double precision floating pointdoubleType data.
public double readDouble(a) throws TException {
  return Double.longBitsToDouble(readI64());
}
Copy the code
  • readString()Reading:String typeData is read and validated first4Bytes ofString lengthAnd then checkNIOThe bufferWhether there are bytes of the corresponding length inNot spending. If so, directly fromThe bufferReads; Otherwise, fromTransmission channelTo read data.
public String readString(a) throws TException {
  int size = readI32();
  checkStringReadLength(size);

  if (trans_.getBytesRemainingInBuffer() >= size) {
    try {
      String s = newString(trans_.getBuffer(), trans_.getBufferPosition(), size,"UTF-8");
      trans_.consumeBuffer(size);
      return s;
    } catch (UnsupportedEncodingException e) {
      throw new TException("JVM DOES NOT SUPPORT UTF-8"); }}return readStringBody(size);
}
Copy the code

If reading from a transport channel, look at the readStringBody() method:

public String readStringBody(int size) throws TException {
  try {
    byte[] buf = new byte[size]; Trans_. ReadAll (buf,0, size);return newString (buf,"UTF-8");
  } catch (UnsupportedEncodingException uex) {
    throw new TException("JVM DOES NOT SUPPORT UTF-8"); }}Copy the code
  • readBinary()Reading:Binary arrayType data, andString readingSimilarly, return oneByteBufferByte cache object.
public ByteBuffer readBinary(a) throws TException {
  int size = readI32();
  checkStringReadLength(size);

  if(trans_ getBytesRemainingInBuffer () > = size) {ByteBuffer bb = ByteBuffer. Wrap (trans_. GetBuffer (), Trans_. GetBufferPosition (), size); trans_.consumeBuffer(size);return bb;
  }

  byte[] buf = new byte[size]; Trans_. ReadAll (buf,0, size);return ByteBuffer.wrap(buf);
}
Copy the code

(d). After each field data is read, another field end mark needs to be read.

public void readFieldEnd(a) {}
Copy the code

(e). When all fields have been read, readStructEnd() needs to read in a structure completion tag.

public void readStructEnd(a) {}
Copy the code

(f). After reading, it is also necessary to verify that the required field in Thrift IDL is null and valid.

public void validate(a) throws org.apache.thrift.TException {
  // check for required fields
  if (key == null) {
    throw new org.apache.thrift.protocol.TProtocolException("Required field 'key' was not present! Struct: " + toString());
  }
  if (value == null) {
    throw new org.apache.thrift.protocol.TProtocolException("Required field 'value' was not present! Struct: "+ toString()); }}Copy the code

conclusion

In fact, I believe that you have a deep understanding of the Thrift serialization mechanism and the implementation and efficiency of deserialization mechanism.

A link to the

  1. Apache Thrift Series Details (I) – Overview and Introduction

  2. Apache Thrift series details (II) – Network services Model

  3. Apache Thrift – Serialization mechanism


Welcome to pay attention to the technical public number: Zero one Technology Stack

This account will continue to share learning materials and articles on back-end technologies, including virtual machine basics, multithreaded programming, high-performance frameworks, asynchronous, caching and messaging middleware, distributed and microservices, architecture learning and progression.