[toc]

Original is not easy, welcome to attention, public number: Qiya cloud storage

An overview of the

Talked about so much general knowledge of storage, from Linux file system, block layer, to disk, have done some in-depth sharing. Today share a Go programming tips: how to write memory structure to disk? How do you pronounce it?

You can think about it for a second. In fact, the essence of this is IO operation, the key to see how the data is coming. Most of our data comes from files or packages on the network, often forgetting that the structure of memory itself is data. It’s an array of bytes. It’s 010101. It’s all the same.

Data on disk, no structure is sensed, just 0101010 data, file is just an array of bytes (the computer’s smallest data storage unit is Byte, Byte). So if you think about this a little bit more clearly, just to make a point, when you write it the key is how you convert a structure into a byte array, and when you read it the key is how you convert a byte array into a structure.

Original is not easy, welcome to attention, public number: Qiya cloud storage

How do you do that?

Struct to byte array. The technical term is serialization.

In fact, not only memory to disk, memory to network, as long as it involves cross-platform, cross-media, you have to turn the data into a common form of computer system: byte array.

The byte is the smallest unit of storage that all computers understand the same, the universal, unambiguous smallest unit in the universe.

serialization

Serialization can be complex, it can be simple, it’s all the same, because the target is byte arrays. Different approaches have led to different protocols, most notably JSON and PB.

For example:

// The Test variable takes up 16 bytes of memory according to the alignment principle
type Test struct {
   F1 uint64
   F2 uint32
   F3 byte
}
Copy the code

Json protocol

The following is a call to convert a structure to a byte array (serialization) in json protocol format.

package main

import (
    "encoding/json"
    "fmt"
)

// The Test variable takes up 16 bytes of memory according to the alignment principle
type Test struct {
    F1 uint64
    F2 uint32
    F3 byte
}


func main(a) {
    t := Test{F1: 0x1234, F2: 0x4567, F3: 12,}

    // Test serialization
    bs, err := json.Marshal(&t)
    iferr ! =nil {
        panic("")
    }
    fmt.Printf("t -> []byte\t: %v\n", bs)

    // Test deserialization
    t1 := Test{}
    err = json.Unmarshal(bs, &t1)
    iferr ! =nil {
        panic("")
    }
    fmt.Printf("[]byte -> t1\t: %v\n", t1)
}
Copy the code

Output:

$ go build -gcflags "-N -l"./test_json_1.go $ ./test_json_1 t -> []byte : [123 34 70 49 34 58 52 54 54 48 44 34 70 50 34 58 49 55 55 54 55 44 34 70 51 34 58 49 50 125] []byte -> t1 : {12}, 4660, 17767Copy the code

It’s easy. Yes, it’s a pose we use a lot, but our needs come in all shapes and sizes.

For example, Test takes 16 bytes of memory, but json takes 30 bytes after serialization? I’m asking you to serialize the Test structure into a 16-byte array. What do you do? You can’t do this with JSON because you can’t explicitly control what the serialization looks like, so you can only customize the serialization.

The custom

So how do you customize? Test n. Test n.

package main

import (
   "encoding/binary"
   "errors"
   "fmt"
)

// The Test variable takes up 16 bytes of memory according to the alignment principle
type Test struct {
   F1 uint64
   F2 uint32
   F3 byte
}

func (t *Test) Marshal(a) ([]byte, error) {
   // Create a 16-byte buffer
   buf := make([]byte.16)
   / / the serialization
   binary.BigEndian.PutUint64(buf[0:8], t.F1)
   binary.BigEndian.PutUint32(buf[8:12], t.F2)
   buf[12] = t.F3

   return buf, nil
}

func (t *Test) Unmarshal(buf []byte) error {
   if len(buf) ! =16 {
      return errors.New("length not match")}// deserialize
   t.F1 = binary.BigEndian.Uint64(buf[0:8])
   t.F2 = binary.BigEndian.Uint32(buf[8:12])
   t.F3 = buf[12]
   return nil
}

func main(a) {
   t := Test{F1: 0x1234, F2: 0x4567, F3: 12,}

   // Test serialization
   bs, err := t.Marshal()
   iferr ! =nil {
      panic("")
   }
   fmt.Printf("t -> []byte\t: %v\n", bs)

   // Test deserialization
   t1 := Test{}
   err = t1.Unmarshal(bs)
   iferr ! =nil {
       panic("")
    }
    fmt.Printf("[]byte -> t1\t: %v\n", t1)
}
Copy the code

This gives us strict control over the serialized array we want to output. To run the program, enter the following:

$ go build -gcflags "-N -l" ./test_json.go 
$ ./test_json 

t -> []byte     : [0 0 0 0 0 0 18 52 0 0 69 103 12 0 0 0]
[]byte -> t1    : {4660 17767 12}
Copy the code

Don’t be confused, if you convert this output to hexadecimal output, you will see that it is:

t -> []byte     : [0 0 0 0 0 0 0x12 0x34 0 0 0x45 0x67 12 0 0 0]
[]byte -> t1    : {0x1234 0x4567 12}
Copy the code

This is what serialization and deserialization look like, and it’s the same for every other serialization protocol.

But notice, in either case, this interchange has to be lossless, so you can’t say Test is 0x1234, you serialize it, you deserialize it and it’s 0x3333.

The size of the end

If you have noticed, I use binary. Bigendian. Uint64, which converts a 64-bit integer into a big-endian byte array. So this brings up the idea, what is a big endian? What is the microendian? What is the general convention to follow?

Important: the size side of a byte is only considered for multibyte base types.

For example, there is a uint32 integer 0x11, 22, 33, 44. How does the computer store it? How does the disk store it?

As we said earlier, the storage unit of a computer is a byte. A uint32 is 4 bytes. How do you place those 4 bytes?

Intuitive thinking, there are two ways (left to right, low address -> high address).

  • The first position: 0x11 0x22 0x33 0x44
  • Second position: 0x44 0x33 0x22 0x11

The first is what’s called a big endian, and the second is what’s called a small endian.

Highlight: the most significant bit at the low address is the big endian, and the least significant bit at the low address is the little endian.

In the real world, most machine processor architectures are small-endian, such as x86, and some machines are big-endian, such as PowerPC 970, etc.

There are some conventions about the big and small ends:

  • Machine CPU processing is generally basic small end;
  • Network transmission, disk storage default big end;

For example, if you print 0x11223344, each byte in the big endian is 0x11 0x22 0x33 0x44, and each byte in the little endian is 0x44 0x33 0x22 0x11. Which one is better?

So, when customizing serialization, pay attention to the big and small side. Incompatible pits may be introduced here. For example, if you don’t explicitly serialize an integer to a big-endian sequence, then store it to disk the default way, and when you read it, do you deserialize it big-endian or little-endian? This is the pit.

Original is not easy, welcome to attention, public number: Qiya cloud storage

The original way

Is it necessary to serialize a structure into a byte array at such a high level?

Not necessarily. A structure is itself a block of memory and an array of bytes, so forcing the address of a structure into []byte addresses is the simplest and most primitive way to serialize

Strong turn type

package main

import (
   "fmt"
   "unsafe"
)

// The Test variable takes up 16 bytes of memory according to the alignment principle
type Test struct {
   F1 uint64
   F2 uint32
   F3 byte
}

func Struct2Bytes(p unsafe.Pointer, n int) []byte {
   return((* [4096]byte)(p))[:n]
}

func main(a) {
   t := Test{F1: 0x1234, F2: 0x4567, F3: 12}
   bytes := Struct2Bytes(unsafe.Pointer(&t), 16)
   fmt.Printf("t -> []byte\t: %v\n", bytes)
}
Copy the code

Output:

$ ./test_direct 
t -> []byte     : [52 18 0 0 0 0 0 0 103 69 0 0 12 0 0 0], len:16
Copy the code

See, this is the output of the little endian (as mentioned earlier, the machine’s default endian is the little endian).

File to read and write

This is pretty straightforward, because we’ve already covered how structures and byte arrays interswitch in detail. What is written to a file is an array of bytes, and what is read from a file is an array of files. It’s as simple as that.

The steps for writing a structure are as follows:

  1. Make the structure look like a byte array;
  2. Then write

The steps to read a structure are as follows:

  1. Read data from a file into a byte buffer.
  2. Strong transformation into a structure

Here’s an example:

package main

import (
   "fmt"
    "log"
    "os"
    "unsafe"
)

// The Test variable takes up 16 bytes of memory according to the alignment principle
type Test struct {
   F1 uint64
   F2 uint32
   F3 byte
}

func Struct2Bytes(p unsafe.Pointer, n int) []byte {
   return((* [4096]byte)(p))[:n]
}

func main(a) {
   t := Test{F1: 0x1234, F2: 0x4567, F3: 12}
    // strong type
   bytes := Struct2Bytes(unsafe.Pointer(&t), 16)

   fmt.Printf("t -> []byte\t: %v\n", bytes)

    fd, err := os.OpenFile("test_bytes.txt", os.O_RDWR|os.O_CREATE, 0666)
    iferr ! =nil {
        log.Fatalf("create failed, err:%v\n",err)
    }

    // Structure writes to file
    _, err = fd.Write(bytes)
    iferr ! =nil {
        log.Fatalf("write failed, err:%v\n", err)
    }

    t1 := Test{}
    // Force out a 16-byte buffer
    t1Bytes := Struct2Bytes(unsafe.Pointer(&t1), 16)
    // Read the data out of the file
    _, err = fd.ReadAt(t1Bytes, 0)
    iferr ! =nil {
        log.Fatalf("read failed, err:%v\n", err)
    }

    fmt.Printf("t1 -> []byte\t: %v\n", t1Bytes)
}
Copy the code

The output is as follows:

$ ./test_direct 
t -> []byte     : [52 18 0 0 0 0 0 0 103 69 0 0 12 0 0 0]
t1 -> []byte    : [52 18 0 0 0 0 0 0 103 69 0 0 12 0 0 0]
Copy the code

The variable t is identical to the variable T1. Perfect flower sprinkling. One last look at the document:

$ hexdump -C test_bytes.txt 00000000 34 12 00 00 00 00 00 00 67 45 00 00 0c 00 00 00 |4....... gE...... | 00000010Copy the code

Original is not easy, welcome to attention, public number: Qiya cloud storage

Do you see that? That’s what happens when you force your structure into a Byte array, a little endian.

conclusion

  1. A file, in the sense of a storage component, is essentially a byte array;
  2. The nature of reading and writing memory structure in disk file is transformed by byte array.
  3. When a structure becomes a byte array we call it serialization, when a byte array becomes a structure we call it deserialization;
  4. Serialization can be complex, it can be simple, it’s all the same, because the target is byte arrays. Different protocols have been created in different ways, most notably JSON, PB;
  5. The structure itself is a block of memory, itself is a byte array, so the structure address is forcibly converted to[]byteIs the simplest and most primitive way to serialize;
  6. Struct force-typing is important because it is incompatible with the size side of the byte order when dealing with multi-byte types (such as integers).
  7. If you don’t want to serialize using JSON, it’s best to customize your serialization rules.
  8. Have you used hexdump before? Very easy to use, view file binary data;

Original is not easy, welcome to attention, public number: Qiya cloud storage