There are times when you need to use Python to process binary data, such as when accessing files or socket operations. This can be done using Python’s struct module. Structs can be used to deal with structures in C.

The three most important functions in struct modules are pack(), unpack(), calcsize()

pack(fmt, v1, v2, …) Encapsulate the data as a string (actually a byte stream similar to a C structure) in the given format (FMT)

Unpack (FMT, string) parses the string of bytes in the given format (FMT) and returns the parsed tuple

Calcsize (FMT) calculates how many bytes of memory a given format (FMT) occupies

Struct supports the following formats:

Format

C Type

Python

The number of bytes

x

pad byte

no value

1

c

char

string of length 1

1

b

signed char

integer

1

B

unsigned char

integer

1

?

_Bool

bool

1

h

short

integer

2

H

unsigned short

integer

2

i

int

integer

4

I

unsigned int

integer or long

4

l

long

integer

4

L

unsigned long

long

4

q

long long

long

8

Q

unsigned long long

long

8

f

float

float

4

d

double

float

8

s

char[]

string

1

p

char[]

string

1

P

void *

long

Note 1. Q and Q are interesting only if the machine supports 64-bit operations

Note 2. Each format may be preceded by a number to indicate a number

Note 3. The s format represents a string of length, 4s represents a string of length 4, but p represents a PASCAL string

Note 4.P is used to convert a pointer whose length is related to the length of the machine word

Note 5. The last one, which can be used to indicate the pointer type, is 4 bytes

To exchange data with structures in C, consider that some C or C ++ compilers use byte alignment, usually a 32-bit system of four bytes, so structs are converted according to local machine byte order. You can use the first character in the format to change the alignment. The definition is as follows:

Character

Byte order

Size and alignment

@

native

Native makes up four bytes

=

native

Standard Specifies the number of bytes

<

little-endian

Standard Specifies the number of bytes

>

big-endian

Standard Specifies the number of bytes

!

network (= big-endian)

Standard Specifies the number of bytes

Use it in the first position of FMT, like ‘@5s6sif’

Example 1:

Let’s say I have a structure

struct Header

{

unsigned short id;

char[4] tag;

unsigned int version;

unsigned int count;

}

Recv received an above struct in string s, and now needs to parse it out using the unpack() function.

import struct

id, tag, version, count = struct.unpack(“! H4s2I”, s)

In the format string above,! Means we’re going to use network byte order parsing, because our data is received from the network, and it’s sent over the network in network byte order. The following H represents the ID of an unsigned short,4s represents a 4-byte string, and 2I represents two unsigned ints.

With just one unpack, we now have our information saved in ID, Tag, Version, count.

Similarly, it is easy to pack local data into struct format.

ss = struct.pack(“! H4s2I”, id, tag, version, count);

The pack function converts id, tag, version, and count to a Header in the specified format. Ss is now a string (actually a byte stream similar to a C structure) that can be sent via socket.send(ss).

Example 2:

import struct

A = 12.34

Change a to binary

bytes=struct.pack(‘i’,a)

Bytes in this case is a string, which is identical in bytes to the binary storage of A.

And then do the reverse operation

Existing binary bytes (actually strings), convert it in reverse to python data types:

a,=struct.unpack(‘i’,bytes)

Notice that unpack returns a tuple

So if there is only one variable:

bytes=struct.pack(‘i’,a)

So, that’s what we need to do when we decode

A, = struct. Unpack (‘ I ‘, bytes) or (a,) = struct. Unpack (‘ I ‘, bytes)

If you use a=struct.unpack(‘ I ‘,bytes), then a=(12.34,) is a tuple instead of the original float.

If it is composed of multiple data, it can be like this:

a=’hello’

b=’world! ‘

c=2

D = 45.123

bytes=struct.pack(‘5s6sif’,a,b,c,d)

Bytes are now written in binary form to files such as binfile.write(bytes).

Bytes =binfile.read()

Unpack () is decoded into python variables

a,b,c,d=struct.unpack(‘5s6sif’,bytes)

‘5s6sif’ this is called FMT, which is a format string made up of numbers plus characters, 5s for a string of five characters, 2i for two integers, etc. Here are the available characters and types, ctype for a one-to-one correspondence with python types.

Note: problems encountered when handling binary files

When we use processing binaries, we need to use the following method

Binfile =open(filepath,’rb’) Read binary file

Binfile =open(filepath,’wb’) write binary file

Binfile =open(filepath,’r’)

There are two differences:

First, if you use ‘r’ and hit ‘0x1A’, it’s considered file closed, which is EOF. This problem does not exist with ‘rb’. That is, if you write in binary and read out text, if there is ‘0X1A’ in it, only part of the file will be read out. Using ‘rb’ will read all the way to the end of the file.

Second, for the string x=’ ABC \ndef’, we can use len(x) to get a length of 7, \n we call a newline character, which is actually ‘0X0A’. When we write in ‘w’, or text, the Windows platform will automatically change ‘0X0A’ to two characters ‘0X0D’, ‘0X0A’, that is, the file length actually becomes 8. When read as ‘r’ text, it is automatically converted to the original newline character. If written in ‘WB’ binary mode, a character is kept unchanged and read as is. So if you write in text and read in binary, you have to think about that extra byte. ‘0X0D’ also known as carriage return. It doesn’t change under Linux. Because Linux only uses ‘0X0A’ for line breaks.

For more python structs, see: Python uses structs to handle binaries