We know that Apple device executables are stored in MachO, a binary file. This file is usually used when doing reverse or static analysis. The common tool for analyzing MachO is MachOView. Today, with the help of MachOView, the main analysis of Code Signature storage rules.

This article also focuses on these questions:

1, how does MachOView confirm MachO content?

2, how binary data is stored, how to confirm the location.

3. How to parse the meaning of bytecode

Lead to

1. Binary file is simply a file that stores content in binary form, which can be read into memory intact for various processing. For example, with the value 3.1415927, the text file needs 9 bytes to store: 3.1, 4, 1, 5, 9, 2, 7 the nine ASCII values, whereas if it is binary, 4 bytes will suffice: DB 0F 49, 40.

Binary files read into memory are usually stored continuously, it does not need additional processing, the original is how it is in memory.

3. Each process is assigned a virtual address space, and the range of process addressing is carried out in this virtual address space. There is a mapping table between virtual addresses and physical addresses for management.

4, simple understanding: virtual address = random base address (ASLR) + logical address (in-segment offset).

There will also be a lot of offsets, which simply means how many bytes are offset from a certain position. The key is to confirm which position it is offset relative to. In different data segments, the relative anchor point is different. But generally the offset is relative to the current data segment.

5. MachO in FAT format can be understood as the sequential combination of multiple architectures, so when analyzing a certain architecture, it is necessary to add the offset of the corresponding architecture.

Uint32_t = 1 uint8_t = 1 uint8_t = 1

The Mach – O format

Format analysis

A quick look at the mach-O data structure:

The Mach-O file is roughly divided into three parts:

Header

Indicates the overall information about the current Mach-O file, including CPU architecture, sub-version, file type, and number of load commands. Digital content is easy to represent, but what about categories like CPU architecture? Binary data is ultimately a number, and these categories of information can only be represented by numbers, but they require a number with a special meaning, usually called magic. For example 0xCAFEBABE means FAT and 0xFEEDFACF means ARM64.

The definition of the Header address: opensource.apple.com/source/xnu/…

Load Commands

Records the information and location of each data segment, only the introduction of categories and tags, including some information offset address, file size, etc.

Data

Record specific content information. Different types of information correspond to different data meanings. Note the arrow on the right side of the figure from Load Commands to Data, where Data is specified by Load Commands.

The relationship between them would be expressed in a book: Header is the cover, Load Commands is the directory, and Data is the content of the book.

Looking for Code Signature

The focus of this section is to find the content of the Code Signature. It is not parsed by MachOView and is still the original data form, which is a good case for analysis.

The analysis file is system ls, its path is /bin/ls, put it in MachOView. Ls is a FAT file that contains two schemas. The FAT Header records the categories, offsets, and sizes of each schema.

For X86_64, expand Load Commands and find the LC_CODE_SIGNATURE information that represents the code signature:

The right side is the real data content, MachOView has helped us to map the field description:

Data Offset: indicates the Data Offset 53808, which is 0xD230 in hexadecimal format

Data Size: indicates the file Size 5728, which is 0x1660 in hexadecimal format

These two hexadecimal values are actually Data, and Value is what MachOView does for us.

The offset is not the same as the Fat Header offset, which refers to the total file offset. The offset here refers to the offset for X86 files. So the actual offset should be: 0xD230 + 0x4000 = 0x11230.

Find the Code Signature content in the Data section:

Here pFile is the offset (or logical offset) relative to the current file, and its starting position is the one calculated above: 0x11230. The last byte of the Code Signature is 0x11230 + 0x1660-0x1 = 0x1288F.

Parsing Code Signature

CS_SuperBlob

Now that we’ve found the code signature location, let’s start parsing it. The first step in parsing is to find the data definition, and then analyze the meaning of the data. Code Signature here: the definition of related content opensource.apple.com/source/xnu/…

The entire signature header is a CS_SuperBlob structure, which is defined as follows:

typedef struct __SC_SuperBlob {
	uint32_t magic;					/* magic number */
	uint32_t length;				/* total length of SuperBlob */
	uint32_t count;					/* number of index entries following */
	CS_BlobIndex index[];			/* (count) entries */
	/* followed by Blobs in no particular order as indicated by offsets in index */
} CS_SuperBlob; 
Copy the code

The first argument to this structure is magic, which is defined as follows:

/* * Magic numbers used by Code Signing */
enum {
	CSMAGIC_REQUIREMENT = 0xfade0c00./* single Requirement blob */
	CSMAGIC_REQUIREMENTS = 0xfade0c01./* Requirements vector (internal requirements) */
	CSMAGIC_CODEDIRECTORY = 0xfade0c02./* CodeDirectory blob */
	CSMAGIC_EMBEDDED_SIGNATURE = 0xfade0cc0./* embedded form of signature data */
	CSMAGIC_EMBEDDED_SIGNATURE_OLD = 0xfade0b02./* XXX */
	CSMAGIC_EMBEDDED_ENTITLEMENTS = 0xfade7171./* embedded entitlements */
	CSMAGIC_DETACHED_SIGNATURE = 0xfade0cc1./* multi-arch collection of embedded signatures */
	CSMAGIC_BLOBWRAPPER = 0xfade0b01./* CMS Signature, among other things */
  / /...
}
Copy the code

The second argument is length, which represents the length of the entire SuperBlob.

The third parameter is count, which indicates the number of entries in the index entity.

The fourth argument is a structure for CS_BlobIndex.

Small big end end

1. This is the binary data in 64-bit format. There are two types of 64-bit format, which are represented as big-encoder 64-bit and small-encoder 64-bit.

2. For example, 0x1234, in the case of small endian, 12 is stored in the low byte, 34 in the high byte, and vice versa.

Data parsing

Let’s take the first line of Code Signature data and analyze it:

Note here that the Data section has two tags: Data LO and Data HI, which are used to represent the current sequence of bytes, preceded by the low byte and followed by the high byte. In this way, according to the rules of the small end, we can take the data in the natural order, so we can get the following:

magic

0xFADE0CC0, corresponding to CSMAGIC_EMBEDDED_SIGNATURE, represents the embedded code signature data.

length

Is 0x1486, we can calculate the last byte position: 0x11230 + 0x1486-0x1 = 0x126B5

The red-marked byte is where the Code Signature ends, and everything after that is filled with 0x00 and is non-physical.

count

The value 3 indicates that there are three next entities, and this entity corresponds to the structure CS_BlobIndex.

CS_BlobIndex

Let’s look at the CS_BlobIndex structure:

/* * Structure of an embedded-signature SuperBlob */

typedef struct __BlobIndex {
	uint32_t type;					/* type of entry */
	uint32_t offset;				/* offset of entry */
} CS_BlobIndex;
Copy the code

It has two member variables, type representing the entity type and offset representing the entity offset.

The general type must have a special number corresponding to the meaning, here type is also the same, this type in the above magic in an enum defined.

CSSLOT_CODEDIRECTORY = 0./* slot index for CodeDirectory */
CSSLOT_INFOSLOT = 1,
CSSLOT_REQUIREMENTS = 2,
CSSLOT_RESOURCEDIR = 3,
CSSLOT_APPLICATION = 4,
CSSLOT_ENTITLEMENTS = 5,

CSSLOT_ALTERNATE_CODEDIRECTORIES = 0x1000./* first alternate CodeDirectory, if any */
CSSLOT_ALTERNATE_CODEDIRECTORY_MAX = 5./* max number of alternate CD slots */
CSSLOT_ALTERNATE_CODEDIRECTORY_LIMIT = CSSLOT_ALTERNATE_CODEDIRECTORIES + CSSLOT_ALTERNATE_CODEDIRECTORY_MAX, /* one past the last */

CSSLOT_SIGNATURESLOT = 0x10000./* CMS Signature */
Copy the code

Data parsing

Let’s go back to the data part and analyze it according to the above structure:

Can parse three CS_BlobIndex data:

type Type meaning offset
0x00 CSSLOT_CODEDIRECTORY 0x24
0x02 CSSLOT_REQUIREMENTS 0x261
0x10000 CSSLOT_SIGNATURESLOT 0x29D

The offset appears outside of the Code Signature, so it represents the offset relative to the Code Signature.

This table also provides a table of contents that tells us what the page numbers of each of the three parts (three structures) of the rest of the content are.

CS_CodeDirectory

Let’s start with CSSLOT_CODEDIRECTORY, which corresponds to the CS_CodeDirectory structure:

/* * C form of a CodeDirectory. */
typedef struct __CodeDirectory {
	uint32_t magic;					/* magic number (CSMAGIC_CODEDIRECTORY) */
	uint32_t length;				/* total length of CodeDirectory blob */
	uint32_t version;				/* compatibility version */
	uint32_t flags;					/* setup and mode flags */
	uint32_t hashOffset;			/* offset of hash slot element at index zero */
	uint32_t identOffset;			/* offset of identifier string */
	uint32_t nSpecialSlots;			/* number of special hash slots */
	uint32_t nCodeSlots;			/* number of ordinary (code) hash slots */
	uint32_t codeLimit;				/* limit to main image signature range */
	uint8_t hashSize;				/* size of each hash in bytes */
	uint8_t hashType;				/* type of hash (cdHashType* constants) */
	uint8_t platform;				/* platform identifier; zero if not platform binary */
	uint8_t	pageSize;				/* log2(page size in bytes); 0 => infinite */
	uint32_t spare2;				/* unused (must be zero) */
	/* Version 0x20100 */
	uint32_t scatterOffset;				/* offset of optional scatter vector */
	/* Version 0x20200 */
	uint32_t teamOffset;				/* offset of optional team identifier */
	/* followed by dynamic content as located by offset fields above */
} CS_CodeDirectory;
Copy the code

Data parsing

Let’s take this data out first, and then analyze it according to the structure:

Here are just a few important points for analysis.

Magic is 0xFADE0C02, which exists as a token representing CodeDirectory

Length is 0x23D, indicating the length of the data segment

Identoffset is 0x30, which represents the offset of the identifier string, which corresponds to our bundleId

Note that the current CodeDirectory is the internal structure of the SuperBlob, so the offset here becomes the internal offset of the structure, and the starting position here is 0xFADE0C02, and the position is 0x11254. So we can calculate the file offset of IndentoffSet as:

The identoffset address is 0x11254 + 0x30 = 0x11284

Here you might be wondering, how do you know where you’re going to end up with just an offset, because there’s no data size here. In fact, you don’t need to know the size of a string to know where it ends. A character has the ending bit \0, which in ASCII code is 0x00.

Ls bundleId is com.apple.ls.

One more thing: strings in MachO are encoded using UTF-8 instead of ASCII, but UTF-8 is ASCII compatible, so we can parse the correct content as ASCII.

CS_GenericBlob

CSSLOT_SIGNATURESLOT = CS_Generic_Blob; CSSLOT_SIGNATURESLOT = CS_Generic_Blob; CSSLOT_SIGNATURESLOT = CS_Generic_Blob

typedef struct __SC_GenericBlob {
	uint32_t magic;				/* magic number */
	uint32_t length;			/* total length of blob */
	char data[];
} CS_GenericBlob;
Copy the code

In the previous table, we recorded that its offset is 0x29D, so its starting position is 0x11230 + 0x29D = 0x114CD. Find this position and put it into the structure for parsing:

Magic is 0xFADE0B01 and corresponds to the CSSLOT_SIGNATURESLOT value.

The data length is 0x11E9 (4585 bytes), which represents the size of CS_GenericBlob, and everything after that is data, which represents the certificate part.

We can calculate the last byte position where the certificate data ends: 0x114CD + 0x11e9-0x8-0x1 = 0x126AD.

Note: According to the book “iOS Application Reverse and Security”, with the help of binary tools such as 010 Editor, we copy the data part of the data (need to use Hooper tools), save in cer format, you can get a certificate file. However, the ls test was not successful, so it is speculated that there may be other contents of data here, which need to be split.

Jtool

As long as we have the corresponding data structure, all the information in the signature part can be resolved. But every time all word-by-word analysis, obviously very laborious, can write a program, used for the above content analysis? Of course you can. There is already a tool for this, Jtool. Jtool is more powerful than otool and parses data in more detail. This can be installed via Homebrew:

$ brew install jtool
Copy the code

The x86_64 schema signature can be queried using jtool:

$ jtool -arch x86_64 --sig /bin/ls
Copy the code

The output is:

Blob at offset: 53808 (5728 bytes) is an embedded signature
Code Directory (573 bytes)
		Version:     20100
		Flags:       none
		Platform Binary
		CodeLimit:   0xd230
		Identifier:  com.apple.ls (0x30)
		CDHash:	     46cc1da7c874a5853984a286ffecb48daf2f65f023d10258a31118acfc8a3697 (computed)
		# of Hashes: 14 code + 2 special
		Hashes @125 size: 32 Type: SHA-256
Requirement Set (60 bytes) with 1 requirement:
	0: Designated Requirement (@20, 28 bytes): SIZE: 28
		Ident: (com.apple.ls) AND Apple Anchor
Blob Wrapper (4585 bytes) (0x10000 is CMS (RFC3852) signature)
CA: Apple Certification Authority CN: Apple Root CA
CA: Apple Certification Authority CN: Apple Code Signing Certification Authority
CA: Apple Certification Authority CN: Apple Root CA
CA: Apple Certification Authority CN: Apple Root CA
CA: Apple Certification Authority CN: Apple Code Signing Certification Authority
CA: Apple Software CN: Software Signing
Time: 201222002625Zi
Copy the code

Offset 53808 in the first line corresponds to 0xD230 in hexadecimal, which is the offset recorded in LC_CODE_SIGNATURE.

Based on the output, the Code signature consists of three parts: Code Diretory, Requeirement Set, and Blob Wrapper. The certificate section resolves six certificates, indicating that there should be other structures that can be split.

review

If you see this, go back to the first three questions to test your understanding.

1, how does MachOView confirm MachO content?

2, how binary data is stored, how to confirm the location.

3. How to parse the meaning of bytecode