preface

  • The basic knowledge of the previous reverse part of the article and the required tools have been about, the subsequent preparation of actual combat projects and assembly and jailbreak part of the content continue to update, please pay attention to.

  • Currently ready to update a low-level series of articles, from dyld loading executables to the entry main function, to the loading of classes, classes, protocols and so on as the main thread. Step by step, explore the underlying principles.

  • This article starts with the essence of the method, missing a few articles, filling up later, and then preparing a table of contents.

Leading knowledge

Runtime

Speaking of anything about the nature of OC, we have to mention the Runtime thing.

The official documentation

This is just a brief overview of the Runtime to help us explore the nature of the method, and to update the details of the Runtime mechanism and usage.

Runtime Brief Introduction

◈ Objective-C extends the C language with object-oriented features and Smalltalk-style messaging. The core of this extension is a Runtime library written in C and compiled languages. It is the cornerstone and foundation of objective-C object-oriented and dynamic mechanics.

◈ Objective-C is a dynamic language, which means it requires not only a compiler, but also a runtime system to dynamically create classes and objects for messaging and forwarding.

◈ Understanding the Objective-C Runtime mechanism can help us better understand the language and, when appropriate, extend it to solve design or technical problems on a project at the system level.

The Runtime version

There are actually two versions of Runtime: ‘Modern’ and ‘Legacy’. Our current Version of Objective-C 2.0 runs on the Modern Runtime system and only runs on 64-bit applications after iOS and macOS 10.5. Older 32-bit macOS programs still use the (earlier) (Legacy) version of the Runtime system in Objective-C 1.

The biggest difference between the two versions is that when you changed the layout of a class’s instance variables, in earlier versions you had to recompile its subclasses, whereas in current versions you did not.

Runtime is basically written in C and assembly. You can access the open source code maintained by Apple here. Apple and GNU each maintain an open source version of the Runtime, and efforts are made to maintain consistency between the two versions.

Runtime API

The document address

Read more.

The Runtime USES

The Runtime for us ordinary developers is mainly based on its dynamic mechanism to implement a variety of requirements/effects. To recap:

  • Objective-c Associated Objects add attributes to a classification
  • Method Swizzling adds and replaces andKVOimplementation
  • Message forwarding (Hot update) Fixes bugs (JSPatch)
  • implementationNSCodingAutomatic filing and automatic unfiling
  • Implement automatic translation of dictionaries and models (MJExtension )
  • .

In fact, depending on the mechanics of the Runtime and the apis it provides, we are free to use it to generate different functionality.

Simple summary

  • In C, there are three steps to convert code into an executable program: compile, link, and run. At link time, object type and method implementation are determined.

  • In Objective-C, LLVM takes some of the work that goes into compiling and linking and puts it into the run phase. That is, even a compiled.ipa package does not know what happens when a method is called when the program is not running. This also made it possible for “thermal fixes” to become popular.

This design makes Objective-C flexible, even allowing us to dynamically change the implementation of a method while the program is running.

This leads to our exploration of the nature of today’s methods – the message sending mechanism

Instance object – Class object – metaclass object

OC class object/instance object/metaclass resolution. This series of articles will continue to update the nature of classes/objects

Nature of method

explore

With that said, let’s remove the God perspective and explore the full flow of the OC method from scratch.

case

Create a new Command Line project with the following code:

// main.m
#import <Foundation/Foundation.h>

@interface LBObject : NSObject
- (void)eat;
@end

@implementation LBObject
- (void)eat{
    NSLog(@"eat");
}
@end

void run(){
    NSLog(@"%s",__func__);
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        LBObject * obj = [LBObject alloc];
        [obj eat]; / / OC method
        
        run();  / / C functions
    }
    return 0;
}
Copy the code

Compile time – clang

clang -rewrite-objc main.m -o main.cpp

Since LLVM itself is a built-in Clang, this command allows us to see what the compiled, run-time source looks like after C.

Open main. CPP and go straight to the bottom implementation of the main function.

C function and OC method

In the figure above, we can clearly see that the run function determines the call and implementation at compile time. The OC method is compiled to call the objc_msgSend function. This is the message sending mechanism we mentioned at Runtime.

LLVM + Runtime uses this approach to achieve dynamic possibilities.

Therefore, it is concluded that:

The essence of the OC method is to call functions such as objc_msgSend.

Why say ‘equal function’, because call class method/superclass method will be different. For example, objc_msgSendSuper, objc_msgSend_stret, and so on.

objc_msgSend

From the compiled code we can see that the objc_msgSend function takes two arguments id, sel. id, which is the object to operate on. And through SEL and IMP mechanism, in order to achieve the essence of dynamic call method. We call this mechanism message sending.

Different method calls

Invoke object instance methods
LBObject *obj = [LBObject alloc];
[obj eat];
// Instance method calls underlying compilation
// The essence of the method: message: message recipient message number.... Parameter (message body)
objc_msgSend(obj, sel_registerName("eat"));
Copy the code
Calling class methods
objc_msgSend(objc_getClass("LBObject"), sel_registerName("eat"));
Copy the code
Invoke the parent instance method
struct objc_super lbSuper;
lbSuper.receiver = obj;
lbSuper.super_class = [LBSuper class];
// __OBJC2__ requires only receiver and super_class
objc_msgSendSuper(&lbSuper, @selector(sayHello));
Copy the code
Call the parent class method
struct objc_super myClassSuper;
myClassSuper.receiver = [obj class];
myClassSuper.super_class = class_getSuperclass(object_getClass([obj class]));/ / the metaclass
objc_msgSendSuper(&myClassSuper, sel_registerName("test_classFunc"));
Copy the code
tip

To use the objc_msgSend function, turn the checksum off, otherwise the compilation will report an error.

Objc_msgSend source code analysis

Here comes the big one.

  • Open objC4 source code. Compile objC4 source code, password R5v6.

  • Search for objc_msgSend and go directly to ENTRY _objc_msgSend of objC-msG-arm64.s.

In assembly, the ENTRY format of a function is ENTRY + function name and the end is END_ENTRY. Here we use arm64 architecture as an example

Compile the source code

Objc_msgSend is written in assembly. Why? Personal feeling due to the following reasons:

  • limit
    • CLanguages as static languages, it is impossible to implement a function with an unknown number of parameters, type, and jump to another arbitrary function pointer.
  • The efficiency of
    • High-level code eventually needs to be converted into assembly language to be recognizable.
  • security
    • We said in reverse, in order to prevent the system function from beinghookAssembly is often used to call methods and implement functions.

The source code is as follows:

ENTRY _objc_msgSend
	UNWIND _objc_msgSend, NoFrame

	cmp	p0, #0 // nil check and tagged pointer check
#if SUPPORT_TAGGED_POINTERS
	b.le	LNilOrTagged		//  (MSB tagged pointer looks negative)
#else
	b.eq	LReturnZero
#endif// person - isa - class LDR p13, [x0] // p13 = isa getFromisa_p16 p13 // p16 = class LGetIsaDone: CacheLookup NORMAL // calls imp or objc_msgSend_uncached#if SUPPORT_TAGGED_POINTERS
LNilOrTagged:
	b.eq	LReturnZero		// nil check
	
	//...
Copy the code

Code is too long I will not stick all, you see in the source code.

Process analysis

The whole assembly process specific code will not take the analysis, the follow-up to update the reverse will talk about assembly part, when the time will be a good analysis of the register and assembly instructions.

A quick summary of the _objc_msgSend assembly code process is as follows:

  • 1️ discount: objc_msgSend is divided into two parts, the first part is the process of assembly write lookup cache. Until the bl __class_lookupMethodAndLoadCache3, go to the C functions continue to carry out the subsequent lookUpImpOrForward process.

  • 2️ discount: access to the real isa of the object, non-taggedpointer isa isa consortium that uses bit-field to store various information, which will be described in detail later.

  • 3️ discount: Go to CacheLookup process, find cache_T by pointer offset, process bucket and memory hash table, find IMP by key after SEL hash algorithm, return if found, cannot find JumpMiss.

  • 4️ discontinuation to __objc_msgSend_uncached -> MethodTableLookup

  • 5 ️ ⃣ : call bl __class_lookupMethodAndLoadCache3 to slow search process.

The following is the message search and forwarding process implemented by C function. Due to space problems, the next article will continue to describe the complete process.

IMP _class_lookupMethodAndLoadCache3(id obj, SEL sel, Class cls)
{        
    return lookUpImpOrForward(cls, sel, obj, 
                              YES/*initialize*/.NO/*cache*/.YES/*resolver*/);
}
Copy the code

Conclusion:

The essence of OC method is as follows:

LLVM compiles method invocation into calling objc_msgSend and other functions at compilation time, and then searches imp corresponding to SEL in assembly code cache. If it finds it, it will return to call. If it cannot find it, it will enter the slow process of message search and message forwarding.