OC Basic principles of exploration document summary

Analysis object method call at the bottom of the execution process, including fast lookup process, slow lookup process, dynamic method resolution, message forwarding, and the final query failure error method.

Main Contents:

  1. Quick Lookup process
  2. Slow search process
  3. Dynamic method parsing
  4. Message forwarding (message receiver redirection, message redirection)
  5. Query failure message

1. Preparation

1.1 A brief overview of Runtime

A basic understanding of Runtime and method calls can be found in the Official Runtime guidance document on my other blog.

1.1.1 What is Runtime

OC has object-oriented nature and dynamic mechanism. OC deferred some of the critical work from compile time and link time to run time. The Runtime provides us with this Runtime, which allows us to dynamically create and modify classes, objects, methods, properties, protocols, etc

Unlike compile time, which is the process by which source code is translated into machine-readable code, run time is the process by which code is run and loaded into memory. It is a dynamic process.

1.1.2 What is the Function of Runtime

  • Provide a runtime environment for object orientation
  • Memory layout and underlying execution logic

1.2 Isa, Cache, and Method list

The process of finding a method involves understanding isa, cache, and method lists. This requires understanding the bottom of the object and the bottom of the class. See my other blog on the bottom of the OC class for details

A brief introduction:

Isa is an attribute in an object, which contains class information. Therefore, we can obtain the class through the OBJECT’s ISA, and then obtain the data in the class through the class, such as methods, attributes, protocols, and member variables.

Cache is a member of the class, which contains SEL and IMP. When we want to query imp through SEL, we will first go to the cache of the class.

The list of methods is in the RW of the class, and when we want to query the IMP through SEL, we need to query it in the list of methods.

1.3 How to explore?

For example, objc_msgSend implements a cache lookup followed by a method list lookup, and then dynamic method parsing is performed after the lookup.


So every step of my analysis of method calls is traceable, and the important thing is to explore, not memorize. Color {red}{so my analysis of method calls here is traceable at every step, the important thing is to explore, rather than memorize knowledge points. }

  1. Start with the upper level method calls and see how the lower level is implemented through Clang. Objc_msgSend is implemented
  2. Through objc_msgSend in the source view, found in the assembly objc_msgSend implementation process, explore assembly discovery is cache search.
  3. After the cache lookup is complete, a lookUpImpOrForward method is entered, and the implementation of lookUpImpOrForward is found in the source code C language.
  4. If no method is found, a resolveMethod_locked method is entered, which opens dynamic method resolution.
  5. If the method fails to find, it will get an error function _objc_msgForward_impcache.
  6. After the dynamic method resolution will once again to find list of methods can’t find my next step forward, so by instrumentObjcMessageSends way print sends the message log and through the hopper/IDA decompiled two ways to explore the process forward.

2. The nature of the method

Since you want to explore method calls, you first need to know what a method is. Using Clang here, it is clear that the underlying method call is actually sending messages through objc_msgSend.


In a word: the essence of a method is message sending Color {red}{in a word: the essence of a method is to send a message}

Sending a message is simply sending a message to a recipient object telling the recipient object which function we want to execute.

There are many different message functions, and we’ll examine only two common ones, objc_msgSend and objc_msgSendSuper.

2.1 Understanding of objc_msgSend

2.1.1 Underlying structure

Source:

OBJC_EXPORT void objc_msgSend(void /* id self, SEL op, ... */) OBJC_AVAILABLE(10.0, 2.0, 9.0, 1.0, 2.0);Copy the code

Description:

  • Self represents the current object, which gets the list of methods of its class
  • Op is the method selector that looks for IMP through sel
  • Self and op are mandatory hidden arguments, and if the method has other arguments, there will be others.

2.1.2 validation

Code:

#import "WYPerson.h" #import "WYCat.h" #import "objc/runtime.h" #import "objc/message.h" @implementation WYPerson - (void)msgSendTest:(BOOL) ABC {NSLog(@" test objc_msgSend"); }Copy the code

Note:

  • To call objc_msgSend, you must import the header file #import “objc/message.h”
  • / / Set target –> Build Setting –> search MSG — / / set enable strict checking of calls from YES to NO, otherwise objc_msgSend will report an error

The method call

// method call WYPerson *person = [WYPerson alloc]; objc_msgSend(person,sel_registerName("msgSendTest:"),YES); // Result: 2021-10-15 19:16:19.021944+0800 Message send [2866:47753] test objc_msgSendCopy the code

2.2 Understanding of objc_msgSendSuper

2.2.1 Underlying Structure

Source:

Struct objc_super {/// Specifies an instance of a class. __unsafe_unretained _Nonnull ID receiver; Specifies the particular superclass of the instance to message. #if! defined(__cplusplus) && ! __OBJC2__ /* For compatibility with old objc-runtime.h header */ __unsafe_unretained _Nonnull Class class; #else __unsafe_unretained _Nonnull Class super_class; #endif /* super_class is the first class to search */; // void export void msgsendsuper (void /* struct objc_super *super, SEL op,... */) OBJC_AVAILABLE(10.0, 2.0, 9.0, 1.0, 2.0);Copy the code

Description:

  • Super is an objc_super structure
  • Op is the method selector
  • The objc_super structure contains the recipient and parent of the class

2.2.2 Verification

Code:

The parent WYPerson class implements the msgSendSuperTest method

@interface WYPerson : NSObject - (void) msgSendSuperTest; @end@implementation WYPerson - (void) msgSendSuperTest {NSLog(@"%s: test objc_msgSendSuper",__func__); } @endCopy the code

Subclass WYStudent does not implement the msgSendSuperTest method

@interface WYStudent : NSObject

- (void) msgSendSuperTest;
@end

#import "WYStudent.h"

@implementation WYStudent

@end
Copy the code

Main function call

WYPerson *person = [WYPerson alloc];
WYStudent *student = [WYStudent alloc];
    
struct objc_super wySuper;
wySuper.receiver = person;
wySuper.super_class = [WYStudent class];
    
objc_msgSendSuper(&wySuper, sel_registerName("msgSendSuperTest"));
Copy the code

Results:

809788+0800 Message send [4750:91856] -[WYPerson msgSendSuperTest] : Test objc_msgSendSuper 2021-10-15 20:14:45.809861+0800 Message send [47:91856] -[WYPerson msgSendSuperTest] : test objc_msgSendSuperCopy the code

Description:

  • Objc_msgSendSuper is used underneath to get the method of the parent class when calling it from a subclass
  • We just don’t perceive it at the top
  • Set the receiver to the current object in the objc_super structure, and then set its parent class. Note that the recipient is still the current object, not the parent

2.2.3 Case Analysis

WYStudent inherits from WYPerson, WYStudent creates a method called objc_msgSendSuperTest, what do you think it prints when you call this method?

- (void)objc_msgSendSuperTest{NSLog(@" parent: %@-- subclass: %@",[super class],[self class]); }Copy the code

Analysis: both print WYStudent, this is because the parent method is called. The recipient in the objc_super structure is the current object, not the parent object. So the class that prints the recipient is WYStudent. Self is a hidden argument that represents the current object, so the current object calls the class method and returns the object’s class WYStudent.

Verification results:

2021-10-15 20:29:42.676387+0800 Message send [5275:104688] Parent class: WYStudent-- Subclass: WYStudentCopy the code

2.3 summary

  • The essence of the method is message sending
  • Use objc_msgSend to send messages for the current class
  • Use objc_msgSendSuper to send messages for the parent class

3. Quick search

The quick lookup process is to find the IMP from the cache of the current class. Cache stores SEL and IMP, you can quickly find the corresponding IMP through SEL.

3.1 Searching for Source Code

We know from above that the method call is actually executing the objc_msgSend function, so start with the objc_msgSend function.

  • A global search of objC source code for objc_msgSend does not find the implementation of this function, because the cache lookup is done in assembly
  • So you need to search globally for _objc_msgSend and find the entry

3.2 Assembly source code analysis

3.2.1 Overall Overview

Some people may see assembly implementation will make fear, for the convenience of everyone, I here on the overall process of the first description, then see how to achieve assembly

It is important to familiarize yourself with the cache part of the underlying analysis of the OC class.

The following diagram covers the cache query process, important assembly implementation, and register changes. There are some confusing places to look at in this diagram.

  1. Start the query and get the Class
    1. Get the object’s Class via self’s ISA
  2. Get buckets
    1. Get the cache from the Class
    2. Get buckets and masks in the cache
  3. Hashing algorithm search
    1. Hash the address value of the bucket using the mask and method select _cmd
    2. Remove the IMP from the bucket
  4. Hashing conflict algorithm search
    1. If a conflict occurs, start hashing the collision algorithm
    2. Look forward to the current bucket to see if it is the one we need
    3. If the first bucket is still not found, jump to the last bucket and continue the search
    4. Search until the first element is still not found, then slow search

3.2.2 Starting query and obtaining Class

Source:

Unwinding _objc_msgSend, unwinding _objc_msgSend NoFrame// No view // check for empty and small objects // CMP for comparison. //p0 represents the equivalent of the variable in register 1, where the message receiver is stored. Again, this is because when passed in, the first argument is the message receiver. // The second argument is the method selector _cmd, so p1 must be _cmd // If the function returns a value, the first register holds the return value. CMP P0, #0 // nil check and tagged pointer Check // If small object types are supported. Return small object or empty, //b is to jump //b.le is less than the judgment, LNilOrTagged #if support_tagged_pointer B LNilOrTagged // (MSB tagged pointer b looks negative) //b.eq = execute #else b. qlreturnZero // If small objects are not supported, empty #endif // here is the process that must exist. // the first property of the class is isa, LDR p13, [x0] // p13 = isa P16 = isa (p13) &ISA_mask GetClassFromIsa_p16 p13 // p16 = Class //LGetIsaDone is an entry: // call IMP or objc_msgSend_uncached // call IMP or objc_msgSend_uncached // _objc_msgSend #if SUPPORT_TAGGED_POINTERS LNilOrTagged: B.eqlreturnzero // nil check null processing, // tagged adrp x10, _objc_debug_taggedpointer_classes@PAGE add x10, x10, _objc_debug_taggedpointer_classes@PAGEOFF ubfx x11, x0, #60, #4 ldr x16, [x10, x11, LSL #3] adrp x10, _OBJC_CLASS_$___NSUnrecognizedTaggedPointer@PAGE add x10, x10, _OBJC_CLASS_$___NSUnrecognizedTaggedPointer@PAGEOFF cmp x10, x16 b.ne LGetIsaDone // ext tagged adrp x10, _objc_debug_taggedpointer_ext_classes@PAGE add x10, x10, _objc_debug_taggedpointer_ext_classes@PAGEOFF ubfx x11, x0, #52, #8 LDR x16, [x10, x11, LSL #3] b LGetIsaDone// go to Z for IMP // SUPPORT_TAGGED_POINTERS #endifCopy the code

Assembly statement interpretation:

  1. ENTRY _objc_msgSend
    • The msgSend process starts here
    • Seeing the ENTRY later indicates the beginning of a process
  2. UNWIND _objc_msgSend, NoFrame
    • Means no view
  3. cmp p0, #0
    • Check whether p0 is null
    • CMP stands for comparison
    • P0 is self, where the receiver of the message is stored. This is because when passed in, the first argument is the message receiver
    • When the function returns, p0 is the return value
  4. if SUPPORT_TAGGED_POINTERS
  • Whether small object types are supported
  1. b.le LNilOrTagged
    • B indicates the jump
    • B. If the value is less than, LNilOrTagged will be executed
  2. b.eq LReturnZero
    • B. Eq judgment is equal to execution
    • If the value is 0, exit directly
  3. ldr p13, [x0]
    • Save ISA to P13
    • P0, as you just saw, is the message receiver, x0 is what this P1 register stores, and the first property of the class is ISA
    • LDR means passing the x0 value to P13
  4. GetClassFromIsa_p16 p13
    1. Save the Class information to P16
    2. It is used to retrieve the Class information stored in ISA and save it to the P16 register
    3. So p16 is equal to ISA (p13) and ISA_MASK
    4. Here is equivalent to a function call, which in assembly is a macro definition
  5. CacheLookup NORMAL, _objc_msgSend
    1. Start the cache lookup process

Code logic:

If it is not null, determine whether it isa small object type. If it isa small object type, perform other operations. 3

3.2.3 Cache Search process

Source:

*/. Macro CacheLookup // // Restart protocol: // // As soon as we're past the LLookupStart$1 label we may have loaded // an invalid cache pointer or mask. // // When task_restartable_ranges_synchronize() is called, // (or when a signal hits us) before we're past LLookupEnd$1, // then our PC will be reset to LLookupRecover$1 which forcefully // jumps to the cache-miss codepath which have the following // requirements: // // GETIMP: // The cache-miss is just returning NULL (setting x0 to 0) // // NORMAL and LOOKUP: // - x0 contains the receiver // - x1 contains the selector // - x16 contains the isa // - other registers are set as per calling conventions // LLookupStart$1: /* LDR = 1; /* LDR = 1; /* LDR = 1; /* LDR = 1; 16 bytes #define CACHE (2 * __SIZEOF_POINTER__) LDR p11, [x16, # cache] Is take to cache_t / / p11 = mask | buckets / / real machine 64 see the # if CACHE_MASK_STORAGE = = CACHE_MASK_STORAGE_HIGH_16 / * and representation and operation, P10 = buckets */ and p10, p11, # 0x0000FFFFFFFFFF LSR #48 indicates that the cache is shifted 48 bits to the first 16 bits. So we get mask and p12, P1,p11,LSR #48, _cmd &mask and save it to p12 x12 = _cmd &mask. X12 = _cmd & mask, x12 = _cmd & mask, x12 = _cmd & mask, x12 = _cmd & mask, Elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4 and p10, p11, #~0xf p10 = buckets and p11, p11 #0xf // p11 = maskShift mov p12, #0xffff lsr p11, p12, p11 // p11 = mask = 0xffff >> p11 and p12, p1, P11 // x12 = _cmd & mask #else #error Unsupported cache mask storage for ARM64. #endif /* LSL #(1+PTRSHIFT) indicates a logical shift of 4 bits to the left, that is, *16 moves the first address of the bucket to a multiple of 16 of the subscript and corresponds to the P12 to get the bucket and store it in the P12. The hash algorithm _cmd & mask is used to reach the bucket calculated by buckets by memory translation. If you want to obtain a bucket with subscript in buckets, you need to carry out memory translation. Each bucket structure contains SEL and IMP, so it contains 16 bits. So I'm going to shift 16 to the left subscript times 16, */ add p12, p10, p12, LSL #(1+PTRSHIFT) // buckets + ((_cmd & mask) << (1+PTRSHIFT)) LDP p17, p9, [x12] {imp, sel} = * p17, p9, [x12] CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit () {CacheHit (); cmp p9, p1 // if (bucket->sel ! = _cmd) b.n2f // scan more CacheHit $0 // call or return imp If it is the first bucket, enter 3F. If it is not the first bucket, obtain the SEL of the previous bucket. Continue to execute the first check x12 is the address of register P12. */ 2: // not hit */ 2: // not hit */ 2: // not hit */ p12 = not-hit bucket CheckMiss $0 // miss if bucket->sel == 0 cmp p12, P17, p9, [x12, # -bucket_size]; p17, [x12, # -bucket_size] // {imp, sel} = *; P12 = first bucket, w11 = mask #if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16 */ add p12, p12, p11 LSR #(48 - (1+PTRSHIFT)) // buckets = buckets (mask << 1+PTRSHIFT) CACHE_MASK_STORAGE_LOW_4 add p12, p12, p11, LSL #(1+PTRSHIFT) // p12 = buckets + (mask << 1+PTRSHIFT) #else #error Unsupported cache mask storage for ARM64. #endif // Clone scanning loop to miss instead of hang when cache is corrupt. // The slow path may detect any corruption and LDP p17, p9, [x12] // {imp, sel} = *bucket and sel 1: cmp p9, p1 // if (bucket->sel ! = _cmd) b.ne 2f // scan more CacheHit $0 // call or return imp 2: // not hit: P12 = not-hit bucket CheckMiss $0 // miss if bucket->sel == 0 // if CMP p12, P17, p9, [x12, # -bucket_size]; p17, p9, [x12, # -bucket_size]; // {imp, sel} = *--bucket b 1b // loop LLookupEnd$1: LLookupRecover$1: 3: // double wrap JumpMiss $0 .endmacroCopy the code

Assembly statement interpretation:

  1. .macro CacheLookup
    1. Here is the definition of cache Hook up
    2. You can use.macro to define the method in the future
  2. ldr p11, [x16, #CACHE]
    1. Offset the Class information by 16 bytes to get the cache
    2. Cached information cache in the register to the p11, p11 = mask | buckets, p11’s cache
    3. LDR means to store a value into the P11 register
    4. X16 represents the value stored in register P16, currently Class
    5. The “# value” indicates a value where a global search of the CACHE found twice the address of the pointer, or 16 bytes. #define CACHE (2 * SIZEOF_POINTER )
  3. and p10, p11, #0x0000ffffffffffff
    1. You get buckets and you put them in P10
    2. p10 = mask|buckets & #0x0000ffffffffffff
    3. Buckets were the next 48 (real number was 44)
    4. And stands for and, saving the value after and to register P10
  4. and p12, p1, p11, LSR #48
    1. P12 is the hash address, obtained by hashing algorithm, x12 = _cmd & mask
    2. LSR indicates that the logic shifts to the right, p11 and LSR #48 indicate that maskAndBuckets moves 48 places to the left, that is, mask is obtained
    3. And p12, P1,p11,LSR #48 represent _cmd &mask and save to p12
  5. add p12, p10, p12, LSL #(1+PTRSHIFT)
    1. Get the calculated bucket and store it in the P12
    2. LSL #(1+PTRSHIFT) indicates a logical shift of 4 bits to the left, i.e. *16
    3. P10 is the address of the first bucket. You need to translate the address to get the bucket with the subscript.
    4. The translated address size is the coordinate *16 (since the bucket contains sel and IMP, it is 16 bytes, so it needs to be multiplied by 16)
    5. So move the first address by *16 to get the calculated bucket and place it in the P12
  6. ldp p17, p9, [x12]
    1. Store IMP and SEL to P17 and P9, respectively
  7. cmp p9, p1
    1. Compares sel in the bucket to the _cmd passed in
  8. cmp p12, p10
    1. Compares buckets with buckets to see if they are the first bucket
  9. ldp p17, p9, [x12, #-BUCKET_SIZE]!
    1. Move the address of a bucket forward by one bucket size
    2. That is, get the previous bucket and proceed to store SEL and IMP into P17 and P9
    3. BUCKET_SIZE Indicates the size of a bucket
  10. add p12, p12, p11, LSR #(48 – (1+PTRSHIFT))
    1. If it is determined to be the first bucket, move to the last position
    2. Move the cache 44 bits to the right, that is, move mask 4 bits to the left, that is, move mask*16
    3. Mask = capacity -1, so mask is the last location, multiplied by 16, is the address size from the last location to the first location
    4. The PTRSHIFT value is 4
    5. Why do you leave maskAndBuckets with four zeros in the middle to make it easier to calculate at this point

Code logic:

  1. Shift the Class 16 bits from primary memory to cache and store it in P11
  2. Buckets is obtained by calculating the cache mask and stored in P10
  3. Move maskAndBuckets 48 places to the right to get the hash address of mask and _cmd, and store it in P12 (hash algorithm)
  4. The buckets translate to this subscript address to get the target bucket and store it in the P12
  5. Store the IMP and SEL of the target bucket into P17 and P9, respectively
  6. Determine the current bucket->sel! = _cmd
    1. If found, then get imp return
    2. If not, the first loop is iterated (hash collision algorithm)
  7. Determine if the bucket is the first element and, if not, move it forward one bit and compare
  8. If it is the first element, it jumps to the last element, starting the second loop by moving forward one bit for comparison.
  9. If the second loop does not find the first element, the method does not exist in the cache.

3.2.4 Starting to Obtain IMP

If one is found, the CacheHit is executed. Source:

.macro cachehit. if $0 == NORMAL x16 // authenticate and call imp .elseif $0 == GETIMP mov p0, p17 cbz p0, 9f // don't ptrauth a nil imp AuthAndResignAsIMP x0, x12, x1, x16 // authenticate imp and re-sign as IMP 9: ret // return IMP .elseif $0 == LOOKUP // No nil check for ptrauth: the caller would crash anyway when they // jump to a nil IMP. We don't care if that jump also fails ptrauth. AuthAndResignAsIMP x17, x12, x1, x16 // authenticate imp and re-sign as IMP ret // return imp via x17 .else .abort oops .endif .endmacroCopy the code

Start IMP fetching, execute TailCallCachedImp x17, x12, x1, x16.

3.2.5 Start to search the method list

3.2.5.1 CheckMiss

Source:

.macro CheckMiss // miss if bucket->sel == 0 .if $0 == GETIMP cbz p9, LGetImpMiss .elseif $0 == NORMAL cbz p9, __objc_msgSend_uncached // Get cached cached. Elseif $0 == LOOKUP CBZ p9, __objc_msgLookup_uncached .else .abort oops .endif .endmacroCopy the code

Note: If not, start the slow lookup to __objc_msgSend_uncached

3.2.5.2 __objc_msgSend_uncached

Source:

STATIC_ENTRY __objc_msgSend_uncached
 UNWIND __objc_msgSend_uncached, FrameWithNoSaves

 // THIS IS NOT A CALLABLE C FUNCTION
 // Out-of-band p16 is the class to search
 
 MethodTableLookup
 TailCallFunctionPointer x17

 END_ENTRY __objc_msgSend_uncached
Copy the code

Note: You can see the jump to __objc_msgSend_uncached, where MethodTableLookup is finally executed to query the list of methods cached

3.2.5.3 MethodTableLookup

Source:

.macro MethodTableLookup // push frame SignLR stp fp, lr, [sp, #-16]! mov fp, sp // save parameter registers: x0.. x8, q0.. q7 sub sp, sp, #(10*8 + 8*16) stp q0, q1, [sp, #(0*16)] stp q2, q3, [sp, #(2*16)] stp q4, q5, [sp, #(4*16)] stp q6, q7, [sp, #(6*16)] stp x0, x1, [sp, #(8*16+0*8)] stp x2, x3, [sp, #(8*16+2*8)] stp x4, x5, [sp, #(8*16+4*8)] stp x6, x7, [sp, #(8*16+6*8)] str x8, [sp, #(8*16+8*8)] // lookUpImpOrForward(obj, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER) // receiver and selector already in x0 and x1 mov x2, x16 mov x3, #3 bl _lookUpImpOrForward // IMP in x0 mov x17, x0 // restore registers and return ldp q0, q1, [sp, #(0*16)] ldp q2, q3, [sp, #(2*16)] ldp q4, q5, [sp, #(4*16)] ldp q6, q7, [sp, #(6*16)] ldp x0, x1, [sp, #(8*16+0*8)] ldp x2, x3, [sp, #(8*16+2*8)] ldp x4, x5, [sp, #(8*16+4*8)] ldp x6, x7, [sp, #(8*16+6*8)] ldr x8, [sp, #(8*16+8*8)] mov sp, fp ldp fp, lr, [sp], #16 AuthenticateLR .endmacroCopy the code

Description:

  • You can see that it eventually jumps to _lookUpImpOrForward,
  • And through annotation can see incoming parameters (obj, sel, CLS, LOOKUP_INITIALIZE | LOOKUP_RESOLVER), that is to say behaviors for 0011

  • l o o k U p I m p O r F o r w a r d This method is important, and it’s where the next slow search starts. The method \color{red}{_lookUpImpOrForward is important enough to start the slow lookups. }

3.3 Summary and Analysis

Code execution flow:

Schematic diagram of register storage:

3.4 Code logic difficulty analysis

3.4.1 Why are there two loops?

  • The first loop looks forward from the current bucket, and the second loop looks forward from the last bucket.

  • This is determined by the hash collision algorithm in the cache. The hash collision algorithm is to search forward first, if the first one is found, then continue to search from the last one, until the last one is found

  • Hash collision algorithm:

static inline mask_t cache_next(mask_t i, mask_t mask) { return i ? i-1 : mask; // If I is present, store the index -1, i.e., the first bit of the index. If it is 0, then place it directly in the first position, i.e., the last position.Copy the code

3.4.2 How to set the current bucket as the last element of buckets artificially?

  • Directly locate the last element of bucker by moving buckets’ first address +mask 44 places to the right (equivalent to moving left 4 places)
  • Mask = the total number of buckets -1, so offset 16 bytes of the mask to jump to the last bucket

  • The last four digits are 0 because buckets store the last 44 digits and the middle four are for present use
  • The mask plus the last four digits translates to the address size of the last bucket

3.4.3 How do I Find a Bucket

By hashing, _cmd and mask

The hash algorithm

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    return (mask_t)(uintptr_t)sel & mask;
}
Copy the code

3.4.4 Why do I iterate after calculating the hash address through the hash algorithm instead of directly judging it

Because the hash collision algorithm will be implemented when the insertion is encountered, the query will be stored in other locations in order to prevent the hash collision, so the circular forward search is needed again.

3.5 A few minor questions about assembly

1. Why does objc_msgSend use assembly at the bottom

  • Assembly implementations are particularly fast
  • With parameter uncertainty, whereas C or C++ are more deterministic, it is more difficult to implement dynamic parameters.

2, how to find assembler files

  • The suffix for assembly is.s
  • Since our daily development is the architecture is ARM64, so we need to look in the file with the arm64.s suffix

3, assembly functions are preceded by _, if you want to find in C files to delete the _, such as _objc_msgSend

3.6 summarize

  1. When a message is sent, the cache of this class is searched first to see if it exists
  2. Then the sel in the cache is compared with the sel in the cache. If there is any sel in the cache, then the IMP is returned.
  3. In this case, the process of finding buckets involves the hash algorithm. First, the hash algorithm _cmd &mask is used to obtain the bucket of the hash address
  4. If it doesn’t exist, it might be the address after the hash conflict, so it looks forward first. If it finds the first location, it looks forward from the last location until it finds the first element. Finally, if it doesn’t find the first element, it starts to search slowly
  5. The simple logic is that sel-> ISA ->Class->cache->buckets(_cmd &mask)-> get sel and IMP -> by subscript

4, slow search

Slow search process: in the fast search process did not find IMP, will enter the class method list and parent class cache and method list to continue to query, this process is slow search process.

The bottom layer is realized by C language.

We’ll start with lookUpImpOrForward as we find it in our quick lookups.

The main content includes the whole process, binary search method, parent class search process, dynamic method analysis. Dynamic method parsing is not covered here, but will be covered in more detail in the next chapter.

Slow search process in the method list of the query process involved in the class, classification method list construction, the subsequent detailed analysis of the class and classification of the loading process, will analyze how the method list is constructed, here encountered about the method list construction content can first write down without further investigation.

4.1 Overall Process

4.1.1 Source code analysis

Source:

/*
 1、如果是从汇编中进来,也就是cache中没有找到imp,则behavior为0011,LOOKUP_INITIALIZE | LOOKUP_RESOLVER
 2、如果是通过lookUpIMpOrNil进来的,behavior为1100,behavior | LOOKUP_CACHE | LOOKUP_NIL
 3、如果是class_getInstanceMethod进来的,也就是仅仅在查询方法列表时,behavior为0010,LOOKUP_RESOLVER
 4、在动态解析过程中会通过resolveMethod_locked调用:behavior为0100,behavior | LOOKUP_CACHE
 */

IMP lookUpImpOrForward(id inst, SEL sel, Class cls, int behavior)
{
    /*
      method lookup
     enum {
         LOOKUP_INITIALIZE = 1, 0001
         LOOKUP_RESOLVER = 2,   0010
         LOOKUP_CACHE = 4,      0100
         LOOKUP_NIL = 8,        1000
     };
     behavior是其中的某个值
     因此behavior与这几个数相与,只有相等,才会不为0,如果不相等肯定会为0,以此来判断是否是这几个枚举值
     */
    
    
    //消息转发(报错方法)
    const IMP forward_imp = (IMP)_objc_msgForward_impcache;
    IMP imp = nil;
    Class curClass;

    runtimeLock.assertUnlocked();

    // Optimistic cache lookup
    //多线程
    /*
     这里是从动态方法解析的过程中来的
     也就是说明此处的方法调用是查找缓存
     */
    if (fastpath(behavior & LOOKUP_CACHE)) {
        imp = cache_getImp(cls, sel);
        if (imp) goto done_nolock;
    }

    // runtimeLock is held during isRealized and isInitialized checking
    // to prevent races against concurrent realization.

    // runtimeLock is held during method search to make
    // method-lookup + cache-fill atomic with respect to method addition.
    // Otherwise, a category could be added but ignored indefinitely because
    // the cache was re-filled with the old value after the cache flush on
    // behalf of the category.

    runtimeLock.lock();

    // We don't want people to be able to craft a binary blob that looks like
    // a class but really isn't one and do a CFI attack.
    //
    // To make these harder we want to make sure this is a class that was
    // either built into the binary or legitimately registered through
    // objc_duplicateClass, objc_initializeClassPair or objc_allocateClassPair.
    //
    // TODO: this check is quite costly during process startup.
    //是否为已知类,也就是是否已经被加载到内存中
    checkIsKnownClass(cls);

    //类的实现,(也就是是否将类的数据按照类的结构构造完成),需要将类和元类的继承链都要实现一下
    if (slowpath(!cls->isRealized())) {
        cls = realizeClassMaybeSwiftAndLeaveLocked(cls, runtimeLock);
        // runtimeLock may have been dropped but is now locked again
    }

    /*
        当从cache中没有查找到进入该方法时,behavior为0011,
        behavior & LOOKUP_INITIALIZE说明此处是进行查找初始化方法
        初始化,执行initialize函数
        这里可以看出只有在查找方法列表时才会调用initialize函数
     
        所以条件为:1)cache中没找到进入到方法列表中查找方法;2)且该类还没有被初始化
     */
    if (slowpath((behavior & LOOKUP_INITIALIZE) && !cls->isInitialized())) {
        cls = initializeAndLeaveLocked(cls, inst, runtimeLock);
        // runtimeLock may have been dropped but is now locked again

        // If sel == initialize, class_initialize will send +initialize and 
        // then the messenger will send +initialize again after this 
        // procedure finishes. Of course, if this is not being called 
        // from the messenger then it won't happen. 2778172
    }

    runtimeLock.assertLocked();
    curClass = cls;

    // The code used to lookpu the class's cache again right after
    // we take the lock but for the vast majority of the cases
    // evidence shows this is a miss most of the time, hence a time loss.
    //
    // The only codepath calling into this without having performed some
    // kind of cache lookup is class_getInstanceMethod().

    // unreasonableClassCount -- 表示类的迭代的上限
    /*
     1、类的方法查找
     2、查找父类为nil
     3、for循环用来查询父类的方法列表
     */
    //这个for循环用来循环查询父类
    for (unsigned attempts = unreasonableClassCount();;) {
        // curClass method list.
        Method meth = getMethodNoSuper_nolock(curClass, sel);
        if (meth) {
            //查找到,就返回imp,并存放到cache中
            imp = meth->imp;
            goto done;
        }

        /*
         1、给cureClass赋值superclass
         2、判断父类如果为nil,也就是NSObject的父类为nil,就开始默认转发
         */
        if (slowpath((curClass = curClass->superclass) == nil)) {
            // No implementation found, and method resolver didn't help.
            // Use forwarding.
            imp = forward_imp;
            break;
        }

        // Halt if there is a cycle in the superclass chain.
        //循环如果达到上限了,就提示内存损坏,不再执行
        if (slowpath(--attempts == 0)) {
            _objc_fatal("Memory corruption in class list.");
        }

        // Superclass cache.
        //得到父类的imp(从缓存中查找),最终返回只能是cache中存储的imp
        imp = cache_getImp(curClass, sel);
        
        //如果父类缓存中得到报错函数,就直接返回,找初始类的动态方法解析和消息转发
        //这里如果是报错函数,直接跳出开始默认转发
        if (slowpath(imp == forward_imp)) {
            // Found a forward:: entry in a superclass.
            // Stop searching, but don't cache yet; call method
            // resolver for this class first.
            /*
             如果在父类中查找到了报错函数,就停止搜索,并且不进行缓存,开始对当前类进行动态方法解析
             */
            break;
        }
        //如果父类存在该方法,则存入到初始类缓存中
        if (fastpath(imp)) {
            // Found the method in a superclass. Cache it in this class.
            goto done;
        }
    }

    //当上边的循环中遇到break退出循环时进入到这里
    // No implementation found. Try method resolver once.当没有查找到Imp时,尝试一次动态方法解析
    /*
     
     当从动态方法解析后再次进入该方法时,behavior为1100
     而LOOKUP_RESOLVER为0010,所以就不会进入。
     */
    //behavior这个作为标识,只能进一次
    if (slowpath(behavior & LOOKUP_RESOLVER)) {
        //这里是异或操作,不相等为1,相等为0,
        //如果如果可以进入这里,说明是xx1x,异或一下之后就变成xx0x
        behavior ^= LOOKUP_RESOLVER;
        //动态方法解析
        //这里的返回值不会是nil,如果查询不到返回的是forward_imp
        return resolveMethod_locked(inst, sel, cls, behavior);
    }

    //通过查看done发现如果没有查找到,不会存储进cache中,也就是说这里是不会存入forward_imp
    //只有查找到imp才会进入到done
 done:
    log_and_fill_cache(cls, imp, sel, inst, curClass);
    runtimeLock.unlock();
 done_nolock:
    //如果behavior为1xxx,与1000相与,就为YES,此时再加上查不到imp,就会返回nil
    //只有一种情况,那就是动态方法解析之后再次执行该函数,此时在cache中查询得到的是forward_imp,就会返回nil
    //这里很疑惑的一点,什么情况下会把forward_imp存入到缓存中
    if (slowpath((behavior & LOOKUP_NIL) && imp == forward_imp)) {
        return nil;
    }
    //如果是动态方法解析完成后再进入该方法一定会执行done_nolock,因为return是在done_nolock下面的
    return imp;
}

Copy the code

Code analysis:

  1. preparation
    1. Determine if sel and IMP are already present in cache_t
    2. Check whether the current class is a known class: Check whether the current class is an approved class, that is, a loaded class (analyzed when the class is loaded)
    3. The inheritance chain of this class and the related classes of the metaclass inheritance chain of this class are implemented again, which is convenient for the subsequent search of the parent class and the search of the class method (analysis during class loading).
    4. Determines whether the class is initialized. If not, the class is initialized, including the inheritance chain of the class and the inheritance chain of the class’s metaclass
  2. Methods the query
    1. Get the Method corresponding to SEL
    2. If so, go straight to IMP
    3. If the parent class is empty, an error is reported, and if the parent class exists, the method of the parent class is searched
  3. Parent method query
    1. The cache of the parent class is searched in the assembly again
    2. Parent class error method
    3. Find IMP exists, then continue to execute, stored in the current class cache
  4. Method dynamic parsing (not processed at first)

4.1.2 Overall process

The overall process: 1, initialize the class -> method list query -> loop parent class query -> dynamic method parsing 2, query the method need to save to the current class cache 3, if the parent class returns an error function or parent class nil return an error function, at this time to assign the value of error function, and start dynamic method parsing

Quick lookup: A quick lookup is performed when a dynamic method is parsed and a method is inserted into the cache by another thread

Initialize (class loading, class implementation, class initialization); initialize (class loading, class implementation, class initialization); initialize (class initialization); 4, Class is a bidirectly-linked list structure, the parent Class has its own subclass, the subclass has its own parent Class

Methods list query: 1, the list of methods query method using the binary search algorithm 2, list at the time of loading have been sorted (sel address sorting method), so you can use binary search method to quickly find 3, the binary search is better, because it is in the middle of the position are calculated through the starting position. Later changes only need to change the starting position

Parent query: (1) select * from parent; (2) select * from parent; (3) select * from parent; 4. The parent class’s cache is looped through the for loop

Schematic diagram:

Summary: 1, first determine the class loading, the realization of the class, class initialization, if all is complete, you can begin to query methodList 2, methodList query are implemented by the binary search method, the premise of binary search method is sorting, the class has been conducted in the process of the implementation of the sort. (sort by sel address) 3. Use the for loop to find the parent class, first look up the parent class cache, then look up the parent class’s methodList, cache is also found by assembly pull 4. The dynamic resolution and message forwarding of the method are also carried out

4.2 Binary search method

2 getMethodNoSuper_nolock

Source:

static method_t * getMethodNoSuper_nolock(Class cls, SEL sel) { runtimeLock.assertLocked(); ASSERT(cls->isRealized()); // fixme nil cls? // fixme nil sel? Auto const methods = CLS ->data()->methods(); // A Method list array contains multiple methods. For details, see the loading process of subsequent classes. For (auto mlists = methods.beginlists (), end = methods.endlists (); mlists ! = end; ++mlists) { // <rdar://problem/46904873> getMethodNoSuper_nolock is the hottest // caller of search_method_list, inlining it turns // getMethodNoSuper_nolock into a frame-less function and eliminates // any store from this codepath. Method_t *m = search_method_list_inline(*mlists, sel); if (m) return m; } return nil; }Copy the code

CLS ->data()->methods (); CLS ->data()->methods

  • First of all, in the underlying structure of the class, we can see that the methods function is used to obtain the method list array, which stores all the method list. The list of methods stores each method.
  • So you need to loop through the method list array to get each method list, and then do binary lookup on the method list

2 findMethodInSortedMethodList

By search_method_list_inline calls to findMethodInSortedMethodList, specific binary search algorithm in findMethodInSortedMethodList function

Source:

ALWAYS_INLINE static method_t * findMethodInSortedMethodList(SEL key, const method_list_t *list) { ASSERT(list); const method_t * const first = &list->first; const method_t *base = first; const method_t *probe; Uintptr_t keyValue = (uintptr_t)key; // Uint32_t count; /* For (count = list->count; /* for (count = list->count; count ! = 0; */ probe = base + (count >> 1) */ probe = base + (count >> 1) */ probe = base + (count >> 1); // Uintptr_t probeValue = (uintptr_t)probe->name; // Here is the method with the same name for the inserted category, If (keyValue == probeValue) {// 'probe' is a match. // Rewind looking for the *first* occurrence of this Value. // This is required for correct category overrides. // If it is not the first one and the previous probe has the same name, While (probe > first && keyValue == (uintptr_t)probe[-1].name) {probe--; } return (method_t *)probe; } // If it is on the right, set the starting position to probe+1, (count-1)/2. If (keyValue > probeValue) {base = probe + 1; count--; } } return nil; }Copy the code

Note: just a simple algorithm is not difficult, and the code level notes have been written in sufficient detail, it will not expand to say more. So here’s a little bit about the algorithm

Pay attention to the point

  1. The comparison is only for the middle position, not for the front and back positions, and the middle position is calculated by starting position and total number
  2. So when you calculate the total, you change the total every time, so you divide by 2, you move 1 to the right
  3. Calculation of starting position
    1. If the actual position is to the right of the middle position, the starting position needs to be changed, that is, the middle position probe+1. Because the probe is also subtracted, the total number needs to be subtracted by 1 first, and then by 2.
    2. If the actual position is to the left of the middle position, no change is required
  4. There’s a little detail here that’s a little hard to imagine, and that’s if there’s no middle place, if there’s only an even number in the middle, where does the middle place go
    1. Count >>1, and if you do a simple calculation you can see that the last digit is going to be erased, so it’s going to be rounded down
    2. That is, if it’s even, the middle position is going to be the one behind the middle.

5, multiple classification loading, the later the loading more in the front, so need to look forward, has been classified by the same name method.

2. The class is loaded later than the class, so the method of the classification will be in front of the method, and the later the loading of the classification will be in front of the method. 3. 4, sorting is by method selector address to row, judgment method is by method name to judge. Remember, a lot of people think that sorting is done by method name, but it’s not.

4.3 Parent Class Search process

The flow of the parent class focuses on finding the cache through assembly and then going back to C to find the list of methods.

Cache_getImp is too easy to find in the cache. Search in the assembly using cache_getImp and work your way down. I won’t say much here.

Source:

Reasonableclasscount ();); { // curClass method list. Method meth = getMethodNoSuper_nolock(curClass, sel); Meth ->imp = meth->imp; goto done; } /* 1, assign superclass to cureClass; 2, check if the parent is nil, that is, NSObject's parent is nil, Slowpath ((curClass = curClass->superclass) == nil)) {// No implementation found, and method resolver didn't help. // Use forwarding. imp = forward_imp; break; } // Halt if there is a cycle in the superclass chain. If (slowpath(--attempts == 0)) {_objc_fatal("Memory corruption in class list "); Imp = cache_getImp(curClass, sel); If (slowpath(imp == forward_imp)) {// Found a forward:: entry in a superclass. // Stop searching, but don't cache yet; Call method // resolver for this class first. } // If (fastPath (imp)) {// Found the method in a superclass. Cache it in this class.goto done; }}Copy the code

Description:

  1. Simple circular process
    1. Start by looking for the current class’s methodList
    2. Find the parent cache
    3. And look for the parent class’s methodList
  2. The cache_getImp method is implemented by assembly _cache_getImp, passing in $0 as GETIMP
  3. If a method implementation is found in the superclass cache, a jump to CacheHit is a hit and imp is returned directly
  4. If no method implementation is found in the parent cache, jump to CheckMiss or JumpMiss, and return nil by checking $0 to LGetImpMiss

Process diagram:

Summary:

  1. If the current class query does not find methodList, the parent class’s cache is searched first
  2. If the parent’s cache is not found, the parent’s list of methods is searched
  3. If the parent is still not found, keep looking until the parent is nil, and when the parent is nil, that means the current class is an NSObject class. So it finds the head, and then it exits and starts doing dynamic method resolution.
  4. The parent class finds the method and stores it in the current class’s cache.

4.4 summarize

  1. The slow lookup stream looks for the method list of the class, then looks for the cache and method list of the parent class, until it finds the NSObject class. If it doesn’t find the NSObject class, it starts dynamic method parsing. After parsing, it looks for the method list again. If it is still not found, forward_IMP is returned, which is an error function. If found, it is stored in the current class’s cache.
  2. The search of method list is realized by binary search method. When the search is done, it will continue to search forward in the method list to find the same method in the classification.

5. Dynamic method analysis

LookUpImpOrForward breaks out of the loop when neither the current class nor its parent class finds a method and starts dynamic method resolution, which we’ll look at in the following section.

5.1 Overall Description

In the past, we all know that dynamic method parsing is done by resovleInstanceMethod method or resolveClassMethod method to add an IMP to the original SEL dynamic. After the value is added, query it in the cache or method list to send sel messages correctly.

So our next task is to look at how this process is implemented at the bottom.

5.2 Understanding of Behavior

The behavior parameter is passed into lookUpImpOrForward and is used to determine which code needs to be executed and which does not. To understand the code flow, it is important to understand how the behavior is used

Definition:

method lookup enum { LOOKUP_INITIALIZE = 1, 0001 LOOKUP_RESOLVER = 2, 0010 LOOKUP_CACHE = 4, 0100 LOOKUP_NIL = 8, 1000};Copy the code

It can be seen that there are four types of behavior, behavior is equal to these numbers, only if it is equal, it will not be 0, if it is not equal, it will definitely be 0, so as to determine whether it is one of these enumerated values.

There are four execution decisions: initialization, dynamic method resolution, cache lookup, and return nil

A global search lookup allows you to see where the function is called and whether the four executions are performed.

1, if is come in the assembly, which is not found in the cache imp, the behaviors of 0011, LOOKUP_INITIALIZE | LOOKUP_RESOLVER, initialized can parse and dynamic method. 2, if it is in through lookUpIMpOrNil, behaviors of 1100, behaviors | LOOKUP_CACHE | LOOKUP_NIL, If class_getInstanceMethod is used, the behavior is 0010 and LOOKUP_RESOLVER is used. \color{red}{this method is called in the redirection of the message being forwarded.} This method is called in the redirection of the message being forwarded. Behaviors of 0100, behaviors | LOOKUP_CACHE, can dynamically method resolution.

Specific use:

LOOKUP_CACHE

*/ fastPath (behavior & LOOKUP_CACHE)) {imp =. */ Optimistic cache lookup // multithreaded /*  cache_getImp(cls, sel); if (imp) goto done_nolock; }Copy the code

LOOKUP_INITIALIZE

Behavior & LOOKUP_INITIALIZE; /* If the behavior is not initialized from the cache, the behavior is 0011. Initialize (); initialize (); initialize (); initialize (); Slowpath ((behavior & LOOKUP_INITIALIZE) &&! cls->isInitialized())) { cls = initializeAndLeaveLocked(cls, inst, runtimeLock); // runtimeLock may have been dropped but is now locked again // If sel == initialize, class_initialize will send +initialize and // then the messenger will send +initialize again after this // procedure finishes. Of course, if this is not being called // from the messenger then it won't happen. 2778172 }Copy the code

LOOKUP_RESOLVER

When you re-enter the method after parsing from the dynamic method, behavior is 1100 and LOOKUP_RESOLVER is 0010, so it won't be entered. If (slowpath(behavior & LOOKUP_RESOLVER)) {// if (slowpath(behavior & LOOKUP_RESOLVER)) { Xx0x behavior ^= LOOKUP_RESOLVER; Forward_imp return resolveMethod_locked(inst, sel, CLS, behavior); }Copy the code

LOOKUP_NIL

// if the behavior is 1xxx and the behavior is equal to 1000, the behavior is equal to nil. If the behavior is equal to 1000, the behavior is equal to 1xxx, and the behavior is equal to nil. If (slowpath((behavior & LOOKUP_NIL) && imp == forward_imp)) {return nil; }Copy the code

5.3 lookUpImpOrForward Starts dynamic method resolution

Source:

// No implementation found. Try method resolver once When the Imp is not found, try a dynamic method resolution /* When you re-enter the method after parsing from the dynamic method, the behavior is 1100 and LOOKUP_RESOLVER is 0010, so you won't enter. If (slowpath(behavior & LOOKUP_RESOLVER)) {// if (slowpath(behavior & LOOKUP_RESOLVER)) { Xx0x behavior ^= LOOKUP_RESOLVER; Forward_imp return resolveMethod_locked(inst, sel, CLS, behavior); }Copy the code

Description:

  1. When you break out of the loop in the code above, you begin to enter dynamic method resolution. As analyzed above, if the method is not found, it breaks. So it is when no method is found that dynamic method parsing begins
  2. A class only performs dynamic method resolution once per slow lookup
  3. Behavior to determine, avoid dynamic method after parsing to query IMP again, resulting in a loop
  4. In resolveMethod_locked, lookUpImpOrForward is called again and the behavior is passed in as an argument, so entering the method does not enter dynamic parsing.
  5. Dynamic method resolution is finally performed using the resolveMethod_locked function.

5.4 Dynamic method parsing starts in resolveMethod_locked

Source:

static NEVER_INLINE IMP resolveMethod_locked(id inst, SEL sel, Class cls, int behavior) { runtimeLock.assertLocked(); ASSERT(cls->isRealized()); runtimeLock.unlock(); // Instead of a metaclass, use instance method parsing directly if (! cls->isMetaClass()) { // try [cls resolveInstanceMethod:sel] resolveInstanceMethod(inst, sel, cls); } // If it is a metaclass, You need to call class methods else {/ / try [nonMetaClass resolveClassMethod: sel] / / and/CLS resolveInstanceMethod: sel ResolveClassMethod (inst, sel, CLS); resolveClassMethod(int, sel, CLS); /* if (! = 0) /* if (! = 0) /* if (! = 0) /* if (! = 0) LookUpImpOrNil (inst, sel, CLS) {resolveInstanceMethod(inst, sel, CLS)) {resolveInstanceMethod(inst, sel, CLS); // chances are that calling the resolver have populated the cache // so attempt using it // Multithreaded, the method may have been added // Behavior is xx0x, So the query is not to enter into a dynamic method resolution return lookUpImpOrForward (inst, sel, CLS, behaviors | LOOKUP_CACHE); }Copy the code

Description:

  1. If the incoming class is not a metaclass, the instance method is resolved by calling resolveInstanceMethod()
  2. If the class passed in is a metaclass, the class method is resolved by calling resolveClassMethod()
  3. After parsing is complete, lookUpImpOrForward is called again for the query method. It can be seen that the behavior here is XX0x, so this query will not enter the dynamic method resolution.
  4. LOOKUP_CACHE is used to allow for multiple threads that might already be executing the method

5.5 resolveInstanceMethod Dynamically resolves instance methods

Source:

/*********************************************************************** * resolveInstanceMethod * Call +resolveInstanceMethod, looking for a method to be added to class cls. * cls may be a metaclass or a non-meta class. It could be a metaclass, It is also possible that a non-metaclass * Does not check if the method already exists. **********************************************************************/ static void resolveInstanceMethod(id inst, SEL sel, Class cls) { runtimeLock.assertUnlocked(); ASSERT(cls->isRealized()); Resolve_sel = @selector(resolveInstanceMethod:); /* CLS is a metaclass, and the root metaclass is resolveInstanceMethod. */ if (! lookUpImpOrNil(cls, resolve_sel, cls->ISA())) { // Resolver not implemented. return; } BOOL (*msg)(Class, SEL, SEL) = (typeof(msg))objc_msgSend; // call an objc_msgSend function to execute resolveInstanceMethod: If CLS is a metaclass, the resolveInstanceMethod function is also executed. If CLS is a metaclass, the resolveInstanceMethod function is executed. ResolveInstanceMethod */ bool resolved = MSG (CLS, resolve_sel, sel) // Cache the result (good or bad) so the resolver doesn't fire next time. // +resolveInstanceMethod adds to self a.k.a. Error code: 'resolved'; error code: '1'; error code: '1'; error code: '1'; error code: '1' IMP IMP = lookUpImpOrNil(INST, SEL, CLS); Resolved && PrintResolving) {if (imp) {_objc_inform("RESOLVE: RESOLVE: method %c[%s %s] " "dynamically resolved to %p", cls->isMetaClass() ? '+' : '-', cls->nameForLogging(), sel_getName(sel), imp); } else { // Method resolver didn't add anything? _objc_inform("RESOLVE: +[%s resolveInstanceMethod:%s] returned YES" ", but no new implementation of %c[%s %s] was found", cls->nameForLogging(), sel_getName(sel), cls->isMetaClass() ? '+' : '-', cls->nameForLogging(), sel_getName(sel)); }}}Copy the code

Perform the following steps: 1. Check whether the metaclass or root metaclass implements resolveInstanceMethod. 2

Description:

  1. ResolveInstanceMethod can pass in either a class or a metaclass, which means it can parse class methods dynamically.
  2. The resolveInstanceMethod method will be executed first.
  3. After execution, imp lookUpImpOrNil is queried again, without checking whether resolveInstanceMethod returns YES.
  4. The resolveInstanceMethod method returns a value for printing, and does nothing else to see if an IMP is added to sel
  5. LookUpImpOrNil calls to lookUpImpOrForward pass behavior 0011, so no dynamic method resolution is performed again.
  6. So we just need to add imp to sel in the resolveInstanceMethod method to ensure that the query is successful

lookUpImpOrNil

lookUpImpOrNil(id obj, SEL sel, Class cls, Int behaviors = 0) {/ / behaviors | LOOKUP_CACHE | LOOKUP_NIL for 1100 return lookUpImpOrForward (obj, sel, CLS. behavior | LOOKUP_CACHE | LOOKUP_NIL); }Copy the code

5.6 resolveClassMethod Dynamically resolves class methods

Source:

/*********************************************************************** * resolveClassMethod * Call +resolveClassMethod, looking for a method to be added to class cls. * cls should be a metaclass. CLS is a metaclass * Does not check if the method already exists. **********************************************************************/ static void resolveClassMethod(id inst, SEL sel, Class cls) { runtimeLock.assertUnlocked(); ASSERT(cls->isRealized()); ASSERT(cls->isMetaClass()); ResolveClassMethod (resolveClassMethod); lookUpImpOrNil(inst, @selector(resolveClassMethod:), cls)) { // Resolver not implemented. return; } Class nonmeta; { mutex_locker_t lock(runtimeLock); nonmeta = getMaybeUnrealizedNonMetaClass(cls, inst); // + Initialize Path should have realized nonmeta already if (! nonmeta->isRealized()) { _objc_fatal("nonmeta class %s (%p) unexpectedly not realized", nonmeta->nameForLogging(), nonmeta); } } BOOL (*msg)(Class, SEL, SEL) = (typeof(msg))objc_msgSend; Bool resolved = MSG (nonmeta, @selector(resolveClassMethod:), sel); // Cache the result (good or bad) so the resolver doesn't fire next time. // +resolveClassMethod adds to self->ISA() IMP IMP = lookUpImpOrNil(INST, SEL, CLS); if (resolved && PrintResolving) { if (imp) { _objc_inform("RESOLVE: method %c[%s %s] " "dynamically resolved to %p", cls->isMetaClass() ? '+' : '-', cls->nameForLogging(), sel_getName(sel), imp); } else {// Method resolver didn't add anything? _objc_inform("RESOLVE: +[%s resolveClassMethod:%s] returned YES" ", but no new implementation of %c[%s %s] was found", cls->nameForLogging(), sel_getName(sel), cls->isMetaClass() ? '+' : '-', cls->nameForLogging(), sel_getName(sel)); }}}Copy the code

Check whether there is a resolveClassMethod in the current metaclass. Obtain the original class of the metaclass. Execute resolveClassMethod

Description:

  1. The CLS passed in here is just a metaclass, not a class
  2. Dynamic method resolution for class methods
  3. Internally, the resolveClassMethod method is executed
  4. Through nonmeta = getMaybeUnrealizedNonMetaClass (CLS, inst); Gets the initial class of the metaclass.
  5. As you can see, when calling a class method, the recipient parameter passed in the objc_msgSend function must also be a class, not a metaclass. Although the method is stored in a metaclass, the message recipient is still a class.
  6. After resolveClassMethod is executed, send message again to query IMP

5.7 Answers to some questions

When CLS is a metaclass, resolveClassMethod is executed once, and lookUpImpOrNil fails, resolveInstanceMethod is executed again. Why is that?

(resolveClassMethod) (resolveClassMethod) (resolveClassMethod) (resolveClassMethod) (resolveClassMethod) (resolveClassMethod) (resolveClassMethod) (resolveClassMethod) ResolveInstanceMethod (resolveInstanceMethod, resolveInstanceMethod, resolveInstanceMethod, resolveInstanceMethod, resolveInstanceMethod) And then query in the second way.

This is not caused by NSObject in the metaclass inheritance chain, because the resolveClassMethod is also called directly from the original class, and if it is not a metaclass, it is called directly from itself.

Next you need to look at calling lookUpImpOrNil to return the nil condition.

Look at lookUpImpOrNil

static inline IMP lookUpImpOrNil(id obj, SEL sel, Class cls, Int behaviors = 0) {/ / behaviors | LOOKUP_CACHE | LOOKUP_NIL for 1100 return lookUpImpOrForward (obj, sel, CLS. behavior | LOOKUP_CACHE | LOOKUP_NIL); }Copy the code

Note: You can see here that the behavior passed in contains LOOKUP_NIL. We know from analyzing behavior that LOOKUP_NIL is only judged when called here.

Look at lookUpImpOrForward again

Cache lookup

*/ fastPath (behavior & LOOKUP_CACHE)) {imp =. */ Optimistic cache lookup // multithreaded /*  cache_getImp(cls, sel); if (imp) goto done_nolock; }Copy the code

Explanation: This is also the only place to enter Done_nolock. And when imp is queried, run done_NOLock

done_nolock

done_nolock: // if the behavior is 1xxx and the behavior is equal to 1000, the behavior is equal to nil. If the behavior is equal to 1000, the behavior is equal to 1xxx, and the behavior is equal to nil. If (slowpath((behavior & LOOKUP_NIL) && imp == forward_imp)) {return nil; }Copy the code

Description:

  • In nonlock if IMP is forward_IMP, it just returns nil
  • Behavior is 1100, so (behavior & LOOKUP_NIL) must be 1

Summary: So if the imp in cache is forward_IMP after dynamic parsing, nil is returned and resolveInstanceMethod is executed

5.8 If you have some questions that you don’t understand, please let me know

1. How can CLS passed in be metaclass when we call methods? Who would pass a metaclass directly, and looking at where lookUpImpOrForward is called doesn’t show where the metaclass is passed, which is strange.

LookUpImpOrNil has already been executed once in resolveInstanceMethod or resolveClassMethod. Why lookUpImpOrForward again?

Guess:

  • Although the method is dynamically resolved, lookUpImpOrNil is also queried again
  • However, the last lookUpImpOrForward execution hasn’t returned a value yet, so we’ll need to do it here
  • LOOKUP_CACHE is used to query the behavior in the cache because it may already be cached during the second query during dynamic method parsing

5.9 Simple Verification

To determine mehtodDynamically modified SEL classes by resolveInstanceMethod As a function of the mehtodDynamically implementation will resolveInstanceMethodTest function.

Code:

/* Add imp to sel; /* Add IMP to sel */ + (BOOL)resolveInstanceMethod (SEL) SEL { If (sel == @selector(mehtodDynamically)) {if (sel == @selector(mehtodDynamically) Parameter type (involves the type code) class_addMethod ([self class], sel, (IMP) resolveInstanceMethodTest, "v @ :"); return YES; } if (sel == @selector(eat)) { return NO; } return [super resolveInstanceMethod:sel]; } // The argument must be written this way, because it is already written this way at the bottom. Be consistent. Void resolveInstanceMethodTest (id self, SEL _cmd) {NSLog (@ "hi, I'm a mehtodDynamically implementation by dynamic analytic function"); WYStudent *student = [WYStudent alloc]; [student mehtodDynamically];Copy the code

Results:

Hi, I am a Dynamically resolved function as mehtodDynamically implementedCopy the code

5.10 summarize

  1. A slow lookup looks for the method list of the current class. If the method list does not exist either, the parent class’s cache and method list are queried
  2. Slow lookup: the method list of the current class is searched, then the parent’s cache and the parent’s method list are searched, until NSObject does not find it, then dynamic method parsing begins, and then the method list is searched again. If it is still not found, forward_IMP is returned, which is an error function
  3. Finding imp by SEL in the method list is obtained by binary lookup
  4. Dynamic method resolution is implemented by resolveInstanceMethod and resolveClassMethod.

6. Message forwarding

In above, we start from objc_msgSend query, check the cache list of processes, methods, search process, the method of dynamic analytic process, but a dynamic method after parsing performed lookUpImpOrForward again, if not find method, error function will be assigned to the imp, Continue to look for source code and found no message forwarding related code. So the process for this method call has ended, which is the process for objc_msgSend.

People say that message forwarding happens after dynamic method parsing, so how does that happen?

When you look at the official documentation and know that an object cannot respond to a message because there is no method implementation, the run-time system will notify the object via forwardInvocation: message.

So we can do this by forwarding messages to other objects while notifying this object.

Note:

  • Therefore, in this sense, message forwarding is not the process of message sending, but the real process of message sending is the process of objc_msgSend, which we have analyzed above, and finally comes to the end of dynamic method parsing.
  • We can do this through message forwarding because the system sends a notification to this object when a message fails to be sent. So we forward the message in this notification.
  • Message sending is sending a message to this object, and message forwarding is actually out of this category.

6.1 Analysis of Message Forwarding

Although through the official document know in the dynamic method after parsing if still did not find IMP message forwarding to inform the method, but did not find the part of the code in the source code, so how to analyze its specific execution process?

The first easy thought is to decompile to look at the underlying implementation. Because the top Apple did not provide us with the source code implementation, but at compile time must be there, we can see the process through decompilation.

Another way is to look at what methods were executed before the crash. The method after dynamic method resolution and before the error method is the message forwarding method.

6.1 Analysis of Hopper decompilation

The use of decompilation, it is difficult to understand, if I go into detail, the blog is too big… , so I will only give a brief introduction here, and I will blog about the use of decompilation in more detail.

General view source code is to see how to compile the upper code to the bottom code, that is, the effect of assembly, if the bottom code can not be viewed, can only disassemble, from the bottom code into the upper code.

Hopper and IDA are tools that help us statically analyze visibility files, disassemble executable files into pseudocode, control flowcharts, and more. Here we use Hopper.

[Step 1] : Get the image file first

  • The image list command retrieves all image file paths

  • The image file is obtained from the path. Procedure

Step 2: Use Hopper Disassembler to open the image file

  • Search globally for the method you want to find, you will enter this interface
  • Then you can look it up using the method.

6.2 instrumentObjcMessageSends way print sends the message log

InstrumentObjcMessageSends can be used to print method call information, so we can use it to view the method invocation process, does it include message forwarding.

6.2.1 Turn on objcMsgLogEnabled

Through lookUpImpOrForward – > log_and_fill_cache – > logMessageSend, found at the bottom of the logMessageSend source instrumentObjcMessageSends source code to achieve, so, In the main call instrumentObjcMessageSends print method call log information.

Code:

extern void instrumentObjcMessageSends(BOOL flag); int main(int argc, const char * argv[]) { @autoreleasepool { LGPerson *person = [LGPerson alloc]; instrumentObjcMessageSends(YES); [person sayHello]; instrumentObjcMessageSends(NO); NSLog(@"Hello, World!" ); } return 0; }Copy the code
  • The external reference needs to be imported or an error will be reported, which tells the compiler to load the file in another file
  • Turn on the switch and set it to YES
  • After enabling the function, set it to NO to avoid affecting other areas

6.2.1 Run the code and go to the/TMP/msgsend directory

  • LogMessageSend: Send messages to the/TMP /msgSends directory
  • After a run, you can go to this directory to find log files

6.2.1 Viewing Log Files

  • Two dynamic method resolutions: the resolveInstanceMethod method
  • Two news fast forwarding: forwardingTargetForSelector method
  • Two news slowly forward: methodSignatureForSelector + resolveInvocation

6.3 Message Receiver Redirection

An EAT method was created in WYCat with an implementation. We don’t have this method in WYPerson, so we call it through WYPerson to see if we can redirect the message receiver to CAT.

WYCat source code:

@interface WYCat : NSObject @property (nonatomic ,assign ,readonly) int age; @property (nonatomic, copy,readwrite) NSString *name; - (void)eat; @end@implementation WYCat - (void)eat{NSLog(@" hello, I'm cat, but I'm called by Person "); } @endCopy the code

WYPerson source code:

@interface WYPerson : NSObject - (void) runtimeTest; - (void)mehtodDynamically; // There is no method to implement - (void)eat; - (void)getCatProperty; - (void) msgSendSuperTest; @ end @ implementation WYPerson / / return the receiver object - (id) forwardingTargetForSelector (SEL) aSelector {if (aSelector = = @selector(eat)) { return [[WYCat alloc]init]; } return [super forwardingTargetForSelector:aSelector]; // return nil; } @endCopy the code

The main call

// Message recipients redirect WYPerson *person = [WYPerson alloc]; [person eat];Copy the code

Running results:

2021-10-17 15:12:28.041183+0800 Message send [85460:1551709] Hello everyone, although I am CAT, I am called by PersonCopy the code

Conclusion:

You can see that when we call WYPerson’s eat method, it doesn’t have a function implementation of eat. But we can use the message receiver redirection to determine if the current method is EAT, change the message receiver to WYCat, and then eat will do the execution.

6.4 Message Redirection

The EAT method is not implemented in the WYPerson class and returns nil in the message receiver redirect method. We change selector pointing or message receiver pointing in message redirection.

WYPerson function implementation:

/* Return a method signature object, Said the function return value type and parameter types * / - (NSMethodSignature *) methodSignatureForSelector: (SEL) aSelector {the if ([NSStringFromSelector(aSelector) isEqualToString:@"eat"]) { return [NSMethodSignature signatureWithObjCTypes:"v@:"]; } return [super methodSignatureForSelector:aSelector]; } //forwardInvoWYCation notifies the current object and passes the NSInvoWYCation message. */ - (void)forwardInvocation (NSInvocation *)anInvocation{//1, WYCat *cat = [[WYCat alloc] init]; //anInvoWYCation is a message, The method selector for this message if ([CAT respondsToSelector:[anInvocation]]) {// Redirects the receiver and sends the message // [anInvocation] invokeWithTarget:cat]; // Re-invocation of the message receiver // anInvocation. Target = cat; // Re-invocation of the selector. Selector = @selector(forwardInvocationTest); // Invocation invocation [anInvocation invoke]; } else { [super forwardInvocation:anInvocation]; }} - (void)forwardInvocationTest{NSLog(@" hello, I am a message redirection function, if not find eat, will execute me "); }Copy the code

Running results:

Hello, I am an implementation function of message redirection, if I can't find the function eat, I will be executedCopy the code

Note:

  • Change the message in the Message Receiver re-invocation, NSInvocation is the message, and go to the class to see what we can change.
  • It is found that there are only two message contents that can be changed: Target and selector, which represent message receiver and method selector respectively.
  • NSMethodSignature are read-only, we can’t change, must be set in the methodSignatureForSelector method.
@interface NSInvocation : NSObject

+ (NSInvocation *)invocationWithMethodSignature:(NSMethodSignature *)sig;

@property (readonly, retain) NSMethodSignature *methodSignature;

- (void)retainArguments;
@property (readonly) BOOL argumentsRetained;

@property (nullable, assign) id target;
@property SEL selector;

- (void)getReturnValue:(void *)retLoc;
- (void)setReturnValue:(void *)retLoc;

- (void)getArgument:(void *)argumentLocation atIndex:(NSInteger)idx;
- (void)setArgument:(void *)argumentLocation atIndex:(NSInteger)idx;

- (void)invoke;
- (void)invokeWithTarget:(id)target;

@end
Copy the code
  • It is important to note that methodSignature must match the real methodSignature of the function, otherwise the function will still not be found by a mismatch

As proved by the following code, it does still prompt that it cannot be found because of a mismatch

- (void)forwardInvocationTest:(NSString *) ABC {NSLog(@" hello, I'm an implementation of message redirection, if I can't find eat, will execute me "); }Copy the code
  • MethodSignatureForSelector is must be written, first need to set up the method signature

6.5 summarize

  1. The process of message forwarding does not belong to message sending, but simply sends a notification to the message receiver when the message fails to be sent. In this notification, we change the selector or receiver of the message to achieve the second sending of the message.
  2. Message receiver redirection can only modify the message receiver and forward it to another object to execute the function of the same name
  3. Message redirection modifies message receivers and method selectors, that is, a method that can be forwarded to another object
  4. Message receiver redirection must pass methodSignatureForSelector setting method signature.

7, error function recognition

LookUpImpOrForward in the message sending process returns a forwarding function called _objc_msgForward_impcache if a quick lookup, a slow lookup, and dynamic method resolution fail. It is used to report errors, so we began to analyze it to see the last action after a failed message was sent.

7.1 Finding an error Function

Here you see the error function _objc_msgForward_impcache

// method forwarding (error) const IMP forward_imp = (IMP)_objc_msgForward_impcache;Copy the code

7.2 Assembly Search

  • First global search _objc_msgForward_impcache, found in assembly
  • Perform to __objc_msgForward
  • __objc_msgForward executes __objc_forward_handler, so look for __objc_forward_handler next

7.3 Finding __objc_forward_handler

Global search, not found in assembly, guess may be in C language, so remove an underscore in the source code for global search _objC_forward_handler, found that the final error function is objc_defaultForwardHandler.

Find the source code as follows:

void *_objc_forward_handler = (void*)objc_defaultForwardHandler;
Copy the code

7.4 Analysis of the objc_defaultForwardHandler Function

Check the code of the error function, and find that we can not find the error reported by the function when we normally execute the method, and finally find the header. The message is sent. Procedure

Objc_defaultForwardHandler source

// Default forward handler halts the process.
__attribute__((noreturn, cold)) void
objc_defaultForwardHandler(id self, SEL sel)
{
    _objc_fatal("%c[%s %s]: unrecognized selector sent to instance %p "
                "(no message forward handler is installed)", 
                class_isMetaClass(object_getClass(self)) ? '+' : '-', 
                object_getClassName(self), sel_getName(sel), self);
}
Copy the code

8, summary

We call from the upper method, look at the underlying implementation through Clang, and see that the underlying message is sent through objc_msgSend. After that, we explore objc_msgSend step by step, find the cache lookup in assembly, look up the list of lookUpImpOrForward methods and dynamic method resolution, find the secondary rescue of messages by message forwarding through official documents and print logs, and finally find the common error function. This is the end of everything we’ve experienced in calling a method.