preface

I explored objc_msgSend’s cache lookup (quick lookup) process. This article will follow with the method list lookup (slow lookup) process when there is no cache.

1: __objc_msgSend_uncachedProcess analysis

Objc_msgSend looks for IMPs in the cache of a class or metaclass based on SEL. If the CORRESPONDING IMP is not found in the cache, it will jump to MissLabelDynamic (__objc_msgSend_uncached process) and look for the CORRESPONDING IMP in the method list.

Search for __objc_msgSend_uncached in objC4-818.2 source code, then hold command, click on the small arrow next to the file name, wrap up all files, select objc-msg-arm64.s and click open. For __objC_msgSend_cached, select STATIC_ENTRY __objc_msgSend_uncached to start viewing the assembly source for __objc_msgSend_uncached.

Note: Support for __has_feature(ptrauth_calls) pointer authentication (A12 and later bionic chips) is not analyzed in this article.

Illustration:

1.1: __objc_msgSend_uncachedCompile the source code

    STATIC_ENTRY __objc_msgSend_uncached
    UNWIND __objc_msgSend_uncached, FrameWithNoSaves

    // THIS IS NOT A CALLABLE C FUNCTION
    // Out-of-band p15 is the class to search
    
    // Look for imp for _cmd
    MethodTableLookup
    // x17 = IMP
    TailCallFunctionPointer x17

    END_ENTRY __objc_msgSend_uncached
Copy the code
  • MethodTableLookupAccording to thex1the_cmdLook for the corresponding method in the class or metaclass listIMP.
  • TailCallFunctionPointerCall foundIMP.

1.2: MethodTableLookupCompile the source code

.macro MethodTableLookup
	
    SAVE_REGS MSGSEND
    
    // /* method lookup */
    // enum {
    // LOOKUP_INITIALIZE = 1,
    // LOOKUP_RESOLVER = 2,
    // LOOKUP_NIL = 4,
    // LOOKUP_NOCACHE = 8,
    // };
    
    // lookUpImpOrForward(obj, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER)
    // receiver and selector already in x0 and x1
    // x2 = x16 = class
    mov	x2, x16
    // x3 = LOOKUP_INITIALIZE | LOOKUP_RESOLVER = 3
    mov	x3, #3
    // The official comment says everything and calls lookUpImpOrForward
    // lookUpImpOrForward(obj, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER)
    // bl: b: jump l: link register // Before jump _lookUpImpOrForward,
    // Save the address of the next instruction in the LR register, i.e. save the address of the instruction (mov x17, x0) in the LR register
    // After _lookUpImpOrForwar, execute the address in the LR register
    bl	_lookUpImpOrForward
    
    // X0 is the first register and the return value register, in this case the return value IMP of _lookUpImpOrForward
    // IMP in x0
    // x17 = x0 = IMP
    mov	x17, x0

    RESTORE_REGS MSGSEND

.endmacro
Copy the code
  • The official notes say it all,x0forreceiver.x1forselector.mov x2, x16That will bex16(a class or metaclass) assign a value tox2.mov x3, #3That will bex3The assignment for3, and then call the functionlookUpImpOrForward(obj, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER).
  • mov x17, x0.x0This is the first register, which is also the return value register_lookUpImpOrForwardThe return value of theIMP, will be foundIMPAssigned tox17.

1.3: TailCallFunctionPointerCompile the source code

#if __has_feature(ptrauth_calls)
// JOP

.macro TailCallFunctionPointer
    // $0 = function pointer value
    braaz	$0
.endmacro
// JOP
#else
// not JOP
.macro TailCallFunctionPointer
    // $0 = function pointer value
    // Jump to $0, i.e. call IMP
    br	$0
.endmacro
// not JOP
#endif
Copy the code
  • br $0Jump,$0That callIMP.

1.4: __objc_msgSend_uncachedThe flow chart

2: lookUpImpOrForwardThe function process

_lookUpImpOrForward in MethodTableLookup calls lookUpImpOrForward, and a global search for _lookUpImpOrForward will only find the calling code in the assembly file. No implementation code, so search globally for lookUpImpOrForward, hold command and click the little arrow in front of the file name to fold all the files, then click the little arrow in objc-Runtime-new.mm again, Open and find the lookUpImpOrForward function to see the relevant source code.

2.1: lookUpImpOrForwardFunction source parsing

NEVER_INLINE
IMP lookUpImpOrForward(id inst, SEL sel, Class cls, int behavior)
{
    // behavior = 3 = LOOKUP_INITIALIZE | LOOKUP_RESOLVER
    // 指定消息转发的forward_imp
    const IMP forward_imp = (IMP)_objc_msgForward_impcache;
    IMP imp = nil;
    Class curClass;

    runtimeLock.assertUnlocked();
    
    // 判断类或元类是否初始化,如果没有初始化,
    // behavior = LOOKUP_INITIALIZE | LOOKUP_RESOLVER | LOOKUP_NOCACHE = 11
    // 二进制为1011
    if (slowpath(!cls->isInitialized())) {
        // The first message sent to a class is often +new or +alloc, or +self
        // which goes through objc_opt_* or various optimized entry points.
        //
        // However, the class isn't realized/initialized yet at this point,
        // and the optimized entry points fall down through objc_msgSend,
        // which ends up here.
        //
        // We really want to avoid caching these, as it can cause IMP caches
        // to be made with a single entry forever.
        //
        // Note that this check is racy as several threads might try to
        // message a given class for the first time at the same time,
        // in which case we might cache anyway.
        behavior |= LOOKUP_NOCACHE;
    }

    // runtimeLock is held during isRealized and isInitialized checking
    // to prevent races against concurrent realization.

    // runtimeLock is held during method search to make
    // method-lookup + cache-fill atomic with respect to method addition.
    // Otherwise, a category could be added but ignored indefinitely because
    // the cache was re-filled with the old value after the cache flush on
    // behalf of the category.
    
    // 加锁,保证线程安全
    runtimeLock.lock();

    // We don't want people to be able to craft a binary blob that looks like
    // a class but really isn't one and do a CFI attack.
    //
    // To make these harder we want to make sure this is a class that was
    // either built into the binary or legitimately registered through
    // objc_duplicateClass, objc_initializeClassPair or objc_allocateClassPair.
    
    // 检查类是否注册(是否是被dyld加载的类),防止被伪装的类进行攻击
    checkIsKnownClass(cls);
    
    // 递归实现类、父类和元类
    // 初始化类和父类
    // 此处不分析,后面单独发文分析类的加载和初始化
    cls = realizeAndInitializeIfNeeded_locked(inst, cls, behavior & LOOKUP_INITIALIZE);
    // runtimeLock may have been dropped but is now locked again
    runtimeLock.assertLocked();
    curClass = cls;

    // The code used to lookup the class's cache again right after
    // we take the lock but for the vast majority of the cases
    // evidence shows this is a miss most of the time, hence a time loss.
    //
    // The only codepath calling into this without having performed some
    // kind of cache lookup is class_getInstanceMethod().
    // 获取锁后,代码再次查找类的缓存,但绝大多数情况下,证据表明大部分时间都未命中,因此浪费时间。
    // 唯一没有执行某种缓存查找的代码路径就是class_getInstanceMethod()。
    
    // unreasonableClassCount,类迭代上限,函数注释翻译得到
    // 死循环根据sel查找IMP,根据break,goto等语句退出
    for (unsigned attempts = unreasonableClassCount();;) {
        // 判断是否有共享缓存缓存优化,一般是系统的方法比如NSLog,一般的方法不会走
        if (curClass->cache.isConstantOptimizedCache(/* strict */true)) {
#if CONFIG_USE_PREOPT_CACHES
            /*
            支持共享缓存,再次查询共享缓存,目的可能在你查询过程中
            别的线程可能调用了这个方法,共享缓存中有了
            */
            // 根据sel在缓存中查找IMP
            imp = cache_getImp(curClass, sel);
            // 找到IMP就跳转done_unlock流程
            if (imp) goto done_unlock;
            curClass = curClass->cache.preoptFallbackClass();
#endif
        } else {
            // curClass method list.
            // 二分查找法在curClass中根据sel查找IMP
            Method meth = getMethodNoSuper_nolock(curClass, sel);
            if (meth) {                 // 找到了sel对于的方法
                imp = meth->imp(false); // 获取对于的IMP
                goto done;              // 跳转done流程
            }
            
            // 获取父类,父类为nil,走if里面的流程,不为nil,就继续下面流程
            if (slowpath((curClass = curClass->getSuperclass()) == nil)) {
                // No implementation found, and method resolver didn't help.
                // Use forwarding.
                // 按照继承链(cls->supercls->nil)一直查找到nil都没有查找到sel对应的IMP
                // 动态方法决议也没起作用,就开始消息转发
                imp = forward_imp;
                break;
            }
        }

        // Halt if there is a cycle in the superclass chain.
        // 如果父类链中有一个循环,则停止。
        if (slowpath(--attempts == 0)) {
            _objc_fatal("Memory corruption in class list.");
        }

        // Superclass cache.
        // 父类缓存中根据sel查找IMP
        imp = cache_getImp(curClass, sel);
        if (slowpath(imp == forward_imp)) {
            // Found a forward:: entry in a superclass.
            // Stop searching, but don't cache yet; call method
            // resolver for this class first.
            
            // 如果父类返回的是forward_imp,停止查找,跳出循环
            // 但是不要缓存,首先调用此类的动态方法决议(下面的resolveMethod_locked)
            break;
        }
        if (fastpath(imp)) {
            // Found the method in a superclass. Cache it in this class.
            // 在父类缓存中根据sel找到了IMP,进入done流程,在此类中缓存它。
            goto done;
        }
    }

    // No implementation found. Try method resolver once.
    // 未找到实现。尝试一次动态方法决议。
    // behavior = 3 = 0011  LOOKUP_RESOLVER = 2 = 0010
    // 0011 & 0010 = 0010 = 2,条件成立
    // 再次判断behavior = 1 == 0001,0001 & 0010 = 0,条件不成立
    // 动态方法决议只执行一次
    if (slowpath(behavior & LOOKUP_RESOLVER)) {
        // behavior = 3 ^ 2 = 0011 ^ 0010 = 0001 = 1
        behavior ^= LOOKUP_RESOLVER;
        // 动态方法决议
        return resolveMethod_locked(inst, sel, cls, behavior);
    }

 done: // 在本类或父类方法列表中或者父类缓存中根据sel找到了IMP
    // 不是LOOKUP_NOCACHE,即不是+new or +alloc, or +self等方法
    if (fastpath((behavior & LOOKUP_NOCACHE) == 0)) {
#if CONFIG_USE_PREOPT_CACHES // 支持共享缓存,相关处理
        while (cls->cache.isConstantOptimizedCache(/* strict */true)) {
            cls = cls->cache.preoptFallbackClass();
        }
#endif
        // 将查询到的sel和IMP插入类的缓存  注意:插入的是消息接收者的类的缓存
        // 到这里就跟前面的cache_t探索的内容联系起来了
        // cache_t的读、写流程到这里就有了一个闭环
        log_and_fill_cache(cls, imp, sel, inst, curClass);
    }
 done_unlock:
    // 解锁
    runtimeLock.unlock();
    /*
     如果(behavior & LOOKUP_NIL)成立
     imp == forward_imp,没有找到IMP,且动态方法决议没起作用
     直接返回nil
    */
    if (slowpath((behavior & LOOKUP_NIL) && imp == forward_imp)) {
        return nil;
    }
    return imp;
}
Copy the code

2.1.1: Slow search process analysis

Slow search process

  • checkreceiverWhether the class is registered. If not, an error is reported.
  • judgeclsWhether (a class or metaclass) is implemented and initialized, if not implemented and initialized recursivelyclsAnd relatedInheritance chainandIsa to chainIn theThe parent classandThe metaclass(Cause: this class is not found when the method is looking for, it will look for the parent class, until there is no parent class; And the class method exists in the metaclass).

Recursive loop lookup

Look for the class for Receiver

  • Check whether there is a shared cache, because it is possible that during the query, the method is called cache, if there is, directly from the shared cache, if not, startreceiver(message receiver).
  • Binary search is used in thereceiver(message receiver) according to the method list of the classselFind the correspondingIMP.
  • ifreceiverNot found in the method list of the class of the message receiverIMPTo retrieve the parent class and start the recursive parent search.

Find the cache of the parent class (or parent metaclass) (curClass = superclass)

  • If there is a loop in the parent class chain, an error is reported and it stops.
  • Superclass cache according toselTo find theIMPIf not found in the parent cacheIMPTo find the list of superclass methods.

List of methods to query the superclass (or supermetaclass) (curClass = superclass)

  • Use binary search in the superclass method list according toselTo find theIMPIf not found, continue to obtain the parent class of the parent class, first look for the cache, then look for the method list, and recurse until the parent class isnilSo far.

methods

  • If theIn the list of methods in the class or superclass chainorParent class cacheAccording to theselTo find theIMP, breaks out of the loop to determine whether the cache needs to be inserted (+new or +alloc, or +selfSuch methods do not need), if needed, willIMPandselinsertreceiver(message receiver) class in the cache.

Dynamic method resolution

  • If the parent chain cache returns isforwardOr if you recurse through all of the superclasses, you can’t find oneIMP, it breaks out of the loop and determines whether the dynamic method resolution has been executed. If no dynamic method resolution has been executed, the dynamic method resolution will call the slow lookup process again (the dynamic method resolution will not be executed again).

forward

  • If you look all the way up to nil and you don’t find anythingIMP, and the dynamic method resolution did not work, the message was forwardedforward_impInsert into the cache to begin the message forwarding process.

2.1.2: lookUpImpOrForwardThe flow chart

2.2: Implementing and initializing classes

static Class realizeAndInitializeIfNeeded_locked(id inst, Class cls, bool initialize) { runtimeLock.assertLocked(); / /! CLS ->isRealized() CLS ->isRealized() CLS ->isRealized() CLS ->isRealized() CLS ->isRealized() CLS ->isRealized() CLS ->isRealized() if (slowpath(! cls->isRealized())) { cls = realizeClassMaybeSwiftAndLeaveLocked(cls, runtimeLock); RuntimeLock may have been dropped but is now locked again} if (slowpath(initialize &&! cls->isInitialized())) { cls = initializeAndLeaveLocked(cls, inst, runtimeLock); // runtimeLock may have been dropped but is now locked again // If sel == initialize, class_initialize will send +initialize and // then the messenger will send +initialize again after this // procedure finishes. Of course, if this is not being called // from the messenger then it won't happen. 2778172 } return cls; }Copy the code
  • Recursively implements classes and classesisaRefers to the class associated with the chain and the parent class chain.
  • Initializes classes related to the parent class chain (recursively up tonil).

2.3: Binary search

ALWAYS_INLINE static method_t *
findMethodInSortedMethodList(SEL key, const method_list_t *list, const getNameFunc &getName)
{
    ASSERT(list);

    auto first = list->begin(); // The location of the first method
    auto base = first;
    decltype(first) probe;
    
    // Convert key to uintPtr_t because the elements in the repaired method_list_t are sorted
    uintptr_t keyValue = (uintptr_t)key;
    uint32_t count;
    
    // Count = number of arrays, count >> 1 = count / 2
    // count >>= 1 = (count = count >> 1) = (count = count / 2)
    /* Case 1: for example, count = 8. The index of sel to be searched is 2 1(the first time). Count = 8 2(the second time)
    /* Case 2: For example, count = 8 sel index = 7 1(the first time). Count = 8 2(the second time). Count = (7 >>= 1) = 3(the third time) 1(count--) */
    for(count = list->count; count ! =0; count >>= 1) {
        
        
        // Get probe value (intermediate value)
        Prebe = 0 + (count / 2) = 2 */
        /* case 2:1. Probe = 1 (0) + (count / 2) = 0 + 4 2. Probe = 5 + (3/2) = 6 3
        probe = base + (count >> 1);
        
        // Get the uintPtr_t value of sel to probe
        uintptr_t probeValue = (uintptr_t)getName(probe);
        
        /* Example 1:1. Key = 2, prebe = 2, method_t * */
        /* Example 2:1. Key = 7, prebe = 6, unequal 3. Key = 7, probe = 7, unequal
        if (keyValue == probeValue) { // If the uintPtr_t value of the target sel matches the uintPtr_t value of the probe SEL successfully
            // `probe` is a match.
            // Rewind looking for the *first* occurrence of this value.
            // This is required for correct category overrides.
            
            // The probe value is not the first && the uintPtr_t value above sel is also equal to keyValue
            // Note that there are methods with the same name as the classification, and methods with the same name as the classification, and methods with the same name as the classification
            while (probe > first && keyValue == (uintptr_t)getName((probe - 1))) {
                probe--;
            }
            / / return
            return &*probe;
        }
        
        // If keyValue > probe value
        /* Example 1:1. 2 is not greater than 4, do not enter, continue the loop */
        /* Case 2:1. 7 > 4, enter 2. 7 > 6, enter */
        if (keyValue > probeValue) { 
            2. Base = 6 + 1 = 7, count-- = 3-- = 2 */
            base = probe + 1; count--; }}return nil; // Return nil
}
Copy the code

Note: Methods in the method list are sorted by selector address using the fixupMethodList function.

  • Binary searchIt’s just every time you take theta in the rangeThe medianforProbe valueIf there is no method with the same name, the method found is returned directly.
  • If the category has methods of the same name, the method of the category is returned, and if there are multiple methods of the same name, the method of the last loaded category is returned
  • If it is not equal, it will continue to narrow down the search scope, and if it is not found at the end, it will returnnil.

2.4: Binary search case diagram

Note: Base is the start index of the probe range, and count is the number of probes per probe.

2.5: cache_getImp

In the slow lookup process, cache_getImp is called to start the fast lookup process when a class supports a shared cache or each time a parent class is acquired. Hold down the left mouse button of command and control and click to view. In the C++ file, cache_getImp only states:

Objc_msgSend is a quick lookup process that is implemented using assembly source code. Search for cache_getImp globally and go to objc-msg-arm64.s to see the relevant definitions:

// Pass p0 = class p1 = sel STATIC_ENTRY _cache_getImp // // Store the class in p16 GetClassFromIsa_p16 P0, 0 CacheLookup GETIMP, _cache_getImp, LGetImpMissDynamic, LGetImpMissConstant // If no cache is found, return nil directly. P0 is the first register, which is also the return value register LGetImpMissDynamic: mov p0, #0 ret LGetImpMissConstant: mov p0, p2 ret END_ENTRY _cache_getImpCopy the code

The GetClassFromIsa_p16 macro defines the process to pass in parameters SRC = class, needs_auth = 0. Since class was already verified in the previous slow lookup process, p16 = class is simply assigned here.

Macro GetClassFromIsa_p16 SRC, needs_auth, auth_address /* note: auth_address is not required if ! needs_auth */ #if SUPPORT_INDEXED_ISA // armv7k or arm64_32 // Indexed isa mov p16, \src // optimistically set dst = src tbz p16, #ISA_INDEX_IS_NPI_BIT, 1f // done if not non-pointer isa // isa in p16 is indexed adrp x10, _objc_indexed_classes@PAGE add x10, x10, _objc_indexed_classes@PAGEOFF ubfx p16, p16, #ISA_INDEX_SHIFT, #ISA_INDEX_BITS // extract index ldr p16, [x10, p16, UXTP #PTRSHIFT] // load class from array 1: If needs_auth == 0 // _cache_getImp takes an authed class already // Go here // P16 = class mov p16, \ src. else // 64-bit packed isa ExtractISA p16, \ SRC, \auth_address .endif #else // 32-bit raw isa mov p16, \src #endif .endmacroCopy the code

The CacheLookup process is already analyzed in objc_msgSend Analysis 1, the nature of iOS Runtime methods, CacheLookup GETIMP, _cache_getImp, LGetImpMissDynamic, LGetImpMissConstant.

  • If the cache does not hit, executeLGetImpMissDynamicProcess.
If no cache is found, return nil. P0 is the first register and the return value register LGetImpMissDynamic: mov p0, #0 retCopy the code
  • If the cache hits, goCacheHitMacro definition insideGETIMPProcess.
// Pass Mode($0) = getimp. macro cachehit. if $0 == NORMAL TailCallCachedImp x17, x10, x1, X16 // authenticate and call imp. Elseif $0 == GETIMP // go here // p0 = p17(imp) mov p0, p17 // CBZ Return 0 CBZ p0, 9f // don't ptrauth a nil imp // imp ^= class, Authenticate IMP x0, X10, X1, x16 and re-sign as IMP 9: ret // return IMP .elseif $0 == LOOKUP // No nil check for ptrauth: the caller would crash anyway when they // jump to a nil IMP. We don't care if that jump also fails ptrauth. AuthAndResignAsIMP x17, x10, x1, x16 // authenticate imp and re-sign as IMP cmp x16, x15 cinc x16, x16, ne // x16 += 1 when x15 ! = x16 (for instrumentation ; fallback to the parent class) ret // return imp via x17 .else .abort oops .endif .endmacroCopy the code
  • p17 = impThat will bep17Assigned top0.
  • ifimp = 0A direct9The process,return 0.
  • AuthAndResignAsIMPMacro definition will beimpDecoding (cache_t::insertWhen I insertimp) and return the decodedimp.
.macro AuthAndResignAsIMP // $0 = cached imp, $1 = address of cached imp, $2 = SEL, $0 = $0 ^ $3 = IMP ^ class eor $0, $0, $3Copy the code
  • Yes, from the cacheimpTo decode.

2.6: why use assembly for cache lookups and C++ for the rest

  1. Assembly is closer to machine language, faster and more efficient, maximizing the benefits and significance of caching.
  2. Assembly is safer.
  3. The arguments to C functions must be specified explicitly, and assembly can be specified dynamically, giving more flexibility.

3: unrecognized selector sent to xxx

3.1: Single dog has no girlfriend case

Look at a case first: single dog does not have girlfriend case.

Create an SDSingleDog class that declares a girlfriend method with no implementation (single dogs don’t have girlfriends).

#import <Foundation/Foundation.h>
#import "SDSingleDog.h"

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        SDSingleDog *singleDog = [SDSingleDog alloc];
        [singleDog girlfriend];
        
    }
    return 0;
}
Copy the code

Run the program, print it out, crash (so not having a girlfriend is a big crash).

In this case, the classic exception of unrecognized selector sent to XXX is thrown if the method fast lookup and method slow lookup fail to find the corresponding method, dynamic method resolution, and message forwarding are not implemented.

3.2: unrecognized selectorThe source code to explore

Global search for unrecognized selector sent to find 3 related functions and methods.

But when you hit a breakpoint, it turns out that the program won’t go to those three places, look at the method annotations and function call stack, and discover that it’s actually calling methods in CoreFoundation, so you’ll explore that later when you analyze message forwarding.

Class 4: callNSObjectObject method success reason analysis

4.1: Case code

@interface NSObject (Goddess) - (void)kneelAndLick; @end @implementation NSObject (Goddess) - (void)kneelAndLick { NSLog(@"Single dog kneel and lick Goddess"); } @end #import <Foundation/Foundation.h> #import "SDSingleDog.h" #import "NSObject+Goddess.h" int main(int argc, Const char * argv[]) {@autoreleasepool {// The class object calls the instance object method [SDSingleDog kneelAndLick]; } return 0; } * * * * * * * * * * * * * * * * * * * * * * * printing * * * * * * * * * * * * * * * * * * * * * * * 2021-07-08 16:40:25. 510538 + 0800 KCObjcBuild Single [4157-161599] dog kneel and lick Goddess Program ended with exit code: 0Copy the code

4.2: Case analysis

/***********************************************************************
* class_getClassMethod.  Return the class method for the specified
* class and selector.
**********************************************************************/
Method class_getClassMethod(Class cls, SEL sel)
{
    if(! cls || ! sel)return nil;

    return class_getInstanceMethod(cls->getMeta(), sel);
}
Copy the code

According to the underlying source code analysis:

  • OC bottom there is no so-called object method and class method, class is also a metaclass class object, class method is in the form of object method metaclass inside.
  • The parent class of the root metaclass is the root class. When the root metaclass does not find the target method, it finds the root class and returns the object method of the root class.

5: forecast

The process of finding methods is actually quite complex. Now that you’ve explored the method fast lookup process and slow lookup process, stay tuned for dynamic method resolution, and the message forwarding process.