In previous articles, we looked at the isa, superclass, and bits fields in the OC object ISA in detail, so let’s explore the following cache fields.

1. cacheField source code analysis

First, let’s look at the data types of the cache fields in the objC_class structure, as follows:

struct objc_class : objc_object { objc_class(const objc_class&) = delete; objc_class(objc_class&&) = delete; void operator=(const objc_class&) = delete; void operator=(objc_class&&) = delete; // Class ISA; Class superclass; cache_t cache; // formerly cache pointer and vtable class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags ... . . }Copy the code

We see that cache is a data type of type cache_t, so let’s look at cache_t again, as follows:

struct cache_t { private: explicit_atomic<uintptr_t> _bucketsAndMaybeMask; Struct {explicit_atomic<mask_t> _maybeMask; // In Linux or Mac OS, __LP64__ is used as the true uint16_t _flags; #endif uint16_t _occupied; // The number of buckets currently stored}; explicit_atomic<preopt_cache_t *> _originalPreoptCache; / / 8}; . . . // Here are some important methods to use mask_t mask() const; // add a void incrementOccupied(); void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask); // Empty bucket static bucket_t *emptyBuckets(); static bucket_t *allocateBuckets(mask_t newCapacity); static bucket_t *emptyBucketsForCapacity(mask_t capacity, bool allocate = true); static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap); Unsigned capacity() const; Struct bucket_t *buckets() const; Mask_t occupied() const; // Insert, sel (method number), IMP (function implementation address), receiver (method receiver) void insert(sel sel, imp imp, id receiver); . . . }Copy the code

The __LP64__ macro can be defined in the following table

After exploring the cache_t structure, there’s a lot to be said for bucket_t data structures, so let’s look at bucket_t data structures as follows:

struct bucket_t { private: // IMP-first is better for arm64e ptrauth and no worse for arm64. // SEL-first is better for armv7* and i386 and x86_64.  #if __arm64__ explicit_atomic<uintptr_t> _imp; explicit_atomic<SEL> _sel; #else explicit_atomic<SEL> _sel; explicit_atomic<uintptr_t> _imp; #endif ... . . // Inline SEL () const {return _sel.load(memory_order_relaxed); } inline IMP imp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, Class cls) const { uintptr_t imp = _imp.load(memory_order_relaxed); if (! imp) return nil; #if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH SEL sel = _sel.load(memory_order_relaxed); return (IMP) ptrauth_auth_and_resign((const void *)imp, ptrauth_key_process_dependent_code, modifierForSEL(base, sel, cls), ptrauth_key_function_pointer, 0); #elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR return (IMP)(imp ^ (uintptr_t)cls); #elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE return (IMP)imp; #else #error Unknown method cache IMP encoding. #endif } }Copy the code

The cache field is a cache of methods, so let’s explore how the cache field is used to cache methods.

2. cacheField to explore

2.1 the use ofLLDBTo explore the

Start by writing the following code in a working source code project:

@interface Person : NSObject - (void)sayHi; @end @implementation Person - (void)sayHi { NSLog(@"hi!" ); } @end int main(int argc, const char * argv[]) { @autoreleasepool { // class_data_bits_t Person *p = [Person alloc]; NSLog(@"%@",p); } return 0; }Copy the code

Then break the NSLog call line, run the program, and debug the LLDB as follows:

P /x p (Person *) $0 = 0x0000000101204ab0 Print the data for p object heap space (LLDB) x/4gx $0 0x101204AB0:0x011D800100008491 0x0000000000000000 0x101204AC0: 0x0000000000000000 0xf33a90460c6508ff //3. Get the address of the Person class from isa (the heap allocated by the objC_class structure variable). Note: ISA_MASK must be selected based on the platform on which your program is running) (LLDB) p/x 0x011D800100008491&0x00007ffffffFF8ULL (unsigned long long) $1 = 0x0000000100008490 //4. Print the heap memory address of the member of the objC_class structure variable cache_T (LLDB) p/x (cache_t *) 0x00000001000084A0 (cache_t *) $2 = by address translation (16 bytes) 0x00000001000084a0 //5. P /x *$2 (cache_t) $3 = {_bucketsAndMaybeMask = {STD ::__1::atomic<unsigned long> = {Value =  0x00000001003643a0 } } = { = { _maybeMask = { std::__1::atomic<unsigned int> = { Value = 0x00000000 } } _flags = 0x8010  _occupied = 0x0000 } _originalPreoptCache = { std::__1::atomic<preopt_cache_t *> = { Value = 0x0000801000000000 } } } } \ \ 6. And then try to print the Value of those fields, $3._bucketsAndMaybeMask (explicit_atomic<unsigned long>) $4 = {STD ::__1::atomic<unsigned long> = { Value = 0x00000001003643a0 } } (lldb) p/x $4.Value error: <user expression 5>:1:4: no member named 'Value' in 'explicit_atomic<unsigned long>' $4.Value ~~ ^ (lldb) p/x $3._originalPreoptCache (explicit_atomic<preopt_cache_t *>) $5 = { std::__1::atomic<preopt_cache_t *> = { Value = 0x0000801000000000 } } (lldb) p/x $5.Value error: <user expression 7>:1:4: no member named 'Value' in 'explicit_atomic<preopt_cache_t *>' $5.Value ~~ ^ (lldb) p/x $3._maybeMask (explicit_atomic<unsigned int>) $6 = { std::__1::atomic<unsigned int> = { Value = 0x00000000 } } (lldb) p/x $6.Value error: <user expression 9>:1:4: no member named 'Value' in 'explicit_atomic<unsigned int>' $6.Value ~~ ^ //7. Not being able to output the desired information, we guessed that we might need to call some function in the cache_t structure to fetch buckets. In previous exploration of the source code, we found the buckets method: struct bucket_t *buckets() const; // select sel and IMP from LLDB; // select sel and IMP from LLDB $buckets() (bucket_t *) $8 = 0x00000001003643A0 So we print the value of the first bucket_t variable by address offset, P /x $8[0] (bucket_t) $9 = {_sel = {STD ::__1::atomic<objc_selector *> = (null) { Value = nil } } _imp = { std::__1::atomic<unsigned long> = { Value = 0x0000000000000000 } } } //10. This is because we haven't called any instance methods in this Person class, so the methods aren't cached in the member variable cache, so we call instance methods in Person, Po [p sayHi] 2021-06-23 18:54:17.981387+0800 KCObjcBuild[27259:2783273] hi! (lldb) p/x $3 (lldb) p/x [Person class] (Class) $10 = 0x0000000100008490 Person (lldb) p/x (cache_t *)0x00000001000084a0  (cache_t *) $11 = 0x00000001000084a0 (lldb) p/x *$11 (cache_t) $12 = { _bucketsAndMaybeMask = { std::__1::atomic<unsigned long> = { Value = 0x0000000101205b40 } } = { = { _maybeMask = { std::__1::atomic<unsigned int>  = { Value = 0x00000007 } } _flags = 0x8010 _occupied = 0x0001 } _originalPreoptCache = { std::__1::atomic<preopt_cache_t *> = { Value = 0x0001801000000007 } } } } //11. The Value of _occupied changed from 0 to 1, and the Value of _maybeMask changed from 0x00000000 to 0x00000007. The values of _originalPreoptCache and _bucketsAndMaybeMask are also changed accordingly. (lldb) p/x $12.buckets() (bucket_t *) $13 = 0x0000000101205b40 (lldb) p/x $13[0] (bucket_t) $14 = { _sel = { std::__1::atomic<objc_selector *> = "" { Value = 0x0000000100003e08 "" } } _imp = { std::__1::atomic<unsigned long> = { Value = 0x000000000000bc20 } } } //12. Print sel and IMP for buckets' first bucket_t variable. Sel () and IMP (nil, [Person class]) are functions defined in the bucket_t structure. (lldb) p/x $14.sel() (SEL) $15 = 0x0000000100003e08 "sayHi" (lldb) p/x $14.imp(nil, [Person class]) (IMP) $16 = 0x00000001000038b0 (KCObjcBuild`-[Person sayHi] at main.m:39)Copy the code

2.2 Use custom data types to explore

In the above runnable source code program, we use LLDB debugging source code in the objC_class structure of each field have a deep understanding, but if we download the source code does not run, using LLDB source code debugging, how do we debug, in fact, in addition to using LLDB source code debugging, We can also define the same data structure as in the source code to explore the role of these fields. First we create a project, and then in the main file we define the following structure as the source data structure:

struct sgy_bucket_t {
    SEL _sel;
    IMP _imp;
};

struct sgy_cache_t {
    struct sgy_bucket_t *buckets; // 8
    uint32_t    _maybeMask; // 4
    uint16_t     _flags;  // 2
    uint16_t     _occupied; // 2
};

struct sgy_class_data_bits_t {
    uintptr_t bits;
};

struct sgy_objc_class {
    Class ISA;
    Class superclass;
    struct sgy_cache_t cache;             // formerly cache pointer and vtable
    struct sgy_class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
};
Copy the code

Define the Person class in the main function, and write the following code in the main function to run the program.

struct sgy_bucket_t { SEL _sel; // method name, only IMP _imp; // the function address implemented by the method}; struct sgy_cache_t { struct sgy_bucket_t *buckets; // The buckets start address is offset to the corresponding uint32_t _maybeMask; // bucket maximum capacity uint16_t _flags; // mark uint16_t _occupied; // Bucket actual capacity}; struct sgy_class_data_bits_t { uintptr_t bits; }; struct sgy_objc_class { Class ISA; //ISA Class superclass; Struct sgy_cache_t cache; Struct sgy_class_data_bits_t bits; // Class data (member variables, methods, metaclasses with class methods)}; @interface Person : NSObject - (void)sayHi; @end @implementation Person - (void)sayHi { NSLog(@"hi!" ); } @end int main(int argc, const char * argv[]) { @autoreleasepool { Person *p = [Person alloc]; struct sgy_objc_class *cls = (__bridge struct sgy_objc_class *)([Person class]); SayHi [p sayHi]; NSLog(@"%hu ---- %u", cls->cache._occupied, cls->cache._maybeMask); struct sgy_bucket_t *buckets = cls->cache.buckets; For (int I = 0; i < (cls->cache._maybeMask); i++) { NSLog(@"sel = %@, imp = %p", NSStringFromSelector(buckets[i]._sel), buckets[i]._imp); } } return 0; }Copy the code

Run the program and view the printed information of program running, as shown below:

2021-06-25 10:05:29.981153+0800 OBJC source debug [32446:3445346] hi! 2021-06-25 10:05:29.981525+0800 OBJC source code [32446:3445346] 1 —- 3 2021-06-25 10:05:29.981572+0800 OBJC source code [32446:3445346] Sel = (null), imp = 0x0 2021-06-25 10:05:29.981599+0800 OBJC source debug [32446:3445346] sel = (null), Imp = 0x0 2021-06-25 10:05:29.981662+0800 OBJC source debug sel = sayHi, imp = 0xBcb8

The function sayHi is not stored as an array in the storage space provided for buckets. Why is that? If the bucket_t variable is stored as an array, it is difficult to store, insert, delete, and search the bucket_t variable, because it needs to be traversed. The time complexity is O(n), which has low performance for the operations of the underlying classes. However, if the linked list is used for storage, the time complexity of insert, delete and storage operations is O(1), but it is also necessary to traverse the location first, so the time complexity is O(n). Therefore, the best way is to use the data structure of hash table for operation, so that the time complexity of query, storage, delete and other operations is O(n), and the performance is greatly improved. In order to verify this conjecture, we define several methods in the Person class, and after calling these methods, Let’s take a look at buckets and see the code as follows:

@interface Person : NSObject - (void)sayHi; - (void)sayHi2; - (void)sayHi3; - (void)sayHi4; - (void)sayHi5; @end @implementation Person - (void)sayHi { NSLog(@"hi! %s", __FUNCTION__); } - (void)sayHi2 { NSLog(@"hi2! %s", __FUNCTION__); } - (void)sayHi3 { NSLog(@"hi3! %s", __FUNCTION__); } - (void)sayHi4 { NSLog(@"hi4! %s", __FUNCTION__); } - (void)sayHi5 { NSLog(@"hi5! %s", __FUNCTION__); Int main(int argc, const char * argv[]) {@autoreleasepool {Person *p = [Person alloc]; struct sgy_objc_class *cls = (__bridge struct sgy_objc_class *)([Person class]); SayHi [p sayHi]; [p sayHi2]; [p sayHi3]; [p sayHi4]; [p sayHi5]; NSLog(@"%hu ---- %u", cls->cache._occupied, cls->cache._maybeMask); struct sgy_bucket_t *buckets = cls->cache.buckets; For (int I = 0; i < (cls->cache._maybeMask); i++) { NSLog(@"sel = %@, imp = %p", NSStringFromSelector(buckets[i]._sel), buckets[i]._imp); } } return 0; }Copy the code

Run the program and view the result, as follows:

2021-06-25 11:52:16.453344+0800 OBJC source code debug [33139:3508642] hi! -[Person sayHi] 2021-06-25 11:52:16.453811+0800 OBJC source code debug [33139:3508642] hi2! -[Person sayHi2] 2021-06-25 11:52:16.453859+0800 OBJC source code debug [33139:3508642] hi3! -[Person sayHi3] 2021-06-25 11:52:16.453889+0800 OBJC source code debug [33139:3508642] hi4! -[Person sayHi4] 2021-06-25 11:52:16.453918+0800 OBJC source code debug [33139:3508642] hi5! -[Person sayHi5] 2021-06-25 11:52:16.453964+0800 OBJC source debug [33139:3508642] 3 —- 7 2021-06-25 11:52:16.454121+0800 OBJC [33139:3508642] sel = sayHi5, Imp = 0xbd18 2021-06-25 11:52:16.454181+0800 OBJC source code debug [33139:3508642] sel = sayHi4, Imp = 0xbd28 2021-06-25 11:52:16.454213+0800 OBJC source code debug [33139:3508642] sel = sayHi3, Imp = 0xbDF8 2021-06-25 11:52:16.454240+0800 OBJC source code debug [33139:3508642] sel = (null), Imp = 0x0 2021-06-25 11:52:16.454266+0800 OBJC source code debug [33139:3508642] sel = (null), Imp = 0x0 2021-06-25 11:52:16.454318+0800 OBJC source debug [33139:3508642] sel = (null), Imp = 0x0 2021-06-25 11:52:16.454373+0800 OBJC source debug sel = (null), imp = 0x0

We find that _maybeMask is changed from 3 to 7, _occupied is changed from 1 to 3, sel and IMP of bucket_t in buckets are traversed, and it looks like they store data in hash table structure.

2.3 Why is _maybeMask changed to 7 after instance methods are called when USING LLDB

After the exploration of these two methods, we find that the two methods are the same only call an instance method, in the source code after calling the instance method _maybeMask changed to 3, but in the use of LLDB after calling the instance method _maybeMask changed to 7? We suspect that LLDB may have called multiple instance methods before calling instance method sayHi, causing it to expand, so let’s verify. First of all, we assume that LLDB may call several instance methods before calling instance method sayHi, so that LLDB may call INSERT to insert a bucket_t variable to BUCKETS. Therefore, we print some method call information in the insert method, as shown in the following figure:

Make a breakpoint at the following position in main

Compile and run the program, empty the console output, and call the sayHi method of p with the LLDB command, as shown below:

Recevier (0x10060e0B0); recevier (0x10060e0B0); receiver (0x10060e0B0); So respondsToSelector and class methods are also called by buckets, so sel and IMP of those methods are cached by buckets, but that’s only three methods, not more than three quarters of the capacity. It’s obviously expanded. Why is that? So let’s write the following code in insert:

Compile and run the program. When the breakpoint is reached, type the following command to view the printed information:

This is all buckets in buckets before the sayHi method is inserted. Note that sel is an empty string in line 4 and IMP is the starting address of buckets. When sel and IMP of sayHi are inserted into buckets, the buckets determine whether the current capacity is greater than or equal to 3/4 of the total capacity. If so, the code in the red box is executed, as shown in the following figure:

The realLocate function calls the following function:

Let’s look at the allocateBuckets function

This function calls endMarker to get the starting address of buckets’ last bucket, as shown below:

Then the value of the last bucket in the allocateBuckets function is assigned according to the architecture. In the ARM architecture, SEL is set to 1, imp is set to newBuckets start address -1, and in other architectures, SEL is set to 1. Imp is set to the starting address of newBuckets.

So why did apple developers do this? Why do you set sel and IMP for the last bucket every time you open or re-open buckets? The purpose is to store an instruction in objc_msgSend for use in objc_msgSend, which we’ll discuss later when we discuss objc_msgSend.