review

We’ve seen that the basic structure of a class is something like this. Classes are made up of isa superclass cache bits. Cache_t isa structure type, There are two member variables _bucketsAndMaybeMask and a union

private: explicit_atomic<uintptr_t> _bucketsAndMaybeMask; Union {struct {explicit_atomic<mask_t> _maybeMask; // Size is 4 #if __LP64__ uint16_t _flags; // The size is 2 #endif uint16_t _occupied; // Size is 2}; explicit_atomic<preopt_cache_t *> _originalPreoptCache; };Copy the code
classDiagram

LGPerson --> cache_t
 cache_t -->bucket_t
LGPerson :  isa
LGPerson :  superClass
LGPerson :  cache
LGPerson :  bits
class cache_t{
_buckets
_mask
_flags
_occupied
 
}
class bucket_t{
_sel
_imp
}
 

Pictures show

The essence of cache_t

In the case of class method calls, which are known to look up imPs in memory via SEL(method number), the cache_t structure has emerged to make methods more responsive and efficient without having to traverse the method in memory every time. Cache_t stores SEL and IMP and receiver of called methods in the current class structure as a bucket_t structure for subsequent methods to look up. (PS:sel and IMP relationship sel: method number SEL equivalent to the book directory name (outline of a few pages)IMP: function pointer address IMP equivalent to the book page number (address of a few pages)

See the source code for cache_t

cache_tTo see thebuckets()This is something likeclass_data_bits_tProvided by the insidemethods(), are obtained by methods.

Continue checking buckets

The source code is as follows

struct bucket_t { private: // IMP-first is better for arm64e ptrauth and no worse for arm64. // SEL-first is better for armv7* and i386 and x86_64. // explicit_atomic<uintptr_t> _imp; explicit_atomic<SEL> _sel; #else explicit_atomic<SEL> _sel; explicit_atomic<uintptr_t> _imp; #endif .... // the following is the method omission};Copy the code

It’s pretty obvious:

  • Bucket_t differentiates the real machine from the others, but the variables don’t change_seland_impJust in a different order
  • So bucket_t has a store in it_seland_impThe cache should contain methods

Flowchart for cache_t

ClassDiagram objc_class --> cache_T True objc_class --> cache_T Other cache_T Other -->bucket_ Non-true cache_T true maskAndBuckets Description Objc_class: Class isa objc_class: Class superClass objc_class : cache_t *cache objc_class : Class_data_bits_t bits class cache_t True machine {uintptr_t _bucketsAndMaybeMask mask_t _maybeMask uint16_t _flags uint16_t _occupied capacity() bucket_t *buckets() mask_t occupied() void incrementOccupied() void setBucketsAndMask void Reallocate void insert} class bucket () mask_t mask unit16_t flag unit16_t occupied} {explicit_atomic<uintptr_t> _IMP; explicit_atomic<SEL> _sel; } maskShift = 48 maskZeroBits = 4 maxMask =((uintptr_t)(64-maskshift)) -1 static Constexpr (maskShift-MaskZerobits) -1} class bucket_ true {explicit_atomic<SEL> _imp; explicit_atomic<uintptr_t> _sel; }

Pass code verification

Create the LGPerson class, customize some instance methods, create the LGPerson instantiation object in the main function, and then perform LLDB debugging

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface LGPerson : NSObject

@property (nonatomic, copy) NSString *name;
@property (nonatomic) int age;
@property (nonatomic, strong) NSString *hobby;

- (void)saySomething;

@end

NS_ASSUME_NONNULL_END

#import "LGPerson.h"

@implementation LGPerson

- (instancetype)init{
    if (self = [super init]) {
        self.name = @"Cooci";
    }
    return self;
}

- (void)saySomething{
    NSLog(@"%s",__func__);
}

@end
int main(int argc, const char * argv[]) {
    @autoreleasepool {

        
        Class pClass = [LGPerson class];
        NSLog(@"%@",pClass);
    }
    return 0;
}
Copy the code

Run the code and debug it through LLDB

Description:LGPersonObject does not call object methods,bucketsThere is noThe cacheMethod data

Continue LLDB debugging by calling object methods in LLDB

Look for buckets() to joinsaySomethingmethods

Conclusion:

  • callsaySomethingLater,_mayMaskandoccupiedThese two variables should be related to the cache
  • bucket_tThe structure providessel()andimp(nil,pClassmethods
  • saySomethingmethodsselandimpAnd stored in thebucket,bucketStored in thecacheIn the

Analyze the cache outside of the source environment

We have analyzed the source code of cache above, so we can rewrite its source code as follows

#import <Foundation/Foundation.h> #import "LGPerson.h" #import <objc/runtime.h> typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits struct kc_bucket_t { SEL _sel; IMP _imp; }; struct kc_cache_t { struct kc_bucket_t *_bukets; // 8 mask_t _maybeMask; // 4 uint16_t _flags; // 2 uint16_t _occupied; / / 2}; struct kc_class_data_bits_t { uintptr_t bits; }; // cache class struct kc_objc_class { Class isa; Class superclass; struct kc_cache_t cache; // formerly cache pointer and vtable struct kc_class_data_bits_t bits; }; int main(int argc, const char * argv[]) { @autoreleasepool { LGPerson *p = [LGPerson alloc]; Class pClass = p.class; // objc_clas [p say1]; [p say2]; [p say3]; [p say4]; [p say1]; [p say2]; // [p say3]; [pClass sayHappy]; struct kc_objc_class *kc_class = (__bridge struct kc_objc_class *)(pClass); NSLog(@"%hu - %u",kc_class->cache._occupied,kc_class->cache._maybeMask); // 0-8136976 count // 1-3 // 1: source code cannot be debug // 2: LLDB // 3: small sample // low-level principle // a: 1-3 -> 1-7 // b: (null) -0x0 // c: 2-7 + say4-0xB850 + no class method // d: NSObject for (mask_t I = 0; i<kc_class->cache._maybeMask; i++) { struct kc_bucket_t bucket = kc_class->cache._bukets[i]; NSLog(@"%@ - %pf",NSStringFromSelector(bucket._sel),bucket._imp); } NSLog(@"Hello, World!" ); } return 0; }Copy the code

Just call [p say1] and [p say2]

2021-07-14 16:16:30.205454+0800 003-cache_t [19959:7458549] LGPerson say: -[LGPerson say1] 2021-07-14 16:16:30.205773+0800 003-cache_t [19959:7458549] -[LGPerson say2] 2021-07-14 16:16:30.205810+0800 003-cache_t [19959:7458549] +[LGPerson sayHappy] 2021-07-14 16:16:30.205830+0800 003-cache_t [19959:7458549] cache._occupied2 - Cache._maybemask3 2021-07-14 16:16:30.205903+0800 003-cache_t [19959:7458549] _selsay1-_IMP0xB850F 2021-07-14 16:16:30.205937+0800 003-cache_T Source code environment analysis [19959:7458549] _selSAY2-_IMP0xB820F 2021-07-14 16:16:30.205965+0800 003-cache_T [19959:7458549] _SEL (NULL) -_IMP0x0F 2021-07-14 16:16:30.205992+0800 [19959:7458549] Hello, World!Copy the code

I’m calling say3 and say4

2021-07-14 16:18:31.427671+0800 003-cache_t [19990:7460225] -[LGPerson say1] 2021-07-14 16:18:31.428242+0800 003-cache_t [19990-7460225] -[LGPerson say2] 2021-07-14 16:18:31.428296+0800 003-cache_t -[LGPerson say3] 2021-07-14 16:18:31.428329+0800 003-cache_t -[LGPerson say4] 2021-07-14 16:18:31.428366+0800 003-cache_t [19990-7460225] +[LGPerson sayHappy] 2021-07-14 16:18:31.428393+0800 003-cache_t [19990:7460225] cache._occupied2 - Cache._maybemask7 2021-07-14 16:18:31.428497+0800 003-cache_t [199900:7460225] _selSAY4-_IMP0xB830f 2021-07-14 16:18:31.428547+0800 003- cache_T Out of source code environment analysis [199900:7460225] _SEL (NULL) -_IMP0x0F 2021-07-14 16:18:31.428600+0800 [19990:7460225] _selSay3 - _IMP0xB800f 2021-07-14 16:18:31.428636+0800 [19990:7460225] _SEL (NULL) -_IMP0x0F 2021-07-14 16:18:31.428670+0800 003-cache_T [19990:7460225] _SEL (NULL) -_IMP0x0F 2021-07-14 16:18:31.434070+0800 003-cache_T [19990:7460225] _SEL (NULL) -_IMP0x0F 2021-07-14 16:18:31.434127+0800 003-cache_T [19990:7460225] _SEL (NULL) -_IMP0x0F 2021-07-14 16:18:31.434160+0800 [19990:7460225] Hello, World!Copy the code

“Say1” and “say2” are missing. “Say4” is in front of “say4” and “_occupied” and “_maybeMask” are empty.

Cache_t Analysis of important methods

Cache_t entry to the method cacheinsert(SEL sel, IMP imp, id receiver), there are parameters in itselandimp; And there’s a method nameinsertLook at the implementation

Overall process:

  • occupied()Getting the current capacity is basically telling you how many are in the cachebucketthe
  • newOccupied= occupied() + 1
  • oldCapacityThe purpose is to free up the old memory for reexpansion
  • onlyFor the first time,Cache method. The default cache capacity iscapacity = INIT_CACHE_SIZECapacity = 4 is the memory size of four buckets
  • reallocate(oldCapacity, capacity, /* freeOld */false)Open up memory,freeOldThe variable controls whether old memory is freed

Reallocate method exploration

What does the RealLocate method do

  • 1.allocateBucketsCreate a memory
  • 2.setBucketsAndMaskSet up themaskandbucketsThe value of the
  • 3.collect_freeWhether to free old memory byfreeOldcontrol

AllocateBuckets opens up memory methods

The allocateBuckets method does two things:

  • calloc(bytesForCapacity(newCapacity)1) open upnewCapacity * bucket_tSize of memory
  • end->setStore the last location of open memorysel = 1.imp = The address of the first buket location

SetBucketsAndMask method

In the source code there are 3 setBucketsAndMask corresponding to different implementations under different architectures, but the function is to write data to _bucketsAndMaybeMask and _maybeMask

collect_free

It is mainly to empty data and reclaim memory

Conclusion:

Calculate the current size of the first, and then open up capacity, when the cache is needed when the capacity is less than the total capacity of the 3/4 will go directly to cache processes When the capacity of more than 3/4, system at this time will be double expansion, expansion of the maximum capacity will not exceed a maximum of mask 2 ^ 15 will be one important operating capacity, opening up new memory, Frees old memory, at which point freeOld = true then starts the cache method

The cache method

Bucket () is neither an array nor a linked list, but a sequential memory hash function that evaluates the hash index based on sel and mask. Why do we need masks? The actual function of mask is to tell the system that you can only store the first three Spaces in Capacity 1. For example, if capacity = 4, the cache method can only store the first three Spaces and start caching. If there is no data in the current location, the cache method will be cached. If there is a method at that location that is the same as yours, the method is already cached and returns. If a hash conflict exists and the subscripts are the same but sel is different, the hash will be performed again and the cache will continue to resolve the conflict

The flow chart of the insert