Welcome to the iOS Basics series (suggested in order)

IOS low-level – Alloc and init explore

IOS Bottom – Isa for everything

IOS low-level – Analysis of the nature of classes

IOS Underlying – cache_t Process analysis

IOS Low-level – Method lookup process analysis

IOS bottom layer – Analysis of message forwarding process

IOS Low-level – How does Dyld load app

IOS low-level – class load analysis

IOS low-level – Load analysis of categories

1. Overview of this paper

This paper aims to analyze how the class structure is loaded and how the class data is processed when dyLD initializes the main program. This section also belongs to the flow before the main() function.

2. Class loading exploration

2.1 Finding a Pointcut

When preparing to initialize the main program, libObjc calls _objc_init() to initialize all the class structures in the project. Thus, _objc_init() is the pointcut.

2.2 _objc_init () analysis

Directly in thelibObjcIn the search_objc_init(.

Let’s take a look at its definition to understand what it does:

* Bootstrap initialization. Registers our image notifier with dyld. * Called by libSystem BEFORE library initialization Start initialization. Register our image notifications through DYLD. Called by libSystem before library initialization timeCopy the code

Go to the implementation source and see how it works:

(1) If the initialization has been performed, directly return. Ensure that the initialization is done only once.Copy the code
② Read environment variables that affect the runtime. Print the environment variable help as well, if necessary.Copy the code

Internally, the set environment variable is read by string matching. Environment variables here basically start with OBJC_, as opposed to environment variables in dyLD processes that start with dyLD_.

Run export OBJC_HELP=1 in iTerm2 to view the environment variables provided by the system. OBJC_PRINT_LOAD_METHODS is the most common. It prints out all implemented +load() methods, and developers can selectively remove unnecessary +load() methods based on its output to improve startup speed.

③ Set objC's predefined thread-specific keys and key destructors to store objC's private data.Copy the code
④ run c++ static constructorsCopy the code

Internally get __objc_init_func in macho through getLibobjcInitializers. Because libObjc calls _objc_init() before dyld calls the static constructor, it has to be called manually, so all of this calls the c++ static constructor of the system class. Self-written will be called later.

⑤ No operation is performedCopy the code
void lock_init(void)
{
}
Copy the code

The interior is an empty implementation. LibObjc is implemented in C and C ++, which have their own locking mechanism, indicating that this mechanism is also applicable in OC. This mechanism is used by default and nothing is done. The point of this line of code may be to increase readability.

⑥ Initialize the exception handling systemCopy the code

Internal through @try@catch to ensure that exceptions in the process of program execution can be detailed output.

⑦ In dyLD initialization of the main program, through the pointer callback to achieve images map, load, unmap operationsCopy the code

This is the core of _objc_init(); the first six steps are just preparation.

Let’s see what libObjc is doing with the map, load, and unmap that he inherited from dyld.

2.3 map_images () analysis

void
map_images(unsigned count, const char * const paths[],
           const struct mach_header * const mhdrs[])
{
    mutex_locker_t lock(runtimeLock);
    return map_images_nolock(count, paths, mhdrs);
}
Copy the code

Map_images () is called internally by map_images_nolock(). The parameters are the number of image files passed by dyld count, file paths, macho header information mach_header.

. while (i--) { const headerType *mhdr = (const headerType *)mhdrs[i]; auto hi = addHeader(mhdr, mhPaths[i], totalClasses, unoptimizedTotalClasses); if (! hi) { continue; } if (mhdr->filetype == MH_EXECUTE) { #if __OBJC2__ size_t count; _getObjc2SelectorRefs(hi, &count); selrefCount += count; _getObjc2MessageRefs(hi, &count); selrefCount += count; .Copy the code

Map_images_nolock internally iterates through header file information. When the header file type is MH_EXECUTE, __objc_selrefs and __objc_msgrefs are obtained from macho to prepare for the registration method.

. if (firstTime) { sel_init(selrefCount); arr_init(); .Copy the code

A one-time runtime initialization is then performed, which must be deferred until the executable itself is found. This initialization includes:

① Register some system method selectors

#define s(x) SEL_##x = sel_registerNameNoLock(#x, NO)
#define t(x,y) SEL_##y = sel_registerNameNoLock(#x, NO)

    s(load);
    s(initialize);
    t(resolveInstanceMethod:, resolveInstanceMethod);
    t(resolveClassMethod:, resolveClassMethod);
    t(.cxx_construct, cxx_construct);
    t(.cxx_destruct, cxx_destruct);
    s(retain);
    s(release);
    s(autorelease);
    s(retainCount);
    s(alloc);
    t(allocWithZone:, allocWithZone);
    s(dealloc);
    s(copy);
    s(new);
    t(forwardInvocation:, forwardInvocation);
    t(_tryRetain, tryRetain);
    t(_isDeallocating, isDeallocating);
    s(retainWeakReference);
    s(allowsWeakReference);
Copy the code

You can see a lot of familiar methods, so why only these?

Because this set of methods are to be used internally in the system, they need to be registered in advance, and other method selectors will be registered when the class structure is initialized.

② Automatic release pool initialization and global hash table initialization

void arr_init(void) 
{
    AutoreleasePoolPage::init();
    SideTableInit();
}
Copy the code

The hash table is used to store the weak table, reference count table, and so on.

The map_images_nolock() function finally executes _read_images() to start reading the macho initialization class information.

. if (hCount > 0) { _read_images(hList, hCount, totalClasses, unoptimizedTotalClasses); }Copy the code

2.4 _read_images () analysis

First, the purpose of _read_images() is to read the macho initialization class information, which must be stored by the container.

int namedClassesSize = 
    (isPreoptimized() ? unoptimizedTotalClasses : totalClasses) * 4 / 3;
gdb_objc_realized_classes =
     NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);
        
allocatedClasses = NXCreateHashTable(NXPtrPrototype, 0, nil);
Copy the code

So, just in time, libObjc created two tables gDB_objC_realized_CLASSES and allocatedClasses to prepare.

Then why do you need two forms?

  • Gdb_objc_realized_classes: Named classes that are not in the dyLD shared cache, whether implemented or not. And dynamically expand the number of classes.

  • AllocatedClasses: All assigned classes (metaclasses).

This is easy to understand. The system needs to rely on tables for subsequent processing. Therefore, the master table stores all the original data, and the small table stores the data to be initialized, improving query efficiency.

Easy enough to create, then start reading the data.

If you look at this method at a macro level, you’ll find that it’s written in an interesting and ritualistic way

As you can see, it does the related processing in turn, and implements similar logic, with corresponding output at the end.

Now that the main flow is clear, let’s look at some of the more important processes:

(1) class to handle

for (EACH_HEADER) { classref_t *classlist = _getObjc2ClassList(hi, &count); if (! mustReadClasses(hi)) { continue; } bool headerIsBundle = hi->isBundle(); bool headerIsPreoptimized = hi->isPreoptimized(); for (i = 0; i < count; i++) { Class cls = (Class)classlist[i]; Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized); . }}}Copy the code
GETSECT(_getObjc2ClassList,           classref_t,      "__objc_classlist");
Copy the code

Click _getObjc2ClassList to get to the macro definition, which means

Class information is read from the __objc_classlist section of macho (the same goes for the rest of the processing, but the fields are different), and readClass() is called to iterate over the read classes.

At first glance, readClass contains a lot of ro, RW-related code, so this is where the RW is set, and many articles do. If you are too hasty, you may overlook details.

Rw = popFutureNamedClass(); popFutureNamedClass(); popFutureNamedClass(); So readClass does two main things:

  • Do not perform
  • addNamedClass()
  • addClassTableEntry()

AddNamedClass () and addClassTableEntry() are similar, inserting the read classes and metacladders into the gDB_objC_realized_classes master table.

The class processing here is just adding the class to the table and loading it for you.

② Method number processing

static size_t UnfixedSelectors{ mutex_locker_t lock(selLock); for (EACH_HEADER) { if (hi->isPreoptimized()) continue; bool isBundle = hi->isBundle(); SEL *sels = _getObjc2SelectorRefs(hi, &count); UnfixedSelectors += count; for (i = 0; i < count; i++) { const char *name = sel_cname(sels[i]); sels[i] = sel_registerNameNoLock(name, isBundle); }}}Copy the code

The method number information is read from the __objc_selrefs section of macho and is inserted into the method number hash table by calling sel_registerNameNoLock() to iterate over the read method number. Similar to the method used by the system to register itself, the remaining methods are registered here.

③ Non-lazy loading class implementation

for (EACH_HEADER) { classref_t *classlist = _getObjc2NonlazyClassList(hi, &count); for (i = 0; i < count; i++) { Class cls = remapClass(classlist[i]); if (! cls) continue; . addClassTableEntry(cls); . realizeClassWithoutSwift(cls); }}Copy the code

The non-lazy-loaded classes are first read from the __objc_nlclslist segment of Macho, and the calls to remapClass() and addClassTableEntry() are iterated over to ensure that the non-lazy-loaded classes are added to the corresponding table. The realizeClassWithoutSwift() implementation class is then called.

Note that the entire process only deals with non-lazy-loaded classes; lazy-loaded classes are not loaded.Copy the code

So why only load classes that are not lazily loaded?

The reason is this: the current process is before main(), and a project typically starts with tens of thousands of classes, most of which are used after it starts, and some of which are never used. It is reasonable and necessary for Apple to load only the classes that need to be loaded before starting up.

④ Classification treatment

for (EACH_HEADER) { category_t **catlist = _getObjc2CategoryList(hi, &count); bool hasClassProperties = hi->info()->hasCategoryClassProperties(); for (i = 0; i < count; i++) { category_t *cat = catlist[i]; Class cls = remapClass(cat->cls); if (! cls) { catlist[i] = nil; . continue; } bool classExists = NO; if (cat->instanceMethods || cat->protocols || cat->instanceProperties) { addUnattachedCategoryForClass(cat, cls, hi); if (cls->isRealized()) { remethodizeClass(cls); classExists = YES; }... } if (cat->classMethods || cat->protocols || (hasClassProperties && cat->_classProperties)) { addUnattachedCategoryForClass(cat, cls->ISA(), hi); if (cls->ISA()->isRealized()) { remethodizeClass(cls->ISA()); }... }}}Copy the code

The classes are first read from macho __objc_catList. RemethodizeClass () is called through the read classes. AttachCategories () is called internally to attach method, protocol, and property to the classes.

There will be partial analysis in article 2.6 and further analysis in the next chapter related to classification (involving lazy loading and non-lazy loading). Here we will first analyze the key function realizeClassWithoutSwift().

2.5 realizeClassWithoutSwift () analysis

RealizeClassWithoutSwift () is a non-SWIFT implementation class call. Is there a swift implementation class call? Try a search and see _objc_realizeClassFromSwift().

Look at its implementation:

It’s still essentially calling realizeClassWithoutSwift(), so analyze realizeClassWithoutSwift() whether it’s swift or not.

Let’s see what it does:

Performs first-time initialization on class cls,including allocating its read-write data. Returns the real class structure for the class. Performs the first initialization on a class, including assigning its read and write data. Returns the true class structure of the classCopy the code

Then watch how it works:

In most cases, Normal class is executed. This is where the rW is actually set, but note that rw only assigns ro and FLGS. Other methods and protocol are not assigned.

② The parent and metaclass of the recursive implementation class, the recursive exit is CLS =nil, all the way to NSObject.

③ Assign isa, superCLs, cache_t, bits to the supercls structure. It makes sense that when you implement a class, you definitely need to do something about the internal attributes.

(4) If superCLs exists, add CLS to superCLS pairs. Otherwise, set CLS to rootClass. ③ and ④ are equivalent to bidirectional linked list association CLS and SuperCLS.

MethodizeClass () fixes the CLS method list, protocol list, and attribute list. Append any additional categories.

2.6 methodizeClass () analysis

Come to methodizeClass (),

It does two things (in some cases, the ro already has classified data, which will be analyzed in the next chapter) by maintaining the order of class before classification:

  • torwIn turnattachLists()Method list, protocol list, and property list
  • unattachedCategoriesForClass()Gets a list of unattached categories, calledattachCategories()torwAlso, in turn,attachLists()Method list, protocol list, and property list.

Come to attachCategories ()

static void attachCategories(Class cls, category_list *cats, bool flush_caches){ if (! cats) return; . auto rw = cls->data(); prepareMethodLists(cls, mlists, mcount, NO, fromBundle); rw->methods.attachLists(mlists, mcount); free(mlists); if (flush_caches && mcount > 0) flushCaches(cls); rw->properties.attachLists(proplists, propcount); free(proplists); rw->protocols.attachLists(protolists, protocount); free(protolists); }Copy the code

You can see internally the same call to attachLists().

It is interesting to note that method lists, protocol lists, and attribute lists can all be called attachLists(), indicating that their underlying structure is similar.

Come to attachLists (),

This is how the data works.

  • In the case of multiple and definite numbers, the old data is moved back to the new data as a whole, inserting the new data to the front of the list
  • In the case of a certain number and one, the new data is directly inserted in the first one
  • In the case of an indefinite number of data, the old data is moved back one, and the new data is inserted in front of the new data, and the process is repeated until the end of the addition

In general, the new data always precedes the old data, which explains why the same method, the classification method has the illusion of “overwriting” the main class method (except for +load).

At this point, the map_images() work that _objc_init took over from DYld is finished, the structure of the class is loaded, and the class data is processed. Note, however, that this is only for non-lazy-loaded classes.

3. Summarize some relevant questions and interview questions

1. If a class implements the same method as the main class, will it be overwritten?

The method of the classification and the main class exist at the same time, but the storage bit of the classification precedes the main class, so the calling method reads the method of the classification first, producing an illusion that the method of the main class is overwritten by the classification

2. Who is added to RW first, class or categorical data?

First sort, then sort. The classified data needs to be attached to the RW to take effect, so the class needs to be loaded first so that it has the RW. Even in cases where the classification is executed first, it is only saved and attached after the class is executed

3. Why do we need RW when we have RO

Because oc is dynamic, data can be added at run time in addition to compile-time data

4. The relationship between RO and RW

Ro stores the properties, methods, and protocols that were identified at compile time for the current class;

The RW is determined at run time and copies the contents of the RO, and then copies the properties, methods, and so on of the current class classification into it. Rw is a superset of RO

5. How do the data in Macho get into memory

Read the contents under the corresponding field of macho, store them with hash table, and initialize them according to the table

4. Write in the back

Compared with DYLD, the loading principle process of class is relatively simple, but there are many interview questions derived from it. Therefore, it is cost-effective to understand its process, which is worth our time to go deeper.

The next chapter is a load analysis of classes, which complements this chapter by covering the flow of classes and classes under lazy and non-lazy loading.