This is the 20th day of my participation in the August Text Challenge.More challenges in August

When we think about threads, we think about threads being unsafe. How do we make sure that threads are safe and that data access between multiple threads doesn’t go wrong? With these questions, let’s introduce the principle of locking.

1 iOS locks

1.1 Classification of locks

Locks are divided into two categories: spin locks and mutex locks. Spin locks are implemented by busy (non-stop questioning) and are suitable for short tasks.

  • spinlocks
    • OSSpinLock
    • dispatch_semaphore_t
    • os_unfair_lock_lock
  • The mutex
    • pthread_mutex_t
    • NSLock
    • NSCondition
    • NSRecursiveLock
    • NSConditionLock
    • @synchronized

1.2 Performance of various locks

Test the performance of various locks by locking/unlocking 100,000 times

int kRunTimes = 100000; {lock = initialize lock; double_t beginTime = CFAbsoluteTimeGetCurrent(); for (int i=0 ; i < kRunTimes; I++) {lock (&lock); Unlock (& lock); } double_t endTime = CFAbsoluteTimeGetCurrent() ; NSLog(@" lock: %f ms",(endTime - beginTime)*1000); }Copy the code

Running result of real iPhoneXR

OSSpinLock: 1.433015 ms
dispatch_semaphore_t: 2.267957 ms
os_unfair_lock_lock: 2.338052 ms
pthread_mutex_t: 2.584100 ms
NSlock: 2.802968 ms
NSCondition: 2.210975 ms
PTHREAD_MUTEX_RECURSIVE: 2.773046 ms
NSRecursiveLock: 3.018975 ms
NSConditionLock: 5.580902 ms
@synchronized: 9.202957 ms
Copy the code

Real iPhone12mini running results

OSSpinLock: 0.748038 ms
dispatch_semaphore_t: 1.023054 ms
os_unfair_lock_lock: 0.805020 ms
pthread_mutex_t: 0.934958 ms
NSlock: 1.582980 ms
NSCondition: 1.513004 ms
PTHREAD_MUTEX_RECURSIVE: 2.305984 ms
NSRecursiveLock: 2.532005 ms
NSConditionLock: 8.258939 ms
@synchronized: 3.880978 ms
Copy the code

The running time of each lock can be roughly shown in the figure below. After testing, the performance of each lock is different in different environments. The performance of the real machine is better than that of the simulator, which indicates that the @synchronized lock has certain optimization on the real machine. @synchronized locks are used relatively frequently in our applications.

2@synchronized analysis

2.1 @ synchronized

Since @synchronized is the easiest and most frequently used, we’ll look at @synchronized first.

  • Through clang to C++ code, view the bottom layer implementation

The test case

static int a = 0;

int main(int argc, char * argv[]) {
    NSObject *obj = [NSObject alloc];
    @synchronized (obj) {
        a++;
    }
    return 0;
}
Copy the code

Using C++ code, we can see several things:

    1. Lock unlock for the same object _sync_obj, usually we use self, because self can be used directly to avoid creating additional lock object, and in the process of calling the object method, it will not be released to ensure that the lock object is available.
    1. Locked byObjc_sync_enter (_sync_obj), unlock throughObjc_sync_exit (_sync_obj);@synchronizedLock code block, in the lock, unlock between, play a lock.
static int a = 0; int main(int argc, char * argv[]) { NSObject *obj = ((NSObject *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("NSObject"), sel_registerName("alloc")); { id _rethrow = 0; id _sync_obj = (id)obj; objc_sync_enter(_sync_obj); Struct _SYNC_EXIT(id arg) : sync_exit(arg) {} ~_SYNC_EXIT() {objc_sync_exit(sync_exit); struct _SYNC_EXIT(id arg) : sync_exit(arg) {} ~_SYNC_EXIT(sync_exit); } id sync_exit; } _sync_exit(_sync_obj); // lock code a++; } // exit object destructor, ~_SYNC_EXIT() -> objc_sync_exit(_sync_obj) catch (id e) {_rethrow = e; } { struct _FIN { _FIN(id reth) : rethrow(reth) {} ~_FIN() { if (rethrow) objc_exception_throw(rethrow); } id rethrow; } _fin_force_rethow(_rethrow); } } return 0; }Copy the code
  • Xcode looks at the assembly

So let’s explore thatobjc_sync_enter.objc_sync_exitInternal implementation

2.2 @synchronized source code analysis

  • objc_sync_enter
// Begin synchronizing on 'obj'. // Allocates recursive mutex associated with 'obj' if needed. // Returns OBJC_SYNC_SUCCESS once lock is acquired. int objc_sync_enter(id obj) { int result = OBJC_SYNC_SUCCESS; if (obj) { SyncData* data = id2data(obj, ACQUIRE); ASSERT(data); data->mutex.lock(); } else {// @synchronized(nil) does nothing if (DebugNilSync) {_objc_inform(" nil SYNC DEBUG: @synchronized(nil); set a breakpoint on objc_sync_nil to debug"); } objc_sync_nil(); } return result; }Copy the code

BREAKPOINT_FUNCTION(void objc_sync_nil); -> void objc_sync_nil(void) {}.

BREAKPOINT_FUNCTION( void objc_sync_nil(void) );

/* Use this for functions that are intended to be breakpoint hooks.
   If you do not, the compiler may optimize them away.
   BREAKPOINT_FUNCTION( void stop_on_error(void) ); */
#   define BREAKPOINT_FUNCTION(prototype)                             \
    OBJC_EXTERN __attribute__((noinline, used, visibility("hidden"))) \
    prototype { asm(""); }
Copy the code

So objc_sync_enter(nil),@synchronized(nil) does nothing.

  • objc_sync_exit

Likewise objc_sync_exit(nil) is @synchronized(nil) does nothing.

// End synchronizing on 'obj'. // Returns OBJC_SYNC_SUCCESS or OBJC_SYNC_NOT_OWNING_THREAD_ERROR int objc_sync_exit(id obj) { int result = OBJC_SYNC_SUCCESS; if (obj) { SyncData* data = id2data(obj, RELEASE); if (! data) { result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR; } else { bool okay = data->mutex.tryUnlock(); if (! okay) { result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR; } } } else { // @synchronized(nil) does nothing } return result; }Copy the code

At the same time, we find that objc_sync_Enter, objc_Sync_Exit are strikingly similar, with the biggest differences being:

  • objc_sync_enter -> SyncData* data = id2data(obj, ACQUIRE).ACQUIRE getThe lock
  • objc_sync_exit -> SyncData* data = id2data(obj, RELEASE).RELEASE RELEASETo unlock
  • instructions@synchronized(obj)The core in theid2data()Implementation, as wellSyncDataThe data type structure of.

2.3 core implementation of @synchronized(object

Both objc_sync_Enter and objc_sync_exit call ID2Data, so it must be our focus, so let’s follow through. Let’s find its source code, as follows:

static SyncData* id2data(id object, enum usage why) { spinlock_t *lockp = &LOCK_FOR_OBJ(object); SyncData **listp = &LIST_FOR_OBJ(object); SyncData* result = NULL; #if SUPPORT_DIRECT_THREAD_KEYS // Check per-thread single-entry fast cache for matching object bool fastCacheOccupied = NO; SyncData *data = (SyncData *)tls_get_direct(SYNC_DATA_DIRECT_KEY); if (data) { fastCacheOccupied = YES; if (data->object == object) { // Found a match in fast cache. uintptr_t lockCount; result = data; lockCount = (uintptr_t)tls_get_direct(SYNC_COUNT_DIRECT_KEY); if (result->threadCount <= 0 || lockCount <= 0) { _objc_fatal("id2data fastcache is buggy"); } switch(why) { case ACQUIRE: { lockCount++; tls_set_direct(SYNC_COUNT_DIRECT_KEY, (void*)lockCount); break; } case RELEASE: lockCount--; tls_set_direct(SYNC_COUNT_DIRECT_KEY, (void*)lockCount); if (lockCount == 0) { // remove from fast cache tls_set_direct(SYNC_DATA_DIRECT_KEY, NULL); // atomic because may collide with concurrent ACQUIRE OSAtomicDecrement32Barrier(&result->threadCount); } break; case CHECK: // do nothing break; } return result; } } #endif // Check per-thread cache of already-owned locks for matching object SyncCache *cache = fetch_cache(NO); if (cache) { unsigned int i; for (i = 0; i < cache->used; i++) { SyncCacheItem *item = &cache->list[i]; if (item->data->object ! = object) continue; // Found a match. result = item->data; if (result->threadCount <= 0 || item->lockCount <= 0) { _objc_fatal("id2data cache is buggy"); } switch(why) { case ACQUIRE: item->lockCount++; break; case RELEASE: item->lockCount--; if (item->lockCount == 0) { // remove from per-thread cache cache->list[i] = cache->list[--cache->used]; // atomic because may collide with concurrent ACQUIRE OSAtomicDecrement32Barrier(&result->threadCount); } break; case CHECK: // do nothing break; } return result; } } // Thread cache didn't find anything. // Walk in-use list looking for matching object // Spinlock prevents multiple threads from creating multiple // locks for the same new object. // We could keep the nodes in some hash table if we find that there are // more than 20 or so distinct locks active, but we don't do that now. lockp->lock(); { SyncData* p; SyncData* firstUnused = NULL; for (p = *listp; p ! = NULL; p = p->nextData) { if ( p->object == object ) { result = p; // atomic because may collide with concurrent RELEASE OSAtomicIncrement32Barrier(&result->threadCount); goto done; } if ( (firstUnused == NULL) && (p->threadCount == 0) ) firstUnused = p; } // no SyncData currently associated with object if ( (why == RELEASE) || (why == CHECK) ) goto done; // an unused one was found, use it if ( firstUnused ! = NULL ) { result = firstUnused; result->object = (objc_object *)object; result->threadCount = 1; goto done; } } // Allocate a new SyncData and add to list. // XXX allocating memory with a global lock held is bad practice, // might be worth releasing the lock, allocating, and searching again. // But since we never free these guys we won't be stuck in allocation very often. posix_memalign((void **)&result, alignof(SyncData), sizeof(SyncData)); result->object = (objc_object *)object; result->threadCount = 1; new (&result->mutex) recursive_mutex_t(fork_unsafe_lock); result->nextData = *listp; *listp = result; done: lockp->unlock(); if (result) { // Only new ACQUIRE should get here. // All RELEASE and CHECK and recursive ACQUIRE are // handled by the per-thread caches above. if (why == RELEASE) { // Probably some thread is incorrectly exiting // while the object is held by another thread. return nil; } if (why ! = ACQUIRE) _objc_fatal("id2data is buggy"); if (result->object ! = object) _objc_fatal("id2data is buggy"); #if SUPPORT_DIRECT_THREAD_KEYS if (! fastCacheOccupied) { // Save in fast thread cache tls_set_direct(SYNC_DATA_DIRECT_KEY, result); tls_set_direct(SYNC_COUNT_DIRECT_KEY, (void*)1); } else #endif { // Save in thread cache if (! cache) cache = fetch_cache(YES); cache->list[cache->used].data = result; cache->list[cache->used].lockCount = 1; cache->used++; } } return result; }Copy the code

The code is more complex, let’s analyze it piece by piece. This function is a whole big block of code

/ / / 1 if (data) logic} {/ / / / / / 21 if (cache) logic} {/ / / / / / 3 lockp - > lock (); {// logic} // 4 lockp->unlock(); If (result) {///Copy the code

So we’re going to go through this section by section. #if SUPPORT_DIRECT_THREAD_KEYS supports TLS. Thread Local Storage (TLS) is a private space provided by the operating system for threads alone, usually with limited capacity. Pthread_key_create (), pthread_getSpecific (), pthread_setspecific(), pthread_key_delete() if(data) And if(cache) have two places to store SyncData. Then perform

sizeof(SyncData));
    result->object = (objc_object *)object;
    result->threadCount = 1;
    new (&result->mutex) recursive_mutex_t(fork_unsafe_lock);
    result->nextData = *listp;
    *listp = result;
Copy the code

So this code does the result assignment, keep going,

 done:
    lockp->unlock();
Copy the code

Done, unlock(), why is that? This is because during internal operations, memory space is opened up and related processing of memory space is ensured to be thread safe. SyncData is a one-way linked list. Why is it stored twice? spinlock_t lockp = &LOCK_FOR_OBJ(object); Object, #define LOCK_FOR_OBJ(obj) sDataLists[obj]. Lock * *SyncData *listp = &LIST_FOR_OBJ(object); I also get it from a.

#define LOCK_FOR_OBJ(obj) sDataLists[obj].lock
#define LIST_FOR_OBJ(obj) sDataLists[obj].data
static StripedMap<SyncList> sDataLists;
Copy the code

Why are sDataLists here? SDataLists is a static global variable. A StripedMap is a hash map, and by subscript, sometimes a hash conflict occurs, and usually we hash it again, and if it conflicts again, we hash it again, that’s the normal way to do it, and this time we’re going to talk about zip. SDataLists are static global variables. Stripedmaps are hash structures, i.e. global hash tables. The SyncList structure is as follows:

struct SyncList {
    SyncData *data;
    spinlock_t lock;

    constexpr SyncList() : data(nil), lock(fork_unsafe_lock) { }
};
Copy the code

We debug through LLDB, as shown in the figure:

Here is thesDataListsStructure, total 64 data.

Number four, which means we’ve done it once, indirectly proves that this is a hash structure. SynData data is derived from object. So if we lock self, and then we lock self, and then we create a synData and store it in the hash table, and then the keyword is self, what do we do, then we have a hash collision structure, in our hash tableSyncListIt’s not a data structure, it’s stored in a linked list, because pairs don’t conflict, linked lists don’t make it easy to query,But SyncData does not need us to query, we only need to lock, unlock, only need to add and delete, no need to query, which forms the zipper method.

2.4 Precautions of synchronized

  • Synchronized lock objects should not be null
  • Synchronized locked objects have a lifetime at least as long or longer than that of threaded code objects, so self is generally used
  • Synchronized lock objects use self to prevent multiple creation and facilitate storage and release.
  • Synchronized lock objects use self. If the current lock object is very long, the list will be very large, which will put some burden on the zipper, but it will not operate very frequently.
  • Synchronized locks, the difference between the simulator and the real machine is large because there is a limit of 64 sizes in the simulator
class StripedMap { #if TARGET_OS_IPHONE && ! TARGET_OS_SIMULATOR enum { StripeCount = 8 }; #else enum { StripeCount = 64 }; #endifCopy the code

Because the judgment here, if it’s real it’s eight or 64 and it takes a long time to find.