1. Introduction

Tagged Pointer is a technology that Apple proposes to store small objects on 64-bit devices. It has the following features

  • The Tagged Pointer value is no longer an address, but a real value. So, it’s not really an object anymore, it’s just a normal variable in an object’s skin.
  • Its memory is not stored in the heap, it does not require malloc and free, it does not follow the logic of reference counting, it is freed by the system
  • Three times more efficient at memory reads and 106 times faster at creation.
  • You can do this by setting environment variablesOBJC_DISABLE_TAGGED_POINTERSIt’s up to developers to decide whether to use the technology

2. The source code

The source code is runtime-based obj4-779.1

2.1 Various flag bits
#if (TARGET_OS_OSX || TARGET_OS_IOSMAC) && __x86_64__
    // 64-bit Mac - tag bit is LSB
# define OBJC_MSB_TAGGED_POINTERS 0
#else
    // Everything else - tag bit is MSB
# define OBJC_MSB_TAGGED_POINTERS 1
#endif

#define _OBJC_TAG_INDEX_MASK 0x7 // 0b111 indicates an extended flag bit, which is 8 bits
// array slot includes the tag bit itself
#define _OBJC_TAG_SLOT_COUNT 16
#define _OBJC_TAG_SLOT_MASK 0xf // 0b1111 taggedPointer + Mask with extended flag bits

#define _OBJC_TAG_EXT_INDEX_MASK 0xff
// array slot has no extra bits
#define _OBJC_TAG_EXT_SLOT_COUNT 256
#define _OBJC_TAG_EXT_SLOT_MASK 0xff // 0b1111 1111

#if OBJC_MSB_TAGGED_POINTERS
# define _OBJC_TAG_MASK (1UL<<63) // Whether the flag bit of tagged Pointer is 1: yes, 0: no
# define _OBJC_TAG_INDEX_SHIFT 60 // Base tag offset from 2-4bit, combined with _OBJC_TAG_INDEX_MASK to obtain the base tag value
# define _OBJC_TAG_SLOT_SHIFT 60
# define _OBJC_TAG_PAYLOAD_LSHIFT 4 // LSHIFT and RSHIFT are used together to obfuscate and recover data shift
# define _OBJC_TAG_PAYLOAD_RSHIFT 4
# define _OBJC_TAG_EXT_MASK (0xfUL<<60) // 1111 0000 ... 0000 0000 Indicates whether the tag bit is extended. The tag bit 111 indicates that the tag bit is extended
# define _OBJC_TAG_EXT_INDEX_SHIFT 52 // Extend the offset of the tag bit from 5 to 12, combined with _OBJC_TAG_EXT_INDEX_MASK to obtain the value of the extended tag
# define _OBJC_TAG_EXT_SLOT_SHIFT 52
# define _OBJC_TAG_EXT_PAYLOAD_LSHIFT 12
# define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12
#else
# define _OBJC_TAG_MASK 1UL
# define _OBJC_TAG_INDEX_SHIFT 1
# define _OBJC_TAG_SLOT_SHIFT 0
# define _OBJC_TAG_PAYLOAD_LSHIFT 0
# define _OBJC_TAG_PAYLOAD_RSHIFT 4
# define _OBJC_TAG_EXT_MASK 0xfUL
# define _OBJC_TAG_EXT_INDEX_SHIFT 4
# define _OBJC_TAG_EXT_SLOT_SHIFT 4
# define _OBJC_TAG_EXT_PAYLOAD_LSHIFT 0
# define _OBJC_TAG_EXT_PAYLOAD_RSHIFT 12
#endif
Copy the code

There are many bits of information defined, a few we need to focus on:

  • _OBJC_TAG_MASK: flag bit Indicates whether the pointer is tagged pointer
  • _OBJC_TAG_INDEX_MASK: The value of tag is 7, indicating extended tag bits
  • The rest are definitions that use bitwise operations to get the value of the tag, the mask of the value of the ext tag, and some other left-to-right shifts
2.2 How Can I Determine whether it is Tagged Pointer

We know that there is a flag bit to indicate if the pointer is tagged pointer

static inline bool 
_objc_isTaggedPointer(const void * _Nullable ptr)
{
    return ((uintptr_t)ptr & _OBJC_TAG_MASK) == _OBJC_TAG_MASK;
}

Copy the code

To determine whether it is tagged pointer, obtain the value of the identifier bit by bit operation. It is important to note that different schema tag bits differ, with some using the lowest and some using the highest bits.

2.3 System Encryption of Tagged Pointer

Before iOS12, we found that it is possible to print the value of tagged Pointer directly, which is very readable, but after 12, we found that it is completely unreadable.

- (void)testCase {
	NSString *stringWithFormat1 = [NSString stringWithFormat:@"y"];
    [self formatedLogObject:stringWithFormat1];
}

- (void)formatedLogObject:(id)object {
    if (@available(iOS 12.0, *)) {
        NSLog(@"%p %@ %@", object, object, object_getClass(object));
    } else {
        NSLog(@"0x%6lx %@ %@", object, object, object_getClass(object)); }}Copy the code

The above test code, printed before 12:0x79 is the value of the Y character corresponding to ASCII

0xa000000000000791 y NSTaggedPointerString
Copy the code

Output after iOS12:

0xcb47b8d98a2fa15f y NSTaggedPointerString
Copy the code

IOS12 before the print pointer value can be very clear to see the data and other information, iOS12 after the system is completely unable to read the print, look at the source code found that Apple is confused, so that we can not directly get the value, so that we can avoid it is easy to forge a tagged pointer object

How did Apple get confused

The first step: displacement calculation, right and left transverse

static inline void * _Nonnull
_objc_makeTaggedPointer(objc_tag_index_t tag, uintptr_t value)
{
    // PAYLOAD_LSHIFT and PAYLOAD_RSHIFT are the payload extraction shifts.
    // They are reversed here for payload insertion.

    // ASSERT(_objc_taggedPointersEnabled());
    if (tag <= OBJC_TAG_Last60BitPayload) {
        // ASSERT(((value << _OBJC_TAG_PAYLOAD_RSHIFT) >> _OBJC_TAG_PAYLOAD_LSHIFT) == value);uintptr_t result = (_OBJC_TAG_MASK | ((uintptr_t)tag << _OBJC_TAG_INDEX_SHIFT) | ((value << _OBJC_TAG_PAYLOAD_RSHIFT) >>  _OBJC_TAG_PAYLOAD_LSHIFT));return _objc_encodeTaggedPointer(result);
    } else {
        // ASSERT(tag >= OBJC_TAG_First52BitPayload);
        // ASSERT(tag <= OBJC_TAG_Last52BitPayload);
        // ASSERT(((value << _OBJC_TAG_EXT_PAYLOAD_RSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_LSHIFT) == value);
        uintptr_t result =
            (_OBJC_TAG_EXT_MASK |
             ((uintptr_t)(tag - OBJC_TAG_First52BitPayload) << _OBJC_TAG_EXT_INDEX_SHIFT) |
             ((value << _OBJC_TAG_EXT_PAYLOAD_RSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_LSHIFT));
        return_objc_encodeTaggedPointer(result); }}Copy the code

The second step, and a random number xOR operation

static inline void * _Nonnull
_objc_encodeTaggedPointer(uintptr_t ptr)
{
    return (void *)(objc_debug_taggedpointer_obfuscator ^ ptr); // objc_debug_taggedPOinter_obfuscator obfuscates the pointer value by xor
}
Copy the code

This random number is generated randomly when dyLD loads the image, and it is different every time the program loads

// map_images -- map_images_nolock -- _read_images -- initializeTaggedPointerObfuscator
static void
initializeTaggedPointerObfuscator(void)
{
    // Initialize a taggedPointer mask each time _objc_init is generated
    if (sdkIsOlderThan(10_14, 12_0, 12_0, 5_0, 3_0) | |// Set the obfuscator to zero for apps linked against older SDKs,
        // in case they're relying on the tagged pointer representation.
        DisableTaggedPointerObfuscation) {
        objc_debug_taggedpointer_obfuscator = 0;
    } else {
        // Pull random data into the variable, then shift away all non-payload bits.
        // select a mask and then perform the operation; Basically, every time the app starts, you get something different
        arc4random_buf(&objc_debug_taggedpointer_obfuscator,
                       sizeof(objc_debug_taggedpointer_obfuscator)); objc_debug_taggedpointer_obfuscator &= ~_OBJC_TAG_MASK; }}Copy the code

Try to hook this value to get or change it to a fixed value to debug, but failed.

How did you decrypt it after the confusion

Normal means that the first step of the wave, xor the fixed random number, is reversed in the same way as the above confusion, and the value is obtained

static inline uintptr_t
_objc_decodeTaggedPointer(const void * _Nullable ptr)
{
    return (uintptr_t)ptr ^ objc_debug_taggedpointer_obfuscator; // Again xor returns the original value
}

Copy the code

Here’s an example that makes it clear; You can see encode xor once, decode xor once again to get the original value.

// Assume the objc_debug_taggedPOinter_obfuscator bit0010, the original data is1001Encode:1001 ^ 0010 = 1011Decode:1011 ^ 0010 = 1001
Copy the code

Step two, move back left and right

static inline uintptr_t
_objc_getTaggedPointerValue(const void * _Nullable ptr) 
{
    // ASSERT(_objc_isTaggedPointer(ptr));
    uintptr_t value = _objc_decodeTaggedPointer(ptr);
    uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK;
    if (basicTag == _OBJC_TAG_INDEX_MASK) {
        return (value << _OBJC_TAG_EXT_PAYLOAD_LSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_RSHIFT;
    } else {
        return(value << _OBJC_TAG_PAYLOAD_LSHIFT) >> _OBJC_TAG_PAYLOAD_RSHIFT; }}static inline intptr_t
_objc_getTaggedPointerSignedValue(const void * _Nullable ptr) 
{
    // ASSERT(_objc_isTaggedPointer(ptr));
    uintptr_t value = _objc_decodeTaggedPointer(ptr);
    uintptr_t basicTag = (value >> _OBJC_TAG_INDEX_SHIFT) & _OBJC_TAG_INDEX_MASK;
    if (basicTag == _OBJC_TAG_INDEX_MASK) {
        return ((intptr_t)value << _OBJC_TAG_EXT_PAYLOAD_LSHIFT) >> _OBJC_TAG_EXT_PAYLOAD_RSHIFT;
    } else {
        return((intptr_t)value << _OBJC_TAG_PAYLOAD_LSHIFT) >> _OBJC_TAG_PAYLOAD_RSHIFT; }}Copy the code
2.4 Tagged Pointer

It is defined in objC_tag_index_t. For example, 2 indicates NSString and 6 indicates NSDate. We know that the maximum value of 3 bits can be 7, which is reserved by the system. It is used to flag if there is an extra tag bit so that more classes can support Tagged Pointer

#if __has_feature(objc_fixed_enum)  ||  __cplusplus >= 201103L
enum objc_tag_index_t : uint16_t
#else
typedef uint16_t objc_tag_index_t;
enum
#endif
{
    // 60-bit payloads
    OBJC_TAG_NSAtom            = 0, 
    OBJC_TAG_1                 = 1, 
    OBJC_TAG_NSString          = 2, 
    OBJC_TAG_NSNumber          = 3, 
    OBJC_TAG_NSIndexPath       = 4, 
    OBJC_TAG_NSManagedObjectID = 5, 
    OBJC_TAG_NSDate            = 6.// 60-bit reserved
    OBJC_TAG_RESERVED_7        = 7.// 52-bit payloads
    OBJC_TAG_Photos_1          = 8,
    OBJC_TAG_Photos_2          = 9,
    OBJC_TAG_Photos_3          = 10,
    OBJC_TAG_Photos_4          = 11,
    OBJC_TAG_XPC_1             = 12,
    OBJC_TAG_XPC_2             = 13,
    OBJC_TAG_XPC_3             = 14,
    OBJC_TAG_XPC_4             = 15,
    OBJC_TAG_NSColor           = 16,
    OBJC_TAG_UIColor           = 17,
    OBJC_TAG_CGColor           = 18,
    OBJC_TAG_NSIndexSet        = 19,

    OBJC_TAG_First60BitPayload = 0, 
    OBJC_TAG_Last60BitPayload  = 6, 
    OBJC_TAG_First52BitPayload = 8, 
    OBJC_TAG_Last52BitPayload  = 263, 

    OBJC_TAG_RESERVED_264      = 264
};
#if__has_feature(objc_fixed_enum) && ! defined(__cplusplus)
typedef enum objc_tag_index_t objc_tag_index_t;
#endif
Copy the code

Extension 3.

3.1 String Encoding

A lot of blogs say that characters are no longer NSTaggedPointerString when they are longer than 10, which is actually not accurate;

The TaggedPointer bit payload is a fixed number of bits, but the number of bits can vary depending on the character encoding format.

In the following test example, an 11-bit string which is also TaggedPointer

NSString *test2 = [NSString stringWithFormat:@ "% @".@ "11111111111"];
    NSLog(@"%p value:0x%lx %@ %@", test2, _objc_getTaggedPointerValue((__bridge const void *)test2), test2, object_getClass(test2));
Copy the code

Output result:

0xfa976f93575a8e75 value:0x7bdef7bdef7bdeb 11111111111 NSTaggedPointerString
Copy the code

If you are interested in this blog post, you can learn how the system encodes Tagged Pointer Strings when creating a string

Thus we can see that the structure of the tagged pointer strings is:

If the length is between 0 and 7, store the string as raw eight-bit characters. If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet “eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX”. If the length is 10 or 11, store the string in a five-bit encoding, Using the Alphabet “eilotrm.apdnsIc ufkMShjTRxgC4013” excerpt from Tagged Pointer Strings by Mike Ash

3.2 Optimization of Tagged Pointer in iOS14

In Intel, the data structure of the tagged pointer is as follows:

  • Flag if it is tagged pointer by specifying the lowest value
  • A 3-bit tag marks the class of the data
  • Extended 8 bits to indicate more types of tagged pointer

The tagged pointer data structure in arm is as follows:

IOS13 system:

Why the change?

It is mainly for the optimization of the call of objc_msgSend. Under the Intel structure, it is necessary to divide tagged Pointer and nil into two branches to judge. If the highest bit is set to bit 1, only one check is required to determine whether the object is a normal pointer.if (ptrValue <= 0) // is tagged or nilAnything else is a pointer to normal

IOS14 system

The difference between iOS14 and iOS13 is that we put tag in the last three digits; The tagged Pointer bit remains the highest bit

Why are we doing this?

Apple’s explanation:

  • A feature of ARM is that dyLD ignores the first 8 bits of Pointers (this is due to ARM’s Top Byte Ignore feature).
  • In this way, the payload in the graph is the same as the payload of an ordinary pointer. That is, the payload bits in a tagged pointer can contain a normal pointer.
  • This gives Tagged Pointer the ability to reference constant data in binaries, such as strings or other data structures, reducing the use of dirty memory

All I can say is apple is awesome 666