Strings and arrays are simple to use, but what about the underlying storage?

A, the String

1.1. 1StringHow much memory do variables take up?

Sample code:

var str1 = "0123456789"
print(MemoryLayout.size(ofValue: str1))
// Output: 16

print(Mems.memStr(ofVal: &str1))
/*
 输出:
 0x3736353433323130
 0xea00000000003938
 */
Copy the code

Conclusion: A String variable takes 16 bytes.

How are those 16 bytes stored?

1.2. Compilation analysis

What do the first and last eight bytes of the variable str1 store? In fact, they store all ASCII values.

0The hexadecimal corresponding to the ASCII code value of0x30.

The first 8 bytes 0x3736353433323130, separated:

0x37 36 35 34 33 32 31 30
Copy the code

The last 8 bytes 0xeA00000000003938 Separated:

0xea0000000000 39 38
Copy the code

What does 0xEA stand for? This should be looked at separately, 0xE A, followed by a bit of a for the string length (that is, length 10), and preceded by e for the string identifier (e for the string memory stored in the variable).

That means that the string length is at most F, which fills up memory completely.

1.3. A string of 15 characters

Sample code:

var str1 = "0123456789ABCDE"
print(Mems.memStr(ofVal: &str1))
0x3736353433323130 0xEF45444342413938 */
Copy the code

A 15-bit string with the digits already represented by f. This is similar to a tagger pointer in OC (the contents of the string are placed directly into object STR1).

What happens if I have one more bit?

1.4. A string of more than 15 characters

Sample code:

var str1 = "0123456789ABCDEF"
print(MemoryLayout.size(ofValue: str1))
// Output: 16
print(Mems.memStr(ofVal: &str1))
/*
 输出:
 0xd000000000000010
 0x80000001000075f0
 */
Copy the code

This output is obviously different from the previous one. Even as the length continues to grow, the memory doesn’t change much, which means that the string really isn’t stored in those 16 bytes.

1.4.1. String memory address

The first step:View the string initialization input parameter

  • String Real address:rax = 0x100001cc2 + 0x594e = 0x100007610
  • rdiThe real address of the string0x100007610
  • rsiStore the length of the string0x10

The second step:Enter theString.initfunction

  • rsi(It is0x10) and0xfMake a comparison (length comparison)
    • If the string length is less than 16 characters, enterjleSpecified address (memory directly into memory)
    • Otherwise, proceed
  • The function returns the address:rax = rdi(0x100007610) + 0x7fffffffffffffe0 = 0x80000001000075F0

Step 3:Function return value

  • String.initThe function returns the valuerdxThe last 8 bytes of string memory are given
  • The string length is given to the first 8 bytes of the string

Step 4:View the contents stored in the real address

  • When you look at the in-memory output of the sample code, it becomes clear

Extension: String Real address Rax = RDI (0x100007610) + 0x7FFffFFFFFFFe0 = 0x80000001000075F0 Equivalent to rax = 0x00000001000075F0 + 0x20.

1.4.2. String storage area

Where is the actual address of the string stored in memory?

The strings in the above sample code are stored in the constant area, but the assembly code feels like it is in the global area.

We can use the Mach-O file to see where the string is stored in the region (the memory of the program file may be dynamic in real time, but the first run of this case is enough to prove where the string is stored).

As you can see, the string in the instance is stored in the constants area (cString). The last 8 bytes of the string hold the address to the memory block 00007610.

Note: The Mac mach-O file displays the address with the offset 0x100000000(virtual memory address), for example, 0x100007610 find the address 00007610 in Mach-O.

1.5. Memory changes caused by string concatenation operations

1.5.1. A string of a maximum of 15 characters

var str1 = "0123456789"
print(Mems.memStr(ofVal: &str1)) // Output: 0x3736353433323130 0xeA00000000003938
str1.append("A")
print(Mems.memStr(ofVal: &str1)) // Output: 0x3736353433323130 0xeb00000000413938
Copy the code

According to the output memory address, when the string content changes but the length does not exceed 15 bits, the string content is still directly placed in memory.

1.5.2. A string of more than 15 characters

var str1 = "0123456789ABCDEF"
print(Mems.memStr(ofVal: &str1)) // Output: 0xD000000000000010 0x80000001000075e0
str1.append("G")
print(Mems.memStr(ofVal: &str1)) // Output: 0xf000000000000011 0x000000010043F8A0
Copy the code

Strings are stored in the constant area, which means memory is not allowed to change. So when you modify the contents of a string (more than 15 bits), you create a new chunk of memory in the heap space to store the contents.

Conclusion:

  • The string contains a maximum of 15 characters and is stored directly in the variable memory
  • More than 15 characters in length, the contents of the string are stored in the constant area
    • The last eight bytes of memory hold the memory address where the string content is actually stored
    • The first 8 bytes store the length of the string and the identifier (the identifier is used to indicate where the string is stored)
  • Whenever a concatenated string is longer than 15 bits, the heap is recreated for the string content.

Second, the Array

Arrays are officially defined as structs (value types) :

public struct Array<Element>
Copy the code

2.1. How much memory does an Array variable take up?

The memory footprint of a structure is the sum of the memory used by variables stored in the structure.

Example code 1:

struct Point {
    var x = 0, y = 0
}
var p = Point(a)print(MemoryLayout.stride(ofValue: p))
// Output: 16
Copy the code

The structures in code 1 above take up 16 bytes of memory.

Arrays are also structures. Is the memory footprint calculated in the same way as in the example above?

Example code 2:

var arr = [1.2.3.4]
print(MemoryLayout.stride(ofValue: arr))
// Output: 8
Copy the code

Unfortunately, it takes only 8 bytes of memory and is the size of an Int. So where do we store the contents of the array?

2.2. Where is the data stored in the array?

Through assembly analysis, we can know that the array in the array is stored in the heap space, and the array variable memory stores the address of the array object in the heap space.

Example code 1:

var arr = [1.2.3.4]
print(Mems.memStr(ofRef: arr))
/ * output:  0x00007fff8e5f54d8 0x0000000200000002 0x0000000000000004 0x0000000000000008 0x0000000000000001 0x0000000000000002 0x0000000000000003 0x0000000000000004 */
Copy the code

As you can see from the memory layout, the array content needs to skip the first 32 bytes. So what do the preceding bytes hold?

  • The first 8-byte segment is the memory address that holds the information about the reference type of the array
  • The second 8 bytes: the reference count for the array
  • Third 8 bytes: the number of elements in the array
  • Fourth segment 8 bytes: The capacity of the array
  • This is followed by the elements of the array

The size of the array is automatically expanded to twice the number of elements and is a multiple of 8.

Example code 2:

var arr = [1.2.3.4]
print(Mems.memStr(ofRef: arr))
/ * output:  0x00007fff8e5f54d8 0x0000000200000002 0x0000000000000004 0x0000000000000008 0x0000000000000001 0x0000000000000002 0x0000000000000003 0x0000000000000004 */

arr.append(5)
print(Mems.memStr(ofRef: arr))
/ * output:  0x00007fff8e5f54d8 0x0000000200000002 0x0000000000000005 0x0000000000000010 0x0000000000000001 0x0000000000000002 0x0000000000000003 0x0000000000000004 0x0000000000000005 0x0000000000000000 0x0000000000000000 0x0000000000000000 */
arr.append(6)
arr.append(7)
arr.append(8)
arr.append(9)
print(Mems.memStr(ofRef: arr))
/ * output:  0x00007fff8e5f54d8 0x0000000200000002 0x0000000000000009 0x0000000000000020 0x0000000000000001 0x0000000000000002 0x0000000000000003 0x0000000000000004 0x0000000000000005 0x0000000000000006 0x0000000000000007 0x0000000000000008 0x0000000000000009 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 */
Copy the code

So an array looks like a structure, but it’s essentially a reference type.

For more articles in this series, please follow our wechat official account [1024 Planet].