IOS APP Startup Optimization (2) : Code Size Performance Guidelines

Retired Document Important: This document may not represent best practices for current development. Links to downloads and other resources may no While it may not be a best practice for current development, it is still valid.

Introduction to Code Size Performance Guidelines

In terms of program performance, there is a clear correlation between memory utilization and efficiency. The more memory an application consumes, the less efficient it becomes. More memory means more memory allocation, more code, and more potential for paging activity.

The topic of this document focuses on reducing executable code. Reducing code footprint is not just a matter of enabling code optimization in the compiler, although that does help. You can also reduce your code footprint by organizing your code so that only the minimum number of necessary functions are stored in memory at any given time. You can do this by analyzing the code.

Reducing the amount of memory allocated by an application is also important to reduce memory footprint; The Memory Usage Performance Guidelines in Performance Documentation contain this information.

Organization of This Document

This programming topic (documentation) contains the following:

Overview of the Mach -o Executable Format describes how to use the organization of the Mach-O Executable Format to improve code efficiency.
Managing Code Size describes several compiler options that can be used to reduce the overall Size of the executable.
Improving Locality of Reference describes how to configure and reorganize code to reduce the loading time of code segments.
Reducing Shared Memory Pages describes methods to reduce the size of __DATA segments.
Minimizing Your Exported Symbols shows how to identify and eliminate unnecessary symbol information in Your code.

Overview of the Mach-O Executable Format

Mach-o is the native executable format for binaries in OS X and is the preferred format for Shipping code. The executable format determines the order in which the code and data in the binary file are read into memory. The order of code and data affects memory usage and paging activity, thus directly affecting the performance of a program.

The Mach-O binaries are organized into segments. Each section contains one or more sections. Different types of code or data go into each section. Segments always start at page boundaries, but sections are not necessarily page-aligned. The size of a Segment is measured by the number of bytes in all its sections and rounded up to the next virtual memory page boundary. Therefore, a segment is always 4096 bytes or a multiple of 4 KB, where 4096 bytes is the minimum size.

Segments and sections of the Mach-o executable are named according to their intended use. The convention for Segment names is to use all uppercase letters preceded by a double underscore (e.g. __TEXT); The convention for Section names is to use all lowercase letters preceded by a double underscore (for example, __text).

There are several possible segments in the Mach-o executable, but only two are performance-related: __TEXT segment and __DATA segment.

The __TEXT Segment: Read Only

The __TEXT segment is a read-only area containing executable code and constant data. By convention, every executable created by compiler tools has at least one read-only __TEXT segment. Since this segment is read-only, the kernel can map the __TEXT segment from the executable directly into memory once. When a segment is mapped into memory, it can be shared between all processes interested in its contents. (This is mainly the case with frameworks and shared libraries.) The read-only attribute also means that the pages that make up the __TEXT segment do not have to be saved to backup storage. If the kernel needs to free physical memory, it can discard one or more __TEXT pages and re-read them from disk as needed.

Table 1 lists some of the more important sections that can appear in __TEXT segments. For a complete list of segments, see Mach-o Runtime Architecture.

Table 1 Major sections in the __TEXT segment

Section	Description
__text	Executable The compiled machine code for The executable
__const	The general constant data for The executable
__cstring	Literal String constants (quoted strings in source code) Literal string constants (quoted strings in source code)
__picsymbol_stub	Position-independent code Stub ROUTINES used by the Dynamic Linker (DYLD) Specifies the position-independent code stub ROUTINES used by the DyLD

The __DATA Segment: Read/Write

The __DATA segment contains an extraordinary amount of executable data. This segment is both readable and writable. Because it is writable, the __DATA segment of the Framework or other shared library is logically copied for each process linked to the library. When memory pages are read and written, the kernel marks them as copy-on-write. This technique delays copying a page (page) until some process that shares the page tries to write to it. When this happens, the kernel creates a private copy of the page for the process.

__DATA segments have many sections, some of which are only used by the dynamic linker. Table 2 lists some of the more important sections that may occur in __DATA segment. For a complete list of segments, see Mach-o Runtime Architecture.

Table 2 Major sections of the __DATA segment

Section	Description
__data	Initialized global variables (for example int a = 1; or static int a = 1;) . Initialized global variables
__const	Constant data needing relocation (for example, char * const p = “foo”;) Constant data that needs to be relocated
__bss	Uninitialized static variables (for example, static int a;) . Uninitialized static variables
__common	Uninitialized external globals (for example, int a; Outside function blocks). Uninitialized external global variables
__dyld	A placeholder section, used by the dynamic linker. The placeholder part used by the dynamic linker
__la_symbol_ptr	“Lazy” symbol pointer. Symbol pointer for each undefined function called by the executable. Symbolic pointer to each undefined function called by the executable
__nl_symbol_ptr	Symbol pointer for each undefined data symbol referenced by the executable. “Non lazy” Symbolic pointer. Symbol pointer for each undefined data symbol referenced by the executable

Mach-O Performance Implications

The composition of __TEXT and __DATA in the Mach-o executable has a direct impact on performance. The techniques and goals for optimizing these segments are different. However, they share a common goal: improving memory efficiency.

Most typical Mach-O files are made up of executable code that occupies the __TEXT section in __TEXT. As described above in The __TEXT Segment: Read Only, __TEXT Segment is read-only and maps directly to The executable. Therefore, if the kernel needs to reclaim the physical memory occupied by some __text pages, it does not have to save those pages to backup storage and page them later. It simply frees up memory and reads it back from disk when it references code later. Although this is cheaper than swapping, because it involves one disk access instead of two, it is still expensive, especially if many pages have to be recreated from disk.

One way to improve this situation is to improve the Reference location of code through procedure reordering, as described in Improving Locality of Reference. This technique groups methods and functions together based on the order in which they are executed, how often they are called, and how often they are called. If pages in a __text section group work this way, they are unlikely to be released and read back multiple times. For example, if you put all the boot-time initialization functions on one or two pages, you don’t have to recreate the page after these initializations have occurred.

Unlike __TEXT segment, __DATA segment can be written, so pages in __DATA segment are not shareable. Non-constant global variables in frameworks can affect performance because each process linked to the framework has its own copy of the variables. The main solution to this problem is to move as many infinitesimal global variables as possible into the __const section of __TEXT by declaring them const. Reducing Shared Memory Pages describe this and related techniques. This is usually not a problem for applications, because __DATA sections in applications are not shared with other applications.

The compiler stores different types of nonconstant global data in different sections of the __DATA segment. These types of data are uninitialized static data and symbols, which are consistent with the ANSI C “definition” concept that is not declared extern. Uninitialized static data is in the __bSS section of the __DATA segment. Tentative -definition symbols are located in the __common section of the __DATA segment.

The ANSI C and C++ standards state that the system must set uninitialized static variables to zero. (Other types of uninitialized data remain uninitialized.) Since the uninitialized static variables and provisionally defined symbols are stored in separate sections, the system needs to handle them differently. However, when variables are in different sections, they are more likely to end up on different pages, so you can swap in and out separately, making your code run slower. The solution to these problems, as described in Reducing Shared Memory Pages, is to merge non-constant global data into a section of the __DATA segment.

Managing Code Size

The GCC compiler supports a variety of options for optimizing code. Most of these techniques will produce less code or faster code, depending on your needs. When you’re ready to release software, you should try out these techniques to see which ones work best for your code.

Compiler-Level Optimizations

When your project code is stable, you should start experimenting with basic GCC options for optimizing your code. The GCC compiler supports optimization options that allow you to choose whether to use smaller binary size, faster code, or faster build times.

For new projects, Xcode automatically disables optimizations for the development Build style. And select the ‘Fastest, Smallest’ option for the Deployment Build style. Any type of code optimization will result in slower generation times because of the extra work involved in the optimization process. If your code is changing (as it does during the development cycle), you don’t want to enable optimizations. Towards the end of the development cycle, deployment Build Style can give you an indication of the size of the final product.

Table 1 lists the optimization levels available in Xcode. When you select one of the options, Xcode passes the appropriate flags to the GCC compiler for the given group or files. These options are available at target-level or as part of build Style. For information about build Settings for your project, see Xcode Help.

Table 1 GCC Compiler Optimization Options

Xcode Setting	Description
None	The compiler does not attempt to optimize the code. Use this option during development when you are focused on solving logic errors and need to compile quickly. Do not use this option for shipping your executable.
Fast	The compiler performs simple optimizations to improve code performance while minimizing the impact on compile time. This option also uses more memory during compilation.
Faster	Perform almost any supported optimization that does not require space-time trade-off. The compiler does not loop unrolling or function inlining when using this option. This option increases compilation time and the performance of the generated code.
Fastest	Perform all optimizations to speed up code generation. This option increases the size of the generated code when the compiler performs active function inlining. This option is generally not recommended. For more information, see Avoiding Excessive Function Inlining.
Fastest, smallest	Perform all optimizations that normally do not increase the size of your code. This is the preferred option for shipping code because it makes the memory footprint of the executable smaller.

As with any performance enhancement, don’t assume which option will give you the best results. You should always measure the results of every optimization you try. For example, the “Fastest” option may generate extremely fast code for a particular module, but this is usually done at the expense of the size of the executable file. Any speed advantage you gain from code generation can easily be lost if the code needs to come in from disk at run time. (Any speed advantage gained from code generation can easily be lost if the code needs to be paginated from disk at run time.)

Additional Optimizations

In addition to code-level optimization, you can use additional techniques to organize code at module level. The following sections describe these techniques.

Dead Strip Your Code

For statically linked executables, dead-code stripping is the process of removing unreferenced code from the executable. The idea behind dead-stripping is that it cannot be used if the code is not referenced, so it is not needed in the executable. Removing Dead Code reduces the size of executable files and helps reduce paging.

Starting with Xcode Tools version 1.5, the Static Linker (LD) supports dead stripping of executable files. You can enable this functionality directly from Xcode or by passing the appropriate command-line options to the static linker.

To enable dead-code stripping in Xcode, do the following:

Select your target.
Open the Inspector or Get Info window and select the Build TAB.
In Linking Settings, enable the Dead Code Stripping option.

TARGETS -> Build Settings -> search Linking -> Dead Code Stripping set to YES/NO.

In Code Generation Settings, set the Level of Debug Symbols option to All Symbols.

TARGETS -> Build Settings -> search for All Symbols -> Strip Style set to All Symbols/ non-global Symbols/Debugging Symbols (default is All Symbols) Symbols).

To enable dead-code stripping from the command line, pass the -dead_strip option to the LD. You should also pass the -gfull option to GCC to generate a complete set of debugging symbols for your code. The linker uses this additional debugging information to dead strip the executable.

Note: Use the “All Symbols” or -gfull option is recommended, even if you don’t intend to dead strip. Although this option produces larger intermediate files, it generally produces smaller executables because the Linker is more efficient at removing duplicate symbolic information.

If you don’t want to delete any unused functions, you should at least isolate them in a separate section of the __TEXT segment. Moving unused functions to the Common Section improves the locality of code references and reduces the likelihood that they will be loaded into memory. For more information on how to group functions in the Common Section, see Improving Locality of Reference.

Strip Symbol Information

Debugging symbols and dynamic-binding information can take up a lot of space and account for a significant portion of the size of the executable file. All unnecessary symbols should be removed before shipping code.

To remove debugging symbols from the executable, change the Xcode build-style setting to “Deployment” and then rebuild the executable. You can also generate debug symbols by target if you wish. For more information about build styles and Target Settings, see Xcode Help.

To manually remove dynamic-binding symbols from the executable, use the Strip tool. This tool removes the symbol information that the Dynamic Linker normally uses to bind external symbols at run time. Removing symbols for functions that you do not want to bind dynamically reduces the size of the executable file and reduces the number of symbols that the dynamic linker must bind. Normally, you will use this command without any options to remove non-external symbols, as shown in the following example:

% cd ~/MyApp/MyApp.app/Contents/MacOS
% strip MyApp
Copy the code

This command is equivalent to running the strip with the -u and -r options. It removes all symbols marked as non-external, but does not remove symbols marked as external.

Another way to manually strip out dynamically bound symbols is to use export files to limit the symbols exported at build time. Export files identify specific symbols available from the executable at run time. For more information on creating Exported files, see Minimizing Your Exported Symbols.

An alternative to stripping out dynamic-binding symbols manually is to use An exports file to limit the symbols exported at build time. An exports file identifies the specific symbols available at runtime from your executable. For more information on creating an exports file, see Minimizing Your Exported Symbols.

Eliminate C++ Exception Handling Overhead

When an exception is thrown, the C++ runtime must be able to expand the stack back to the point where it first matched the catch block. To do this, the GCC compiler generates stack unwinding information for each function that might throw an exception. This expansion information is stored in the executable and describes the objects on the stack. This information makes it possible to call the destructor of these objects to clear exceptions when they are thrown.

Even if your code doesn’t throw an exception, the GCC compiler still generates default stack expansion information for C++ code. If you use exceptions extensively, this extra code can significantly increase the size of the executable file.

Disabling Exceptions

You can disable exception handling in XCoad by disabling the “Enable C++ Exceptions” build option for the target. From the command line, pass the -fno-exceptions option to the compiler. This option removes stack unwinding information for functions. However, you must still remove any try, catch, and throw statements from your code.

Selectively Disabling Exceptions

If your code uses exceptions somewhere but not everywhere, you can explicitly identify methods that do not require expansion information by adding an empty exception specification to the method declaration. For example, in the code below, the compiler must generate stack expansion information for my_function on the grounds that my_other_function or the function it calls may throw an exception.

extern int my_other_function (int a, int b);
int my_function (int a, int b)
{
   return my_other_function (a, b);
}
Copy the code

However, if you know that my_other_function cannot throw an exception, you can signal to the compiler by including an empty exception specification (throw ()) in the function declaration. Therefore, you can declare the previous function as follows:

extern int foo (int a, int b) throw (a);
int my_function (int a, int b) throw (a)
{
   return foo (a, b);
}
Copy the code

Minimizing the Use of Exception catching

When writing code, think carefully about the use of exceptions. Exceptions should be used to indicate exceptions, that is, they should be used to report problems that you did not expect. If an end-of-file error occurs when reading from a file, you don’t want to throw an exception, because this is a known type of error that can be handled easily. If you are trying to read a file that is known to be open and you are told that the file ID is invalid, you might want to throw an exception.

Avoid Excessive lining Function

Although inline functions can improve speed in some cases, they can also degrade performance on OS X if overused. Inline functions eliminate the overhead of calling functions, but do so by replacing each function call with a copy of the code. If inline functions are called frequently, this extra code can quickly add up, swelling the executable and causing paging problems.

When used properly, inline functions can save time and have a minimal impact on code footprint. Keep in mind that the code for inline functions should generally be very short and rarely called. A function is a good candidate for inlining if it takes less time to execute code in a function than to call the function. In general, this means that an inline function should have no more than a few lines of code. You should also make sure to call functions from as few places in the code as possible. Even a very short function can cause overbloating if used inline in dozens or hundreds of places.

Also, you should know that in general you should avoid using GCC’s “Fastest” optimization level. At this level of optimization, the compiler actively tries to create inline functions, even for functions that are not marked inline. Unfortunately, doing so can greatly increase the size of the executable and lead to worse performance problems due to paging.

Build Frameworks as a Single Module

Most shared libraries do not require the Module features of the Mach-O runtime. In addition, cross-module calls incur the same overhead as cross-library calls. Therefore, you should link all of the project’s intermediate object files into a module.

To merge object files, you must pass the -r option to ld during the Link phase. If you use Xcode to build code, this is done for you by default.

Improving Locality of Reference

An important improvement in application performance is to reduce the number of virtual memory pages that an application uses at any given time. This set of pages is called a working set and consists of Application code and runtime data. Reducing the amount of in-memory data is a function of your algorithms But reducing the amount of in-memory code can be achieved through a process called scatter loading. This technique is also known as improving the Locality of Reference.

Typically, compiled code for methods and functions is organized by source files in generated binaries. (Typically, compiled method and function code is organized by source files in the generated binaries.) Scatter loading changes this organization, instead grouping related methods and functions together regardless of their original location. This process allows the kernel to keep the executable pages most frequently referenced by active applications in the smallest possible memory space. This not only reduces the footprint of the application, but also reduces the likelihood that these pages will be called up (presumably when memory is tight).

Important: Generally, you should wait until very late in the development cycle to decentralize the loading of applications. Code tends to move around during development, which invalidates previous measurements.

Profiling Code With Gprof

Gprof generates an execution profile of the program based on the analysis data collected at run time. The effect of the invoked routine is contained in each caller’s profile. Profile data is taken from the Call Graph Profile file(gmon.out by default), which is created by program compilation and linked with the -pg option. The Symbol table in the executable is associated with the Call Graph Profile file. If multiple profile files are specified, the gprof output will display the sum of the profile information in the given profile files.

The gprof tool has many uses, including:

Situations where the Sampler Application does not work well, such as Command-line Tools or an application that exits after a short time
In this case, you need a Call Graph that contains all the code that might be called in a given program, rather than sampling the calls periodically
The code’s Link order needs to be changed to optimize code locality

Generating Profiling Data

Before profiling the application, you must set the project to Generate Profiling Information. To Generate profiling information for Xcode projects, you must modify the Target or build-style Settings to include the “Generate Profiling code” option. TARGETS -> Build Settings -> Build Options -> Generate profiling code(YES/NO See Xcode Help for information about

The profiling code within the program generates a file named gmon.out that contains profiling information. (Normally, this file is in the current working directory.) To analyze the data in this file, copy it to the directory containing the executable before calling gprof, or just specify the path to gmon.out when you run gprof.

In addition to analyzing your own code, you can find out how much time was spent on Carbon and Cocoa Frameworks by linking to their profile versions. To do this, add the DYLD_IMAGE_SUFFIX setting to target or Build Style and set its value to _profile. Dynamic Linker combines this suffix with the framework name to link against the profile version of the framework. To determine which frameworks support profiling, look at the frameworks themselves. For example, the Carbon Library comes with a profile and debug version.

Note: The profile and debug versions of libraries are installed as part of the Developer Tools package and may not be available on user systems. Make sure your shipping Executable does not link to one of these libraries.

Generating Order Files

The order file contains an ordered sequence of lines, each consisting of a source file name and a Symbol name, separated by colons and without additional Spaces. Each line represents a block to be placed in the executable file section. If you modify the Order file manually, you must follow this format exactly so that Linker can process the order file. If the name: Symbol name pair of an object file is not exactly the name linker sees, it will do its best to match the name with the objects linked.

Procedure Lines in a reordered order file consist of an object filename and procedure name (function, method, or other symbol). The order in which procedures are listed in the order file indicates the order in which they are linked to the __text section of the executable file.

To create an order file from profiling data generated using program, run gprof with the -s option (see the man page for gprof). Such as:

gprof -S MyApp.profile/MyApp gmon.out
Copy the code

The -s option generates four mutually exclusive order files:

| gmon. Order | based on profiling the call graph analysis of the “closest is best”. Calls that often call each other are put together. . | | callf order | Routines for each Routine according to the number of calls ordering, the largest number of first. | | callo. Order | | sorting Routines in call order | time. Order | according to the cost of CPU time to sort of Routines, spend the most time of the Routine as the first. |

You should try using each of these files to see which files offer the biggest performance improvement (if any). See Using Pagestuff to Examine Pages on Disk to discuss how to measure the results of ordering.

These Order Files contain only those procedures used during profiling. Linker keeps track of missing procedures and links them in default order after the programs listed in Order Files. The library functions static names are generated in the Order file only if the project directory contains files generated by Linker’s -Whatsloaded option. For more information, see Creating a Default Order File.

The gprof -s option does not work with executables that have been linked using order File.

Fixing Up Your Order Files

After generating order files, you should go through them and make sure they are correct. In many cases, you will need to manually edit order Files, including:

Executables contain assembly-language files.
You profiled a Stripped executable file.
Your executable contains files that were not compiled with the -g option.
Your project defines defines internal labels (commonly used in goTO statements).
You want to preserve the order of routines in a particular object file.

If the symbol definition is in an Assembly file, a Executable file, or a file compiled without the -g option, Gprof will ignore the source file name from symbol’s entry in the Order file. If your project uses such files, you must manually edit the Order file and add the appropriate source filenames. Alternatively, you can remove symbol References entirely to force the corresponding routines to be linked in the default order.

If your code contains internal labels, you must remove these labels from the Order Files; Otherwise, the function that defines the label will be split apart during the link phase. You can prevent the inclusion of internal labels in Assembly files by prefacing assembly files with the string L_ Labels in assembly files altogether by prefixing them with the string L_). Assembler programs interpret symbols with this prefix as local symbols for specific functions and strip them out to prevent access by other tools such as Gprof. (think of the assembly instruction label with the L prefix in objC-msG-arm64.s)

To preserve the order of routines in a particular object file, use the special symbol.section_all. For example, if the object file foo.o comes from the Assembly source, and you want to link all routines without reordering them, remove all existing references to foo.o and insert the following line in the order file:

foo.o:.section_all
Copy the code

This option is useful for object files compiled from an Assembly source or without a source.

Linking with an Order File

After generating the order file, you can link the program with the -sectOrder and -e start options:

Cc -o outputFile inputfile. o... -sectorder __TEXT __text orderFile -e startCopy the code

To use the Order file in your Xcode project, change the “Other Linker Flags” option in the Deployment Build Style of your project. Add the text -sectorder __TEXT __text orderFile to this setting to specify your order file.

If any inputFile is a library rather than an object file, you may need to edit the Order file before linking to replace all references to the object file with references to the corresponding Library file. Again, Linker does his best to match the names in the Order file with the sources he is editing.

With these options, Linker constructs the executable outputFile to construct the contents of the __text section of the __text segment from the __text sections of the Input Files. Linker sorts the ROUTINES in the Input files in the order listed in orderFile.

When Linker processes the order file, It puts procedures for object-file and symbol-name pairs not listed in order file into the __text section of outputFile. It links the symbols in the default order. Multiple lists of object-file and symbol-name pairs always generate warnings and use the pair’s first occurrence.

By default, Linker prints a summary of the number of symbol names in Linked objects that are not in the Order file, the number of symbol names in the sequence file that are not in the linked object, and the number of ambiguous symbol names it tries to match. To request a detailed list of these symbols, use the -sectorder_detail option.

Linker’s -e start option preserves the entry point of the Executable. The start symbol (note the lack of leading “_”) is defined in /usr/bin/crt1.o of the C Runtime shared library; It represents the first text address in a program that is normally linked. When reorder procedures, this option must be used to fix entry points. An alternative is to make /usr/lib/crt1.o:start or /usr/lib/crt1.o:section_all the first line of the order file.

Limitations of gprof Order Files

The.order file generated by gprof contains only those functions called or sampled when the executable is run. In order for the Library functions to appear correctly in the Order file, the Whatsloaded files generated by Linker should exist in the working directory.

The -s option does not apply to executables linked to order file.

Generating the gmon.order file can take a long time – you can use the -x parameter to suppress it.

The file name for the following project is missing:

Files compiled without the -g argument
Routines generated from the Assembly-language source
Executables that have had their debugging symbols removed (as with the strip tool)

Profiling With the Monitor Functions

The /usr/include/monitor.h file declares a set of functions that you can use to programmatically analyze specific parts of your code. You can use these functions to collect statistics for only some parts or all of your code. You can then use the Gprof tool to build the Call Graph and other performance analysis data from the generated file. Listing 1 shows how to use the monitor function.

Listing 1 Using monitor functions

#include <monitor.h>
 
    /* To start profiling: */
    moninit(a);moncontrol(1);
 
    /* To stop, and dump to a file */
    moncontrol(0);
    monoutput("/tmp/myprofiledata.out");
    monreset(a);Copy the code

Organizing Code at Compile Time

The GCC compiler allows you to specify attributes on any function or variable you declare. The section property lets you tell GCC which segment and section you want to place in a particular piece of code.

Warning: Do not use section attributes unless you know the structure of the Mach-O executable and the rules for placing functions and data in segments. Putting function or global variable in the wrong section can break the program.

The section attribute takes several parameters that control where the resulting code is placed. At a minimum, you must specify segment and section names for the code to be placed. Other options are also available and are described in the GCC documentation.

The following list shows how to use the section attribute on functions. In this case, the section attribute is added to the forward declaration of the function. This property tells the compiler to place the function in a particular __text section of the executable file.

void MyFunction (int a) __attribute__((section("__TEXT,__text.10")));
Copy the code

The following list shows some examples of how to organize global variables using the section attribute.

extern const int x __attribute__((section("__TEXT,__my_const")));
const int x=2;
 
extern char foo_string[] __attribute__((section("__DATA,__my_data")));
char foo_string[] = "My text string\n";
Copy the code

Detailed information on attributes of the specified section, please refer to/Developer/documentation/DeveloperTools gcc3 GCC compiler in the document.

Reordering the __text Section

As described in the Overview of the Mach-O executable file format, __TEXT segment holds the actual code and read-only portion of the program. By convention, compiler tools place procedures from the Mach-o object file (with an extension of.o) in the __TEXT section of the __TEXT segment.

When the program runs, the pages of the __text section are loaded into memory on demand because the ROUTINES on these Pages are used. The code links to the __text section in the order it appears in the given source file, and the source files link in the order they are listed in linker Command line (or specified in Xcode). Thus, code from the first object file is linked from beginning to end, followed by code from the second and third files, and so on.

Loading code in source file declaration order is rarely optimal. For example, suppose that some methods or functions in your code are called repeatedly and others are rarely used. Reordering procedures to place commonly used code at the beginning of the __text section minimizes the average number of pages used by an application, thereby reducing paging activity.

As another example, assume that all objects defined by the code are initialized at the same time. Since the initialization routine of each class is defined in a separate source file, the initialization code is usually distributed in the __text section. By reordering the continuous initialization code for all classes, you can reduce the number of pages that need to be read, thereby improving initialization performance. Applications need only a small number of Pages with initialization code, rather than a large number of pages, with only a small amount of initialization code per page.

Reordering Procedures

Depending on the size and complexity of your application, you should adopt a strategy for sorting code to maximize the performance of your executable files. As with most performance tuning, the more time you spend measuring and reordering your programs, the more memory you save. It is easy to get a good first-cut ordering by running the application and ordering the routines by call frequency. The steps of this strategy are listed below and detailed in the following sections:

Build a profile version of the application. This step generates an executable file containing symbols used in the analysis and reordering process.
Run and use the application to create a set of profile data. Perform a series of functional tests, or have someone use the program during testing.

Important: Pay attention to the most typical usage patterns for best results. Avoid using all features of an application or profile data might get diluted or something similar. For example, focus on launch time and the time to activate and deactivate your main window. Do not bring up ancillary window.

Create order Files. Order Files Lists procedures in the Order optimized Order. Linker uses Order Files to reorder procedures in the executable.
Run Linker with Order Files. This creates an executable file with procedures linked to the __text section specified in the Order file.

For information on profiling your code and generating and linking an order file, see profiling code With Gprof.

Procedure Reordering for Large Programs

For many programs, the sequence generated by the above steps provides a substantial improvement over unordered processes. For a simple application with few features, such a sort represents the majority of the benefits gained through PROCEDURE reordering. However, larger applications and other large programs can benefit greatly from additional analysis. While Order Files based on call frequency or call graph are a good place to start, you can use your knowledge of application structure to further reduce the virtual-memory working set.

Creating a Default Order File

If you want to reorder your application’s procedures using techniques other than the above, you can skip the analysis step and start by listing the default Order file of all routines for your application. Once you have a list of routines in proper form, you can rearrange the entries either manually or using the sorting technique of your choice. The generated order file can then be used with linker’s -sectOrder option, as described in Linking with an Order File.

To create the default order file, first run Linker with the -Whatsloaded option:

cc -o outputFile inputFile.o -whatsloaded > loadedFile
Copy the code

This creates a loadingFile file that lists object files loaded in the executable (including any file in Frameworks or other libraries). The whatsLoad option can also be used to ensure that the Order files generated by Gprof -s contain the names of procedures in the static libraries.

Using the loadedFile file, you can run nm with the -onjls option and the __TEXT __TEXT argument:

nm -onjls __TEXT __text `cat loadedFile` > orderFile
Copy the code

The contents of orderFile are the Symbol table of the Text section. Procedures are listed in their default link order in the symbol table. You can rearrange the entries in this File to change the Order in which you want to link to procedures and then run Linker according to the Linking with an Order File.

Using pagestuff to Examine Pages on Disk

The Pagestuff tool can help you evaluate the effectiveness of the procedure ordering by telling you which pages of the executable might have been loaded in memory at a given time. This section briefly describes how to use the tool. For more information, see Pagestuff Man Page.

The Pagestuff tool prints the symbols on a particular executable code page. Here is the syntax of the command:

pagestuff filename [pageNumber | -a]
Copy the code

The output of pagestuff is the list of procedures in the page contained on the pageNumber pageNumber of filename. To view all pages of a file, use the -a option instead of the page number. This output allows you to determine if each page associated with a file in memory has been optimized. If this is not the case, you can rearrange the entries in the Order file and link the executable again to maximize performance. For example, move two related procedures together so they link on the same page. Perfecting the sort may require several links and adjustment cycles. (Perfecting the ordering may require several cycles of tuning and tuning.)

Grouping Routines According to Usage

Why generate profile data for each operation of your application? This policy is based on the assumption that there are three groups of routines for large applications:

Hot ROUTINES are run during the most common usage of the application. These are usually raw WRAppers that provide the basis for features of an application (for example, a WRAPPERS for accessing the data structure of a document) or wrappers that implement the core features of an application, such as a WRAPPERS for typing in a word processor. These routines should be clustered in the same set of Pages.
Warm Routines implements specific features of the application. Warm Routines are usually associated with specific features that users occasionally perform (such as launching, printing, or importing graphics). Because these routines are often used reasonably, gather them together in the same small Pages set so they can be loaded quickly. However, these routines should not be in the Hot Category because the user does not access this feature for a long time.
Cold ROUTINES are rarely used in applications. Cold ROUTINES implements fuzzy features or overrides boundaries or error situations. Group these routines together to avoid wasting space on hot or warm pages.

At any given time, you should expect most hot Pages to be resident, and for the functionality the user is currently using, you should expect Hot Pages to be resident. In rare cases, Cold Page is permanent.

To achieve this ideal sort, you need to collect a large number of profile data sets. First, collect hot Routines. As described above, compile the application for profiling, launch it, and then use the program. Using gprof -s, generate an order file from profile data sorted by call frequency called hot.order.

After creating the Hot Order File, create Order Files for features that users occasionally use, such as routines that run only when the application is started. Printing, opening documents, importing images, and using various non-document Windows and tools are other examples of features that users use occasionally but not continuously and are good candidates for having their own Order Files. It is recommended to name these Order files (for example, feature.order) after analyzing the feature.

Finally, to generate a list of all routines, build a “default” order file default.order (as described in the Reordering Procedures).

Once you have these Order files, you can combine them using the code shown in Listing 2. You can use this list to build a Command-line Utility that removes duplicate lines in order Files while preserving the order of the original data.

Listing 2 Code for Unique.c

//
// unique
//
// A command for combining files while removing
// duplicate lines of text. The order of other lines of text
// in the input files is preserved.
// unique is a command that merges files while deleting duplicate lines of text. The order of the other lines of text in the input file is preserved.
//
// Build using this command line:
//
// cc -ObjC -O -o unique -framework Foundation Unique.c
//
// Note that "unique" pegasus int 'l from the BSD command "uniq" in that
"Uniq" combines duplicate adjacent lines while "unique" does not
// Require duplicate lines to be adjacent. "unique" is also spelled
// correctly.
// Note that "unique" differs from the BSD command "uniq" in that
// "uniq" combines duplicate adjacent lines, while "unique" does not require that duplicate adjacent lines be spelled correctly.
 
#import <stdio.h>
#import <string.h>
#import <Foundation/NSSet.h>
#import <Foundation/NSData.h>
 
#define kBufferSize 8*1024
 
void ProcessFile(FILE *fp)
{
    char buf[ kBufferSize ];
 
    static id theSet = nil;
 
    if( theSet == nil )
    {
        theSet = [[NSMutableSet alloc] init];
    }
 
    while( fgets(buf, kBufferSize, fp) )
    {
        id dataForString;
 
        dataForString = [[NSData alloc] initWithBytes:buf length:strlen(buf)];
 
        if(! [theSet containsObject:dataForString] ) { [theSet addObject:dataForString];fputs(buf, stdout); } [dataForString release]; }}int main( int argc, char *argv[] )
{
    int     i;
    FILE *  theFile;
    int     status = 0;
 
    if( argc > 1 )
    {
        for( i = 1; i < argc; i++ )
        {
            if( theFile = fopen( argv[i], "r")) {ProcessFile( theFile );
                fclose( theFile );
            }
            else
            {
                fprintf( stderr, "Could not open" % s "\ n", argv[i] );
                status = 1;
                break; }}}else
    {
        ProcessFile( stdin );
    }
 
    return status;
}
Copy the code

Once built, you will use the program to generate the final Order file with the syntax as follows:

unique hot.order feature1.order ... featureN.order default.order > final.order
Copy the code

Of course, the real test of sorting is to reduce the amount of paging I/O. Run the application, use the different functions, and check the performance of the sorted files under different conditions. You can use the TOP tool (and other tools) to measure paging performance.

Finding That One Last Hot Routine

After reordering, there will usually be some Cold ROUTINES in a page area that are rarely used after text ordering has ended. However, one or two Hot routines may slip through the cracks and land in this cold area. This is an expensive mistake, because using one of the Hot ROUTINES now requires hosting the entire page, which is full of cold Routines that are unlikely to be used.

Check that the cold Pages of the executable are not accidentally paged. Find pages that are resident with high-page offsets in the cold region of the application’s text segment. If you have an unwanted page, you need to figure out how to call routine on that page. One approach is to profile the ROUTINES during a particular operation touching the page and use the grep tool to search the profile output for the ROUTINES residing on the page. Alternatively, a quick way to identify where a page is being touched is to run the application under the GDB debugger and use Mach Call VM_PROTECT to disable all access to the page:

(gdb) p vm_protect(task_self(), startpage_addr, vm_page_size, FALSE, 0);
Copy the code

After the page protection is cleared, any access to the page causes a memory error that interrupts the program in the debugger. At this point, you just need to look at the function call stack (using the BT command) to see why routine is called.

Reordering Other Sections

You can use linker’s -sectorder option to organize blocks in most sections of the executable. The sections that occasionally benefit from rearrangement are literal sections. For example, the __cstring section of __TEXT segment or the __DATA section of __DATA segment.

Reordering Literal Sections

Lines in the order file of literal sections can be generated most easily using ld and otool. For literal sections, otool creates an order file of a specific type for each type of literal section:

For C string literal sections, the order-file format is one text C string per line (ANSI C escape sequences are allowed in C strings). For example, the following line:

Hello world\n
Copy the code

For 4-byte literal sections, the order-file format is a 32-bit hexadecimal number leading by 0x, and the rest of the line is treated as a comment. For example, the following line:

0x3f8ccccd (1.10000002384185790000 e+00)
Copy the code

For 8-byte literal sections, the order file line consists of two 32-bit hexadecimal numbers on each line, separated by Spaces, each with a leading 0x, and the remaining lines are treated as comments. For example, the following line:

0x3ff00000 0x00000000 (1.00000000000000000000 e+00)
Copy the code

For literal Pointer sections, the format of the line in the order file represents a pointer, one per line. The text pointer is represented by the segment name, the section name of the text pointer, and the text itself. They are separated by colons with no extra Spaces. For example, the following line:

__OBJC:__selector_strs:new
Copy the code

For all literal sections, each line in the order file is simply entered into the Literal section and appears in the output file in order of the order file. Does not check to see if the text is in the loaded object.

To reorder the Literal section, first create a “Whatsloaded” File using the LD-Whatsloaded option, as described in Creating a Default Order File. Then, run otool with the appropriate options, segment and section names, and filenames. The output from otool is the default order file for the specified section. For example, the following command generates an order file that lists the default loading order for the __cstring section of the __TEXT segment in file cstring_ORDER:

otool -X -v -s __TEXT __cstring `cat whatsloaded` > cstring_order
Copy the code

Once the file cstring_ORDER is created, you can edit the file and rearrange its entries to optimize the placement of references. For example, you can place the most commonly used literal strings (such as labels that appear in the user interface) at the beginning of a file. To generate the desired load order in the executable, use the following command:

cc -o hello hello.o -sectorder __TEXT __cstring  cstring_order
Copy the code

Reordering Data Sections

Currently there is no tool to measure code references to Data Symbols. However, you probably know the program’s data reference pattern and can save some money by separating data for little-used features from other data. One way to implement __data section reordering is to sort the data by size, so that small data items end up on as few pages as possible. For example, if a larger data item is placed across two pages, and two smaller data items share each of those pages, the larger data item must be paginated to access the smaller data item. Reordering the data by size minimizes this inefficiency. Because this data typically needs to be written to the virtual-memory backing store, this can be a significant savings in some programs.

To reorder the __data section, first create an order file, List source Files and Symbols in the Order you want them linked. Then, link the program using the -sectorder command-line option:

Cc -o outputFile inputfile. o... -sectorder __DATA __data orderFile -e startCopy the code

To use the Order file in your Xcode project, change the “Other Linker Flags” option in the Deployment Build Style of your project. Add text-sectorder __DATA __DATA orderFile to this setting to specify your order file.

Reordering Assembly Language Code

Some additional guidelines to keep in mind when reordering routines written in assembly language:

temporary labels in assembly code

In manually coded assembly code, notice that branches to temporary tags branch on non-temporary tags. For example, if you use a label starting with “L” or a D label (where D is a number), as shown in this example

foo: b 1f. bar: ...1:...Copy the code

The resulting program will not be linked or executed correctly because only the Foo and bar symbols can put it into the symbol table of the object file. References to temporary label 1 are compiled as offsets; As a result, no relocation entry is generated for directive b 1f. If the linker does not place the block associated with the symbol bar directly after the block associated with foo, the branch of 1f will not reach the correct position. Because there is no relocation entry, the linker does not know how to fix the branch. The source code change that fixes this problem is to change tag 1 to a non-temporary tag (bar 1, for example). You can avoid problems with object files containing hand-coded assembly code by linking them into a whole without having to reorder them.

the pseudo-symbol .section_start

If the size of the section specified in any input file is non-zero and there is no symbol with its section starting value, the linker uses the pseudo-symbol.section_start as the symbol name associated with the first block in the section. The purpose of this symbol is to handle literal constants whose symbols do not persist in object files. Because literal strings and floating-point constants are in the literal part, this is not a problem for the Apple compiler. You may see this symbol used by assembly language programs or non-Apple compilers. However, instead of reordering the code, you should link the entire File without having to reorder it (see Linking with an Order File).

Reducing Shared Memory Pages

As outlined in the Overview of the Mach -o Executable Format, the data in the __DATA segment of the Mach-O binary file is writable and therefore shareable (by copy-on-write). When memory is low, writable data increases the number of pages that might need to be written to disk, which reduces paging performance. For frameworks, writable data is initially shared, but may be copied to the memory space of each process.

Reducing dynamic or non-constant data in executables can have a significant impact on performance, especially for frameworks. The following sections show you how to reduce the size of __DATA segment in executables. Thus reducing the number of shared memory pages.

Declaring Data as const

The easiest way to make __DATA segment smaller is to mark a wider range of data as constants. Most of the time, it’s easy to mark data as constant. For example, if you never modify an element in an array, you should include the const keyword in the array declaration, as follows:

const int fibonacci_table[8] = {1.1.2.3.5.8.13.21};
Copy the code

Remember to mark Pointers as constants (where appropriate). In the following example, the strings “a” and “b” are constants, but the array pointer foo is not:

static const char *foo[] = {"a"."b"};
foo[1] = "c";       // OK: foo[1] is not constant.
Copy the code

To mark the entire declaration as constant, you add the const keyword to the pointer to make it constant. In the following example, the array and its contents are constants:

static const char *const foo[] = {"a"."b"};
foo[1] = "c";       // NOT OK: foo[1] is constant.
Copy the code

Sometimes you may need to rewrite code to separate constant data. The following example contains a structured array in which only one field is declared const. Because the entire array is not declared const, it is stored in an __DATA segment.

const char *const imageNames[100] = { "FooImage"./*.. */ };
NSImage *imageInstances[100] = { nil, /*.. */ };
Copy the code

If an uninitialized data item contains a pointer, the compiler cannot store the item in an __TEXT segment. The string ends in the __cString section of the __TEXT segment, but the rest of the data item (including Pointers to strings) ends in the const section of the __DATA segment. In the following example, daythedule is split between __TEXT and __DATA segments, even though it is constant:

struct daytime {
    const int value;
    const char *const name;
};
 
const struct daytime daytimeTable[] ={{1."dawn"},
    {2."day"},
    {3."dusk"},
    {4."night"}};Copy the code

To put an entire array into an __TEXT segment, you must rewrite this structure to use a fixed-size char array instead of a string pointer, as shown in the following example:

struct daytime {
    const int value;
    const char name[6];
};
 
const struct daytime daytimeTable[] ={{1, {'d'.'a'.'w'.'n'.'\ 0'}},
    {2, {'d'.'a'.'y'.'\ 0'}},
    {3, {'d'.'u'.'s'.'k'.'\ 0'}},
    {4, {'n'.'i'.'g'.'h'.'t'.'\ 0'}}};Copy the code

Unfortunately, there is no good solution if the size of the string varies greatly, because that solution leaves a lot of unused space.

The array is split in two because the compiler always stores the constant string in the __cString section of the __TEXT segment. If the compiler stores the rest of the array in the __DATA section of the __DATA segment, the string and the pointer to the string may appear on different pages. If this happens, the system will have to update the pointer to the string with a new address, which it cannot do if the pointer is in a __TEXT segment because the __TEXT segment is marked read-only. So Pointers to strings and the rest of the array must be stored in the const section of __DATA segment. The __const section is reserved for data declared to be const. This data cannot be placed in __TEXT segments.

Initializing Static Data

As indicated in Overview of the Mach -o Executable Format, the compiler stores uninitialized static data in the __bSS section of the __DATA segment, And stores the initialized data in the __data section. If there is only a small amount of static data in the __bSS section, you may want to consider moving it to the __data section. Storing data in two different sections increases the number of memory pages used by the executable, increasing the likelihood of paging.

The purpose of merging __bSS and __data sections is to reduce the number of memory pages used by the application. There is no benefit to this technique if moving data to a __data section increases the number of memory pages in that section. In fact, adding pages to the __data section increases the time it takes to read and initialize data at startup.

Suppose you declare the following static variables:

static int x;
static short conv_table[128];
Copy the code

To move these variables to the __DATA section of the __DATA segment of the executable, change the definition to:

static int x = 0;
static short conv_table[128] = {0};
Copy the code

Avoiding Tentative-Definition Symbols

The compiler places any repeated symbols it encounters into the __common section of the __DATA segment (see Overview of the Mach -o Executable Format). The problem here is the same as with uninitialized static variables. If a large amount of global data for an executable is distributed in multiple sections, it is more likely that the data will be on different pages of memory; Therefore, pages may have to be exchanged separately. The goal of the __common section is the same as that of the __bSS section: if there is a small amount of data in the executable, it is removed from the executable.

The common source of a tentative-definition symbol is the definition of the symbol in the header file. Usually, the header declares a symbol, but does not include the definition of that symbol; Instead, you provide the definition in the implementation file. But definitions appearing in the header file cause code or data to appear in every implementation file that contains the header file. The solution to this problem is to ensure that the header file contains only declarations, not definitions.

For functions, you would obviously declare the prototype of the function in the header file and put the definition of the function in the implementation file. Similar operations should be performed for global variables and data structures. Instead of defining a variable in a header file, define it in an implementation file and initialize it appropriately. The variable is then declared in the header file, preceded by the extern keyword. This technique localizes the variable definition to a file, while still allowing the variable to be accessed from other files.

If you accidentally import the same file twice, you can also obtain fari-definition symbols. To ensure that you do not do this, include preprocessor instructions to disallow the inclusion of included files. So, in the header file, you’ll have the following code:

#ifndef MYHEADER_H
#define MYHEADER_H
// Your header file declarations. . .
#endif
Copy the code

Then, if you want to include the header file, include it as follows:

#ifndef MYHEADER_H
#include "MyHeader.h"
#endif
Copy the code

Analyzing Mach-O Executables

You can use a variety of tools to determine how much memory is occupied by very large amounts of data. These tools report on all aspects of data use.

At application or framework runtime, use the size and Pagestuff tools to see how large the various data sections are and what symbols they contain. You need to search for the following items:

To find an executable that contains a large amount of very large data, examine a file with a large __DATA sections in the __DATA segment.
Check __bSS and __common sections for variables and symbols that can be deleted or moved to the __data section.
To find data that the compiler cannot treat as a constant even though it is declared as a constant, check for the presence of an executable file or an object file with a __const section in the __DATA segment.

Some of the larger memory consumers in __DATA segments are fixed-size global arrays that are initialized but not declared constants. These tables can sometimes be found by searching for “[]={” in the source code.

You can also ask the compiler to help you find which arrays can be set as constants. Place const in front of any initialized arrays you suspect to be read-only, and recompile. If an array is not truly read-only, it will not compile. Delete the problematic constants and try again.

Minimizing Your Exported Symbols

If your application or framework has a public interface, you should limit the exported symbols to those required by the interface. Exported symbols take up as little space as possible in the executable file. This not only reduces the size of the executable, but also reduces the workload of the Dynamic linker.

By default, Xcode exports all symbols from the project. You can use the following information to identify and remove symbols that you do not want exported.

Identifying Exported Symbols

To view symbols exported by your application, use the NM tool. This tool reads the executable’s symbol table and displays the symbol information you requested. You can view all symbols or just symbols in a specific section of executable code. For example, to display only externally available global symbols, you can specify the -g option on the command line.

To see detailed symbol information, run nm with the -m option. The output from this option tells you the type of symbol and whether it is external or local (not external). For example, to view detailed symbol information for a TextEdit application, use nm, as follows:

%cd /Applications/TextEdit.app/Contents/MacOS
% nm -m TextEdit
Copy the code

A portion of the result output might look like this:

9005cea4 (prebound undefined [lazy bound]) external _abort (from libSystem)
9000a5c0 (prebound undefined [lazy bound]) external _atexit (from libSystem)
90009380 (prebound undefined [lazy bound]) external _calloc (from libSystem)
00018d14 (__DATA,__common) [referenced dynamically] external _catch_exception_raise
00018d18 (__DATA,__common) [referenced dynamically] external _catch_exception_raise_state
00018d1c (__DATA,__common) [referenced dynamically] external _catch_exception_raise_state_identity
Copy the code

In this mode, NM displays various information based on symbols. For functions and other code residing in __TEXT segments, NM displays pre-binding information and source libraries. For information in __DATA segment, nm displays the specific section of the symbol and its links. For all symbols, nm shows whether the symbol is external or local.

Limiting Your Exported Symbols

If you know the symbols to export from your project, you should create an export file and add it to your project’s linker Settings. The export file is a plain text file that contains the names of symbols to be made available to external callers. Each symbol must be listed on a separate line. Leading and trailing Spaces are not considered part of the symbol name. Lines starting with the # symbol are ignored.

To include export files in your Xcode project, modify the target or build-style Settings of your project. Set the value set to “Exported symbols file” to the name of the Exported file. Xcode passes the appropriate options to the static linker.

To export the symbol list from the command line, add the -exported_symbols_list option to the linker command. It is also possible to export all symbols and then restrict a specific list instead of exporting a specific list of symbols. To restrict a particular symbol list, use the -unexported_symbols_list option in the linker command.

Note that symbols exported by the runtime library must be explicitly included in the export file for the application to launch correctly. To collect a list of these symbols, link the code without exporting the file, and then execute the nm-m command from the terminal. From the generated output, collect all symbols that are marked as external and not part of the code, and add them to the export file.

Exports Using GCC 4.0

GCC4.0 supports custom visibility attributes for individual symbols. In addition, the compiler provides compile-time flags that allow you to set default visibility for all symbols of compiled files.

For information about using GCC 4’s new Symbol Visibility features, see “Controlling Symbol Visibility” in C++ Runtime Environment Programming Guide.

Refer to the link

Reference link :🔗

Code Size Performance Guidelines
Memory Usage Performance Guidelines