The article directories

Apple framework learning (2) Metal
Introduction of the Metal
1. Essentials
- 1.1 Basic tasks and concepts
- 1.2 Migrating OpenGL Code to Metal
- 1.3 Port your Metal code to Apple Arm chips
2. GPUs
- 2.1 Obtaining a Default GPU
- 2.2 GPU selection in macOS
- 2.3 protocol MTLDevice
- 2.4 the GPU features
3. Command Setup
- 3.1 Establishing a command structure
- 3.2 Prepare your Metal application to run in the background
- 3.3 protocol MTLCommandQueue
3.4 protocol MTLCommandBuffer
3.5 protocol MTLCommandEncoder
3.6 Counter Sampling
4. Parallel computing
5. Ray tracking
6. Apply colours to a drawing
7. Presentation
8. Shaders
9. Resources
10. Object sizing and positioning
11. Fill vectors and matrices
12. Time
13. Tools
GPU programming technology
15. Reference
16. Related Documentation

Apple framework learning (2) Metal

Introduction of the Metal

Metal is a graphics processor that renders advanced 3D graphics and performs data parallel computing.

Graphics processors (Gpus) are designed to rapidly render graphics and perform data parallel computation. Use the Metal framework when you need to directly communicate with the Gpus available on the device. Applications that render complex scenes or perform advanced scientific calculations can take advantage of this capability for maximum performance. These applications include:

Games that render complex 3D environments
Video processing applications such as Final Cut Pro
Data processing applications, such as those used to conduct scientific research

Metal works hand in hand with other frameworks that complement its functionality. Using MetalKit simplifies the task of displaying Metal content on the screen. Use the Metal Performance shader to implement custom rendering functions or to leverage a large library of existing functions.

Many advanced Apple frameworks are built on top of Metal to take advantage of its performance, including Core Image, SpriteKit, and SceneKit. Using one of these advanced frameworks can shield you from the details of GPU programming, but writing custom Metal code can give you the highest level of performance.

1. Essentials

1.1 Basic tasks and concepts

Basic tasks and concepts Familiarize yourself with Metal through a series of sample code projects

1.2 Migrating OpenGL Code to Metal

Migrate OpenGL code to Metal replace deprecated OpenGL code in your application with Metal.

1.3 Port your Metal code to Apple Arm chips

Port your Metal code to apple Arm chips to create a version of Metal application that runs on Apple Silicon and Intel Macs.

2. GPUs

Access GPU devices at run time. The graphics processor is the basis of Metal development.

2.1 Obtaining a Default GPU

Get default GPU Select the system default GPU device on which you want to run Metal code.

To use the Metal framework, you first need to get a GPU device. All the objects that an application needs to interact with Metal come from an MTLDevice acquired at run time. IOS and tvOS device only one GPU can by calling MTLCreateSystemDefaultDevice () to access:

2.2 GPU selection in macOS

GPU selection in macOS Select one or more Gpus to run your Metal code by considering GPU capabilities, power, or performance characteristics.

2.3 protocol MTLDevice

Protocol MTLDevice Metal interface of a GPU for graphics drawing or parallel computing.

The MTLDevice protocol defines the interface to the GPU. You can query the unique functionality that MTLDevice provides for your Metal application and issue all Metal commands using MTLDevice. Don’t enforce this agreement yourself; On the contrary, in the iOS or tvOS, at run time using MTLCreateSystemDefaultDevice () from the system request GPU; Used in macOS MTLCopyAllDevicesWithObserver (handler) for a list of available MTLDevice object. For a complete discussion of choosing the right GPU(s), see Getting the Default GPU.

The MTLDevice object is the Go-to object that performs any operation in Metal, so all Metal objects the application interacts with come from the MTLDevice instance acquired at runtime. Mtldevice creates objects that are expensive but persistent; Many of these objects are designed to be initialized once and reused throughout the life of the application. However, these objects are specific to the MTLDevice that emitted them. If you are using multiple MTLDevice instances or want to switch from one MTLDevice to another, you need to create a separate set of objects for each MTLDevice.

2.4 the GPU features

GPU Feature Queries feature information of a specific GPU family.

Metal Feature Sets

Use the Metal Feature Setting table to find feature availability based on Metal software version and GPU family. The availability of features in Metal is determined by the combination of the Metal software versions supported by the GPU and the family feature set. The Metal feature set table provides functionality availability, specific numerical limits, and pixel format support for different GPU families.

Apple GPU Families Understand the GPU family 4 Understands the features of the A11, including raster order groups, tiled shaders, and image blocks.

GPU Family 4 describes new features and performance improvements brought about by the A11 chip and the Apple-designed Graphics processing Unit (GPU) architecture.

Gpus in iOS and tvOS devices implement a rendering technique called Tile-based Delayed Rendering (TBDR) to optimize performance and power consumption. In traditional instant mode (IM) renderers, when a triangle is submitted to the GPU for processing, it is immediately rendered into device memory. Even if the triangle is obscured by other primitives later submitted to the GPU, the triangle is rasterized and processed by fragment functions.

Tile-based delayed rendering TBDR makes some important changes to the IM architecture to process the scene after all primitives have been committed. The screen is divided into small pieces and processed separately. All geometry intersecting blocks are processed simultaneously, and occlusion fragments are discarded before the rasterization and fragment coloring phases. Blocks are rendered to fast local memory on the GPU and written to device memory only after rendering is complete.

TBDR allows vertex and fragment phases to run asynchronously — providing a significant performance improvement over IM. When the fragment phase of the render channel is run, the hardware executes the vertex phase of the future render channel in parallel. The vertex phase typically makes heavy use of fixed-function hardware, while the fragment phase uses math and bandwidth. Full overlap they allow the device to use all the hardware blocks on the GPU at the same time.

The tiled memory used by TBDR has three important characteristics. First, the bandwidth between the shader core and tile memory is many times higher than the bandwidth between the GPU and device memory, and is proportional to the number of shader cores. Second, the memory access latency for accessing tile memory is many times lower than the latency for accessing device memory. Finally, tiled memory consumes less energy than device memory.

On devices based on A7 through A10, Metal does not explicitly describe this tile-based architecture; Instead, you use it to provide hints for the underlying implementation. For example, load and store operations control what data is loaded into local memory and what is written to device memory. Similarly, the no-memory buffer specifies the per-pixel intermediate data to be used only during rendering; In practice, this data is stored in a tile in the GPU’s fast local memory.

Metal2 on A11 GPU

In the A11, the Apple-designed GPU offers several significant enhancements to TBDR. These features are provided through additional Metal 2 apis and allow your applications and games to achieve new levels of performance and functionality.

These features include ImageBlock, tiling shadows, raster sequence groups, and ImageBlock sample coverage control. Metal 2 on the A11 GPU also improves the performance of fragmentation discarding.

Taken together, these features provide better control over memory layout and access to data stored in tiles, as well as more fine-grained synchronization to keep more work on the GPU. The end result is that you can perform more extensive calculations in a single render pass than ever before, and keep the calculations in fast local memory.

Metal 2 on A11 also simplifies the implementation of technologies such as subsurface scattering, sequence-independent transparency, and tile-based lighting algorithms.

3. Command Setup

Build the infrastructure to execute your custom code on the GPU.

3.1 Establishing a command structure

Build a command structure to understand how Metal executes commands on the GPU.

3.2 Prepare your Metal application to run in the background

Prepare your Metal application to run in the background by pausing future GPU use and making sure prior work is scheduled to prepare your application to move into the background

3.3 protocol MTLCommandQueue

MTLCommandQueue A queue that executes the organization command buffer for the GPU.

3.4 protocol MTLCommandBuffer

The MTLCommandBuffer is a container for the GPU to store the encoding commands to be executed.

3.5 protocol MTLCommandEncoder

MTLCommandEncoder An encoder that writes GPU commands to command buffers.

3.6 Counter Sampling

Counter sampling retrieves information about how the GPU executes your commands.

4. Parallel computing

Process arbitrary calculations in parallel on the GPU.

Processing a Texture in a Compute Function

Perform data-parallel computations on texture data.

Creating Threads and Threadgroups

Learn how Metal organizes compute-processing workloads.

Calculating Threadgroup and Grid Sizes

Calculate the optimum sizes for threadgroups and grids when dispatching compute-processing workloads.

class MTLComputePipelineDescriptor

An object used to customize how a new compute pipeline state object is compiled.

protocol MTLComputePipelineState

An object that contains a compiled compute pipeline.

class MTLComputePassDescriptor

A configuration for a compute pass, used to create a compute command encoder.

protocol MTLComputeCommandEncoder

An object used to encode commands in a compute pass.

5. Ray tracking

Accelerating Ray Tracing Using Metal

Implement ray-traced rendering using GPU-based parallel processing

protocol MTLAccelerationStructure

A collection of model data, organized to allow for GPU-accelerated intersection of rays with the model.

class MTLAccelerationStructureDescriptor

A base class for classes that define the configuration for a new acceleraton structure.

class MTLAccelerationStructureGeometryDescriptor A base class for descriptors that contain geometry data to convert into a ray-tracing acceleration structure.

class MTLAccelerationStructureBoundingBoxGeometryDescriptor

A description of a list of bounding boxes to turn into an acceleration structure.

class MTLAccelerationStructureTriangleGeometryDescriptor A description of a list of triangle primitives to turn into an acceleration structure.

class MTLPrimitiveAccelerationStructureDescriptor

A description of an acceleration structure that contains geometry primitives.

class MTLInstanceAccelerationStructureDescriptor A description of an acceleration structure built from instances of primitive acceleration structures.

struct MTLAccelerationStructureInstanceDescriptor

A description of an instance in an instanced geometry acceleration structure.

protocol MTLIntersectionFunctionTable

A table of visible functions that Metal calls to perform ray-tracing intersection tests.

class MTLIntersectionFunctionTableDescriptor

A description that describes how to create an intersection function table.

class MTLIntersectionFunctionDescriptor

A description of a visible function that performs an intersection test.

protocol MTLAccelerationStructureCommandEncoder

A object used to encode commands that build or refit acceleration structures.

Due to the limited space of this article, we can only introduce some current work achievements and thinking

Metal also has some new directions to explore. If you are interested in the basic principles of iOS, architecture design, system construction and how to interview, you can also send me a message to get the latest information and interview related information. If you have any comments and suggestions welcome to give me a message!

Those who like iOS can follow me and learn and communicate with me!!

* * the original links: blog.csdn.net/kyl28288954… 支那

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Metal (part 1)

The article directories

Apple framework learning (2) Metal

Introduction of the Metal

1. Essentials

1.1 Basic tasks and concepts

1.2 Migrating OpenGL Code to Metal

1.3 Port your Metal code to Apple Arm chips

2. GPUs

2.1 Obtaining a Default GPU

2.2 GPU selection in macOS

2.3 protocol MTLDevice

2.4 the GPU features

3. Command Setup

3.1 Establishing a command structure

3.2 Prepare your Metal application to run in the background

3.3 protocol MTLCommandQueue

3.4 protocol MTLCommandBuffer

3.5 protocol MTLCommandEncoder

3.6 Counter Sampling

4. Parallel computing

5. Ray tracking

Metal (part 1)

The article directories

Apple framework learning (2) Metal

Introduction of the Metal

1. Essentials

1.1 Basic tasks and concepts

1.2 Migrating OpenGL Code to Metal

1.3 Port your Metal code to Apple Arm chips

2. GPUs

2.1 Obtaining a Default GPU

2.2 GPU selection in macOS

2.3 protocol MTLDevice

2.4 the GPU features

3. Command Setup

3.1 Establishing a command structure

3.2 Prepare your Metal application to run in the background

3.3 protocol MTLCommandQueue

3.4 protocol MTLCommandBuffer

3.5 protocol MTLCommandEncoder

3.6 Counter Sampling

4. Parallel computing

5. Ray tracking

Related Posts

A trip to Facebook, a new distance

IOS objc_msgSend Message forwarding mechanism

RxSwift source code analysis (2)-Observable and AnonymousObservableSink parsing