This series of articles is a comprehensive translation and study of MetalKit content on Metalkit.org.

MetalKit System articles directory


Announcing Metal 2.2 at WWDC 2019 last week, Apple released the numbers:

  • Metal can now make 100 times more draw calls than OpenGL.
  • Metal is currently running on about 1.4 billion devices.
  • Metal can drive single-precision calculations up to 56 TFLOPS.

Note: To get 56 TFLOPS, you need a new Mac Pro with dual Vega II Duo (4 GPU). The Radeon Pro Vega II Duo is currently the most powerful GPU in the world, capable of delivering FP32 accuracy of 28.3 Tflops. This GPU is only available for Mac Pro and uses Infinity Fabric Link to boost internal transfer between dual Gpus to 48 GB/SEC.

Metal Shading Language is now 2.2 and API version 3. You can now check the Metal version in Xcode 11 by looking at the MTLSoftwareVersion enumeration, which is a device property:

Well, let’s look at some important additions to the Metal framework this year.

1. IOS emulators now support Metal

Most frameworks are now Metal accelerated: UIKit, SpriteKit, SceneKit, Core Animation, Core Image, MapKit, etc. The simulator works on A8 GPU and higher. You can even run two emulators on two different targets at the same time:

IOS Metal instructions are converted to macOS Metal instructions, so you can benefit from Mac’s underlying GPU hardware. From the emulator menu, you can select the macOS GPU you want to use:

Metal performance in the emulator is still lower than that of the real device, so the production code should eventually be analyzed and optimized on the device. Another thing to keep in mind when using the emulator is that texture storage on the emulator needs to always be in private mode. However, it is easy to cover both cases. Create a temporary shared buffer while the texture is in the emulator, initialize the texture into that buffer, and then blit it to a private texture:

#if targetEnvironment(simulator) 
textureDescriptor.storageMode = .private 
#else 
textureDescriptor.storageMode = .shared 
#endif 

let texture = device.makeTexture(descriptor: textureDescriptor)! 
if texture.storageMode = = .private { 
    let tmpBuffer = device.makeBuffer(length: textureSize, 
				      options: .storageModeShared)! 
    initWithTextureData(buffer: tmpBuffer) 
    blitData(fromBuffer: tmpBuffer, toTexture: texture) 
} else { 
    initWithTextureData(texture: texture) 
} 
Copy the code

2. Simplify the GPU family

The new Metal Feature Set Tables documentation has also been updated to version 3, which replaces the old Feature Set in the new GPU family family, as shown below:

  • The Apple Family refers to all Apple-designed Gpus (A-series Gpus).

  • Mac Family family refers to all macOS Gpus (Intel, AMD, Nvidia) :

  • The Common Family refers to all devices and platforms:

  • IPad apps for the Mac family refer to iPadOS apps that run on macOS:

To determine if Mac 2 Series features are available:

if #available(macOS 10.15.iOS 13.tvOS 13.*) { 
    if self.device.supportsVersion(.version3_0) { 
        if self.device.supportsFamily(.familyMac2) { 
            // enable Metal 3 features for the Mac family 2 }}else { 
        // enable Metal 2 features (fallback)}}else { 
    if self.device.supportsFeatureSet(.featureSet_macOS_GPUFamily2_v1) { 
        // enable Metal 2 features (fallback)}}Copy the code

Here are some of the most common technologies and their family lines of support:

features Family series
Deferred shading All the series
2. Programmable Blending Apple 1 and update
Tile deferred/forward (Tile deferred/forward) Common 2 and updates
Tile shading Apple 3 and newer
Visibility Buffer Mac 1 and update
Argument buffers All the series
Indirect Command buffers A second Common updates

3. Ray tracking and calculation

Ray tracing in Metal really started to get interesting last year when the Metal Performance Shader (MPS) API for ray tracing was rolled out, and the calculation of ray triangle intersections was moved to the GPU. This year, two other expensive and important stages have also moved to gpus: acceleration structure and image denoising.

Acceleration structureThrough the calledrefittingInstead of rebuilding the accelerated structure from scratch, the bounding box is moved to where the geometry moves, saving valuable processing time. Now all on GPU: Image denoisingThe noise reduction filter based on image processing is cleverly used. The idea behind this is that each frame stores normals and depth information and then compares them to the next frame to see if certain pixels are invalid. This invalidity can occur if the object moves to a different position or another object blocks it.

The new MPSSVGF class is implementedSpatiotemporal Variance-Guided FilteringNoise reduction algorithm. Now, noise reduction on gpus is 1000 times faster than on cpus.

Metal last year introduced indirect computation buffering (ICB), a way to reduce CPU overhead and simplify command execution by reusing commands. But it only works for rendering. This year MTLIndirectComputeCommand join MTLIndirectRenderCommand together as the encoding type on the ICB.

4. Debugging and analysis tools

The GPU frame capture tool now has a Metal Memory Viewer that allows you to examine textures, buffers and heaps. This tool provides detailed information about the storage mode, type, and size:

For analysis, the Instruments tool now has the Metal Resource Allocations tool, which allows you to check storage locations and provide information about Resource utilization and status at each device, display, or shader compiler:

5. Other new Metal 3 features

New features for iOS and tvOS:

  • Set the pipeline state on Indirect Command Buffers
  • Obtain the scope of Indirect Command Buffers from Buffers
  • 16 bit depth texture

New macOS features:

  • Rendering without render pass attachments
  • Command Buffer Timing
  • Convert between sRGB and non-SRGB texture views

Other new features:

  • Heaps Support Developer Driven Placement
  • The heap can keep track of resources
  • Relax macOS Blit alignment rules to match Apple Gpus
  • Improving resource use
  • Predefined behavior for texture access
  • Texture custom swizzle
  • Texture sharing across processes
  • IOS texture bindings increased
  • IOS change limits increased
  • ASTC 3D supports the latest Apple Gpus
  • 3D BC textures support all Mac Gpus
  • The visibility buffer (also known as occlusion Query) size was increased to 256K
  • MSL new attribute[[primitive_id]]and[[barycentric_coord]]

For a complete list of new features, see the Metal API documentation site. The new source code is also available on Apple’s website.

See you next time.