In recent years, with the development of the Internet of everything technology, Mesh technology has gradually emerged. As a networking technology, Mesh technology can combine multiple access points into a network to provide services. Compared with the traditional WiFi networking technology, Mesh networking is more stable, faster, and has stronger scalability. WiFi Mesh also plays an important role in the future interconnection of everything with its self-organization, self-management and self-healing characteristics.

Aiming at the emerging scene of WiFi Mesh, Baidu Security shared the topic BadMesher online at Black Hat Europe 2021. A New Attack Surfaces of Wi-Fi Mesh Network was discussed. MeshFuzzer, an automated vulnerability mining tool, was designed and implemented, and its application in vulnerability mining was demonstrated.

Issues in interpretation

The basic concept

EasyMesh concept

EasyMesh is a standardized certification scheme launched by the WiFi Alliance, which has gone through three stages of development:

FIG. 1 EasyMesh development process

In 2018, Mesh technology was implemented independently by vendors, and there was no unified standard. Therefore, devices from different vendors could not communicate with each other.

In 2019, the WiFi Alliance launched EasyMesh V1, which introduced the Onboarding process and auto-config process and used the 1905 control protocol to implement most control functions of the Mesh.

In 2020, the WiFi Alliance launched EasyMesh V2 and V3, which enriched more control features, especially security features, adding authorization and integrity check for control messages.

At present, dozens of manufacturers have passed EasyMesh certification, including Mediatek, Huawei, ZTE and so on.

EasyMesh architecture

EasyMesh’s architecture is shown in Figure 2, which contains two key links and two key roles.

Figure 2 EasyMesh architecture diagram

The key link

1. Fronthaul link: refers to the exposed WiFi link, that is, the SSID that our mobile phones can connect to normally

2. Backhual link: refers to the hidden WiFi link, that is, the SSID that cannot be searched, and the link specially provided for the Mesh

The key role

1. Controller role: the manager of the Mesh network, who can send control instructions to the Agent to manage the Mesh network and achieve self-organization, self-management and self-healing

2. Agent role: the executor of the Mesh network, executes tasks by accepting the Controller’s control instructions and feedback the execution results to the Controller

The role here is not specific to the device, but is a logical entity. A device can act as either Controller or Agent, or as both Contrller and Agent.

Mesh network construction process

The whole Mesh network construction process is divided into the following two steps:

1, Onboarding

2. Discovery and Configuration

The Onboarding process

The Onboarding process is to help a device that is not enrolled in the Mesh join the Mesh, and we call those Enrollee devices, The entire process is implemented through the 1905 Push Button Configueration protocol (1905 PBC), which contains the following three features:

1. Feature 1: Both parties entering the network need to push button

Feature 2: Implemented based on WiFi Protected Setup

Feature 3: Based on TLV

As can be seen from Figure 3, the 1905 PBC was specially marked in the Multi-AP Extension section, that is, the SSID of the Backhaul was marked. Therefore, the Entollee device can obtain the access credentials of the Mesh link through the 1905 PBC.

Figure 3 – AP Extension

The entire Onboarding process is shown in Figure 4:

FIG. 4 Onboarding process

First, Push Button the two devices to make them enter the distribution network state.

Secondly Enrollee devices interact with Fronthaul SSID through 1905 PBC, and after the M1-M8 process, eventually the Existing agents return the Backhual SSID and password to the Enrollee devices. Enrollee devices then connect to the Backhaul SSID and join the Mesh network.

The Onboarding process is now complete.

Discovery and Configuration

The overall process is shown in Figure 5:

Figure 5 Discovery and Configuration flows

After completing the Onborading process, Enrollee devices need to find controllers in the Mesh to obtain the basic configuration of the current Mesh, in this case using the IEEE1905.1a control protocol, Enrollee devices detect the presence of Controllers through an ‘AP Autoconfig Search’ broadcast package, which responds with an ‘AP Autoconfig Response’ if one is present on the network. Enrollee devices successfully found Controller, and so far, the Discovery process is complete.

The Configuration process is to synchronize the current Mesh Configuration information to Enrollee devices, such as user names and passwords for the Mesh, the selection of communication channels, and maintenance parameters for network stability. Enrollee devices access the basic Configuration of the Mesh network, and the Enrollee devices can truly join the Mesh family as agents, so that the entire Mesh network is completed.

Control process of Mesh network

The maintenance and management of a Mesh network is an important project. Internet Explorer 1905.1 A is a protocol between the physical layer and the network layer. It defines the wired or wireless control technology in the home network. In the Mesh scenario, IEEE1905.1a is the carrier and provides multiple control protocols, such as device discovery, device configuration, and device management. The whole implementation is based on Type-Length-value. Some EasyMesh control protocols are shown in Table 1:

Message type Protocol Value
1905 Topology Notification message STA capability 0x0001
Multi-AP Policy Config Request message Multi-AP configuration 0x8003
Unassociated STA Link Metrics Response message Link metric collection 0x8010
Backhaul Steering Request message Backhaul optimizatio 0x8019
Client Disassociation Stats message Data Element 0x8022
.

Table 1 Part of EasyMesh control protocol

Here, select “multi-AP Policy Config Request Message” as an example. It can be seen that the corresponding command word in Figure 6 is 0x8003, and the specific Streeing Policy satisfies the basic TLV. In Figure 6, Type is 0x89. Len is 21, and value is the payload.

Figure 6 Multi-AP Policy Config Message

Attack surface analysis

After analyzing the networking and control process of the whole Mesh network, we take a look at the actual attack surface. The carrier of the attack is two key protocols:

1. 1905 Push Button Configuration Protocol

2, IEEE 1905.1 A Control Protocol

There are two key attack surfaces:

1. Attack the network construction process

2. Attack the network control process

Attack the Mesh construction process

Against the Existing Agent

Attackers: “Bad” Enrollee Agent

Victim: Exixting Agent

Attack carrier: 1905 Push Button Configuration Protocol (M1, M3, M5, M7)

The entire attack flow is shown in Figure 7

Figure 7 Attacking Existing Agents

Enrollee attackers build malicious Enrollee devices to attack Existing Agents, which are based on 1905 PBC that send malformed M1, M3, M5 and M7 packets, triggering TLV parsing vulnerabilities in Existing agents in M1, M3, M5 and M7.

Attack the Enrollee Agent

Attacker: “Bad” Existing Agent

Victims: Enrollee Agent

Attack carrier: 1905 Push Button Configuration Protocol (M2, M4, M6, M8)

The entire attack flow is shown in Figure 8

Figure 8 Attacks the Enrollee Agent

Attackers construct malicious Existing Agent devices to attack Enrollee devices, specifically based on 1905 PBC that respond to malformed M2, M4, M6 and M8 packages to carry out attacks, which can trigger the Enrollee devices in THE M2, M4, M6 and M8 TLV resolution vulnerabilities.

Attack the control process of the Mesh network

After analyzing the attack surface constructed by the Mesh, look at the attack surface controlled by the Mesh network.

Attacker: “Bad” Existing Agent

Victims: Controllers and other Existing agents

Attack carrier: IEEE 1905.1 A Control Protocol

An attacker can send a malformed 1905 packet to trigger the 1905 TLV parsing vulnerability in controllers and Existing agents. Figure 9 shows the malicious package we designed for “AP_AUTOCONFIGURATION_WSC_MESSAGE”. We filled the len part of the SSID with 0xFF, while the longest SSID in reality was 64, and filled all the payload part of the SSID with 0xFF. As can be seen from the actual data packets obtained in FIG. 10, the actual SSID was full of the payload filled with 0xFF. This is not in line with the expectations of SSID analysis.

Figure 9 simulates sending malformed IEEE 1905.1a control packets

Figure 10. Actual IEEE 1905.1a control package

The automation tool MeshFuzzer

MeshFuzzer architecture

Our Meshfuzzer consists of two Fuzzing subsystems, Fuzzing for 1905 PBC and Fuzzing for 1905.1a. The overall architecture is shown in Figure 11.

Figure 11 MeshFuzzer architecture

The upper part is the Fuzzing subsystem designed by us for 1905 PBC. We use the WPS interaction data between the actual devices as input, through our TLV variation system, and finally use our 802.1 packet sender to send packets. Meanwhile, serial port connection is made to the device to monitor the state of crash in real time.

The second part is the Fuzzing subsystem designed by us for IEEE 1905.1a. We have realized most of the control protocol fields in EasyMesh, and also through our TLV variation system, we finally use our 1905 packet sender to send packets, and monitor the state of crash through unique 1905 packets.

Mutation strategy

Since the two target protocols are realized based on TLV, we can use a unified mutation strategy to assist Fuzzing efficiently.

Mutation strategy 1: Mutation length field, which triggers some common memory destruction vulnerabilities of TLV parsing through excessively long or too short length. For example, excessively short length will lead to out of bounds read, or integer overflow, and excessively long will lead to out of bounds write, etc. Figure 12 shows the effect of changing the length field to too short in our actual test.

Mutation strategy 2: Randomly add, delete and modify the existing TLV blocks, which may lead to memory destruction-related logic vulnerabilities, such as double-free and UAF, etc. Figure 13 shows the effect of randomly adding TLV blocks in our actual test.

Figure 12 too short length field

Figure 13. Random increment of TLV blocks

Fuzzing network building process

Hardware and software selection

Hardware: Choose Ubuntu or Raspberry PI 4 to send packets with the wireless USB network card.

Software: We chose to modify wPA_supplicant to customize our Fuzzer because wPA_supplicant itself supported the 1905 PBC protocol, so we could add our mutation strategy at different stages of the supplicant. It can realize the Fuzzing work in Mesh network construction stage efficiently and stably.

Figure 14 WPA_supplicant implementation code

Actual Fuzzing Existing Agent

With the above customized Fuzzing tool, we can simulate the whole 1905 PBC process and inject Fuzzing Payload into phases M1, M3, M5 and M7. Figure 15 shows the crash log of the overbounds write vulnerability caused by TLV parsing in phase M7 captured in the Fuzzing process. Figure 16 is the actual packet we captured.

Figure 15. Out-of-bounds write problem at M7 stage

Figure 16. M7 phase writes actual packets out of bounds

The way we monitor crashes is by Ping the target device and capturing crash logs through the serial port in real time.

Actual Fuzzing “Existing” Agent

Another victim of the Network construction process is the Enrollee who does not enroll in the Network, so we model a malicious “Existing” Agent to fuzzing the Enrollee. Here, to ensure Enrollee continues to join the Mesh, we write a script, as shown in Figure 17.

Figure 17 Enrollee keeps joining the Mesh script

We injected Fuzzing Payload in M2, M4, M6, and M8 phases. Figure 18 shows the out-of-bounds write vulnerability caused by TLV parsing in M6 phase triggered by our Fuzzing process. Figure 19 is the actual packet we captured.

Figure 18. Out-of-bounds write problem at M8 stage

Figure 19. M8 phase writes actual packets out of bounds

The way we monitor crashes here is still by Ping the target device and capturing crash logs in real time through the serial port.

Fuzzing network control process

Hardware and software selection

Hardware part: Macbook Pro is selected, because Macbook Pro can better support the sending of 1905 packets.

Software part: The existing open source library Pyieee1905 is selected, so we can develop custom protocol fields based on Pyieee1905, which will greatly reduce the development workload of Fuzzer. We only need to implement the control protocol in EasyMesh to carry out Fuzzing test on the network control part.

Figure 20 pyieee1905

Monitoring module

Since most of the 1905 processing modules are independent processes, we cannot directly capture crashes through the serial port, nor can we monitor the running status of the 1905 process by sending Ping packets to the device. Here, we select the 1905 Topology Query Message provided by EasyMesh. This packet is used to detect mutual support between the 1905 processes on the device, so it is easy to know whether the 1905 process on the device is alive or working properly by whether the device replies to the packet.

Figure 21 Topology Query Message

Every time we send a Fuzzing Payload, we send a 1905 Topology Query. If we get a reply, the 1905 Daemon is working properly. If we don’t get a reply, the 1905 Daemon might have a problem. At this point, we record the Fuzzing Payload sent this time and save it locally and wait for the process to restart.

Figure 22 1905 crash monitoring and saving

Figure 23 actual crash

The actual effect

We used MeshFuzzer to find several memory corruption vulnerabilities caused by TLV resolution in Mediatek MT7915’s EasyMesh solution, and found one security problem that violated the security design guidelines. A total of 19 Cves were obtained, as shown in Figure 24. Mediatek has now fixed all the issues and exported security patches.

Figure 24 MT7915 security issues

Safety recommendations

To deal with memory corruption vulnerabilities caused by TLV parsing, it is recommended to parse the packets in their entirety, check their type and length one by one, and discard the packets when the length and type check fail.

A good example is wPA_supplicant, which is shown in Figure 25 for handling TLV packets, following the process of parsing -> distributing -> verifying -> processing.

Figure 25. Example of proper TLV processing

The EasyMesh V3 standard contains a section describing the security capabilities of the 1905 protocol in response to safety design violations. For example, to isolate the Backhaul and FrontHaul links, you need to verify message integrity and encrypt 1905 packets. It is recommended that vendors comply with the EasyMesh standard to achieve the security capability of the 1905 protocol.

conclusion

The whole issue is summarized as follows:

1. We found multiple security attack surfaces in the WiFi Mesh. Attackers can launch attacks on devices in the Mesh network during Mesh construction and network control stages.

2. We developed an automated vulnerability mining tool MeshFuzzer, which can automatically dig security vulnerabilities introduced by manufacturers when implementing EasyMesh;

3. In practice, we found many security problems in EasyMesh solution of MT7915 chip, obtained 19 CVES, and provided corresponding repair suggestions.

Click to enter for more technical information ~~