In April 2021, researchers conducted an in-depth analysis of Cobalt Strike penetration testing and how some of its signature circumvention techniques failed under detection. In this article, we’ll take a closer look at Metasploit, a common framework that can interoperate with Cobalt Strike.

In this article, we will cover the following topics:

Import resolution of ShellCode — how Metasploit ShellCode locates functions from other DLLS, and how we pre-calculate these values to resolve any imports from other payload variants;

Reverse the shell’s execution flow — this process is very simple;

Break Metasploit import resolution – a non-invasive spoofing technique (without hooks) that allows Metasploit to notify antivirus software (AV) with high confidence.

For this analysis, the researchers generated the researchers’ own Shellcode using Metasploit under v6.0.30-dev. Use the following command to generate the result of malicious samples for 3792 f355d1266459ed7c5615dac62c3a5aa63cf9e2c3c0f4ba036e6728763903 SHA256 hash value, and can be found on the VirusTotal, For readers willing to try it out for themselves.

msfvenom -p windows/shell_reverse_tcp -a x86 > shellcode.vir
Copy the code

Throughout the analysis, we renamed functions, variables, and offsets to reflect their role and improve clarity.

 

The preliminary analysis

In this section, the researcher Outlines the initial logic to follow to determine the next step of analysis (import parsing and perform flow analysis).

While the typical executable contains one or more entry points (exported functions, TLS callbacks, and so on), ShellCode can be considered the most primitive code format in which initial execution begins with the first byte.

The Shellcode generated from the initial byte analysis Outlines two operations:

From an analytical point of view, the first instruction at ① can be ignored. The CLD operation clears direction flags to ensure that string data is read forward and not backward (for example, the difference between CMD and DMC). The second call at ② moves execution to a function the researchers name Main, which will contain shellCode’s Main logic.

Call the disassembly shellcode of the Main function

In the Main function, we observe additional calls, such as the four (③, ④, ⑤, and ⑥) highlighted in the figure below. These calls are for an as-yet-unidentified function whose address is stored in an EBP register. To understand where this function is, we need to step back and understand how the calling instruction operates.

A disassembly of the Main function

The invocation directive moves execution to the target destination by performing two operations:

It pushes the return address (the memory address of the instruction after the calling instruction) onto the stack. This address can later be used by the RET directive to return execution from the called function (the called) to the calling function (the caller);

It shifts execution to the target destination (the caller), just like a JMP instruction;

Thus, the first POP instruction from Main function ③ stores the caller’s return address in the EBP register. This return address is then called as a function at offsets 0x99, 0xA9, and 0xB8(④, ⑤, and ⑥). This pattern, where a similar push precedes each call, tends to indicate that the return address stored in the EBP is a dynamically imported parsing function.

“Normal” executables (e.g., portable executables on Windows) contain the necessary information so that once loaded by an operating system (OS) loader, the code can call imported routines, such as those from the Windows API (e.g., LoadLibraryA). To implement this default behavior, executables should have a specific structure that the operating system can interpret. Because ShellCode is a basic version of the code (it does not have the expected structure), the operating system loader cannot help it parse these imported functions, and more seriously, the operating system loader will not be able to “execute” the Shellcode file. To resolve this problem, ShellCode usually performs “dynamic import parsing.”

One of the most common techniques for performing “dynamic import resolution” is to hash each of the available export functions and compare them to the hashes that need to be imported. Since shellcode developers cannot always predict whether a particular DLL(e.g. Ws3_32.dll for Windows Sockets) and its export has been loaded, it is not uncommon for ShellCode to load DLLS, The LoadLibraryA function (or one of its alternatives) is called first. Relying on LoadLibraryA(or an alternative) before calling the exports of other DLLS is a stable approach because these library loading functions are part of kernel32.dll and are one of the few DLLS that can be loaded into every process.

To verify the researcher’s theory, the researcher can search for all the call instructions, as shown in the figure below (for example, use IDA’s text options under the search menu). All instances reference the EBP register except the first call to Main. This observation, along with well-known constants that the researchers will observe in the next section, supports the researchers’ theory that the address stored in the EBP holds a pointer to a function that performs dynamic import parsing.

All call instructions in shellCode

The large number of calls to the EBP register indicates that it does hold a pointer to the import parsing function, which we now know is located after the first call to Main.

Import resolution solution analysis

So far, we’ve noticed that the directive after the initial call to Main plays a crucial role because we want it to be an import resolution routine. Before looking at shellCode’s logic, let’s look at this parsing routine, because it will simplify understanding the rest of the calls.

From importing hash to function

The code immediately after the initial call to Main is where import parsing begins. To parse these imports, the routine first locates the list of modules loaded into memory because these modules contain their available export functions.

To find these modules, a common Shellcode technique is to interact with a Process Environment Block(PEB).

In computing, the process environment block (abbreviated PEB) is a data structure in the Windows NT operating system family. It is an opaque data structure used internally by the operating system and most of its fields are not used by anyone outside the operating system. The PEB contains data structures applied to the entire process, including the global context, startup parameters, the data structure of the program image loader, the base address of the program image, and synchronization objects used to provide mutual exclusion for the data structure of the entire process.

As shown in the figure below, to access the PEB, ShellCode accesses the thread environment block (TEB), which can be accessed immediately through the register (⑦). The TEB structure itself contains Pointers to PEB(⑦). In PEB, Shellcode can locate the PEB_LDR_DATA structure (⑧), which in turn contains references to multiple lists of double-linked modules. As observed in (⑨), Metasploit Shellcode uses one of these double-linked lists (InMemoryOrderModuleList) to subsequently traverse the LDR_DATA_TABLE_ENTRY structure containing the loaded module information.

Once identified the first module, shellcode in attending to retrieve the name of the module (BaseDllName. Buffer) and the maximum length of Buffer at ⑪ (BaseDllName. MaximumLength), it is necessary, because the Buffer cannot guarantee a null-terminated.

Disassembly of initial module retrieval

Worth stressed the point that, from the usual pointer (TEB. ProcessEnvironmentBlock, PEB. Ldr, etc.), on the other hand, under the double linked list to a list of items. This means that Pointers in the list will point to non-zero offsets rather than to the starting position of the structure. So, while in the figure below LDR_DATA_TABLE_ENTRY has the BaseDllName attribute at an offset of 0x2C, from the point of view of the list entry, the offset is 0x24 (0x2C-0x08). This can be observed in the figure above, where the offset of 8 must be subtracted to access the two BaseDllName attributes at ⑩ and ⑪.

BaseDllName from TEB

After restoring the buffer and maximum length of the DLL name, ShellCode continues to generate a hash. To do this, ShellCode does the following for each ASCII character within the maximum name length:

If the character is lowercase, it is changed to uppercase. This operation is performed based on the ASCII representation of the character, which means that if the value is 0x61 or higher (a or higher), 0x20 is subtracted to get into the uppercase range;

The generated hash (originally 0) is rotated 13 bits to the right (ROR) (0x0D);

Uppercase characters are added to the existing hash;

Describes the pattern of the hash loop for the first character (K) of kernel32.dll

Repeated rotations and additions at a fixed registry size (32 bits in the case of EDI) will eventually start to overlap characters. The combination of these repetitions and overlaps makes the operation irreversible, thus producing a 32-bit hash/checksum of the given name.

One interesting finding is that while BaseDllName in LDR_DATA_TABLE_ENTRY is Unicode encoded (2 bytes per character), the code regards it as ASCII encoded (1 byte per character) by using LODSB (see bewley).

Disassembly of module name hashing routines

Hash generation algorithms can be implemented in Python, as shown in the code snippet below. Although we mentioned earlier that according to Microsoft documentation, the buffer for BaseDllName does not need to be terminated with NULL, extensive testing has shown that NULL terminations are always the case and can usually be assumed. This assumption is what makes the MaximumLength attribute a valid boundary, similar to the Length attribute. Therefore, the following code snippet expects the data passed to get_hash to be a Python byte object generated from a Null-terminated Unicode string.

The above functions can be used to calculate the hash value of kernel32.dll, as shown below.

After generating the hash of the DLL name, ShellCode continues to identify all the exported functions. To do this, Shellcode first retrieves the DllBase property of LDR_DATA_TABLE_ENTRY (in passing), which points to the memory address of the DLL. At this point, the IMAGE_EXPORT_DIRECTORY structure is identified by iterating through the structure of portable executable files (14) and 15) and adding relative offsets to the base addresses in the DLL’s memory. The final structure contains the number of ⑰ persons names of exported functions (⑰ persons) and the list of Pointers to those names (⑯ persons).

Export the disassembly of the retrieval

The architecture of the above operation is as follows, where the dotted line represents the address calculated from the relative offset, which increases the memory base address of the DLL.

From LDR_DATA_TABLE_ENTRY to IMAGE_EXPORT_DIRECTORY

Once you have determined the number of export names and their Pointers, ShellCode enumerates the table in descending order. In particular, the number of names is used as a decrement counter for the element of exposure. For each exported function name and no match, Shellcode executes a hash routine (hash_export_name on ⑲), similar to what the researchers observed earlier, except that character case (hash_export_character) is preserved.

The final hash is obtained by adding the recently evaluated function hash (ExportHash) to the module hash (DllHash) previously obtained at ⑳. This is looking for a difference between hashes and ㉑, and unless they match, the operation restarts the next function.

Export a disassembly of name hashes

If none of the exported functions match, the routine retrieves the next module in the InMemoryOrderLinks double-linked list and does so again until a match is found.

Disassemble the loop to the next module

The architecture of the traversal double-linked list above is shown below:

Traverse InMemoryOrderModuleList

If a match is found, ShellCode continues to call the exported function. From the previous validation of IMAGE_EXPORT_DIRECTORY to retrieve its address, the code will first need to map the order of the function’s name (㉒), and export the number in order. Once the order is recovered from the AddressOfNameOrdinals table, the addresses can be addressed by using the index in the AddressOfFunctions table (㉓).

Import a disassembly of calls

Finally, once the exported address is recovered, Shellcode simulates the call behavior by ensuring that the returned address is first on the stack (removing the hash it is searching for at ㉔), followed by all arguments according to the default Win32 API __stdCall call protocol (㉕). The code then performs the JMP operation at ㉖, moving the execution to the dynamically resolved import and, on return, recovering from where the original call to the EBP operation occurred.

In general, dynamic import resolution schemas can be nested loops. The main loop iterates through the modules in memory order (blue in the figure below), while for each module, the second loop iterates through the export function, looking for a matching hash between the required imports and the available exports (red in the figure below).

Import the parsing stream

【Access to network security learning materials】