Saar Amar
Intro
In iOS 16 / macOS 13 Apple added “guards” for certain types of allocations. This is an interesting change and since I haven’t seen any techincal writeup on the subject I would like to share some details about it.
Funny enough, this isn’t a reversing blogpost. I intended to post this a few months ago after reversing this mechanism, but I didn’t find the time. And thanks to the new macOS 13 / iOS 16 OSS drop (tweet, ref), we have source code to look at (tweet). I’m very happy Apple opensource all of this, it’s super useful, thank you!
Credit to Proteas who bindiffed this change right away back in the day (tweet)!
Guarded allocations
Let me state the obvious right away: we are not talking about guard pages. Guard pages have existed for about ~1000 years now, and we are talking about a modern change Apple added to the kmem_*
subsystem.
Because we are talking about a securiy mitigation, we can expect to panic on failures in specific validations in the kernel. This means we can identify the relevant call to panic, and make our way up the callstack. This is useful in reversing and makes everything easy. However, as I said, we have the XNU sources for this functionality. So in this blogpost we will go through the source code.
The panic flow
The relevant panic happens in the function __kmem_entry_validate_panic
(osfmk/vm/vm_kern
):
__abortlike
static void
__kmem_entry_validate_panic(
vm_map_t map,
vm_map_entry_t entry,
vm_offset_t addr,
vm_size_t size,
uint32_t flags,
kmem_guard_t guard)
{
const char *what = "???";
if (entry->vme_atomic != guard.kmg_atomic) {
what = "atomicity";
} else if (entry->is_sub_map != guard.kmg_submap) {
what = "objectness";
} else if (addr != entry->vme_start) {
what = "left bound";
} else if ((flags & KMF_GUESS_SIZE) == 0 && addr + size != entry->vme_end) {
what = "right bound";
#if __LP64__
} else if (guard.kmg_context != entry->vme_context) {
what = "guard";
#endif
}
panic("kmem(map=%p, addr=%p, size=%zd, flags=0x%x): "
"entry:%p %s mismatch guard(0x%08x)",
map, (void *)addr, size, flags, entry,
what, guard.kmg_context);
}
As you can see, this function is called when the decision to panic has already happened, and it just wraps nicely the call to panic. It “resolves” all the details for the panic string, including the reason for panicing (the what
variable) and the “guard”.
This function is called from two callsites, after a call to __kmem_entry_validate_guard
returned false
:
kmem_entry_validate_guard
kmem_size_guard
Let’s see which flows reach these two functions:
- The functions that call
kmem_entry_validate_guard
are:vm_map_delete
,kmem_realloc_guard
. - The functions that call
kmem_size_guard
are:kfree_large
,kern_os_realloc_external
.
Ok, makes sense - we can see the new security checks (which we will elaboarate about in a minute) are done before operations that interact with the allocation (free/delete mapping/reallocation/etc.).
Just to give a better picture, let’s look at the two functions that actually check and panic. They are very simple and straigtforward:
void
kmem_entry_validate_guard(
vm_map_t map,
vm_map_entry_t entry,
vm_offset_t addr,
vm_size_t size,
kmem_guard_t guard)
{
if (!__kmem_entry_validate_guard(entry, addr, size, KMEM_NONE, guard)) {
__kmem_entry_validate_panic(map, entry, addr, size, KMEM_NONE, guard);
}
}
vm_size_t
kmem_size_guard(
vm_map_t map,
vm_offset_t addr,
kmem_guard_t guard)
{
kmem_flags_t flags = KMEM_GUESS_SIZE;
vm_map_entry_t entry;
vm_size_t size;
vm_map_lock_read(map);
if (!vm_map_lookup_entry(map, addr, &entry)) {
__kmem_entry_not_found_panic(map, addr);
}
if (!__kmem_entry_validate_guard(entry, addr, 0, flags, guard)) {
__kmem_entry_validate_panic(map, entry, addr, 0, flags, guard);
}
size = (vm_size_t)(entry->vme_end - entry->vme_start);
vm_map_unlock_read(map);
return size;
}
Ok, makes sense. Pretty much what one would expect to see. Now let’s see exactly which new information/security properties are enforced here.
Allocation validation
The kmem_*
subsystem knows something went wrong by comparing two structure - the mapping entry associated with the allocation (vm_map_entry_t
) and the new guard structure (kmem_guard_t
).
To see what exactly is checked we can simply look at __kmem_entry_validate_guard
:
static bool
__kmem_entry_validate_guard(
vm_map_entry_t entry,
vm_offset_t addr,
vm_size_t size,
kmem_flags_t flags,
kmem_guard_t guard)
{
if (entry->vme_atomic != guard.kmg_atomic) {
return false;
}
if (!guard.kmg_atomic) {
return true;
}
if (entry->is_sub_map != guard.kmg_submap) {
return false;
}
if (addr != entry->vme_start) {
return false;
}
if ((flags & KMEM_GUESS_SIZE) == 0 && addr + size != entry->vme_end) {
return false;
}
#if __LP64__
if (!guard.kmg_submap && guard.kmg_context != entry->vme_context) {
return false;
}
#endif
return true;
}
And we also would like to see kmem_guard_t
, along with the great documentation:
/*!
* @typedef kmem_guard_t
*
* @brief
* KMEM guards are used by the kmem_* subsystem to secure atomic allocations.
*
* @discussion
* This parameter is used to transmit the tag for the allocation.
*
* If @c kmg_atomic is set, then the other fields are also taken into account
* and will affect the allocation behavior for this allocation.
*
* @field kmg_tag The VM_KERN_MEMORY_* tag for this entry.
* @field kmg_type_hash Some hash related to the type of the allocation.
* @field kmg_atomic Whether the entry is atomic.
* @field kmg_submap Whether the entry is for a submap.
* @field kmg_context A use defined 30 bits that will be stored
* on the entry on allocation and checked
* on other operations.
*/
typedef struct {
uint16_t kmg_tag;
uint16_t kmg_type_hash;
uint32_t kmg_atomic : 1;
uint32_t kmg_submap : 1;
uint32_t kmg_context : 30;
} kmem_guard_t;
We have it all. This structure is used to describe allocations in the kmem_*
subsystem. As the documentation suggests, this new functionality exists to secure atomic allocations:
"KMEM guards are used by the kmem_* subsystem to secure atomic allocations."
However, we don’t need (or want) to rely on documentation. We have code, and code is the most reliable thing we can ever have. To be fair, I mean binary code, not source code. However, I’ll paste here the source code, and you would trust me (and the compiler) that this is what happens :P Indeed, you can see __kmem_entry_validate_guard
returns true
if guard.kmg_atomic
is 0:
if (!guard.kmg_atomic) {
return true;
}
As we can see from __kmem_entry_validate_panic
the panic happens if there is any inconsistency between the guard and the mapping entry, which could be:
- atomicity (one is atomic, the other is not).
- “objectness” (one is sub_map, the other is not).
- bounds - the arguments
addr
andsize
not match the information invm_map_entry_t
. - the “context”s are different.
Now, what is this “*_context
”?
The context
In this blogpost I would like to focus on the last check:
if (!guard.kmg_submap && guard.kmg_context != entry->vme_context) {
return false;
}
This check compares the vme_context
in the vm_map_entry_t
structure associated with the allocation and the kmg_context
in the kmem_guard_t
. If they are different, the function returns false, and the caller will panic on “guard mismatch”.
This context is the “guard” we are talking about. It’s a 30-bit value which XNU stores in the vme_context
field in the vm_map_entry_t
structure, and checks it in different operations. XNU gets this field from the kmg_context
field in kmem_guard_t
instances, which are created on the fly by callers to kmem_*
functionalities.
Setting vme_context
As we saw, the mapping entry is the structure that actually holds and keeps track of this context (in the vme_context
field). There is only one place in XNU that sets this field directly - VME_OBJECT_SET
. Of course, many callsites call this funtion, but let’s start with VME_OBJECT_SET
:
static inline void
VME_OBJECT_SET(
vm_map_entry_t entry,
vm_object_t object,
bool atomic,
uint32_t context)
{
__builtin_assume(((vm_offset_t)object & 3) == 0);
entry->vme_atomic = atomic;
entry->is_sub_map = false;
#if __LP64__
if (atomic) {
entry->vme_context = context;
} else {
entry->vme_context = 0;
}
#else
(void)context;
#endif
...
It’s not like we need more evidence that the context exists only to protect atomic allocations - but we can see that if the allocation is not atomic, the vme_context
is set to 0. And if the allocation is atomic - we use the context
argument.
For example, you can see that in kmem_realloc_guard
, the context argument is kmg_context
from the guard:
...
VME_OBJECT_SET(newentry, object, guard.kmg_atomic, guard.kmg_context);
VME_ALIAS_SET(newentry, guard.kmg_tag);
...
Now let’s see how XNU generates kmg_context
.
Context computation
All the functions that sets kmg_context
in the guards (and actually calculates the hash instead of setting it to 0) useos_hash_kernel_pointer
. This function implements some hash function that is computed on a pointer. The hash is the actual guard:
/*!
* @function os_hash_kernel_pointer
*
* @brief
* Hashes a pointer from a zone.
*
* @discussion
* This is a really cheap and fast hash that will behave well for pointers
* allocated by the kernel.
*
* This should be not used for untrusted pointer values from userspace,
* or cases when the pointer is somehow under the control of userspace.
*
* This hash function utilizes knowledge about the span of the kernel
* address space and inherent alignment of zalloc/kalloc.
*
* @param pointer
* The pointer to hash.
*
* @returns
* The hash for this pointer.
*/
static inline uint32_t
os_hash_kernel_pointer(const void *pointer)
{
uintptr_t key = (uintptr_t)pointer >> 4;
key *= 0x5052acdb;
return (uint32_t)key ^ __builtin_bswap32((uint32_t)key);
}
Please note that this function is always inlined and it’s fast.
Bookkeeping
The vm_map_entry_t
s are used just as before (you can see the lookup is done by calling vm_map_lookup_entry
, RB trees, classic stuff, etc.). This is part of XNU MM 101, and I’m not going to cover this here because it’s not in the scope of this blogpost.
Let’s consider the additional metadata we need to keep track of. The context is 30-bit - which means the bookkeeping requires additional 30-bits per allocation. Apple stores it in the mapping entry - the vme_context
field in vm_map_entry_t
. This is an important change to a key structure.
Below is the diff in vm_map_entry
(/osfmk/vm/vm_map.h
):
struct vm_map_entry {
- struct vm_map_links links; /* links to other entries */
+ struct vm_map_links links; /* links to other entries */
#define vme_prev links.prev
#define vme_next links.next
#define vme_start links.start
#define vme_end links.end
struct vm_map_store store;
- union vm_map_object vme_object; /* object I point to */
- vm_object_offset_t vme_offset; /* offset into object */
- unsigned int
- /* boolean_t */ is_shared:1, /* region is shared */
- /* boolean_t */ is_sub_map:1, /* Is "object" a submap? */
- /* boolean_t */ in_transition:1, /* Entry being changed */
- /* boolean_t */ needs_wakeup:1, /* Waiters on in_transition */
- /* vm_behavior_t */ behavior:2, /* user paging behavior hint */
+ union {
+ vm_offset_t vme_object_value;
+ struct {
+ vm_offset_t vme_atomic:1; /* entry cannot be split/coalesced */
+ vm_offset_t is_sub_map:1; /* Is "object" a submap? */
+ vm_offset_t vme_submap:VME_SUBMAP_BITS;
+ };
+#if __LP64__
+ struct {
+ uint32_t vme_ctx_atomic : 1;
+ uint32_t vme_ctx_is_sub_map : 1;
+ uint32_t vme_context : 30;
+ vm_page_object_t vme_object;
+ };
+#endif
+ };
And just as a reminder:
/*
* Types defined:
*
* vm_map_t the high-level address map data structure.
* vm_map_entry_t an entry in an address map.
* vm_map_version_t a timestamp of a map, for use with vm_map_lookup
* vm_map_copy_t represents memory copied from an address map,
* used for inter-map copy operations
*/
typedef struct vm_map_entry *vm_map_entry_t;
#define VM_MAP_ENTRY_NULL ((vm_map_entry_t) NULL)
Please note that Apple does not need to store a kmem_guard_t
instance per allocation. The only relevant value to keep track of is the context (30-bit hash), and they already store it in the mapping entry. There isn’t a reason to store it twice. And since the guard is derived from the owner of the allocation and can be built on the fly, there isn’t any reason to keep a whole kmem_guard_t
instance per allocation.
Actually, it’s more than that. The fact each caller builds the kmem_guard_t
on the fly is important and helps. Because it means we have two properties:
- It’s not attacker controlled. For example, the
kmg_atomic
field is set from a constant value in the code, which means it’s not subject to altering by the attacker. - It’s derived on the fly based on rules and control flow per client.
We can consider these guards as a way for the callers to describe how they expect the allocation to look like based on the control flow.
Example
The following logic in kfree_large
calls kmem_size_guard
while passing a dynamically generated guard, with constant kmg_atomic
, kmg_tag
, and kmg_type_hash
, and generated context for owner
:
static void
kfree_large(
vm_offset_t addr,
vm_size_t size,
kmf_flags_t flags,
void *owner)
{
#if CONFIG_KERNEL_TBI && KASAN_TBI
if (flags & KMF_GUESS_SIZE) {
size = kmem_size_guard(kernel_map, VM_KERNEL_TBI_FILL(addr),
kalloc_guard(VM_KERN_MEMORY_NONE, 0, owner));
flags &= ~KMF_GUESS_SIZE;
}
addr = kasan_tbi_tag_large_free(addr, size);
#endif /* CONFIG_KERNEL_TBI && KASAN_TBI */
#if KASAN_KALLOC
/* TODO: quarantine for kasan large that works with guards */
kasan_poison_range(addr, size, ASAN_VALID);
#endif
size = kmem_free_guard(kernel_map, addr, size, flags,
kalloc_guard(VM_KERN_MEMORY_NONE, 0, owner));
counter_dec(&kalloc_large_count);
counter_add(&kalloc_large_total, -(uint64_t)size);
KALLOC_ZINFO_SFREE(size);
DTRACE_VM3(kfree, vm_size_t, size, vm_size_t, size, void*, addr);
}
Indeed, we can see the call to kalloc_guard
, which cannot be any simpler:
static kmem_guard_t
kalloc_guard(vm_tag_t tag, uint16_t type_hash, const void *owner)
{
kmem_guard_t guard = {
.kmg_atomic = true,
.kmg_tag = tag,
.kmg_type_hash = type_hash,
.kmg_context = os_hash_kernel_pointer(owner),
};
/*
* TODO: this use is really not sufficiently smart.
*/
return guard;
}
Additional fields
Before we keep going, it might be worth to mention more fields and values besides the *_context
.
Tag?
Well, as you probably noticed, the function __kmem_entry_validate_guard
checks all the fields of kmem_guard_t
besides the kmg_tag
and kmg_type_hash
. I’m intentially not discussing these tags here because Apple mostly uses it for statistics/counting (if you are curious, these are vm_tag_t
, you can see it in the source).
Since it’s not used for security, I’m going to ignore these tags right now :)
Bounds?
Well, unlike vm_tag_t
, bounds and size certainly have great value for security (it’s like saying water has value for life). That’s interesting, because the function kmem_realloc_guard
has the following “TODO”:
/*
* Locate the entry:
* - wait for it to quiesce.
* - validate its guard,
* - learn its correct tag,
*/
again:
if (!vm_map_lookup_entry(map, oldaddr, &oldentry)) {
__kmem_entry_not_found_panic(map, oldaddr);
}
if ((flags & KMR_KOBJECT) && oldentry->in_transition) {
oldentry->needs_wakeup = true;
vm_map_entry_wait(map, THREAD_UNINT);
goto again;
}
kmem_entry_validate_guard(map, oldentry, oldaddr, oldsize, guard);
if (!__kmem_entry_validate_object(oldentry, ANYF(flags))) {
__kmem_entry_validate_object_panic(map, oldentry, ANYF(flags));
}
/*
* TODO: We should validate for non atomic entries that the range
* we are acting on is what we expect here.
*/
...
It seems we all have an agreement about what should be checked, even for non atomic entries :)
Security value
First of all, the threat model here is that the attacker does not have memory corruption yet - they have the ability to influence/control addr/size/etc., and they are looking to expand the set of primitives they hold. And with these new validations in place, we get a really nice hardening for that. Now one cannot use arbitrary free gadgets to mess up with the kmem_*
subsystem. Common attacks on the kmem_*
are:
- calling free with a huge size.
- calling free with a compromised/incorrect addr.
This lets attacker deallocate ranges of several allocations which they can then corrupt with UAF, and then the fun begins. There have been a lot of exploits using such techniques (you probably recall games with OSData
, OSArray
, and in general backing storages aliasing with regular allocations). This helps the allocator to verify the input it gets.
Please note that Apple generates the context using the storage of the pointer to the allocation (and not simply the VA itself). With this behavior in place, if an attacker has the primitive to free an arbitrary range, in addition to match bounds, now they need to do it from the right “owner”.
Example
If you recall, in my 3rd part of the ipc_kmsg blogpost I wrote the following sentence: “I love coffee, whiskey, and IOSurface”. And, well, to be fair, I should have added “OSData
” to this list. To make it up to this structure, which plays a significant role in our lives, let’s look at it right now. Let’s see what this owner
is in OSData
related flow.
Here is the flow to kfree_large
from OSData
:
OSData::ensureCapacity
krealloc_ext
kfree_large
kalloc_guard
And, OSData::ensureCapacity
looks as follows:
unsigned int
OSData::ensureCapacity(unsigned int newCapacity)
{
struct kalloc_result kr;
unsigned int finalCapacity;
if (newCapacity <= capacity) {
return capacity;
}
finalCapacity = (((newCapacity - 1) / capacityIncrement) + 1)
* capacityIncrement;
// integer overflow check
if (finalCapacity < newCapacity) {
return capacity;
}
kr = krealloc_ext((void *)KHEAP_DATA_BUFFERS, data, capacity, finalCapacity,
Z_VM_TAG_BT(Z_WAITOK_ZERO | Z_FULLSIZE | Z_MAY_COPYINMAP,
VM_KERN_MEMORY_LIBKERN), (void *)&this->data);
if (kr.addr) {
size_t delta = 0;
data = kr.addr;
delta -= capacity;
capacity = (uint32_t)MIN(kr.size, UINT32_MAX);
delta += capacity;
OSCONTAINER_ACCUMSIZE(delta);
}
return capacity;
}
Fantastic. The last argument to krealloc_ext
is the owner
, and we can see it sets it to (void *)&this->data
.
Now let’s discuss the threat model and see why all of that makes sense :)
The right kind of mitigations
This section is going to take a turn to “generic memory safety mitigations”, but it’s important.
When we build mitigations, the first question should be “what is the threat model?”. Obviously, the threat model CANNOT be “the attacker has arbitrary r/w”. That contradicts one of the fundamental laws of physics, which reads “arbitrary r/w –> game over, always”. Therefore, the threat model is usually “an attacker has the ability to trigger a memory corruption of the following type…”, which can be UAF / OOB / straightforward-type-confusion / etc..
I have pretty strong (maybe too strong) feelings about how we should build mitigations. In my opinion, for a mitigation to have high ROI, it should target 1st order primitives rather than specific exploitation techniques (i.e., we should aim to kill bug classes). Or at least, we should aim to get as close to the 1st order primitive as possible. That’s precisely why I was so excited to see the following text in the best blogpost ever:
“Most kernel memory corruption exploits go through a similar progression:
vulnerability → constrained memory corruption → strong memory corruption → memory read/write → control flow integrity bypass → arbitrary code execution
The idea is that the attacker starts from the initial vulnerability and builds up stronger and stronger primitives before finally achieving their goal: the ability to read and write kernel memory, or execute arbitrary code in the kernel. It’s best to mitigate these attacks as early in the chain as possible, for two reasons. …”
Example - kalloc_type and dataPAC
Clearly, kalloc_type does precisely that. It’s probably one of the best modern mitigations currently in production. So many UAFs are not exploitable anymore because one could only corrupt an instance of type A with another instance of type A - let’s call it “same-type-confusion” - which highly limits the attacker. To be precise (yes, I’m annoying, but that’s important) - it’s not “same-type-confusion”, because, unlike IsoHeap, kalloc_type doesn’t have 100% isolation. It isolates types based on signatures. So it’s really “same-signature-confusion”. We could keep referring to it as “same-type-confusion”; just please keep this in mind. It’s also worth to mention the signatures are built in a pretty wise and useful way (see SEAR’s blog for the details).
Some people asked me “why do you like dataPAC, if you always talk in favor of killing bug classes instead of protecting specific structures?”. Let’s consider one example. Clearly, with kalloc_type, we could do some nice UAF exploits, provided the vulnerable structure in question has some nice properties. For example, if the structure in question has:
- fields that specify count/length/size – we could corrupt them with another value, and covert the UAF into OOB.
- unions – well, everything inside that union is up for type confusion now.
However - what if the pointers inside these structures are signed using dataPAC, with a type as auxiliary? That’s right, dataPAC just got way up the chain, and it targets something highly closer to the 1st order primitive. Apple actually created here a scenario of “this UAF is not exploitable”, even though the structure has unions, because you need a PAC bypass to create this confusion.
That’s exactly what we should aim for when we build mitigations. The scenario of “hey, we have this new memory safety bug, but we can say that without further primitives, it’s not exploitable”.
What does kmem has to do with all of that?
I believe protecting alloactor-related structures/metadata is quite important. That’s because the allocator can expose highly powerful exploitation primitives, from highly restricted primitives. Consider the attacks of calling free with a huge size - from underflowing/modifying one integer, you get a huge, massive UAF on a ton of structures. That’s a huge leap. And since the cost for this mitigation is relatively low - it seems very worth it.
Going back to the threat model - indeed, the threat model here is that the attacker does not have memory corruption yet. They have the ability to influence/control addr/size/etc., and they are looking to expand the set of primitives they hold - and thanks to the allocator, they can create a powerful memory corruption with relatively low effort. This is exactly what the mitigation comes to address.
I’m very happy to see all these efforts from Apple.
I hope you enjoyed this blogpost.
Thanks,
Saar Amar.