Saar Amar
Intro
This blogpost summarizes and compares some of the exciting approaches in our journey to memory safety. I hope this could become a reference to interested readers and colleagues.
Along this blogpost I’ll consider five security properties: spatial safety, temporal safety, type safety, definite initialization and concurrency safety. Instead of repeating stuff we all have said in the past, I think I can simply copy-paste the following text from (the amazing!) SEAR’s kalloc_type blogpost, which puts it as follows:
“Memory safety is a relatively well-understood problem space. The rest of this post assumes a familiarity with the taxonomy of memory safety:
- Temporal safety means that all memory accesses to an object occur during the lifetime of that object’s allocation, between when the object’s memory is allocated and when it is freed. An access to the object outside of this window is unsafe and called a Use-After-Free (UAF); double-free violations are a particular variant of UAF.
- Spatial safety notes that a memory allocation has a particular size, and it’s incorrect to access any memory outside of the intended bounds of the allocation. Violations of this property are called Out-of-Bounds (OOB) accesses.
- Type safety means that when a memory allocation represents a particular object with specific rules about how the object can be used, those rules can’t unexpectedly change — in other words, that the allocation is typed. Violations of this property are called type confusions.
- Definite initialization denotes that a program is responsible for properly initializing newly allocated memory before using it, as the allocation might otherwise contain unexpected data. Violations of this property often lead to issues called information disclosures, but can sometimes lead to more serious memory safety issues, such as type confusions or UAFs.
- Thread safety is concerned with how modern software concurrently accesses memory. If concurrent accesses to an allocation aren’t properly synchronized, then the objects contained in the allocation might reach an incorrect state and break their invariants. Violations of this property are typically called data races.”
I’ll detail some of the industry’s most interesting approaches/suggestions and discuss how they address these properties. I will compare some of the approaches and add references to already-published research on each one to help interested readers follow up as they like.
The right kind of mitigations
As I wrote in many of my blogposts and said in many presentations - I believe the right kind of mitigations targets the 1st order primitive; the root cause of the bug. This is important because of how most exploits look like. As also explicitly mentioned in the kalloc_type blogpost, attackers use a 1st order primitive in order to corrupt some target structure and expand the set of primitives they have. The more we let the exploit progress, the more primitives attackers have. Targeting late stages in the exploit process has very low ROI, since at late stages of the exploit, attackers already gain so many primitives that let them have a lot of freedom to bypass the mitigations.
An excellent example of that is CFI. We see so many bypasses to all forms of CFI. This is precisely why: when attackers face CFI, they usually have arbitrary read/write. While certain forms of CFI will make bypasses more complicated, there are generally several generic ways to bypass them. And when an attacker armed with arbitrary read/write faces these mechanisms, it always ends with a game over.
Therefore, this blogpost will focus on approaches that follow these lines: solutions that target 1st order primitives rather than specific exploitation techniques.
The blogpost is divided into three main sections:
- Hardware solutions: CHERI (Morello, CheriIoT), MTE
- Software mitigations: kalloc_type+dataPAC, AUTOSLAB, Firebloom, GuardedMemcpy, CastGuard, attack surface reduction
- Safe programming languages: Rust, Swift
Hardware solutions
I believe CHERI and MTE are highly documented and common knowledge stuff at this point. However, I absolutely can’t avoid mentioning both of these powerful hardware features in a blogpost about different possibilities for the future of memory safety.
CHERI
I’ve talked a lot about CHERI in the past three years. The TL;DR is that CHERI-ISA introduces a new entity to represent references to memory instead of pointers: capabilities. Capabilities are bounded, unforgable references to memory. Here are some of the published security research on CHERI I had the honor to be a part of:
- MSRC paper: tweet, paper
- MSRC BlackHat USA 2021 talk: tweet, slides
- MSR and MSRC Morello’s blogpost: tweet, blogpost
- MSR, MSRC and Azure Silicon blogpost on “smallest CHERI”: tweet, blogpost, which I’m very excited about in the context of the IoT/small devices space.
While CHERI has high adoption cost (entirely new ISA, more buses, tags management, requires a rebuild, etc.), it creates a huge security value:
- Bounds are checked architecturally; therefore, spatial safety bugs are deterministically mitigated at the architectural level.
- The ISA gives us the ability to distinguish between pointers and integers while mark capabilities as valid/invalid, which means it is architecturally impossible to fake/forge/corrupt/modify pointers, at all.
- The ISA creates great basic blocks for software mitigations, such as revocation - which deterministically kills UAFs.
A huge value of CHERI (in addition to the deterministic mitigations for memory safety) is compartmentalization. CHERI could change the way we build isolation models completely. With CHERI, we should be able to build isolation with stronger security properties and better perf.
MTE
There has been much research, blogs, papers, and presentations about ARM MTE. And again, I don’t want to repeat highly documented materials. I highly encourage you to check out my BlueHatIL talk “Security Analysis of MTE Through Examples”. This talk covers everything about MTE - how it works, modes, examples, applications, security values, and potential bypasses/concerns. And of course, it has a fun demo.
The TL;DR is that MTE introduces a new type of memory to the ARM architecture (“Normal Tagged Memory”), which allows us to set a tag for every 0x10 bytes of continuous physical memory. Each tag is 4 bits; therefore, we have 16 possibilities for tags. In addition, every pointer (i.e. VA) that points to a virtual address mapped with this new type of memory has to be a “tagged pointer” - which means we encode some information in the unused bits of the pointer itself. In the case of MTE, we simply set the tag in the MSB (bits 59-56).
Now, every time you dereference a VA which is mapped using this new type of memory (every time you do load/store), the CPU will compare both of these tags - the tag value from the MSB of the pointer and the allocation tag from the underlying physical memory. If they are different, the CPU will raise an exception (in the case of synchronous exceptions. Again, check out my talk :)).
There is a lot of published security research you can check out:
- Google’s paper: paper
- Android’s memory safety docs: Arm Memory Tagging Extension
- Adopting ARM MTE in Android: tweet, blogpost
- MSRC paper: tweet, paper
- MSRC BlueHatIL 2022 talk: tweet, slides, repo
As I presented in my talk, the thing about MTE is that if you use it properly, i.e.:
- use synchronous exceptions (“precise-mode”), which means the faulted instruction cannot retire and cause damage
- re-tag allocations on free
you get probabilistic mitigations for all the major memory safety bug-classes: spatial and temporal safety.
Note: If you enforce the property of adjacent allocations having different tags, you actually have a deterministic mitigation for strictly linear overflows. However, it might require special attention. For example: you always have to maintain the property of adjacent allocations having different tags. If you retag on free, you could solve this by adding locking (which hurts perf), or modifying the allocator to guarantee you never free adjacent allocations from different threads. Either way, you must have this in mind.
While MTE clearly introduces lower security value than CHERI, it has a great advantage: it has nearly 100% compatibility with existing code. It doesn’t even require a rebuild. Of course, you need to add all the support to your MM, allocators, etc., but all the rest could just remain as is.
While MTE has a probabilistic approach, unlike the deterministic approach CHERI has, the security value is still very high. Making about 2/3 of all the memory safety 1st order primitives crash with very high probability is fantastic. The main concern is that you can bypass MTE by leaking tags: unlike the physical tags, which are unaddressable, the tag values in all the tagged pointers are obviously writeable.
Now, there are a few special cases/exceptions to keep in mind. Because this blogpost is quite long, I’ll focus only on the software side of MTE and CHERI, and I’ll put aside ISA designing. Let’s start:
Special case #1: Coping overlap
There is something important to keep in mind for the deterministic mitigation for strictly linear overflows. Because there is a lot of confusion around memcpy/memmove, some OSes actually unified both of these functions to one implementation. The implementation checks for overlap between src and dst w.r.t length: if src + length > dst
. If there is an overlap, the function would actually copy the content backward. This is highly important, otherwise, you could corrupt content you didn’t copy yet.
Now, this creates an interesting behavior - the deterministic mitigation becomes probabilistic. Because the first byte you corrupt does not necessarily have a different tag (it’s not adjacent). Therefore, if one aims for a truly deterministic mitigation, they have to keep this in mind and take action. One simple way to address this is to manually check the tags in an increasing order.
Special case #2: Intra-object corruption
Something important I mentioned in my BlueHatIL 2022 talk about MTE (see the link above) is the case of intra-object corruption. This is the case of having an OOB inside an allocation. For example, consider the case of having a fixed-size buffer inside a structure, and having OOB from this buffer forward/backward. If you make sure your OOB remains inside the allocation and outside the buffer (well, it has to be outside the buffer, otherwise it’s not OOB), MTE has no way of mitigating this; all memory lines inside an allocation must have the same tag. In my MTE slides I mention some real-world examples for this sub-bug-class.
The reason I’m singling out MTE but not CHERI, is because CHERI actually has a way to address intra-object corruptions, using an LLVM flag. This is covered in the cheri-exercises repo, under subobject-bounds. One tiny exception is that this cannot be done for the first field in the structure, because it violates the C spec.
An amazing example (which I mentioned in my talk) is the famous great find by Tavis Ormandy in NSS. A straightforward trivial buffer overflow, intra-object corruption, with attacker-control length and content (yes, it couldn’t get any better than that). MTE won’t impact this bug, while CHERI will mitigate it with the subobject-bounds.
Note: For people who know they never want to cast from the element at offset 0 back to the structure, we could implement it for the first element and have it working perfectly, deterministically.
Special case #3: CPU side channels
We have seen so many CPU side channels in the past five years. Lots are speculative execution and time-based, but it doesn’t have to be. For example, we have seen the Hertzbleed attack, a family of frequency side channel attacks. These speculative execution variants just keep coming.
These bugs really concern me (which may lead to true stories such as this one). They are a nightmare for security models that rely on secrets. A few great examples:
Before we build mitigations (especially in hardware!), it’s very important to attack them from every possible angle and try to break them. There are no silver bullet mitigations, but it’s very important to be aware of all the gaps, bypasses, and exceptions and how they play along with the threat models we have in mind. For CHERI and MTE, it means the following:
- MTE: While the allocation tags in the underlying physical memory are unaddressable, all the tags’ values in pointers (which we store in the MSB of all the tagged pointers) are obviously writeable. The only thing that stops us from corrupting them is the lack of knowledge of the actual tags. However, if we could leak these tags, we could corrupt these pointers and bypass MTE (see my BlueHatIL talk about it).
- CHERI: Unlike MTE, CHERI capabilities do not rely on secrets. This means that you can disclose all of the fields of a capability without allowing an identical capability to be fabricated (you can’t forge pointers). It’s not a matter of secrets; it’s a matter of unaddressable tags testifying “Hey! This capability is invalid! Fault! Now!”.
Now, we can’t build information disclosure primitives on top of OOB/UAF, since those will be probabilistically mitigated. However - what about other ways of information disclosure, such as CPU side channels?
Consider the classic example of speculation: if you can read an address with a guess of the tag value in speculation then you will either trap in speculation or read the value. This is likely to produce a timing side channel that can be used to probe MTE values, just as attempting to read an address in speculation can be used to probe whether there is a valid mapping at that address.
And this gets very concerning when we consider local attacks: JS, user->kernel LPEs, etc..
The key point is that guarded manipulation means that capabilities can flow only through valid paths (principle of intentionality) and side channels can therefore not be used to leak the capabilities themselves, only to leak their values (which are explicitly public and can be read through architectural means as well).
CPU side channels are useful to leak secrets, but they won’t impact capabilities and the memory safety properties they introduce.
Definite initialization
Neither CHERI nor MTE have any impact on initialization. And this is very important because use of uninitialized memory has a concerning powerful impact on bypasses:
- MTE: Again, since attackers can corrupt/fake pointers freely with MTE (as long as they know the tag), information disclosure primitives are a very troubling concern for bypasses (again, see my BlueHatIL talk about it). Besides speculative execution and CPU side channels, we need to make sure attackers won’t be able to leak tagged pointers via uninitialized memory.
- CHERI: attackers cannot corrupt/fake pointers with CHERI. Unlike MTE, this property does not rely on secrets - the architecture itself marks valid capabilities in the unaddressable tags. Therefore, any kind of information disclosure (memory or side chnanels based) won’t let attackers corrupt/fake pointers.
However, you can still use tag-preserving gadgets and move existing capabilities from one place to another, without modifying them. You can still use valid capabilities in uninitialized memory, trigger type confusions etc..
The good news is that most of the big companies already have auto initialization features in place, and many compilers support that. Because I see uninitialized memory as a significant concern on the CHERI/MTE threat models, I assume we have that in place. And I feel good about that assumption because all the big companies pulled off auto initialization to some level. Therefore, I’ll consider both CHERI and MTE as “+ auto-init” from now on.
Note: I highly recommend checking out the following fantastic publications:
- “Zero-initialize objects of automatic storage duration”, by JF Bastien about auto initialization (tweet, doc)
- “Solving Uninitialized Stack Memory on Windows”, by Joe Bialek (tweet, blogpost)
- “Solving Uninitialized Kernel Pool Memory on Windows”, by Joe Bialek (tweet, blogpost)
- “Building Faster AMD64 Memset Routines”, by Joe Bialek (tweet, blogpost)
So, overall, here is the impact CHERI and MTE have on memory safety (if you are on a phone, the table probably doesn’t fit in the screen; you might need to scroll):
CHERI-ISA + revocation + auto-init | MTE + auto-init | |
spatial safety | Deterministic mitigation | Probabilistic mitigation (1), (2) |
temporal safety | Deterministic mitigation | Probabilistic mitigation |
type safety | No (3) | No |
concurrency safety | No | No |
definite init | Yes | Yes |
(1) Note: MTE could gain deterministic mitigation for stricly linear overflows, by maintaing the property of setting different tags to adjacent allocations.
(2) Note: MTE can’t protect against intra-object corruptions.
(3) Note: CHERI-ISA actually does give some granularity of type safety, but not in the broad general sense: You get a by-design architectural ability to distinguish between integers and pointers. In addition, you could use sealing to seal vtables with a specific type, etc..
Software Solutions
There are some great examples of software mitigations that target 1st order primitives and therefore kill an entire bug classes of vulnerabilities. This blogpost won’t feel right without mentioning the top leading examples from the past few years, so I would like to give a few examples:
kalloc_type (+ dataPAC), Apple
Yes, this is my favorite. In iOS 15 Apple introduces the first mitigation ever to temporal safety in software: kalloc_type. Kalloc_type gives us type-based segregation for general purpose allocations within each sizeclass. Instead of talking about it here I’ll simply refer you to their amazing blogpost about it. The TL;DR is that XNU memory management enforces the following properties:
- Once a VA used to serve allocation of type A, it could only use to serve allocations of types with the same signature, which drastically reducing the number of UAF reallocation candidates for any given type.
- Types’ signatures are generated such that we could distinguised between types with pointers / metadata etc., and we could avoid exploitable confusion. The signature scheme allows the segregation algorithm to reduce the number of pointer-data overlaps by encoding the following properties for each 8 byte granule of a type:
__options_decl(kt_granule_t, uint32_t, {
KT_GRANULE_PADDING = 0, /* Represents padding inside a record type */
KT_GRANULE_POINTER = 1, /* Represents a pointer type */
KT_GRANULE_DATA = 2, /* Represents a scalar type that is not a pointer */
KT_GRANULE_DUAL = 4, /* Currently unused */
KT_GRANULE_PAC = 8 /* Represents a pointer which is subject to PAC */
});
Kalloc_type does have some weaknesses: same-signature-confusion, pointer-to-pointer confusions attacks, etc.. The most concern/interesting one is probably same-type-confusion. Consider the case of a structure with union that wraps multiple different types of pointers. This is a present for exploit developers - the route from UAF on such a structure to type confusion is usually trivial.
That being said, Apple has a very good instrumentation of dataPAC (we see it more and more in kernelcaches). I actually wrote something about just that in the kmem_guard_t blogpost I posted a few months ago:
“Some people asked me “why do you like dataPAC, if you always talk in favor of killing bug classes instead of protecting specific structures?”. Let’s consider one example. Clearly, with kalloc_type, we could do some nice UAF exploits, provided the vulnerable structure in question has some nice properties. For example, if the structure in question has:
- fields that specify count/length/size – we could corrupt them with another value, and covert the UAF into OOB.
- unions – well, everything inside that union is up for type confusion now.
However - what if the pointers inside these structures are signed using dataPAC, with a type as auxiliary? That’s right, dataPAC just got way up the chain, and it targets something highly closer to the 1st order primitive. Apple actually created here a scenario of “this UAF is not exploitable”, even though the structure has unions, because you need a PAC bypass to create this confusion.
That’s exactly what we should aim for when we build mitigations. The scenario of “hey, we have this new memory safety bug, but we can say that without further primitives, it’s not exploitable”.”
That’s why the combination of kalloc_type with dataPAC is so powerful - it mitigates 1st order primitives and dramatically decreases the amount of exploitable temporal safety bugs.
Note: this is done without new hardware, and Apple found a way to make the perf/memory overhead costs to work out. Some people like to say “unlike DRAM, which is expensive, virtual address space is free. Why do you excited about it so much?”. This is a mistake. There are some costs to keep in mind:
- First of all, virtual address space is not free - it costs PTEs.
- Bookkeeping: if you want sequestering, you need metadata to keep track of that sequestering. This is 16 bytes per page (see
zone_page_metadata
, osfmk/kern/zalloc.c). Which means it grows lineraly with the number of pages, and it increases the cost of PTEs. - TLB: we might affect TLB pressure.
Apple had a super impressive effort here, and it’s amazing that XNU has such powerful temporal safety these days (up to same-signature/type-confusions).
AUTOSLAB, grsecurity
AUTOSLAB has interesting security properties, one of them is actually type-based segregation for temporal safety using dedicated caches. Grsecurity independently implemented a very similar approach to kalloc_type in about the same timeframe (different implementation, same concept). See grsecurity’s great blogpost about it (tweet, blogpost) about it. Grsecurity even discussed same-type-confusion attacks in another blogpost (tweet, blogpost).
Firebloom, Apple
Up to this day, I don’t think I found something I love more than reverse engineering. I spent some time reversing Apple’s Firebloom compiler instrumentation in iBoot, and it was more fun than playing any video game.
TL;DR: Firebloom introduces something exceptionally powerful. It’s a compiler instrumentation that modifies pointers to carry more information than just virtual addresses. With Firebloom, pointers now carry metadata describing the allocations:
- exact bounds: lower bound, upper bound
- the type of the allocation, if there is one
The structure describing reference to memory is defined as follows:
00000000 safe_allocation struc ; (sizeof=0x20, mappedto_1)
00000000 raw_ptr DCQ ? ;
00000008 lower_bound_ptr DCQ ? ;
00000010 upper_bound_ptr DCQ ? ;
00000018 type DCQ ? ;
00000020 safe_allocation ends
Then, the compiler adds the following checks:
- bounds checks: before dereferencing each pointer, the compiler adds checks for lower and upper bounds, to make sure there isn’t any access OOB. Basically, this is a CHERI implemented in software. It introduces bounded references to memory, and every dereference has to be checked to be in bounds. Therefore, Firebloom deterministically kills all spatial safety bugs!
- type checks: the compiler adds type checks to avoid illegal casts, etc.
I know it sounds highly expensive (perf, memory, code size), and I explained why it makes perfect sense in the “Sum up / thoughts” section in my first blogpost. Copy-paste myself is always fun:
“It’s great to see more work on memory safety, and it’s always great to have more new stuff to look into.
This change is interesting. It certainly helps mitigate some of the memory safety vulnerabilities; however - it’s quite expensive in a few different ways:
- memory overhead: these new pointers take 0x20 bytes of memory, instead of 0x8. Representations of references to memory that are protected this way, consume x4 memory.
- code size: clearly code size increases - more instructions to manage the new metadata, more branches, more checks, etc.
- perf: a lot of dereferences now are wrapped with more instructions (that loads data from memory), which impacts performance.
I obviously didn’t measure these overheads between old/new versions of iBoot, so it’s all theoretical. But I believe it’s safe to assume this cost exists, and Apple found a way to make it work.
I know it sounds bad when I list it this way, but to be honest - iBoot is just the place for such a change. I would be highly surprised if Apple (or any other vendor) could pull off such an expensive change in the kernel, but iBoot is a very lightweight, contained environment. It has access to the entire DRAM, and it has a very limited and specific purpose. And it makes sense to protect the second stage bootloader, which is a critical part of the secure boot process.
This is a great example of another effort on Apple’s behalf, which improves security by mitigating a lot of 1st order primitives.”
I encourage you to read all about that in my reversing and documentation about iBoot Firebloom:
- Introduction to Firebloom (iBoot): tweet, blogpost
- Firebloom (iBoot) - the type descriptor: tweet, blogpost
GuardedMemcpy, snmalloc, Microsoft
Unlike all the mitigations mentioned here, GuardedMemcpy probably has the lowest impact:
- it targets specific runtime/libraries functions rather then all memory accesses
- it’s very elegant and efficient - very low perf overhead
Nonetheless, it kills 1st order primitives. So I decided to include it here.
I highly encourage you to read the documentation about snmalloc GuardedMemcpy (to be honest, I encourage you read about snmalloc in general). The TL;DR is that we can implement a very efficient logic to calculate remaining_bytes
, which lets us add bounds checks in certain functions (such as memcpy
, for example) and kill OOBs. I think the doc page explains it so well, so I’ll simply copy-paste the relevant part here:
“All slab sizes are powers of two, and a given slab’s lowest address will be naturally aligned for the slab’s size. (For brevity, slabs are sometimes said to be “naturally aligned (at) powers of two”.) That is if x
is the start of a slab of size 2^n
, then x % (2^n) == 0
. This means that a single mask can be used to find the offset into a slab. As the objects are layed out continguously, we can also get the offset in the object with a modulus operations, that is, remaining_bytes(p)
is effectively:
object_size - ((p % slab_size) % object_size)
Well, as anyone will tell you, division/modulus on a fast path is a non-starter. The first modulus is easy to deal with, we can replace % slab_size
with a bit-wise mask. However, as object_size
can be non-power-of-two values, we need to work a little harder.”
As you might guess, this process ends up with a link to Daniel Lemire’s article, More fun with fast remainders when the divisor is a constant.
Interesting bonus: the same technique was added to zalloc.c in XNU (tweet), see copy_validate
(osfmk/arm64/copyio.c).
CastGuard, Microsoft
Type confusion is a very challenging problem. As I stated in my BlueHatIL 2022 MTE talk, when I say “type confusions”, I’m referring to vulnerabilities where the 1st order primitive is a pure, beautiful, straigtforward type confusion. If you take some OOB/UAF and create a type confusion on top of that, mitigations that kill OOB/UAF at their root cause will break your exploit. However, if the root cause of the bug is a straigtforward type confusion, this becomes a different story.
Type confusions come in many flavors, and while this bug class is less common than others, it is an important one since it bypasses most of the mitigations we discussed (MTE, CHERI, …). A really interesting mitigation called CastGuard challenges this bug class. I’ll encourage you to check out Joe Bialek’s talk from Blackhat USA 2022 about it (tweet, slides). Covers a great research and a great mitigation.
dataPAC for type confusions, Apple
In the Apple ecosystem, PAC has a few different purposes:
- iPAC, which is used for CFI (backward-edge and forward-edge). This is of course not in the scope of this blogpost because it doesn’t target 1st order primitives, and it’s old news.
- dataPAC, which I already mentioned in the context of kalloc_type and detailed how it got up the exploit process, therefore helps killing 1st order primitives of temporal safety bugs.
In general, dataPAC reduces the number of possible exploitable type confusions between pointers. Type confusions come in many forms and create very powerful exploitation primitives. The interesting point is that dataPAC can help with addressing a lot of these primitives in contained environments, such as IOKit, Objective-C, etc..
For example, a trivial attack is to replace a vtable pointer of type A with a vtable pointer of type B, creating the scenario of calling methods of B instead of A, while operating on incorrect state/fields/registers. Another great example is the isa-ptr in Objective-C, which is now signed with PAC (for context, see a lot of the excellent research published by Samuel GroĂź (one, two)). In many cases, such behaviors (that go through vtables/isa-ptr) happen quite close to the 1st order primitive.
While dataPAC targets a specific set of structures (it’s not a wide mitigation you enable for every structure in one shot), it helps mitigate possible type confusions by signing pointers with auxiliaries. That’s very cool!
Attack surface reduction / sandboxing
Reducing attack surfaces across security boundaries has a very high ROI. It’s not always possible (in many cases you can’t give up on sets of features) and it may require a lot of busy work (sandboxing / isolation), but when you can pull it off, it introduces a very high security value. Less attack surface means less bugs. I would like to give a few examples of amazing moves in our industry. And yes, I know, we all have the same example in mind:
- In iOS 16 Apple announced on AMAZING new configuration/mitigation called Lockdown mode. A serious, significant reduce in features and attack surfaces, in favor of better security. Luca (the one and only :)) of course mentioned this in his fantastic Hexacon’s keynote and you can see it in this great Twitter thread about highlights from the keynote.
- In iOS 16 Apple moved the wifi stack out of the kernel to userspace (tweet, tweet).
Safe language solutions
With all due respect to mitigations, they don’t fix the actual problem. The approach of most of the memory safety mitigations is to detect the error in runtime and fastfail the process/system. In other words, the approach is to “convert memory corruption primitives into DoS”.
Note: there is a weird discussion in the world about “is triggering a crash the right thing to do when detecting memory corruption?”. This discussion is weird because the answer is trivially YES. It is significantly better (actually, it’s necessary) to crash rather than have someone corrupt memory and keep the execution running. Memory corruption means compromise of the system’s integrity/confidentiality. It means arbitrary code execution. Which means that among all the bad things the attacker can do, they can DoS your process/system. Actually, from the user’s point of view, intentionally crashing the process/system is usually the best thing they can hope for. Letting someone keep running after you know they corrupted memory is extremely irresponsible. If such a thing happened, it clearly means the code has a severe bug.
While fastfail is a really good reaction to memory corruption, there is something even better - let’s use safe programming languages that simply don’t allow bugs to exist to begin with. There is an obvious problem here - we built our industry upon unsafe programming languages, and it’s incredibly expensive to rewrite everything (it’s unrealistic). However - at the bare minimum, we could develop future code in a safe language (not “safer”, since anything is safer than C).
Rust
First of all, let me say this: Rust is amazing. It’s the right, wise thing to do.
There have been so many publications about Rust. I would just like to briefly cover why Rust answers on all the memory safety properties we talked about. I won’t cover everything, just the most important / unique ones.
Temporal safety
I especially like how Rust manages objects’ lifetimes. Rust isn’t a managed language (there isn’t a GC), and it doesn’t let programmers decide when to free objects (which is good, humans make mistakes). Instead, Rust makes sure it knows all the objects’ lifetimes in compile-time, and adds all the destructions itself. Suppose Rust can’t figure out from your code the lifetime of some instance (and the instance isn’t static, which means it lives until the process/system terminates). In that case, Rust will tell you nicely: “hey, I’m sorry, but your code isn’t good enough. Please take a step back, understand what you want to do, well-define it, and come back to me. Good luck!”.
To quote the great book “Programming Rust” (by Jim Blandy && Jason Orendorff, page 104):
“This is the essence of the process Rust uses for all code. Bringing more language features into the picture - data structures and function calls, say - introduces new sorts of constraints, but the principle remains the same: first, understand the constraints arising from the way the program uses references; then, find lifetimes that satisfy them. This is not so different from the process C and C++ programmers impose on themselves; the difference is that Rust knows the rules, and enforces them.”
With such an amazing approach, Rust enforces 100% temporal safety by design. You can’t have UAFs/double frees/dangling pointers if the compiler can prove there isn’t an access to an object outside of its lifetime.
Example: let me drop here one exaple for how Rust makes sure you can’t break the rules. Rust has two famous traits: Drop and Copy:
- Drop: When a value’s owner goes away, we say that Rust drops the value. When Rust drops a value, it has to operate like a d’tor - free all the other values, heap allocations, and system resources the value owns. Drops happen under a variety of circumstances: when a value goes out of scope, when you truncate a vector, removing elements from its end, and so on. Or in other words - when it removes the value from the ownership tree somehow.
- Copy: When the programmer copies instances of types, Rust must decide whether to move or copy them. There is a significant difference - move means moving the value and leaving its source as uninitialized, while copy means creating a copy of the value by shallow-copy the bits bit-by-bit.
Rust permits values to have the Copy marker only if a shallow bit-by-bit copy is all it needs. Types that own any other resources (heap allocations/handles/fds/etc.) cannot implement Copy. Which leads us to the following basic rule (quoting Programming Rust, page 289): “Any type that implements the Drop trait cannot be Copy. Rust persumes that if a type needs special cleanup code, it must also require special copying code, and thus can’t be Copy.”
Indeed! If some value needs a special cleanup code (free/close/etc.), it doesn’t make any sense to shallow-copy it. This is one example of how dangling pointers and other variations of temporal safety fun begin. Therefore, saying “Drop cannot be Copy” makes a lot of sense. This is what you have with Rust. Rules and restrictions that make sense.
Concurrency safety
For concurrency safety, Rust enforces a strict rule that every value has one (and only one) owner with mutable access at any given point in time. You can pass an immutable reference to concurrent threads, but you cannot do that if there is a mutable reference to the value. To quote from the great page Fearless Concurrency with Rust:
“Each reference is valid for a limited scope, which the compiler will automatically determine. References come in two flavors:
- Immutable references
&T
, which allow sharing but not mutation. There can be multiple&T
references to the same value simultaneously, but the value cannot be mutated while those references are active. - Mutable references
&mut T
, which allow mutation but not sharing. If there is an&mut T
reference to a value, there can be no other active references at that time, but the value can be mutated.
Rust checks these rules at compile time; borrowing has no runtime overhead”.
You know what I want to say about it, so let’s say it together - Rust makes sense.
Basically, I believe once you have unconstrained concurrent mutation in a language, it gets everywhere.
Definite initialization
That’s easy. Use of uninitialized memory simply cannot happen in Rust; the compiler won’t compile code that uses an uninitialized value. Everything is initialized, and there is no possible scenario to create an uninitialized value. For example, we talked about moving a value. In Rust, when you move a value, the source relinquishes ownership of the value to the destination and becomes uninitialized. From this point forward, the destination controls the value’s lifetime.
I have a lot to say about Rust - how the binary looks like, how the compiler work, perf, memory usage, etc.; however, this blogpost is not the place for that (it gets too long). Fortunately, there is a lot of amazing public information all over the internet.
Rust - status, information and community:
Many companies started to act a lot around Rust: encouraging developers to use it, raising awareness for the many advantages this great language introduces, and releasing many free, open-source materials to help newcomers get familiar with the language and its ecosystem. Some great examples:
- Microsoft: “Take your first steps with Rust” (tweet, training)
- The Android team has open sourced their internal Rust Training (tweet, training)
- AWS:Â Sustainability with Rust
- Meta: A brief history of Rust at Facebook (tweet, blogpost)
- This Week In Rust
Of course, it’s not just trainings, tweets and Mastodon posts! There are fantastic, concrete steps towards Rust in production:
- Memory Safe Languages in Android 13 (tweet, blogpost). A great quote from this great blogpost: “As Android migrates away from C/C++ to Java/Kotlin/Rust, we expect the number of memory safety vulnerabilities to continue to fall.”
- Windows dwrite font parsing (tweet)
- Linux kernel:Â tweet
- Tales of the M1 GPU: Mastodon post, blogpost
Something that makes me very happy is that the AMAZING, OUTSTANDING Asahi Linux project adopts Rust as well! Asahi Linux is a fantastic project that publishes great content and builds a fantastic Linux distribution. See Lina’s great thread about her experience with Rust. Specifically, this tweet:
“All the concurrency bugs just vanish with Rust! Memory gets freed when it needs to be freed! Once you learn to make Rust work with you, I feel like it guides you into writing correct code, even beyond the language’s safety promises. It’s seriously magic!…”
In general, I highly recommend following up on that project. Some folks to follow for great content:
- marcan: @[email protected]
- Alyssa Rosenzweig: @[email protected]
- Asahi Lina: @[email protected]
- Sven Peter: @[email protected]
Swift
I didn’t talk much about Swift until now. The trigger for my interest in Swift and the reason it’s part of this blogpost is something mentioned in WWDC22. I found it very exciting. Quoting the following tweet by Jacques Fortier:
“I’m so proud of the Secure Enclave team’s work on iOS 16. The Secure Enclave is the foundation for a bunch of new security features announced this year and has major improvements under the hood this year. Including this bit of Swift news!…”
The link in the tweet is to the video: “What’s new in Swift”. At 5:01, the video reads:
“…Swift underwent some major changes this year. To make the standard library smaller for standalone, statically linked binaries, we dropped the dependency on an external Unicode support library, replacing it with a faster native implementation. Smaller, faster binaries are a huge benefit when running on event-driven server solutions. You get static linking on Linux by default to better support containerized deployments for the server. This size reduction makes Swift suitable for even restricted environments, which allowed us to use it in Apple's Secure Enclave Processor. Swift is useful from apps to servers all the way down to restricted processors; tying it all together is the package ecosystem…”
This is mind-blowing. It goes without saying that the Secure Enclave is a critical component, and having a safer language used for building parts of the TCB is very exciting. For me, it was surprising because while Swift seems like an excellent fit for apps and high-level programming, it looks less fit for low-level OS development. However, now that it has made its way into the SEP, I believe this blogpost should also discuss Swift’s security properties. It’s common knowledge, but everything that gets into such a critical component as the SEP is important to talk about.
Stable ABI
Before I start being that annoying kid that talks about security, I’d like to point out that, unlike Rust, Swift has a table ABI (see this amazing blogpost). That’s huge, and I would like to quote the fantastic author, Aria Beingessner, because it reflects what I’m feeling: “The result is something I find endlessly fascinating, because I think Swift has pushed the notion of ABI stability farther than any language without much compromise.”
Security properties
Security-wise, Swift has a ton of great properties like Rust, such as:
- It is a type safe language
- It enforces spatial safety
- It covers definite initialization perfectly (see Initialization in the Swift book)
- For lifetimes, Swift uses automatic reference counting (ARC)
In addition, incorrect operations such as reading a value while modifying it, are forbidden in Swift. A classic example of “reading a value while modifying it” can be seen in the following example:
var numbers = [3, 2, 1]
numbers.removeAll(where: { number in
number == numbers.count
})
Swift will prevent such behavior because there is an obvious overlap between accesses to the array, and modifying the array requires exclusive access.
Because I talked in detail about how such properties are achieved at the language level, I won’t repeat myself (this blogpost is too long as it is). Swift does things differently but achieves the same properties and guarantees. Instead of repeating myself, I’ll focus on the main important difference: concurrency safety.
While Swift built the basic blocks for developers to safely develop concurrency (using Structured Concurrency), the key question is “are there other ways to use concurrency?”. If you could use other less-safe means to have concurrency, then safety might break, and we depend on the developer. This is where problems begin. To read more, see the Concurrency page.
The important point here is as follows: Swift is not memory safe in the presence of unstructured concurrency. However, it’s important to be fair and note that you shouldn’t be able to trigger memory corruptions in the presence of structured concurrency.
Note: Specifically for the SEP, this is probably not a big deal. But for an OS development, it’s critical.
Example - unstructured concurrency breaks memory safety
Unstructured concurrency makes it possible to introduce data races. Let’s see an example that creates a UAF on a ARC object:
import Foundation
typealias A = String
struct B {
var x: Int64
var y: Int64
var z: Int64
var w: Int64
}
enum TaggedType {
case A_type(A)
case B_type(B)
}
var resource = TaggedType.A_type( A("") )
let backgroundQueue = DispatchQueue(label: "com.app.queue", attributes: .concurrent)
backgroundQueue.async {
while(true) { resource = TaggedType.A_type( A( String(repeating:"C", count: 0x20) ) )}
}
backgroundQueue.async {
while(true) { resource = TaggedType.B_type( B( x: 0x4141414141414141, y: 0x4141414141414141, z: 0x4141414141414141, w: 0x4141414141414141) )}
}
while(true) {}
Build and run:
saaramar@Saars-Air swift % lldb ./type_poc
(lldb) target create "./type_poc"
Current executable set to '/Users/saaramar/Documents/projects/swift/type_poc' (arm64).
(lldb) run
Process 85466 launched: '/Users/saaramar/Documents/projects/swift/type_poc' (arm64)
Process 85466 stopped
* thread #3, queue = 'com.app.queue', stop reason = EXC_BAD_ACCESS (code=1, address=0x141414141414140)
frame #0: 0x00000001a12f09b8 libobjc.A.dylib`objc_release + 8
libobjc.A.dylib`objc_release:
-> 0x1a12f09b8 <+8>: ldr x8, [x0]
0x1a12f09bc <+12>: and x9, x8, #0x7ffffffffff8
0x1a12f09c0 <+16>: ldr x10, [x9, #0x20]
0x1a12f09c4 <+20>: tbz w10, #0x2, 0x1a12f0a24 ; <+116>
Target 0: (type_poc) stopped.
(lldb)
Indeed, memory safety is broken. We introduced a UAF and triggered a segfault in objc_release because we replaced the object between the initialization and the free. As you can see, this sequence is isa-ptr fetching vtable sequence, which means we can replace this type with another type, drop there an isa-ptr of another type, and have a powerful type confusion.
Note: It’s worth mentioning that while it’s disabled by default, the compiler flag -strict-concurrency=complete
deals with that behavior and refuses to build this code.
Swift 6?
It’s important to mention that Swift refined their concurrent safety model. As was mentioned in the “What’s new in Swift” video, Apple set a new goal to Swift 6: to advance from memory safety to thread safety. It’s also mentioned in the interesting announcements post “On the road to Swift 6”:
“What will differentiate Swift 6 from the Swift 5.x releases will be a significant change in the capabilities of the language. At this point, that change is improved concurrency support, and further progress towards the memory ownership model as outlined in John McCall’s ownership manifesto. These are major changes to the language that will take discussion, investigation, and time to implement. Instead of announcing a specific timeline for “Swift 6”, the plan is for the community to be a part of seeing these efforts progress, with focused efforts and goals, and we release Swift 6 when those efforts culminate. It will be an exciting journey, and I am proud to be part of this community that will make it happen.”
I’m looking forward to see how this model would look and how it would affect Swift programming.
Safety in low level
It’s obvious that for new code (and for a lot of existing “security sensitive” components) safe languages are a far better choice than anything else. It literally means “let’s have no memory safety bugs”. And a safe systems language such as Rust is going to be a fantastic fit for a lot of the performance-critical code. However, there are some cases where Rust security properties could not be enforced, and these are the low-level core parts of our OSes.
The core parts of OSes (bootloaders, MM, etc.) involve doing unsafe things. A classic example is memory allocators. The allocator has to construct a notion of objects out of a flat address range. Safe Rust can’t express these things and unsafe Rust doesn’t give us significant benefits over modern C++ code that uses the type system to convey security properties.
These cases are interesting - on the one hand, they are a tiny part of the codebase. On the other hand, these tiny parts usually run with high privileges and dramatically impact the overall system. In such cases, MTE/CHERI play pretty nicely - they help ensure that whatever bugs we have in these areas are killed at their root cause (probabilistically/deterministically).
This is exactly why MSR, MSRC and Azure Silicon pushed for this AMAZING project of CheriIoT (tweet, blogost): scaling CHERI down to RISC-V32E, the smallest core RISC-V specification. I’m very excited about this project, and I hope once we will open-source the ISA and the prototype, more folks across the industry could join.
Huge impact on the IoT space
I’m very excited about this work. Scaling CHERI down to small cores could be a life-changer to the IoT and embedded ecosystems. It’s pretty depressing that these ecosystems are built upon a massive set of different codebases written in unsafe programming language and (in most cases) with 0 mitigations. These CHERI RV32E cores could be a fantastic solution - get the new hardware, rebuild your codebase with the new toolchain, and get powerful security properties!
The new memory-safe microcontroller design deterministically kills spatial and temporal safety and allows compartmentalization RTOS.
Sum up
Post memory safety days
I hope this blogpost helps summarize some of the exciting approaches to mitigate memory safety.
Many people like to ask “what will be the future after memory safety is achieved?”. Well, even in a perfect world, where everyone builds everything in memory-safe languages which hit all the properties we defined, there would still be:
- unsafe blocks
- logic bugs and design flaws
However, while looking forward to such a future, I’m afraid we have a long journey until we get there.
Credit
Huge thanks to David Chisnall for the great feedback! The next round of beers is on me :)
Thanks,
Saar Amar (@[email protected])