CVE-2022-32250 _ Exploit Linux Kernel Exploit with mqueue
🌰

CVE-2022-32250 _ Exploit Linux Kernel Exploit with mqueue

⚠️ [ ORIGIN SOURCE ]
https://blog.theori.io/research/CVE-2022-32250-linux-kernel-lpe-2022/
📅 [ Archival Date ]
Oct 29, 2022 3:11 PM
🏷️ [ Tags ]
CVE-2022-32250LINUXLPE
✍️ [ Author ]

Theori

💣 [ PoC / Exploit ]
https://crash.link/cve-2022-32250

Background

Netfilter is a framework in the Linux kernel for implementing various networking-related tasks with user-defined handlers. Netfilter provides various functions for packet filtering, network address translation and port translation, and packet logging. Netfilter represents a set of hooks that allow other kernel modules to register callback functions in the kernel’s networking stack.

nftables is a component of Netfilter that filters or reroutes packets according to user-defined rules. nftables supports sets to make it easier to use multiple IP addresses, port numbers, etc. in a single rule. sets can be represented using braces when defining rules (e.g., {22, 80, 443}), and sets types include ipv4_addr, ipv6_addr, ether_addr, inet_proto, inet_service, and mark.

nftables have tables, chains, rules, and expressions to store and process instructions. tables contain several chains and are linked to protocols such as IP and IP6. chains include several rules and the types of network traffic information to be processed. rules contain several expressions, and the information received by chains is evaluated as rules inside chains. expressions evaluate whether the input satisfies a set of conditions. How-The-Tables-Have-Turned-CVE-2022-1015-1016

Root Cause Analysis

PoC referenced oss-security - Linux Kernel use-after-free write in netfilter

CVE-2022-32250 is a use-after-free vulnerability in the Netfilter subsystem. The vulnerability occurs when a new nftset is added with a NFT_MSG_NEWSET command. When processing lookup and dynset expressions, freed chunk remains in set->binding list due to an incorrect NFT_STATEFUL_EXPR check. For this reason, use-after-free write occurs.

This vulnerability starts from nft_expr_init. nft_expr_init calls nf_tables_expr_parse and allocates memory for an expr. Afterwards, nf_tables_newexpr initializes expr.

Since the structure is different depending on the type of expr, a suitable structure is stored in data[]. For example, the lookup expression contains struct nft_lookup in data[]

struct nft_expr {
    const struct nft_expr_ops   *ops;
    unsigned char           data[]
        __attribute__((aligned(__alignof__(u64))));
};

struct nft_lookup {
    struct nft_set * set;
    u8 sreg;
    u8 dreg;
    bool invert;
    struct nft_set_binding binding;
};

struct nft_set_binding {
    struct list_head list;
    const struct nft_chain * chain;
    u32 flags;
};

nf_tables_newexpr calls ops->init by referring to ops of expr. In the case of lookup, ops->init contains nft_lookup_init.

nft_lookup_init calls nf_tables_bind_set. nf_tables_bind_set binds the expr in set->binding at [1].

When the execution of the nft_expr_init is completed, it returns to the caller: nft_set_elem_expr_alloc. If expr->ops->type->flags is not NFT_EXPR_STATEFUL, go to err_set_elem_expr and call nft_expr_destroy to remove the expression added to set.

nf_tables_bind_set is linked to expr, so call nf_tables_expr_destroy to remove it.

void nft_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr)
{
    nf_tables_expr_destroy(ctx, expr);
    kfree(expr);
}

nf_tables_expr_destroy calls expr->ops->destroy. For lookup expression, ops->destroy contains nft_lookup_destroy.

static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
                   struct nft_expr *expr)
{
    const struct nft_expr_type *type = expr->ops->type;

    if (expr->ops->destroy)
        expr->ops->destroy(ctx, expr);
    module_put(type->owner);
}

nft_lookup_destroy calls nf_tables_destroy_set and nf_tables_destroy_set tries to destroy set. However, &set->bindings contain the previously assigned expr, so nft_set_destroy cannot be called. Therefore, the set remains as it is, and the allocated expr still exists in the binding of the set.

static void nft_lookup_destroy(const struct nft_ctx *ctx,
                   const struct nft_expr *expr)
{
    struct nft_lookup *priv = nft_expr_priv(expr);

    nf_tables_destroy_set(ctx, priv->set);
}

void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set)
{
    if (list_empty(&set->bindings) && nft_set_is_anonymous(set))
        nft_set_destroy(ctx, set);
}

Although expr remains in the list, nft_expr_destroy frees the assigned expr at [2]. So, if we try to bind another expr to that set again, use-after-free write will occur.

void nft_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr)
{
    nf_tables_expr_destroy(ctx, expr);
    kfree(expr);                            // [2] UAF occurred!
}

Exploitation

The exploitation is performed on Ubuntu 22.04 with a 5.15.0-27-generic kernel.

The exploit has three main steps:

  1. Leak the heap address using struct user_key_payload.
  2. Leak text address using mqueue to get KASLR.
  3. Overwrite modprobe_path.

Leak Heap Address

For Linux kernel exploitation, struct msg_msg is the widely used primitive since it can perform both arbitrary reads and write by modifying the size field.

On the 5.15.0-27-generic kernel, struct msg_msg is allocated to GFP_KERNEL_ACCOUNT, while the struct nft_lookup is allocated to the GFP_KERNEL. If the GFP_KERNEL_ACCOUNT flag is set, the kernel will use the kmalloc-cg-xx slab. Otherwise, the kernel will use the kmalloc-xx slab if the GFP_KERNEL flag is set.

For this reason, we used user keyring to leak information. CVE-2022-34918-LPE-PoC

User keyring uses struct user_key_payload and it has the following structure. We can control the size of struct user_key_payload by changing the datalen.

struct user_key_payload {
    struct rcu_head rcu;
    unsigned short  datalen;
    char        data[] __aligned(__alignof__(u64));
};

struct callback_head {
    struct callback_head *next;
    void (*func)(struct callback_head *head);
} __attribute__((aligned(sizeof(void *))));
#define rcu_head callback_head

struct user_key_payload is allocated from user_preparse.

int user_preparse(struct key_preparsed_payload *prep)
{
    struct user_key_payload *upayload;
    size_t datalen = prep->datalen;

    if (datalen <= 0 || datalen > 32767 || !prep->data)
        return -EINVAL;

    upayload = kmalloc(sizeof(*upayload) + datalen, GFP_KERNEL);
    if (!upayload)
        return -ENOMEM;

    ...

    return 0;
}

struct user_key_payload overlaps with the UAF’d chunk generated as described previously. user_key_payload has a data field that starts from 0x18, so arbitrary R/W is possible. Also, the UAF chunk (struct nft_lookup) has a linked list. For this reason, if UAF is triggered twice, the second UAF address is written at the linked list location of the chunk where the UAF first occurred. Therefore, using UAF write, another UAF address is written to nft_lookup->binding->next. It corresponds to the user_key_payload->data[0:8] field. Consequently, it is possible to leak heap addresses by reading user_key_payload->data.

struct user_key_payload {
    struct rcu_head rcu;        /* 0    10*/
    unsigned short  datalen;    /* 10   2 */
    /*          padding         */
    char    data[] __aligned(__alignof__(u64)); /* 18   -- */
};

Leak KASLR

msg_msg is widely used to exploit the Linux kernel. However, this vulnerability is only possible to write an object’s 0x18 field, so it is hard to exploit with the existing method.

However, in the Linux kernel source code, mqueue has functions suitable for exploitation. mqueue manages multiple messages, and they are managed by struct posix_msg_tree_node. It is allocated from [3].

struct posix_msg_tree_node is allocated in kmalloc-64 and it is implemented as follows. The size of struct rb_node is 0x18 so msg_list->next has an offset of 0x18.

struct posix_msg_tree_node {
    struct rb_node      rb_node;
    struct list_head    msg_list;
    int         priority;
};

struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

In the figure below, UAF 1 and UAF 2 are the first and second use-after-free’d struct nft_expr, respectively. Each struct nft_expr contains struct nft_lookup in nft_expr->data.

If we allocate UAF 1 and UAF 2 in a particular order, they will be connected as shown below.

  1. Allocate UAF 1 chunk with the vulnerability.
  2. Overwrite struct posix_msg_tree_node to UAF 1 by using do_mq_timedsend function.
  3. Write (UAF 1)->binding->next = (UAF 2)->binding and create UAF 2.
  4. Overwrite struct user_key_payload to UAF 2 by using the keyctl function.

In this case, msg_list->next of struct posix_msg_tree_node becomes UAF 2+0x18, and msg_list connected with struct msg_msg list. Therefore, the data[] in UAF 2 is the same as overlapped with struct msg_msg. It is organized as follows:

data[0] = m_list->next   /   data[1] = m_list->prev   /   data[2] = m_type   /   data[3] = m_ts

struct msg_msg {
    struct list_head m_list;
    long m_type;
    size_t m_ts;        /* message text size */
    struct msg_msgseg *next;
    void *security;
    /* the actual message follows immediately */
};

Accordingly, the structure of struct msg_msg can be modified. However, in the case of copy_to_user, it is impossible to copy more than their slab size (currently kmalloc-64). Therefore, KASLR is leaked only when appropriate structures are allocated below UAF 2.

Previously, struct percpu_ref_data was used for the KASLR leak of kmalloc-64. Unfortunately, kfree(msg_msg->security) is performed at free_msg function. If struct percpu_ref_data is allocated below UAF 2, kernel crash occurs at free(count) because the count assigned by the io_uring functions is 0x800000000000000001. Therefore, we use struct user_key_payload again.

struct percpu_ref_data {
    atomic_long_t       count;
    percpu_ref_func_t   *release;
    percpu_ref_func_t   *confirm_switch;
    bool            force_atomic:1;
    bool            allow_reinit:1;
    struct rcu_head     rcu;
    struct percpu_ref   *ref;
};

In struct user_key_payload, struct rcu_head exists. struct rcu_head is a structure designed to wait for a critical section to be terminated. When the critical section is terminated, a callback function (rcu->func()) is called. rcu_head stores another rcu_head address at the next pointer.

Therefore, if this chunk exists below UAF 2, a KASLR leak can be performed.

struct user_key_payload {
    struct rcu_head rcu;
    unsigned short  datalen;
    char        data[] __aligned(__alignof__(u64));
};

struct callback_head {
    struct callback_head *next;
    void (*func)(struct callback_head *head);
} __attribute__((aligned(sizeof(void *))));
#define rcu_head callback_head

The function do_mq_timedreceive is used to read the struct msg_msg inside the mqueue. do_mq_timedreceive calls msg_get to get struct msg_msg [4]. msg_get refers to the leaf node to get the first struct msg_msg [5] and calls list_del [6] to unlink the first struct msg_msg from the linked list.

Afterward, store_msg executes copy_to_user to send the data [7]. Then, free_msg frees struct msg_msg and msg_msg->security [8]. KASLR leak is possible because msg_msg->data contains a function address: rcu->func.

modprobe PATH overwrite

In the msg_get function, list_del unlinks struct msg_msg from the linked list. In this case, msg->prev->next=msg->next, msg->next->prev=msg->prev linking occurs.

The base address of the kernel’s heap is 0xffff????00000000. Therefore, we can write 0xffff????(our input) at any kernel address.

modprobe_path contains /sbin/modprobe initially, so if the data[0] and data[1] have modified structure as (modprobe_path + 0x1 - 0x8) and 0xffff????2f706d74, we can write 0xffff????2f706d74 in modprobe_path + 0x1 (0x2f706d74 is /pmt). Also, we already know ???? from heap leak.

As a result, modprobe_path can be changed into /tmp/????\xff\xffprobe.

Full exploit code is available at our GitHub repo. Note that this is only intended for educational/research purpose and you may not use it to cause any harm or damage.

Patch

nft_expr_init should check expr_info.ops->type->flags. Allocate expr chunk only when the flag is NFT_EXPR_STATEFUL.

Conclusion

In this post, we have shown the process of exploiting CVE-2022-32250. We were able to leak KASLR and overwrite modprobe_path by utilizing the mqueue functions, and as a result, we successfully gained root privileges in Ubuntu 22.04.

Reference

  • How-The-Tables-Have-Turned-CVE-2022-1015-1016
  • oss-security - Linux Kernel use-after-free write in netfilter
  • randorisec/CVE-2022-34918-LPE-PoC