Maddle Stone, Ned Williamson and Nathan Wachholz
Two security patch for libxml2 on macOS Ventura 13.0.1, iOS 16.1.1 and iPadOS 16.1.1
Released November 9, 2022
libxml2
Available for: macOS Ventura, iPhone 8 and later, iPad Pro (all models), iPad Air 3rd generation and later, iPad 5th generation and later, and iPad mini 5th generation and later
Impact: A remote user may be able to cause unexpected app termination or arbitrary code execution
Description: An integer overflow was addressed through improved input validation.
CVE-2022-40303: Maddie Stone of Google Project Zero
libxml2
Available for: macOS Ventura, iPhone 8 and later, iPad Pro (all models), iPad Air 3rd generation and later, iPad 5th generation and later, and iPad mini 5th generation and later
Impact: A remote user may be able to cause unexpected app termination or arbitrary code execution
Description: This issue was addressed with improved checks.
CVE-2022-40304: Ned Williamson and Nathan Wachholz of Google Project Zero
[CVE-2022-40303] Integer overflow in xmlParseNameComplex
Libxml2 is vulnerable to an integer overflow in xmlParseNameComplex when an attribute list has a very long name (name is >= 2**32 characters).
static const xmlChar *xmlParseNameComplex(xmlParserCtxtPtr ctxt) {
int len = 0, l;
[...]
return (xmlDictLookup(ctxt->dict, ctxt->input->cur - len, len));
}If the name is greater than or equal to 2**32 characters, then len overflows. The calculation for the second argument to xmlDictLookup (ctxt->input->cur - len) will point to an address outside of the buffer such as adding 0x80000000 to cur.
Exploiting this issue using static XML requires that the XML_PARSE_HUGE flag is used to disable hardcoded parser limits. Though similar to Felix’s report (CVE-2022-29824) it may be possible to trigger without the flag using XSLT or xpath though I didn’t look into this.
Note: XML_PARSE_HUGE looks very brittle in general. Signed 32-bit integers are widely used as sizes/offsets throughout the codebase, a lot of the helper functions don’t handle inputs larger than 4GB correctly and fuzzers won’t trigger these edge cases. Maybe that flag should include a security warning? Some security critical projects like xmlsec enable it by default (https://github.com/lsh123/xmlsec/commit/3786af10953630cd2bb2b57ce31c575f025048a8) which seems risky.
Proof of Concept:
$ python3 -c 'print("<!DOCTYPE doc [\n<!ATTLIST src " + "a"*(0x80000000) + " IDREF #IMPLIED>")' > /tmp/name_big.xml
$ ./xmllint --huge /tmp/name_big.xml
[CVE-2022-40304] Double-free when parsing default attributes
Test Case
<!DOCTYPE A SYSTEM ""
[
<!ENTITY ENT_A SYSTEM "" NDATA A>
<!ENTITY ENT_B "&ENT_A;&ENT_B;">
<!ATTLIST A C CDATA "">
<!ATTLIST A D CDATA "">
<!ATTLIST A E CDATA "">
<!ATTLIST A F CDATA "">
<!ATTLIST A G CDATA "&ENT_B;">
]>Reproducing the Issue
Simply run xmllint on the provided test case.
The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint or any other parser using libxml2) to see the crash due to the system time being used to select the random seed.
Explanation
libxml2 uses a custom hash table implementation throughout its codebase. The hash table maintains a buffer where strings representing dictionary keys are stored allowing for a consistent ownership model. This functionality is exposed in some functions, for example, xmlDictLookup.
/**
* xmlDictLookup:
* @dict: the dictionary
* @name: the name of the userdata
* @len: the length of the name, if -1 it is recomputed
*
* Add the @name to the dictionary @dict if not present.
*
* Returns the internal copy of the name or NULL in case of internal error
*/
const xmlChar *xmlDictLookup(xmlDictPtr dict, const xmlChar *name, int len) {There's a problem when handling the nested entities while parsing the last attribute list:
When parsing attribute declaration lists as used in the testcase, xmlParseAttributeListDecl will parse attribute default declarations [2] (e.g. #REQUIRED, #IMPLIED, #FIXED) before adding the defaulted attribute for the specified element [3].
The issue is that xmlParseName [1] returns into elemName the internal copy of the string for the parsed name, in this case A. xmlParseDefaultDecl [2] will try to resolve the nested entity references for the last ATTLIST entry in the provided testcase, but due to a reference loop it will hit the depth limit and fail.
When failing, ent->content[0] is set to 0, meaning the string pointed to by ent->content is set to the empty string. ent->content is set earlier in a call to xmlCreateEntity. The pointer is set to the result of an xmlDictLookup call. Recall that this returns the internal instance of the A string.
Therefore we are now in a state where a hash table’s internal string representation for the empty string is duplicated. This leads to issues later on when xmlAddDefAttrs is called [3].
For reference, you can see that the entity was initialized here and the content pointer was indeed set to an internal string reference.
Now when xmlAddDefAttrs is called, elemName which originally pointed to A now points to the empty string without its pointer value having been modified. This becomes a memory corruption issue as shown below:
xmlHashLookup2 will try to lookup the empty string entry first using the pointer value, and if it fails will use the string comparison function to look for the entry. xmlHashUpdateEntry2 on the other hand will only use the pointer value if the internal dict containing string copies is present. Because there are now two internal strings which contain the empty string, in the lookup case it will match if the lookup happens to land in the correct bucket and in the update case it will never match because the pointers don’t match. The bucket used in the hash function is affected by the random seed value used in hash randomization, therefore the test case may need to be run multiple times (by xmllint or any other parser using libxml2) to see the crash.
The existing default attributes list is sometimes found by the first lookup call and never found in the insert call. In the case where the lookup finds the existing default attributes, the realloc call can free it (with more than 4 attributes as in this testcase) but will not remove the pointer during the update. In the case where the lookup does not find the default attributes, it will not be freed and instead will be leaked when the new allocation is made.
Issues
Clearing ent->content[0] can break dictionary string semantics.
The behavior of Lookup/Insert is different for dictionaries containing internal dict references which is exacerbated by the corruption of an internal string noted above.
Reccomended Fix
Either: (1) avoid modifying ent->content by looking up the empty string in the dict if needed instead of modifying the existing value or (2) make the semantics consistent between lookup and insert. I believe (1) is a more fundamental fix for the problem as the lookup/insert semantics aren’t inherently a problem.
Additionally, we recommend fuzzing with the hash list randomization on but using a non time-based seed drawn from the test case.
Crash Log
Line numbers in this testcase may not match your version as I applied clang-format before my review/fuzzing efforts.