Checkmk RCE Chain
⛓️

Checkmk RCE Chain

📅 [ Archival Date ]
Nov 5, 2022 2:31 PM
🏷️ [ Tags ]
CheckmkRCE
✍️ [ Author ]

BY STEFAN SCHILLER

image

Checkmk is a modern IT infrastructure monitoring solution developed in Python and C++. According to the vendor’s website, more than 2,000 customers rely on Checkmk. Due to its purpose, Checkmk is a central component usually deployed at a privileged position in a company’s network. This makes it a high-profile target for threat actors.

In our effort to help secure the open-source world, we decided to look at the open-source edition of Checkmk, which is based on a Nagios monitoring core and seamlessly integrates NagVis to visualize status data on maps and diagrams. During our research, we identified multiple vulnerabilities in Checkmk and its NagVis integration, which can be chained together by an unauthenticated, remote attacker to fully take over the server running a vulnerable version of Checkmk.

In this first article, in a series of three, we start by getting an overview of all identified vulnerabilities and a basic understanding of the Checkmk architecture. Furthermore, we determine the disastrous impact of chaining the identified vulnerabilities together. We also dive deep into the technical details of the first two vulnerabilities, which pave the way for an unauthenticated attacker to gain remote code execution.

Impact

We discovered multiple vulnerabilities in Checkmk and its NagVis integration with the following CVSS scores assigned by the vendor:

  • CVSS 9.1: Code Injection in watolib’s auth.php
  • CVSS 9.1: Arbitrary File Read in NagVis
  • CVSS 6.8: Line Feed Injection in ajax_graph_images.py
  • CVSS 5.0: Server-Side Request Forgery in agent-receiver

These vulnerabilities can be chained together by an unauthenticated, remote attacker to gain code execution on the server running Checkmk version 2.1.0p10 and lower:

We verified the exploitation for the open-source Raw Edition by leveraging a specific feature of its monitoring core. It is likely that an attacker can use similar techniques to exploit a server running an Enterprise Editions.

All of these issues are fixed with Checkmk version 2.1.0p12. We strongly recommend updating any instance with a version before this release.

Technical Details

In this section, we start by looking at the basic architecture of Checkmk and its components. Based on this, we outline how the identified vulnerabilities can be chained together by an attacker and deep dive into the technical details of the first two vulnerabilities, which are the beginning of a full chain to gain unauthenticated, remote code execution.

Background

Checkmk is an IT infrastructure monitoring solution similar to Zabbix or Icinga. The configuration and monitoring of servers, networks, applications, etc., is done via a web interface. This user-facing component is developed in Python and is called Checkmk GUI.

In order to retrieve additional information from the monitored systems, it is possible to deploy a monitoring agent on these systems. The component responsible for registering agents and receiving data from these agents is called the agent-receiver.

The following picture outlines the basic architecture of Checkmk:

image

Checkmk exposes two ports on the external network interface by default:

  • TCP port 80: actual web interface
  • TCP port 8000: agent-receiver

The first component of the web interface is an Apache web server running on TCP port 80, which serves as a reverse proxy. It is possible to run multiple Checkmk instances on a single host. These instances are called monitoring sites or simply sites. For each site, a dedicated, internal Apache server is spawned. The purpose of the outer reverse proxy is to map requests for a specific site to the corresponding internal Apache server dedicated to the requested site. In the picture above, the site monitoring is mapped to the Apache server running on TCP port 5000. From the outside, this Apache server can only be reached via the reverse proxy because it only listens on localhost.

The site-dedicated Apache server forwards requests to either the actual Checkmk GUI, a Python WSGI application, or via FCGI to a PHP wrapper in order to integrate the NagVis PHP component.

The heart of Checkmk is the monitoring core, which is responsible for initiating checks, collecting data, detecting state changes, and providing information to the GUI. While the Checkmk Enterprise Editions have their own monitoring core, the open-source Raw Edition uses a Nagios monitoring core. To retrieve data from it, the core provides an interface called Livestatus, which is implemented as a C++ Nagios broker module called livestatus.o. This interface uses a proprietary protocol called Livestatus Query Language (LQL), which is similar to both HTTP and SQL. For example, a query to retrieve the name and IP address of all monitored hosts, which are in DOWN (1) or UNREACH (2) state, looks like this:

image

The response may look like this:

[[”router3”,”192.168.0.2”],[”ldapserver”,10.0.0.3”]]

More advanced queries can be built by using additional headers. Whenever the GUI needs some data from the core, it sends an LQL query to it, and the core responds with the requested data.

The second component directly reachable via the external interface is the agent-receiver. The agent-receiver is a FastAPI web server listening on TCP port 8000, which provides different routes for registering agents and collecting data from these agents.

With this basic understanding of Checkmk’s components, let’s see how an unauthenticated attacker would be able to chain the identified code vulnerabilities together in order to gain remote code execution.

Exploitation Chain

Some of the identified vulnerabilities have limited practical impact on their own. However, a malicious attacker can chain them together to achieve remote code execution.

The following picture summarizes what abilities the exploitation of an individual vulnerability yields and how an attacker can build on this ability to leverage the following vulnerability to further increase control, which finally results in unauthenticated, remote code execution:

image

The exploitation chain starts with a Server-Side Request Forgery in the agent-receiver (1), which can be leveraged by an attacker to access an endpoint only reachable from localhost. This endpoint is vulnerable to a Line Feed Injection (2). This gives an attacker the ability to forge arbitrary LQL queries, which are used by the Checkmk GUI to retrieve data from the monitoring core. An attacker can take advantage of this ability to delete arbitrary files, which can further be leveraged to bypass the authentication mechanism in the NagVis component.

Once an attacker has gained access to the NagVis component, an authenticated Arbitrary File Read vulnerability (3) in NagVis can be leveraged to read a special Checkmk configuration file called automation.secret. With access to the contents of this file, an attacker can gain access to the Checkmk GUI in the context of the automation user. This access can further be turned into remote code execution by exploiting a Code Injection vulnerability (4) in a Checkmk GUI subcomponent called watolib, which generates a file named auth.php required for the NagVis integration.

After this rough overview of the exploitation chain, let’s dive into the technical details of the first two code vulnerabilities:

image

Server-Side Request Forgery in agent-receiver

The Checkmk agent-receiver is a FastAPI web server, which is by default exposed on TCP port 8000. Most of the provided endpoints forward requests to the Checkmk REST API, which is part of the Checkmk GUI exposed on TCP port 80.

The endpoint called /register_with_hostname expects a POST request with credentials provided via HTTP Basic authentication as well as the two JSON-encoded parameters uuid and host_name in the body. The endpoint handler itself only verifies that any credentials are provided and that the two parameters are present.

In order to retrieve and validate the host configuration of the host identified by the host_name parameter, the function host_configuration is called:

checkmk/agent-receiver/agent-receiver/endpoints.py

@agent_receiver_app.post(
   "/register_with_hostname",
   status_code=HTTP_204_NO_CONTENT,
)
async def register_with_hostname(
   *,
   credentials: HTTPBasicCredentials = Depends(security),
   registration_body: RegistrationWithHNBody,
) -> Response:
   _validate_registration_request(
       host_configuration(
           credentials,
           registration_body.host_name,
       )
   )

The host_configuration function forwards the request to the Checkmk REST API by calling the function _forward_get. The user-provided parameter host_name is appended to the target URL without any sanitization or encoding:

checkmk/agent-receiver/agent-receiver/checkmk_rest_api.py

def host_configuration(
   credentials: HTTPBasicCredentials,
   host_name: str,
) -> HostConfiguration:
   if (
       response := _forward_get(
           f"objects/host_config_internal/{host_name}",
           credentials,
       )
   ).status_code == HTTPStatus.OK:

This lack of sanitization and encoding leads to a limited Server-Side Request Forgery (SSRF) vulnerability.

At first, the impact of this vulnerability does not seem to be very high because the SSRF is limited to a GET request to the hostname and port of the Checkmk GUI, and an attacker cannot even read the response. However, this gives an attacker the essential ability to exploit a second vulnerability. Let’s have a look at it.

Line Feed Injection in ajax_graph_images.py

The Checkmk GUI only provides a minimal number of unauthenticated endpoints. This greatly reduces the attack surface. One of the unauthenticated endpoints is called /ajax_graph_images.py, whose endpoint handler is implemented in the function ajax_graph_images_for_notifications. The purpose of this endpoint is to generate an image with performance data for a given host or service.

Although this endpoint can be accessed unauthenticated, access is restricted by only allowing requests, which originate from localhost (127.0.0.1 or ::1):

checkmk/cmk/gui/plugins/metrics/graph_images.py

def ajax_graph_images_for_notifications(
   resolve_combined_single_metric_spec: Callable[
       [CombinedGraphSpec], Sequence[CombinedGraphMetricSpec]
   ],
) -> None:
   """Registered as `noauth:ajax_graph_images`."""
   if request.remote_ip not in ["127.0.0.1", "::1"]:
       raise MKUnauthenticatedException(
           _("You are not allowed to access this page (%s).") % request.remote_ip
       )

   with SuperUserContext():
       _answer_graph_image_request(resolve_combined_single_metric_spec)

After verifying that the request originates from localhost, the function _answer_graph_image_request is called. This function validates that a host GET parameter is provided and then calls get_graph_data_from_livestatus:

checkmk/cmk/gui/plugins/metrics/graph_images.py

def _answer_graph_image_request(
   resolve_combined_single_metric_spec: Callable[
       [CombinedGraphSpec], Sequence[CombinedGraphMetricSpec]
   ],
) -> None:
   try:
       host_name = request.get_ascii_input_mandatory("host")
       if not host_name:
           raise MKGeneralException(_('Missing mandatory "host" parameter'))
       ...
       try:
           row = get_graph_data_from_livestatus(site, host_name, service_description)

The function get_graph_data_from_livestatus retrieves performance data for the given host via the Livestatus Query Language (LQL) interface. When inspecting all invoked functions within the call stack, the _ensure_connected function caught our attention:

checkmk/cmk/gui/sites.py

def _ensure_connected(user: Optional[LoggedInUser], force_authuser: Optional[UserId]) -> None:
   ...
   if force_authuser is None:
       request_force_authuser = request.get_str_input("force_authuser")
       force_authuser = UserId(request_force_authuser) if request_force_authuser else None
   ...
   _set_livestatus_auth(user, force_authuser)

Although this is an internal function part of the code responsible for querying the LQL interface, a GET parameter called force_authuser is accessed. Further inspecting the call stack reveals that this GET parameter is inserted into the AuthUser header of the LQL query without any sanitization:

image

The AuthUser header is used to restrict the response to data that the specified user is allowed to see. However, this is not essential for our considerations. The important aspect is that the above AuthUser string contains the value of the GET parameter force_authuser and this string is inserted into the final LQL query sent to the monitoring core. Since the GET parameter force_authuser is not sanitized, it is also possible to insert line feed characters (0x0a) into the LQL query.

Usually, an external attacker cannot reach the vulnerable endpoint /ajax_graph_images.py because it is restricted to localhost only. When combined with the SSRF vulnerability in the agent-receiver this assumption is not valid anymore. The SSRF can for example be used to trigger a request with the following GET parameter:

force_authuser=foo

This request results in the following LQL query sent to the core:

image

By using a line feed character in the force_authuser parameter, additional headers can be injected into the LQL query:

force_authuser=foo%0aFooHeader%3a%201337

The resulting LQL query contains the additional header:

image

The ability to inject a whole new query in order to use other tables or commands would increase the attack surface even more. An attacker could try to add two line feed characters and insert a new query after these:

force_authuser=foo%0a%0aGET%20services

However, the LQL interface terminates the connection by default if two subsequent line feed characters are read, which form the end of a single query. Thus the second query is not evaluated:

image

This behavior can be altered by leveraging the KeepAlive header. When this header is set to on, the connection will be kept alive. This way whole new LQL queries can be injected:

force_authuser=foo%0aKeepAlive:%20on%0a%0aATTACKER_QUERY%0a%0aGET%20notexisting

This results in three distinct LQL queries, which are processed separately.

Query 1:

image

Query 2:

ATTACK_QUERY

Query 3:

image

The second query can be fully controlled by an attacker.

With this ability, an attacker has literally made it to the core of Checkmk. Within the next article of this series, we will explore the LQL interface as a new attack surface and see how some minor differences in a developer’s implementation can prevent or enable an attacker to bypass authentication mechanisms.

Patch

Checkmk patched the limited SSRF in the agent-receiver in version 2.1.0p12 (commit). According to our recommendations, the endpoint handler for /register_with_hostname now URL-encodes the host_name parameter before inserting it into the URL:

checkmk/agent-receiver/agent-receiver/checkmk_rest_api.py

from urllib.parse import quote
...

def _url_encode_hostname(host_name: str) -> str:
    ...
    return quote(host_name, safe="")  # '/' is not "safe" here
...

def host_configuration(...):
   ...
       response := _forward_get(
           f"objects/host_config_internal/{_url_encode_hostname(
host_name)}", ...)
   ...

This prevents an attacker from accessing other endpoints than the intended one when the request is forwarded to the Checkmk REST API.

The Line Feed Injection vulnerability was also patched with version 2.1.0p12 (commit) by validating the value provided for the AuthUser header:

checkmk/livestatus/api/python/livestatus.py

# Pattern for allowed UserId values
validate_user_id_regex = re.compile(r"^[\w_][-\w.@_]*$")
...
   # Set user to be used in certain authorization domain
   def set_auth_user(self, domain: str, user: UserId) -> None:
       # Prevent setting AuthUser to values that would be rejected later. See Werk 14384.
       # Empty value is allowed and used to delete from auth_users dict.
       if user and validate_user_id_regex.match(user) is None:
           raise ValueError("Invalid user ID")

Also, an additional check for injected line feed characters was introduced:

checkmk/livestatus/api/python/livestatus.py

   def build_query(self, query_obj: Query, add_headers: str) -> str:
       # Prevent injection of further livestatus commands inside AuthUser header.
       if "\n" in self.auth_header[:-1]:
           raise MKLivestatusQueryError("Refusing to build query with invalid AuthUser header.")

These patches effectively prevent an attacker from injecting line feed characters in the force_authuser parameter.

Timeline

Date
Action
2022-08-22
We report all issues to Checkmk.
2022-08-23
Vendor confirms all issues.
2022-09-15
Vendor releases patched version 2.1.0p12

Summary

In this first article in a series of three, we briefly introduced the Checkmk architecture and outlined the vulnerabilities we identified including the serious impact of chaining these together. We also did a technical deep dive into the first two vulnerabilities, which enable an external attacker to send arbitrary LQL queries to the monitoring core.

The root cause of most vulnerabilities is the lack of sanitization of user-controlled data. This is also true for both of the vulnerabilities we looked at. The Line Feed Injection vulnerability is somehow hard to spot because the user-controlled data is accessed by a function deep down in the call stack and not directly in the endpoint handler. This is generally a bad pattern and should be prevented.

In the next article in this series, we will have a more detailed look at the LQL interface and derive the impact of an attacker’s ability to forge arbitrary queries. We will also look at Checkmk’s NagVis integration and how the aforementioned ability can be leveraged to bypass the authentication of NagVis due to some specific implementation details.

Finally, we would like to thank the Checkmk team very much for quickly responding to our report, handling each issue with absolute transparency, and providing a comprehensive patch for all reported vulnerabilities.

Part 2

image

This is the second of three articles in the Checkmk - Remote Code Execution by Chaining Multiple Bugs series (first article). The series of articles outlines the results of our effort to help secure the open-source world and better understand real-world vulnerabilities by auditing the open-source edition of Checkmk. Our research resulted in the discovery of multiple vulnerabilities in Checkmk and its NagVis integration, which can be chained together by an unauthenticated, remote attacker to fully take over the server running a vulnerable version of Checkmk.

In the first article of the series, we started by getting an overview of all identified vulnerabilities and got a basic understanding of the Checkmk architecture. Furthermore, we determined the severe impact of chaining the identified vulnerabilities together. We also deep-dived into the technical details of the first two vulnerabilities.

In this second article, we will have a more detailed look at the LQL interface and derive the impact of an attacker’s ability to forge arbitrary queries. We will then look at Checkmk’s NagVis integration and how some minor implementation differences between Checkmk and NagVis enable an attacker to bypass the NagVis authentication.

Technical Details

We start this section by briefly recapping the vulnerabilities and exploitation chain. After this, we focus on the LQL interface and outline how an attacker can leverage it to exfiltrate monitoring data and bypass the NagVis authentication.

Exploitation Chain

As a reminder the following picture summarizes the exploitation chain enabling an unauthenticated attacker to gain remote code execution:

image

In the first article, we covered the first two vulnerabilities: a Server-Side Request Forgery in the agent-receiver (1) as well as a Line Feed Injection (2), which can be exploited by an unauthenticated attacker to forge arbitrary LQL queries. Before an attacker can further leverage the Arbitrary File Read vulnerability (3) followed by the Code Injection (4) vulnerability, authenticated access to NagVis is required.

Within this article, we unveil the impact of an attacker’s ability to forge arbitrary LQL queries. We start by determining how an attacker can exfiltrate monitoring data. After this, we describe how the LQL interface can be leveraged to delete arbitrary files and furthermore bypass the NagVis authentication:

image

Monitoring Data Exfiltration

The LQL interface is mainly used to retrieve data from the monitoring core. This data consists for example of internal hostnames and IP addresses of monitored hosts, running services, contact persons, and their email addresses. Although this data is not highly sensitive, it can be useful for an attacker to mount further attacks. Thus an attacker might be interested in retrieving this data.

Blind Data Exfiltration

Although an attacker is able to forge arbitrary LQL queries by leveraging the two vulnerabilities we covered so far, the response cannot be read by the attacker. The reason for this is that neither the vulnerable endpoint /ajax_graph_images.py directly outputs the retrieved data, nor can the SSRF, which is leveraged to request this endpoint, be used to read the response. Thus the attacker is dealing with a blind LQL injection.

This scenario can be compared with a blind SQL injection. Attackers typically use a time-based approach to exploit this vulnerability. For example, the following SQL query could be used to determine if the first character of the first name in the table users is 'a':

SELECT IF( SUBSTR((SELECT name FROM users LIMIT 1),1.1)=’a’, SLEEP(5), 0);

If the condition is satisfied, the call to SLEEP(5) delays the response of the query by five seconds. By iterating over each possible character and measuring the time the response takes, the first character can be determined. This process can be repeated with the second character and so forth until the whole username is exfiltrated.

LQL Blind Data Exfiltration

An attacker can use a similar approach to blindly retrieve data from the LQL interface by using time delays. The purpose of time delays is that some data needs to be retrieved only if a specific condition is satisfied. For example, the disk usage of a host should be reported when the CPU load of this host exceeds a specific threshold.

The headers required to use time delays are prefixed with Wait. The relevant headers for our considerations are these:

  • WaitObject: Name identifying the object for which a condition should be satisfied.
  • WaitCondition: Condition, which should be satisfied.
  • WaitTimeout: Limit in milliseconds after which the query will be executed even if the condition was not satisfied.

The WaitObject header is required, which means that an attacker has to know the name of the object, whose data the attacker wants to retrieve. The easiest but also noisiest approach an attacker may use is a word list attack. By using the following query, an attacker could determine if a host with the name ldap exists:

image

If there is no host with the name ldap, the query immediately returns. If the host exists, the condition is never satisfied, and the query times out after 2000 ms verifying the existence of the host.

A more efficient way to determine the name of monitored hosts is to use the hostgroups table. By default, each host is added to the default host group check_mk. This is the name of the hostgroups object within this table and can thus be used for the WaitObject header. The table contains a column called members, which contains all hostnames within this host group. For example, a request to this table may look like this:

image

The response contains the name of all hosts:

server1, server2, router1, router2

By setting the WaitCondition on this column and using a regular expression, all hostnames can be exfiltrated character by character. The following query determines, if there is a hostname that begins with "serv":

image

Once all hostnames have been exfiltrated, an attacker can use these names for the WaitObject header on the hosts table in order to retrieve all data from a given host, for example, the IP address:

image

Also, the name of the contact responsible for the host can be exfiltrated:

image

After having retrieved the name of a contact, further information about this contact can be retrieved via the contacts table:

image

The fact that the values in a column of one table often contain the names of objects in another table makes it possible to gradually exfiltrate the whole data set.

The following video illustrates how the two vulnerabilities detailed in the first article are used by an unauthenticated, remote attacker to exfiltrate monitoring data from a vulnerable Checkmk server:

After this quick look at the possibilities of data exfiltration, let’s continue with the exploitation chain by determining how an attacker can gain access to Checkmk’s NagVis component:

NagVis Authentication Bypass

The LQL interface can not only be used to retrieve data but also to send external commands to the monitoring core by issuing a COMMAND request. Although the term command might suggest immediate code execution, the abilities are very limited.

Nagios External Commands

The documented commands are supported by the open-source Raw Edition as well as the Enterprise Editions. These commands can for example be used to enable or disable checks and notifications. Since the open-source Raw Edition uses a Nagios monitoring core, there are a few additional commands listed in the Nagios documentation. Nevertheless, sensitive commands like CMD_CHANGE_HOST_CHECK_COMMAND, which alter the command executed to perform host checks, were disabled for security reasons back in 2008.

One additional Nagios command, which is still enabled, is called PROCESS_FILE. The format of this command is structured like this:

PROCESS_FILE;<file_name>;<delete>

Issuing this command directs the Nagios core to read the file specified by <file_name> and execute each line in the file as an external command. This does not increase the attack surface per se because there is no difference from directly issuing an external command. However, if the second parameter <delete> is non-zero, the file will be deleted after it has been processed. The deletion of the file does not depend on its contents. Even if the file does not contain any valid external command, it will be deleted: this command gives an attacker an arbitrary file deletion primitive. In order to understand how this can be leveraged by an attacker, let’s have a look at how Checkmk’s authentication mechanism works.

Checkmk Authentication Mechanism

After a successful login, a session cookie is created, which identifies the user. This cookie is structured like this:

<username>:<session_id>:<hash>

For example, a cookie for the cmkadmin user may look like this:

image

The hash at the end of the cookie is created by _generate_auth_hash, which calls _generate_hash:

checkmk/cmk/gui/login.py

def _generate_auth_hash(username: UserId, session_id: str) -> str:
   return _generate_hash(username, username + session_id)

def _generate_hash(username: UserId, value: str) -> str:
   """Generates a hash to be added into the cookie value"""
   secret = _load_secret()
   serial = _load_serial(username)
   return sha256((value + str(serial) + secret).encode()).hexdigest()

Accordingly, the hash is calculated like this:

hash = SHA256(<username><session_id><serial><secret>)

To verify a cookie, the Checkmk GUI recalculates the hash and compares it with the hash from the cookie:

checkmk/cmk/gui/login.py

def check_parsed_auth_cookie(username: UserId, session_id: str, cookie_hash: str) -> None:
   ...
   if cookie_hash != _generate_auth_hash(username, session_id):
       raise MKAuthException(_("Invalid credentials"))

An attacker, who wants to forge a valid cookie, needs to know all four values from the hash calculation. The username and session_id are part of the cookie itself and are thus known. The serial value of a user is initialized with 0 and incremented by one each time the user’s password is changed, or the user account gets locked. Thus an attacker can simply test successive values starting with 0. The last value called secret is retrieved via the _load_secret function:

checkmk/cmk/gui/login.py

def _load_secret() -> str:
   ...
   secret_path = htpasswd_path.parent.joinpath("auth.secret")

   secret = ""
   if secret_path.exists():
       with secret_path.open(encoding="utf-8") as f:
           secret = f.read().strip()
   ...
   if secret == "" or len(secret) == 32:
       secret = _generate_secret()
       with secret_path.open("w", encoding="utf-8") as f:
           f.write(secret)

   return secret

The secret value is read from a file called auth.secret. If the content of this file is empty or only 32 bytes in length, a new secret is generated and written to the file. The _generate_secret function returns 256 random characters:

checkmk/cmk/gui/login.py

def _generate_secret() -> str:
   return utils.get_random_string(256)

This value is unknown to an attacker and cannot easily be guessed. Without this value it is not possible to forge a valid session cookie:

image

There are two important aspects to highlight here:

  1. _load_secret does always return 256 random characters, even if the auth.secret file was not present or was not read properly.
  2. The auth.secret file is recreated if it is not present.

Leveraging Arbitrary File Deletion

An attacker could try to achieve that the secret value is empty and thus known. Though, if the attacker uses the arbitrary file deletion primitive to delete the auth.secret file, it would be recreated on the fly, and the secret value would be populated with a new value, unknown to the attacker. Thus the ability to delete arbitrary files does not seem to enable an attacker to bypass the authentication of the Checkmk GUI.

When getting a basic overview of the Checkmk architecture in the first article of this series, we outlined that Checkmk integrates the NagVis PHP component. This integration is seamless from an authentication point of view, meaning that a user authenticated to the Checkmk GUI can also access the NagVis component. In order to make this possible, the NagVis class CoreLogonMultisite verifies the session cookie within the checkAuthCookie function:

nagvis/share/nagvis/htdocs/server/core/classes/CoreLogonMultisite.php

private function checkAuthCookie($cookieName) {
    ...
    list($username, $sessionId, $cookieHash) = explode(':', $cookieValue, 3);
    ...
    $users = $this->loadAuthFile($this->serialsPath);
    ...
    $user_secret = $users[$username];
    ...
    $hash = $this->generateHash($username, $sessionId, (string) $user_secret);
    ...
    // Validate the hash
    if ($cookieHash != $hash) {
        throw new Exception();
    }
    ...
    return $username;
}

At first, the cookie is separated into its three components: $username$sessionId, and $cookieHash. The $user_secret value read via the loadAuthFile function is the serial value we have already encountered. The function generateHash is used to calculate the hash with the given parameters. If the calculated hash matches the hash from the cookie, the user is assumed to be authenticated. Advanced readers may have noticed a type juggling vulnerability here, which we reported additionally. Its exploitation is far more laborious and its presence is not relevant for our considerations. So let’s continue with the generateHash function, which is similar to its Checkmk GUI Python equivalent:

nagvis/share/nagvis/htdocs/server/core/classes/CoreLogonMultisite.php

private function generateHash($username, $session_id, $user_secret) {
    $secret = $this->loadSecret();
    return hash("sha256", $username . $session_id. $user_secret . $secret);
}

Though, the implementation of the called loadSecret function is less complex than its Python equivalent:

nagvis/share/nagvis/htdocs/server/core/classes/CoreLogonMultisite.php

private function loadSecret() {
    return trim(file_get_contents($this->secretPath));
}

The function reads the $secret value from the auth.secret file, but it does neither handle any file reading errors nor recreate the file if it is not present.

The goal of an attacker would be to make the $secret value empty and thus known. Let’s determine what happens if file_get_contents is called on a non-existent file:

php > var_dump(file_get_contents('/tmp/not.existing'));
PHP Warning:  file_get_contents(/tmp/not.existing): Failed to open stream: No such file or directory in php shell code on line 1
bool(false)

A warning is raised and the function returns false. Due to the error handlers, NagVis employed, this warning triggers an exception, which prevents further code from being executed. Thus simply deleting the auth.secret file does not yield an empty $secret value.

Winning The File Race

However, an attacker can leverage an important characteristic of the _load_secret function in the Checkmk GUI. This function recreates the auth.secret file with a new secret value if the file is not existing. The creation of the file (open) and the writing of the new secret value to it (write) are two distinct operations. If the loadSecret PHP function calls file_get_contents right after the auth.secret file was recreated, but the new secret value has not yet been written, file_get_contents simply operates on an existing but empty file, and an empty string is returned:

image

(1) At first, an attacker can leverage the SSRF and LF Injection vulnerabilities to trigger an LQL query with the PROCESS_FILE command to delete the auth.secret file. After this, the attacker can quickly trigger two requests: (2) one request to the Checkmk GUI to recreate the auth.secret file and (3) another request to NagVis with a forged cookie assuming an empty $secret value. If the resulting file_get_contents call in NagVis is executed at the right time, the $secret value is empty, and access to NagVis is granted. If the attempt fails, the process can simply be repeated.

The mere ability of an unauthenticated attacker to delete arbitrary files leads to an authentication bypass, even without the presence of an additional vulnerability. Although this attack requires a few attempts, it can reliably be exploited to gain access to NagVis. The more fail-safe implementation in the Checkmk GUI itself prevents an attacker from exploiting it here. Though with access to NagVis, an attacker has crossed another security boundary, and the exposed attack surface is further increased.

Timeline

Date
Action
2022-08-22
We report all issues to Checkmk.
2022-08-23
Vendor confirms all issues.
2022-09-15
Vendor releases patched version 2.1.0p12.

Summary

In this second article in a series of three, we outlined the impact of an attacker’s ability to forge arbitrary LQL queries. Firstly, a time-based approach could be used to exfiltrate data from the monitoring core, which can be useful to mount further attacks. Furthermore, an attacker can use the PROCESS_FILE command to delete arbitrary files and leverage this to bypass the authentication of NagVis. This is achieved by making two simultaneous requests, which results in an empty secret value if the single file operations are executed in a specific order.

The NagVis authentication bypass is only possible because an attacker already has the ability to delete arbitrary files. Nevertheless, the slightly different implementations in NagVis and the Checkmk GUI make a great difference. Since the Checkmk GUI implementation assures that the secret value cannot be empty, the outlined technique does not work here. This approach follows a defense-in-depth mindset and should generally be applied. It prevents an attacker from easily escalating privileges once an initial security boundary is breached.

The next article in this series will continue where we left off here: an attacker has gained access to the NagVis component exposing a new attack surface. This allows the attacker to exploit an authenticated, arbitrary file read vulnerability in NagVis, which can be used to gain access to the Checkmk GUI itself. At last, we take a detailed look at an authenticated code injection vulnerability in Checkmk, which can, at this point, be exploited by the initially unauthenticated attacker to gain remote code execution.

We would like to thank the Checkmk team very much for quickly responding to our report, handling each issue with absolute transparency, and providing a comprehensive patch for all reported vulnerabilities.

Part 3

image

This is the third and last article in the Checkmk - Remote Code Execution by Chaining Multiple Bugs series (first articlesecond article). Within the series of articles, we take a detailed look at multiple vulnerabilities we identified in Checkmk and its NagVis integration, which can be chained together by an unauthenticated, remote attacker to fully take over the server running a vulnerable version of Checkmk.

In the last article, we evaluated the ability of an attacker to forge arbitrary LQL queries. This allows the attacker to exfiltrate monitoring data and issue external Nagios commands, which can be leveraged to delete arbitrary files. We could demonstrate that this ability could be combined with a file race condition to bypass the authentication of the NagVis component.

In this third and last article, we complete our deep dive into the technical details of the vulnerability chain. At this point, the attacker has gained access to the NagVis component. Based on this, we will outline how the attacker can escalate this access to the Checkmk GUI itself by exploiting an authenticated file read vulnerability in NagVis.

At last, we take a detailed look at an authenticated code injection vulnerability in Checkmk, which forms the final step to remote code execution.

Technical Details

We start this section by briefly recapping the vulnerabilities and exploitation chain. After this, we look at the arbitrary file read vulnerability in NagVis and the code injection vulnerability in Checkmk.

Exploitation Chain

As a reminder, the following picture summarizes the exploitation chain enabling an unauthenticated attacker to gain remote code execution:

image

In the last two articles, we covered the first two vulnerabilities (1, 2) and an arbitrary file deletion, which can be exploited by an unauthenticated attacker to gain access to the NagVis component. Within this article, we determine how an attacker can escalate to the Checkmk automation user by exploiting an authenticated arbitrary file read in NagVis (3). With access to the Checkmk automation user, an attacker can ultimately gain code execution by exploiting a code injection vulnerability in Checkmk’s watolib (4):

image

Arbitrary File Read in NagVis

After an attacker has gained access to NagVis, the exposed attack surface is greatly increased because authenticated endpoints can now be accessed. For one of these endpoints, our automatic scan with SonarCloud discovered an interesting path injection vulnerability.

The endpoint is implemented in the CoreModGeneral class. This class offers different actions which an authenticated user can trigger. One of these actions is called getHoverUrl:

share/nagvis/htdocs/server/core/classes/CoreModGeneral.php

class CoreModGeneral extends CoreModule {
   ...
   public function handleAction() {
       $sReturn = '';

       if($this->offersAction($this->sAction)) {
           switch($this->sAction) {
               ...
               case 'getHoverUrl':
                   $sReturn = $this->getHoverUrl();
               break;
           ...

Within the getHoverUrl method, getCustomOptions is called to retrieve user-provided GET and POST parameters. In this case, the parameter url is retrieved, which is supposed to be an array containing URLs. For each provided URL, a new NagVisHoverUrl object is created. The response, which is stored in $arrReturn, contains the requested URL (url) as well as the string representation of the NagVisHoverUrl object (code):

share/nagvis/htdocs/server/core/classes/CoreModGeneral.php

   private function getHoverUrl() {
       $arrReturn = Array();

       // Parse view specific uri params
       $aOpts = $this->getCustomOptions(Array('url' => MATCH_STRING_URL));

       foreach($aOpts['url'] AS $sUrl) {
           $OBJ = new NagVisHoverUrl($this->CORE, $sUrl);
           $arrReturn[] = Array('url' => $sUrl, 'code' => $OBJ->__toString());
       }

       $result = json_encode($arrReturn);
       ...
       return $result;
   }

Within the constructor of the NagVisHoverUrl class, the method readHoverUrl is called.

This method uses file_get_contents to retrieve the requested URL:

share/nagvis/htdocs/server/core/classes/NagVisHoverUrl.php

   private function readHoverUrl() {
       ...
       if(!$content = file_get_contents($this->url)) {
           throw new NagVisException(l('couldNotGetHoverUrl', Array('URL' => $this->url)));
       }
       ...
       $this->code = $content;
   }

Since an authenticated user can fully control the URLs provided, the getHoverUrl action can be used to read arbitrary files by using the file:/// scheme.

This vulnerability further increases the attacker’s ability to read arbitrary files accessible by the webserver user. The impact depends on the presence of accessible files with sensitive content. Unfortunately, for automation users, these files exist.

Checkmk Automation Users

Checkmk provides two types of user accounts: normal users and automation users. A normal user has a regular password and can log in to the GUI. An automation user can be used as a convenient way to automate certain activities that would normally be done via the GUI. Instead of a regular password, an automation user is authenticated by an automation secret. This secret can usually not be used to log in to the GUI but is provided as an additional GET parameter to the accessed endpoint.

The default automation user is called automation and is preconfigured with a random secret. The hash of this secret and the hash of regular passwords are by default stored in an htpasswd file:

image

Though, the secret is additionally stored in a plaintext file, which is called automation.secret:

image

Since the file contains the plaintext secret, the aforementioned arbitrary file read vulnerability can be leveraged by an attacker to retrieve it without requiring to crack the hash stored in the htpasswd file.

Although this secret can be used to access authenticated endpoints, it cannot be used to log in to the GUI with it. Let’s have a look at the corresponding code. When a user logs in, the function check_credentials is called:

checkmk/cmk/gui/userdb/htpasswd.py

   def check_credentials(self, user_id: UserId, password: str) -> CheckCredentialsResult:
       ...
       if self._is_automation_user(user_id):
           raise MKUserError(None, _("Automation user rejected"))
       ...

As we can see, the function _is_automation_user checks if the provided user_id corresponds to an automation user. If that is the case, an error is raised, and the GUI login fails. This is what the _is_automation_user function looks like:

checkmk/cmk/gui/userdb/htpasswd.py

   def _is_automation_user(self, user_id: UserId) -> bool:
       return Path(cmk.utils.paths.var_dir, "web", str(user_id), "automation.secret").is_file()

Accordingly, the presence of the automation.secret file is used in order to determine if the user is an automation user.

By leveraging the Linefeed Injection vulnerability and the Nagios PROCESS_FILE command outlined in the second article, an attacker has not only the ability to read arbitrary files but also to delete them. This means that the attacker can delete the automation.secret file after reading it. Since the login process verifies the provided credentials via the htpasswd file and the automation.secret file is not present, the automation user is assumed to be a normal user, and access to the GUI is granted:

image

After the successful login, an attacker can exploit an authenticated code injection vulnerability.

Code Injection watolib auth.php

In order to seamlessly integrate NagVis into Checkmk, a file called auth.php is generated, which contains information about users, roles, and groups present in the Checkmk GUI. This file is updated when the corresponding data changes (e.g., user settings) by a function called _create_auth_file. This function loads the required data and calls _create_php_file:

checkmk/cmk/gui/watolib/auth_php.py

def _create_auth_file(callee, users=None):
   if users is None:
       users = userdb.load_users()
   ...
   _create_php_file(callee, users, get_role_permissions(), groups)

Within _create_php_file the content of the auth.php file is created and written to disk. In order to format the user data, the function _format_php is called:

checkmk/cmk/gui/watolib/auth_php.py

def _create_php_file(callee, users, role_permissions, groups):
   # Do not change WATO internal objects
   nagvis_users = copy.deepcopy(users)
   ...
   content = """<?php
// Created by Multisite UserDB Hook (%s)
global $mk_users, $mk_roles, $mk_groups;
$mk_users   = %s;
...
?>
""" % (
       callee,
       _format_php(nagvis_users),
       ...
   )

   store.makedirs(_auth_php().parent)
   store.save_text_to_file(_auth_php(), content)

The function _format_php converts the given data into the corresponding PHP representation. Data of type str is inserted into a single-quoted string. Single quotes within the data itself are escaped by prepending a backslash (\) to prevent the string context can be escaped:

checkmk/cmk/gui/watolib/auth_php.py

def _format_php(data, lvl=1):
   s = ""
   ...
   elif isinstance(data, str):
       s += "'%s'" % data.replace("'", "\\'")
   ...

The replacement does not take into account that the data can contain a backslash itself, followed by a single quote (\'). When encountering this sequence, the single quote is prepended by a backslash, which is escaped by the already present backslash (\\'). This way the string context can be escaped and arbitrary PHP code can be injected into the file.

An attacker can exploit the vulnerability after authenticating with the default automation user and then changing the profile settings. After the auth.php file is automatically updated, it contains the attacker-injected PHP code. The attacker now only needs to access the NagVis component, which includes the auth.php file and executes the injection code.

Patch

The arbitrary file read vulnerability was patched in NagVis 1.9.34, which was integrated into Checkmk version 2.1.0p11 by limiting the requested scheme to http and https:

nagvis/share/nagvis/htdocs/server/core/classes/NagVisHoverUrl.php

  private function readHoverUrl() {
      ...
      $aUrl = parse_url($this->url);
      if(!isset($aUrl['scheme']) || $aUrl['scheme'] == '' || ($aUrl['scheme'] != 'http' && $aUrl['scheme'] != 'https'))
          throw new NagVisException(l('problemReadingUrl', Array('URL' => $this->url, 'MSG' => l('Not allowed url'))));
      ...

The code injection vulnerability was patched with Checkmk version 2.1.0p11 by escaping both single-quote characters and backslash characters (commit):

checkmk/cmk/gui/watolib/utils.py

def format_php(data: object, lvl: int = 1) -> str:
   """Format a python object for php"""
   s = ""
   ...
   elif isinstance(data, str):
       s += "'%s'" % re.sub(r"('|\\)", r"\\\1", data)
   ...

Timeline

Date
Action
2022-08-22
We report all issues to Checkmk.
2022-08-23
Vendor confirms all issues.
2022-08-29
NagVis patched version 1.9.34 is released.
2022-08-30
Checkmk version 2.1.0p11 is released containing NagVis 1.9.34.

Summary

In this last article in the series, we detailed an authenticated, arbitrary file read vulnerability in NagVis, which enables an attacker to gain access to the Checkmk automation user. We further took a look at how Checkmk identifies automation users. This revealed that an attacker could leverage the arbitrary file deletion once more to gain access to the Checkmk GUI. This access can further be leveraged to exploit a code injection vulnerability in Checkmk’s watolib.

The arbitrary file read vulnerability is caused by a missing validation of the URL scheme. The impact of this vulnerability is greatly increased because the automation secret is stored in plaintext. Whether it be a file or a database, sensitive values, which can directly be used by an attacker to gain more privileges, should not be stored in plaintext. These sensitive values can for example be passwords, authentication tokens, or password reset tokens.

Dynamic code generation, like creating PHP files, can be very dangerous and should be avoided if possible. There is no built-in method that escapes values in the context of code generation for another language. Thus a custom implementation is required, and some cases can easily be missed. The outlined code injection vulnerability showed that a single mistake in the escaping implementation directly leads to code execution.

Series Wrap-Up

This article completes the Checkmk - Remote Code Execution by Chaining Multiple Bugs series. The series showcased how an attacker successively gained more abilities and access by chaining one vulnerability after another.

In general, web applications have become more secure in the past few years. Vulnerabilities instantly leading to remote code execution are far less common. This requires attackers to leverage less impactful vulnerabilities and chain them together. These chains are often only possible because the security precautions tend to be lower the higher the level of authentication.

The assumption that an attacker lacks a particular ability is dangerous and can quickly lead to a domino effect when an initial security boundary is breached. It is essential to apply security on all layers. Even one seemingly unimportant, additional security check can mitigate one link in an exploit chain and thus break the whole chain.