Executive Summary
Unit 42 researchers examine several malware samples that incorporate Cobalt Strike components, and discuss some of the ways that we catch these samples by analyzing artifacts from the deltas in process memory at key points of execution. We will also discuss the evasion tactics used by these threats, and other issues that make their analysis problematic.
Cobalt Strike is a clear example of the type of evasive malware that has been a thorn in the side of detection engines for many years. It is one of the most well-known adversary simulation frameworks for red team operations. However, it’s not only popular among red teams, but it is also abused by many threat actors for malicious purposes.
Although the toolkit is only sold to trusted entities to conduct realistic security tests, due to source code leaks, its various components have inevitably found their way into the arsenal of malicious actors ranging from ransomware groups to state actors. Malware authors abusing Cobalt Strike even played a role in the infamous SolarWinds incident in 2020.
Related Unit 42 Topics |
Overview of Cobalt Strike
The main driver for the proliferation of Cobalt Strike is that it is very good at what it does. It was designed from the ground up to help red teams armor their payloads to stay ahead of security vendors, and it regularly introduces new evasion techniques to try to maintain this edge.
One of the main advantages of Cobalt Strike is that it mainly operates in memory once the initial loader is executed. This situation poses a problem for detection when the payload is statically armored, exists only in memory and refuses to execute. This is a challenge to many security software products, as scanning memory is anything but easy.
In many cases, Cobalt Strike is a natural choice for gaining an initial footprint in a targeted network. A threat actor can use a builder with numerous deployment and obfuscation options to create the final payload based on a customizable template.
This payload is typically embedded into a file loader in encrypted or encoded form. When the file loader is executed by a victim, it decrypts/decodes the payload into memory and runs it. As the payload is present in memory in its original form, it can be detected easily due to some specific characteristics.
As malware researchers, we often see potentially interesting malicious samples that turn out to just be loaders for Cobalt Strike. It’s also often unclear if a loader was created by a red team or a real malicious actor, thus making attribution even more challenging.
In the next few sections, we’re going to take a closer look into three different Cobalt Strike loaders that were detected out of the box by a new hypervisor based sandbox we designed to allow us to analyze artifacts in memory. Each sample loads a different implant type, namely an SMB, HTTPS and stager beacon. We dubbed these Cobalt Strike loaders KoboldLoader, MagnetLoader and LithiumLoader. We will also discuss some of the methods we can use to detect these payloads.
KoboldLoader SMB Beacon
The sample we’re looking at was detected during a customer incident.
SHA256:7ccf0bbd0350e7dbe91706279d1a7704fe72dcec74257d4dc35852fcc65ba292
This 64-bit KoboldLoader executable uses various known tricks to try to bypass sandboxes and to make the analysis process more time consuming.
To bypass sandboxes that hook only high-level user mode functions, it solely calls native API functions. To make the analyst's life harder, it dynamically resolves the functions by hash instead of using plain text strings. The malware contains code to call the following functions:
NtCreateSection
NtMapViewOfSection
NtCreateFile (unused)
NtAllocateVirtualMemory (unused)
RtlCreateProcessParameters
RtlCreateUserProcess
RtlCreateUserThread
RtlExitUserProcess
The malware creates two separate tables of function hash/address pairs. One table contains one pair for all native functions, while the second table only pairs for Nt*
functions.
For the Rtl*
functions that were used, it loops through the first table and searches for the function hash to get the function address. For the Nt*
functions that were used, it loops through the second table and simultaneously increases a counter variable.
When the hash is found, it takes the counter value that is the system call number of the corresponding native function, and it enters a custom syscall stub. This effectively bypasses many sandboxes, even if the lower level native functions are hooked instead of the high-level ones.
The overall loader functionality is relatively simple and uses mapping injection to run the payload. It spawns a child process of the Windows tool sethc.exe
, creates a new section and maps the decrypted Cobalt Strike beacon loader into it. The final execution of the Cobalt Strike loader that in turn loads an SMB beacon happens by calling RtlCreateUserThread
.
You can find the decrypted beacon configuration data in the Appendix section.
In-Memory Evasion
With our new hypervisor-based sandbox, we were able to detect the decrypted Cobalt Strike SMB beacon in memory. This beacon loader even uses some in-memory evasion features that create a strange sort of chimeric file. While it’s actually a DLL, the “MZ'' magic PE bytes and subsequent DOS header are overwritten with a small loader shellcode as shown in Figure 1.
Figure 1. Disassembled Cobalt Strike beacon loader shellcode.
The shellcode loader jumps to the exported function DllCanUnloadNow
, which prepares the SMB beacon module in memory. To do this, it first loads the Windows pla.dll
library and zeroes out a chunk of bytes inside its code section (.text
). It then writes the beacon file into this blob and fixes the import address table, thus creating an executable memory module.
During the analysis of the file, we could figure out some of the in-memory evasion features that were used, as shown in Table 1.
Evasion feature | Description | Used in our sample |
allocator | Set how beacon's ReflectiveLoader allocates memory for the agent. Options are: HeapAlloc, MapViewOfFile and VirtualAlloc. | No |
cleanup | Ask beacon to attempt to free memory associated with the reflective DLL package that initialized it. | Yes |
magic_mz_x64 | Override the first bytes (MZ header included) of beacon's reflective DLL. Valid x86 instructions are required. Follow instructions that change CPU state with instructions that undo the change. | Yes |
magic_pe | Override the PE character marker used by beacon's ReflectiveLoader with another value. | No |
module_x64 | Ask the x86 reflective loader to load the specified library and overwrite its space instead of allocating memory with VirtualAlloc. | Yes |
obfuscate | Obfuscate the reflective DLL’s import table, overwrite unused header content, and ask ReflectiveLoader to copy beacon to new memory without its DLL headers. | Yes |
sleep_mask | Obfuscate beacon and its heap, in-memory, prior to sleeping. | No |
smartinject | Use embedded function pointer hints to bootstrap beacon agent without walking kernel32 Export Address Table (EAT). | No |
stomppe | Ask ReflectiveLoader to stomp MZ, PE and e_lfanew values after it loads beacon payload. | No |
userwx | Ask ReflectiveLoader to use or avoid read, write or execute (RWX) permissions for Beacon DLL in memory. | No |
Table 1. Cobalt Strike evasion techniques that were used.
To sum up, the beacon loader and the beacon itself are the same file. Parts of the PE header are used for a shellcode that jumps to an exported function, which in turn creates a module of itself inside a Windows DLL. Finally, the shellcode jumps to the entry point of the beacon module to execute it in memory.
As discussed, there is no way for us to detect this beacon of our KoboldLoader sample successfully unless we can peer inside memory during execution.
MagnetLoader
The second loader we will look into is a 64-bit DLL that imitates a legitimate library.
SHA256: 6c328aa7e0903702358de31a388026652e82920109e7d34bb25acdc88f07a5e0
This MagnetLoader sample tries to look like the Windows file mscms.dll
in a few ways, by using the following similar features:
- The same file description
- An export table with many of the same function names
- Almost identical resources
- A very similar mutex
These features are also shown in Figure 2, where the malware file is contrasted with the valid mscml.dll
.
Figure 2. Comparison of file description, export table and resources of MagnetLoader (left) and mscml.dll (right) as seen with EXE Explorer.
MagnetLoader not only tries to mimic the legitimate Windows library statically, but also at runtime.
All of the exported functions of MagnetLoader internally call the same main malware routine. When one of them is called, the DLL entry point is run first. In the entry point, the malware loads the original mscms.dll and it resolves all the functions it fakes.
The addresses of these original functions are stored and called after a fake method is executed. Thus, whenever an exported function of MagnetLoader is called, it runs the main malware routine and afterward calls the original function in mscms.dll
.
The main malware routine is relatively simple. It first creates a mutex named SM0:220:304:WilStaging_02_p1h
that looks very similar to the original one created by mscms.dll
.
The Cobalt Strike beacon loader gets decrypted into a memory buffer and executed with the help of a known trick. Instead of calling the beacon loader directly, the loader uses the Windows API function EnumChildWindows
to run it.
This function contains three parameters, one of which is a callback function. This parameter can be abused by malware to indirectly call an address via the callback function and thus conceal the execution flow.
You can also find the decrypted beacon configuration data in the Appendix section.
LithiumLoader
This last Cobalt Strike sample is part of a DLL side-loading chain where a custom installer for a type of security software was used. DLL side-loading is a technique that hijacks a legitimate application to run a separate, malicious DLL.
SHA256: 8129bd45466c2676b248c08bb0efcd9ccc8b684abf3435e290fcf4739c0a439f
This 32-bit LithiumLoader DLL is part of a custom attacker-created Fortinet VPN installation package submitted to VirusTotal as FortiClientVPN_windows.exe
(SHA256: a1239c93d43d657056e60f6694a73d9ae0fb304cb6c1b47ee2b38376ec21c786
).
The FortiVPN.exe
file is not malicious or compromised. Because the file is signed, attackers used it to evade antivirus detection.
The installer is a self-extracting RAR archive that contains the following files:
File name | Description |
FortiVPN.exe | Legit signed FortiClient VPN Online installer v7.0.1.83 |
GUP.exe | Legit signed WinGup for Notepad++ tool v5.2.1.0 |
gup.xml | WinGup config file |
libcurl.dll | LithiumLoader |
Table 2a. FortiClientVPN_windows.exe file contents.
The self-extracting script commands are as follows:
Table 2b. List of self-extracting script commands.
When the installer is run, all files get silently dropped to the local %AppData%
folder and both executable files get started. While the FortiClient VPN installer executes, the WinGup tool side-loads the libcurl.dll
LithiumLoader malware. The malware does so because it imports the following functions from a legit copy of the libcurl library as shown in Figure 3.:
Figure 3. Import address table of WinGup.exe.
This threat also tries to add the %AppData%
folder path to the exclusion list in Windows Defender via PowerShell.
On the startup of GUP.exe
, the malicious libcurl.dll
file is loaded into the process space as it statically imports the functions shown in Figure 3, above. While all four libcurl
functions are run, only curl_easy_cleanup
contains a malicious routine that was injected while compiling a new version of the library. Thus, we’re not dealing with a patched version of the legitimate DLL. This is a cleaner solution that doesn’t break the code after the inserted malicious routine, as is often seen in other malware.
This curl_easy_cleanup function
usually contains only one subroutine (Curl_close
) and has no return value (as shown in its source code on GitHub). The altered function is as shown in Figure 4.
Figure 4. Modified curl_easy_cleanup export function of libcurl.dll.
The load_shellcode
function decrypts the shellcode via XOR and key 0xA as shown in Figure 5.
Figure 5. Shellcode loader function load_shellcode()
.
This function runs the Cobalt Strike stager shellcode indirectly via EnumSystemGeoID
instead of directly jumping to it. This Windows API function has three parameters, the last one of which is a callback function abused by LithiumLoader.
The Cobalt Strike stager shellcode is borrowed from Metasploit and is the reverse HTTP shell payload, which uses the following API functions:
LoadLibrary
InternetOpenA
InternetConnectA
HttpOpenRequestA
InternetSetOptionA
HttpSendRequestA
GetDesktopWindow
InternetErrorDlg
VirtualAllocStub
InternetReadFile
The shellcode connects to the IP address of a university in Thailand.
LithiumLoader Detection Issues
At the time of writing this analysis, the Cobalt Strike beacon payload was no longer available. Without a payload or any actionable information in the execution report of API calls, it’s often challenging for a sandbox to determine whether the sample is malicious. This sample doesn’t have any functionality that can be classified as malicious per se.
Catching Cobalt Strike Through Analyzing Its Memory
In all three of these examples there are some common detection challenges. These samples do not execute in normal sandbox environments. But as we discussed, there is a wealth of information that we can use for detection if we look inside memory during execution, like function pointers, decoded stages of the loader, and other artifacts.
For many years now, it has been standard practice for sandbox systems to instrument and observe the activity of executing programs. If our team has learned anything over the years, it’s that this alone is not enough for highly evasive malware. This is why we’ve been working hard the past few years on figuring out how we can add more thorough processing for this type of highly evasive malware.
For accurate detection, one of the key features we’ve found to address highly evasive malware is that we need to look at memory as samples execute in addition to using the system API to get a better understanding of what’s happening.