Do, a debugger, you often use Re, a reverse engineer Mi, a name, I call myself Anyways….
By now, you must be very thankful I reminded you of this famous song; I am sure it will be stuck in your head the rest of the day. You’re welcome!
Confused on how this relates to malware analysis?
In the world of malware and reversing, there are tools, scripts, and methods we use to investigate the relationship between malware families, detect new versions and understand differences across malware samples. A great example of why we are doing this is to understand if a current detection log written is still working on a newer variant of the malware. What did the adversary change in the code and will that impact our protection towards our customers? One approach is to compare older and newer samples to figure out which components are always present in both samples. In some cases, we get ambitious and use a bulk number of samples that are, for example classified as ransomware, and search for common denominators in the code to help identify new samples which fit the classification of belonging to ransomware.
Another classic method is to perform code comparison is binary diffing. Using BinDiff injunction with IDA Pro, we create two databases of the malware samples and start to compare the two. An overall comparison score is generated, and it will demonstrate the similarities which exists between samples as shown in Figure 1.
Figure 1: BinDiff example
In Figure 1, this dashboard from BinDiff, demonstrates the two sample have a similarity of 52 percent. Drilling more down into this dashboard, the differences and similarities are split up in several categories. It is also possible to visualize the functions for example, compare and spot the exact difference. Adversaries tend to reuse code and optimize certain parts in newer releases. The important question is where do these similarities exist? In the ‘generic’ code elements, or in the components that were used to create this malware?
There are many other methods to compare malware such as fuzzy-hashing, extracting code-blocks and then comparing etc. Recently in our blog on DPRK Ransomware families, we used graph-technology and the Hilbert Curve mapping to discover similarities shown in Figure 2.
Figure 2: Hilbert Curve mapping
We have frequently used code comparisons and visualizations but would it be possible to compare malware samples using a more abstract technique? What about sound?
A novel technique
Music, sound, and especially analog synths have always been a fascinating topic for me. As a veteran from the Dutch Navy, I had the pleasure to spend a week on a submarine listening to sonar sounds, seeing frequency waterfalls and filtering out ships versus animals; it was a phenomenal experience. Combining all of this with the desire to investigate and innovate new comparison methods, I pondered would it be possible to compare malware samples based on their sound?
To start, we took two samples of the Conti ransomware for Linux, one released in May, the other one in June. From a BinDiff and code comparison perspective, there were minimal differences and an overall score of 99.8 percent equality. Would our experiment with sound show the same?
First, we had to transfer the sample into an audio-file, so this can be played and used for frequency analysis. We used the ‘cat’ command from the Linux command-line and sent it through an audio player to generate the sound:
>> cat malwarefile.bin | mplayer -cache 1024 -quiet -rawaudio samplesize =1: channels=1 :rate=8000 -demuxer rawaudio -
This will make some noisy audio, similar to a dial up modem trying to connect to the Internet using a telephone line. Rerouting the audio from the headset-jack towards a mixer, the sound was recorded and exported both into the .WAV (waveform) and .MP3 audio files.
Conti May Recording
Conti June Recording
Loading the two .WAV samples generated from the Conti ransomware Linux variants in one of our used audio-analysis tools (Audacity and Sonic LineUp), we saw the spectrogram in Figure 3:
Figure 3: The sound of Conti
Studying the above picture, one can spot that while they are almost identical, there is a minor difference at the end of the sound-profile of the June sample. This matches what was discovered during binary code analysis.
Now, onto the next experiment. From our investigation on DPRK ransomware families, we discovered that the VHD ransomware sample and the BEAF sample had some code similarities but also a lot of differences. Would that also be possible to prove with sound?
Again, we created the required audio files for both samples. Since the binary malware samples were different in file size, the length of recording would be different for each sample. This was reflected when we loaded our audio files into Audacity.
Figure 4: DPRK Ransomware sound
We see first the difference in length, but also the similarities in the waveforms stand out. Using Sonic LineUp, we tried to line up the wave and frequency spectrums to align them as best as possible.
Figure 5: Spectrum diffing
Not only visually, but also when we conduct a frequency plot-spectrum analysis of the two audio samples, the differences clearly present themselves as seen in Figure 6.
Figure 6: Frequency plot spectrum
The VHD plot spectrum displays a lot more activity in the ranges above 7000Hz than BEAF as one example of identifiable differences.
Converting malware samples to sound to compare them was an interesting and worthy learning experience. It also proved things we found with traditional code analysis or visual analysis were also seen in audio analysis. Honestly, we did not expect the results of this fun experiment in proving what we observed during traditional code similarity/comparison research methods. It was remarkable to discover that in the Conti case, where we observed minimal changes from a traditional approach, that the sound conversion and frequency analysis demonstrated the same findings.
Wouldn’t it be nice if instead of asking for ransom, the ransomware gangs would start making music of their code for us to enjoy on Spotify or Apple Music? We gave it a try to create the first version of Conti ransomware gang Techno where we combined the generated audio of the Conti code and combined it with some tracks using Garageband. Check it out and tell us what you think!
I’m curious if there’s more creative talent out there that can convert code to audio and make some nice music with it. Upload your mix and share it in our SoundCloud channel so we can add it to the playlist.