Hardware Trojans Under a Microscope

⚠️ [ ORIGIN SOURCE ]

https://ryancor.medium.com/hardware-trojans-under-a-microscope-bf542acbcc29

📅 [ Archival Date ]

Nov 4, 2022 7:43 PM

🏷️ [ Tags ]

RATHardware

✍️ [ Author ]

Ryan Cornateanu

💣 [ PoC / Exploit ]

https://crash.link/hwt

Introduction

While the security industry generally focuses on software cyber attacks, we can’t forget the security impact of lower level hardware flaws, such as those that affect semiconductors. The surface for silicon level attacks has widened over the past several years; as integrated circuit (IC) fabrication evolves for increasingly advanced microelectronics, the risk of flaws creeping into these complex systems also increases.

IC Design Flow

This article gives an overview and background of Hardware Trojans including netlists, die preparations, electron microscope images, and circuit testing. We will additionally be making our own physical layout design of a Hardware Trojan that will be analyzed using Klayout.

RTL Design vs Netlist

Before diving into Hardware Trojans, let’s review how silicon chips are designed so we can later discover them through reverse engineering. We can start at the behavioral level of the process. Register-transfer level (RTL) defines how a pattern of registers and logic gates are laid out.

Circuit Design of a D Flip-Flop

Designing RTL became a lot easier in 1983 when the first hardware description language (HDL) was developed by the Department of Defense to document the behavior of application-specific integrated circuit (ASIC) designs. VHDL and Verilog were introduced roughly at the same time and are still the two most popular HDLs today. These languages were created as a specialized computer vernacular to program digital logic circuits. If we take the above RTL diagram and convert it to Verilog code — it would look like this:

always @(posedge clk)
begin
 Q <= D;
end
assign D = ~Q;

This simple snippet of code can be synthesized to the logic gate design discussed. Logic synthesis is the automated process of converting an HDL language like Verilog or VHDL into an optimized gate-level description. When reverse engineering silicon die images, the first goal is to understand how we can take all these layers and translate them into a transistor netlist. Looking at something like a two input XOR gate without having the RTL component icons to replace the transistors, we can design this schematic:

XOR Gate Netlist

Simply put, it only took six transistors to make this complementary metal-oxide-semiconductor (CMOS) netlist design. From here on in our reverse engineering journey, you would convert this netlist into an RTL schematic, then to an HDL as seen from images and code snippets above.

Getting Silicon Die Images

In order to spot Hardware Trojans you’ll need to have the bare die on hand along with a few other preparations to the chip that I’ll discuss later in this post. One part I won’t touch on is the process of decapsulation & delayering of integrated circuits as I have already written a paper on this but it is necessary to understand how to obtain these tiny silicon dies and how they are imaged.

Defining Hardware Trojans

A Hardware Trojan can come in the shape of malicious circuitry that impairs the function or trustworthiness of an electronic system (i.e., an integrated circuit). Circuits are susceptible to this type of attack at a physical layout level or gate level. Specifically, gate-level Trojans are functional modifications or parametric deviations from the intended purpose of a component and can be achieved by the addition or subtraction of gates. Difficult-to-detect Trojans could be used to weaken the security of a cryptoprocessor by:

Reducing the effective entropy of a random number generator
Yielding information on the internal operation of the processor to an attacker.

Below you will find an example of a threshold voltage-triggered Hardware Trojan (HTVth) in a combinational circuit (2-input NAND gate).

Image from ResourceGate

According to the paper Ingress of Threshold Voltage-Triggered Hardware Trojan in the Modern FPGA Fabric – Detection Methodology and Mitigation: The essential design targets for HTVth’s are set accordingly such that:

The transfer task of a Trojan circuit must be linear.
Sensitivity to temperature and threshold voltage changes should be remarkably high.
The change in the output should be fairly high for a change in the input.
Insignificant temporal degradation and tolerance to process variations should be preserved.

Trojan circuits need to maintain an equally low power consumption without compromising the effectiveness of the triggered payload. The circuit shown above is unable to draw extra current while in a dormant state, which makes it difficult to detect. Methods like power signature analysis would not be able to tell confidently that this specific NAND gate has a Trojan.

In this example, the attacker can easily control the output of the 2-input NAND gate. With A(1) NAND B(0) == AB(1), we expect an input of 1 and 0 to give a high voltage output. With the inserted Trojan, the equation for A NAND B will have the opposite intended result. The Hardware Trojan (HWT) trigger could just as easily modify certain input pins or bit-flip expected results. This could help gain access to embedded credentials, keys, code, and much more.

Inserting Trojans

There are many ways to insert Trojans because there are many different types of Trojans. One type has the ability to place fixed inputs feeding into two flip-flops in such a way that the chip would still pass built-in self-test (BIST) checks. This could make bypassing a debugging security mechanism easier. A second type includes the ability to make side-channel attacks more accessible. The third type, layout-stage Trojans, can be injected at a number of points, specifically if multiple third-party IP cores are combined in a single design. In principle, a single design could fall victim to multiple Trojans created by different groups of adversaries.

RTL insertion is one of the more popular methods for injecting Hardware Trojans. The Trojan itself can be a system-level object that is converted to a rogue RTL or could be delivered during ASIC layout. Additionally, a Trojan can mask preparation by modifying the data to mask a circuit that is already in production. This is achieved by using resolution enhancement technologies (RET), such as optical proximity correction (OPC) which corrects for the wave-like behavior of light when etching the nanoscale attributes of the most modern integrated circuits. More information can be found at the Wikipedia on Tape-outs.

Mask alternation could, for example, change a dopant’s polarity within a transistor, n-type for p-type or vice versa. Researchers from Horst Görtz Institute for Semiconductor Security claimed it would be extremely difficult to detect, especially if carried out post-signoff. A forgery could certainly make the changes immediately before mask production.

Resized p-dopant used to affect the behavior of an AOI cell (from “Hardware Trojan attacks and countermeasures”)

In the case above which was presented by Becker et al, an inverter could be rigged to provide the wrong output. For example, Vdd can be high at all times by switching the p-dopant mask for an n-type version, or have its strength altered by reducing the width of the p-channel metal-oxide semiconductor (PMOS) transistor. In the image above, the p-dopant encompasses the entire Vdd region but in the Trojan schematic (to the right), it only takes over half of the transistors on the Vdd side.

Trojans that have the ability to affect more than one path provide a higher chance of being detected. With low transition probabilities, Trojans are unlikely to affect the circuit power consumption. Even if the nets are chosen from non-critical paths without shared segments, it would be extremely difficult to detect Trojans using delay-based techniques. This testing flow was applied to a Ethernet MAC 10GE circuit, and the chip was synthesized at a 90nm technology node. It was given a comparator Trojan which monitors the wires of the data bus. The logic of the comparators, as shown below, switches as the data bus changes and therefore consumes a significant amount of power.

A comparator Trojan

As you can see, inserting Trojans at an RTL or even gate level is quite complex, and can have a lot of consequences if not done properly, such as a major increase of power consumption or faulty behavior.

It was found that Side-channel signal analysis and power transition probability analysis showed the highest and most accurate number of nets and transition probabilities due to the fact that it can show power anomalies introduced by the operation of the comparators. This was shown to be better than techniques like delay analysis and structural analysis (which found no testable faults in the circuit).

If we look at Trojans from a layout perspective, the game changes a bit. Most of the time with circuit design, there is a lot of unused spaces or even areas that are chemical mechanically polished (CMP). Trojan cells that are placed in an area that is not used by the design have more impact on the circuit’s power consumption than all the other methods listed above.

A flow was developed to partition the circuit layout and establish practical locations for Trojan cells taking up unused spaces. The distribution of circuit cells and white spaces across the circuit layout are acquired, and the possible locations for circuit cells placement are then determined.

Sample Preparation, Microscope Images, & Testing

Using optical reverse engineering techniques can reveal a lot to a researcher. Basic microscopes such as a compound, stereo-, or even a metallurgical can be used to detect Trojans, but the industry standard leans towards scanning electron microscopes (SEM) because ICs are getting smaller. Advanced capabilities are necessary in order to be able to research further. In order to even prepare a chip to show hidden Trojans, some things are done with the doping that require some use of chemical etchant.

As seen in the zoomed in picture below, the gate of the original design (a CMOS inverter) is modified by applying a different dopant polarity to discrete parts of the gate’s active area.

SEM image of a Trojan in a 130-nm chip (from “The State-of-the-Art in IC Reverse Engineering”)

According to a study conducted by the University of North Carolina, they were able to fix the output of transistors to a specific value, and/or even had the ability to change the strength of transistors in a similar way.

The University of Florida conducted similar research on detecting Trojans using different beam voltages on a custom CMOS chip. By using this technique they were able to use rapid SEM imaging, image processing, and computer vision algorithms to detect insertion, deletion, and modification changes between a Trojan-free chip and an IC Under Authentication (IUA).

As discussed above, preparatory work is needed in order to prepare the chip for imaging because the electrons cannot penetrate into a thick layer of silicon dioxide (SiO₂) substrate. A commonly used method for remove SiO₂ is mechanical polishing as it is more easily controlled than using something like dumping a chip in hydrofluoric acid, as seen in my last paper.

To maintain even substrate thinning, researchers used a VarioMill. This product uses laser chemical etching, five axis adaptive CNC micro-machining, and VIS/IR spectrometry to thin backsides within 1μm thickness.

The goal here is to take SEM images from the IUA and compare them to the golden IC. The SEM needs certain parameters set in order to take an effective image of the IUA such as beam voltage, field of view, dwelling time (speed), and resolution. With these settings, the microscope can be programmed to image the entire IC.

SEM images variations with different Beam Voltages (from “Detecting Hardware Trojans”)

Changes were made by the researchers on a doping level in order to simulate the activity of a Trojan which were then imaged from a SEM. In some of these examples, an attacker could potentially change a NAND to a NOR gate that will essentially implement malicious logic of their own through methods such as modification on an active region, camouflaging cells on the 1st metal layer, or by inserting/deleting logic on an active region (i.e. adding an inverter).

Top: NAND/NOR (Top) Inverter (Bottom) (from “Detecting Hardware Trojans”)

Modern microcontrollers and CPUs range from 10,000 to billions of gates, and analysis will require many images from a microscope. This is why automation is key when it comes to collecting information for both nominal circuits and circuits with Trojans.

Other proven methods have shown promise and, particularly, path delay testing. In Yale’s experiment, the researchers inserted a 2-bit comparator-based Trojan near the input of the DES core of a chip. Path delays are used to generate fingerprints of nominal chips, so every path in the netlist is included to be a part of the trace. Each fingerprint represents one aspect of the total characteristics of a genuine design. The chips are then validated by comparing their path delay parameters to the fingerprints, which is fairly significant when it comes to tiny Trojan circuits. Ultimately, they were able to deduce that only those chips whose sample points are in or near the intersection of all given sets in the entire fingerprint space will be considered genuine. If one or several spaces of the sample points were far away from the said intersection, then the chip was considered to have a Trojan inserted.

We will now cover a section where a Hardware Trojan is created by using open source Electronic Design Automation (EDA) tools to show what it looks like on a custom design level.

ASIC Design of our own HWT

Now that we’ve had the chance to learn about Hardware Trojans from many different resources, I figured it would be fun to create our own from RTL to GDSII, where we can essentially view our chip using 3D rendering software like GDS3D. The beauty of designing our own infected chip is that we get to see how to use open source tools to do it, as well as being able to look at an integrated circuit without needing a microscope as we can magnify as much as we want using software.

Let’s go over gate-level design of a normal 4-input AND-OR logic circuit side by side with the same design but with a Trojan inserted. We will convert this into Verilog code, then use either QFlow with OSUs PDK or Openlane on Skywater130 PDK to produce the GDSII files, so we can go over the ASIC design as if we were looking at it through zoomed in lens.

Injected Trojan design (from “Exposing Hardware Trojans in Zero-Knowledge”)

According to Dimitris Mouris, Charles Gouert, and Nektarios Georgios Tsoutsos, the gates are labeled by the evaluation order (first G1, then G2, etc.). The execution trace is also shown on the rows of the tables, that will show output of the four State Machine (SM) registers, which are defined by the variables r0 — r3. The highlighted and underlined values in the tables show which simulation variable was overwritten after the evaluation of the gate. Diagram (a) shows a circuit that outputs 1 when all four inputs are set to high. Diagram (b) shows the same circuit as (a) after being injected with an example Trojan that is only activated when all inputs are set to 1.

We have two ways of converting the gate-level logic to Verilog, either by hand or by using the Digital tool. We can draw up schematics and have the software convert to an HDL of our choice. I’ll go with the lazier option B.

module non_active_hwt (
 A,
 B,
 C,
 D,
 Y
); input A;
 input B;
 input C;
 input D;
 output Y; assign Y = ((((A & B) | (C & D)) | (C & D)) & D);
endmodule

The generated Verilog code is pretty simple for the design that does not contain a Trojan. The output of Y is dependent on four inputs that go through a logical AND in 3 groups while 2 groups get OR’d for an output to get AND’d one more time by input D. Now let’s take a look at the code I generated with the Hardware Trojan injected.

module active_hwt (
 A,
 B,
 C,
 D,
 Y
); input A;
 input B;
 input C;
 input D;
 output Y; assign Y = ((((A & B) & (C & D)) ^ (((A & B) | (C & D)) | (C & D))) & D);endmodule

This is fairly similar to the code with no Trojan except this code has a XOR operation along with an extra AND with input C & D. We can use EDA Playground to simulate the wave patterns using a simple test bench for both. If you aren’t familiar with writing test benches then you can look at my code to see how they are written up. Essentially we are testing out the inputs of A, B, C, & D to see how difficult it would be to notice a difference from a waveform perspective. Remember, our waveform should only show Trojan activation when we have 4'b1111, all other input and output signals should mirror each other.

No HWT Inserted (Top) HWT Inserted (Bottom)

Both of them look very similar in terms of how Y gets switched on and off but if you look closely into the output of the waveform with no Hardware Trojan, you’ll see one difference. When A through D are all switched to 1, the output of Y is 1 as expected. Notice when we test out the same inputs with the HWT inserted code… the output of Y is 0 when all pins are activated. So we found the trigger of the Trojan, and the functionality of both RTL designs behave almost identical. You can see that in line 54 of our test bench code, that if we were to forget the test of all inputs being set to an active state, the generated waveforms of both circuits would have been identical.

Let’s take this to to pre-fabrication and see how easy it would be to spot the differences between both ASIC designs. We can do this by going through the installation process of QFlow, where all the dependencies like yosys, netgen, graywolf, qrouter, and magic will be installed. Using the QFlow GUI, you can direct it to the Verilog file that you would like to set for synthesis preparation.

QFlow GUI

You’ll have several options for the technology dropdown, I decided to go with the latest PDK, which is OSU018. This means it is going to be designed at a 180nm scale. After running through synthesis, placement, STA, routing, etc. You’ll end up with a .gds file in the /layout/ directory. Using Klayout, we can open the GDSII file in the program.

Place and Route Design of Non-Hardware Trojan Injection

If you take a look at this simple IC design, it is broken down into very few simple logical gates. The first one being a AOI21X1, this is a 3 input AND-OR Inverter gate which has the function of Y=!((A&B)|C).

Substrate of the AOI21X1

When stripping down the layers, our design matches the function of an AOI and contains 6 PNP & NPN transistors. When we compare this to the design of the AOI21X1 in the HWT inserted design, they look exactly alike. In fact both designs contain BUFX2 (non-inverting buffer [Y=A]), INVX1 (Inverter [Y=!A]), and a NAND2X1 (two-input NAND gate [Y=!(A&B)]). Let’s take a look at what standard cells our infected design has added.

Place and Route Design of Hardware Trojan Injection

As we can see, this is not as stealthy as it was on the RTL simulation level or even inserting Trojans on a dopant level because now there seems to be a couple more gates. Our design shows that the extra cell that was added was a NAND3X1 next to the original NAND2X1. On top of that an extra 2-input NAND gate was added near our non-inverting buffer.

Substrate of the both NAND gates

On the left side we have 4 transistors making up our 2-input NAND gate, and the output Y of this gate is connected to input A of the inserted 3-input NAND gate (on the right side), which is affecting the output of this gate. This extra gate consists of 6 transistors, totaling an extra 10 transistors to the infected IC design. So where does it all connect to affect the actual physical output pin?

Physical Output of Y

Adding the layers back out we can see the top metal (green) layer shows two physical pins, input D and output Y. The extra two NAND gates that are inserted in this design will ultimately change the output of this pin when the specific trigger is set, which in our case is when all physical input pins are set to 1. When comparing our Verilog code to our IC design, you can see the similarities between the two and why it would be so difficult to catch these at some points of pre-fabrication. We can take a look at the 3D model to see a representation of what it would look like on a wafer.

3D rendered design layout (Top: Non-HWT, Bottom: HWT)

If it wasn’t for the extra fillers (the two large empty orange and yellow areas) that were automatically placed and routed with QFlow on bottom HWT design, they would look nearly identical because the two extra gates discussed earlier doesn’t add a lot to the design.

You might be wondering in what scenario this would be malicious, great question. Imagine these four input pins when all set HIGH expected the outcome of a HIGH output which basically allows the IC to lock the CPU in a non-debug state. After our attack alters the design, it will make sure when all pins are set HIGH, the output is LOW. This places the CPU in a debug state that would allow attackers or adversaries to learn of private information and IP about your SoC (System on Chip) that wasn’t intended to be public.

Conclusion

In this paper we went over a fraction of the ways one could insert Trojans, we reviewed methods to detect them, and we designed our own from scratch using the tools I had at my disposal. With this research, we can truly see what an infected chip would look like on the supply chain side.

Here is the repository that contains all the HDL and design files I made during the research phase of place & routing my Hardware Trojan IC.

Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article, please DM me at my Instagram: @hackersclub or Twitter: @ringoware

Happy Hunting :)

A special thanks to one of the reviewers of this paper, Matthew Venn and his Zero to ASIC course that inspired me to get into chip design work.