Article

Data [In]security: Lessons Learned from Intel’s CPU Design Flaw

Since the news of two major processor vulnerabilities last week, tech companies have been scrambling to roll out patches, with organizations facing the possibility that said patches could render their systems inoperable. The security threats—known as Meltdown and Spectre—have flooded the tech news scene and have caused a PR nightmare for chip manufacturer, Intel.

First, let’s start with the basics. The Meltdown vulnerability allows user programs who don’t normally have access to operating system kernel memory to can gain access to it. This opens the door for sensitive items to be stolen, like user password or security certificates. Spectre, on the other hand, allows the attacker who controls the memory of one application to gain access to another application. Both vulnerabilities are very serious and while there will be a patch for Meltdown, your computer systems will likely take a performance hit. With Spectre, on the other hand, there doesn’t seem to be a patch coming out any time soon. In fact, computer hardware will most likely have to be replaced for this vulnerability to be resolved.

System Memory 101

Next, let’s talk about how these vulnerabilities came to be. As system memory has become larger and larger over the years, it’s become increasingly difficult to address all of that memory by its physical addresses. Not only that, but there are security implications about simply having memory addresses be known to the world. In fact, ASLR (Address Space Layout Randomization) was designed to obfuscate the physical addresses of memory to make it difficult for attackers to break into systems. So, what was done? Well, CPUs started using virtual memory addresses—however, the computer needed a very fast place to hold a lookup table of virtual memory addresses-to-physical memory addresses. You may think that holding that lookup table in memory would be fast enough—but it certainly isn’t. In order to make the lookups faster, those small tables are held in part of the CPUs memory called the “cache.”

The proper term for the virtual-to-physical memory address mapping is called the “page table.” The entire page table is too big to hold in the CPU’s memory, so a subset of the page table called the Translation Lookaside Buffer (TLB) is what’s actually held in cache. (If you’re interested in more information, here is an article from the University of Wisconsin-Madison.).

In this instance with Intel architecture, half of the small piece of memory on the process is reserved for the “kernel” (the brain of the OS, if you will)—and the other half is reserved for user programs. When a new program is executed, only half of the cache is emptied—the half for the program—and the kernel addresses stays in place.

Why was this choice made? For performance, it’s faster to only reload half of a piece of memory versus all of the memory. In the TLB example below, there’s metadata that states this memory space is for the kernel—and this is for the user. While a simplified version, a TLB table would look something like this:

Table 1.1

The kernel—or user flags—in the table are a 0 or 3 because CPUs use the concept of a “ring” to denote who owns the information. Ring 0 is the kernel memory and Ring 3 is the user’s memory space. While there are actually other rings, this is all we’ll need to know for purposes of this conversation.

So, how did this become a problem? Because of something called “speculative execution.” Basically, the computer will execute a piece of code before the program asks it to. Let’s liken it to your calculator adding two-plus-two. The full explanation of the vulnerability isn’t published as of today, however, from Google’s posting, there’s something that happens when speculative execution occurs—and it involves the translation lookaside buffers. I’m not an electrical engineer, but my guess is that there’s a problem with the ring metadata when the CPU performs speculative execution.

Essentially, if a piece of the user’s code gets flagged with Ring 0 metadata, it’s possible that user’s code could access anything else flagged with that Ring 0 metadata. To add an additional bit of context, actual data is not held in the Translation Lookaside Buffer; however, memory addresses are stored there. A malicious piece of code could read all of those physical memory addresses, and then proceed to read the data in those addresses. This understanding of the exploit comes from a Ph.D. Major in Amsterdam who’s written a POC for the vulnerability (Here is a link to the original Twitter conversation.).

So, now that you have a rudimentary understanding of where the vulnerability exists, where do we go from here? Well, sometimes, in order to move forward, you need to look back.

Hindsight is 20/20

Let’s take a trip back in time to 1993—the year Intel released its Pentium class of CPUs. Intel rapidly invested in the design of these processors and their performance has improved year after year. But at some point before 1995, a design conversation had to be made. When the designers at Intel looked at the known outcomes, here are some questions they may have asked themselves:

Does the processor share the Translation Lookaside Buffer between the kernel and user addresses?

Pros:

  • Inexpensive way to improve performance
  • Minimizes chip costs due to shared cache
  • Less design complexity because of a single cache design

Cons (known or unknown):

  • Potential for an unknown security vulnerability

Does the Translation Lookaside Buffer flush memory every time a new program needs to access data?

Pros (known or unknown):

  • Potentially more secure

Cons:

  • Definite performance impact

Does the processor have physically separate caches for kernel Translation Lookaside Buffer spaces and users’ Translation Lookaside Buffer spaces?

Pros (known or unknown):

  • Potentially more secure

Cons:

  • Increase in costs to produce chips

When the designers at Intel looked at the known outcomes, what could be the rational choice? They didn’t know pre-1995 that one day an individual would develop an exploit for this design pattern. There are many people already calling for lawsuits against Intel, and while I believe companies should take responsibility for their actions, people, or products, this isn’t one of those cases; Intel made a valid design decision based on the data at hand.

Where Do We Go from Here?

So, why is AHEAD talking about this? We don’t manufacture CPUs, and have no skin in this fight other than to help and support our customers as best we can through the process. As the Security and Compliance Solutions Principal at AHEAD, I spend a lot of time thinking about building security programs and securable infrastructures. One of the first things I talk to customers about is how to design their network for proper data isolation. Although I’m no longer a network engineer, I started out my career in networking, and as I worked through the Cisco certifications, one of the main design patterns hammered into us was logical or physical segmentation of the network. Data should always be isolated based on sensitivity—always. And unfortunately, this is one of the biggest design flaws I see when working with customers.

With that said, Intel made a perfectly valid design choice for the information they had available to them. They likely had no understanding that their design pattern would be exploited over 20 years later, with lawsuits being filed at a breakneck pace.

A few questions come to mind as a result of last week’s events:

    • How many companies do you know who have properly segmented networks based on the data that lives in those subnets?
    • How many companies do you know that have at least a stateful packet inspection firewall in between those subnets?
    • How many companies do you know that are monitoring the traffic between those networks and can identify things like port scanning and data exfiltration?
  • How many companies do you know that even have a working data classification program?

I’d imagine the answer would be “not a lot” for the bulk of those questions. To reiterate, Intel made a design choice based on the best data it had—and the best data we have today tells us that every company on the planet is likely under attack by Meltdown and Spectre. So, how is your network segmented and monitored? And how are your system processes segmented and monitored?

Again, AHEAD doesn’t have any stake in system hardware production—our main concern is helping our clients navigate the new security terrain resulting from this vulnerability. If you’d welcome a second set of eyes to review your security architecture, we have a team of experts whose personal passion it is to do just that.

Contact us to get started, and follow us on LinkedIn and Twitter for more tech news.