Silicon Sleuthing: Finding a ancient bugfix on the 8086

Few CPUs have had the long-lasting influence that the 8086 did. It is hard to believe that when your modern desktop computer boots, it probably thinks it is an 8086 from 1978 until some software gooses it into a more modern state. When [Ken] was examining an 8086 die, however, he noticed that part of the die didn’t look like the rest. Turns out, Intel had a bug in the original version of the 8086. In those days you couldn’t patch the microcode. It was more like a PC board — you had to change the layout and make a new one to fix it.

The affected area is the Group Decode ROM. The area is responsible for categorizing instructions based on the type of decoding they require. While it is marked as a ROM, it is more of a programmable logic array. The bug was pretty intense. If an interrupt followed either a MOV SS or POP SS instruction, havoc ensues.

The bug was a simple mistake for a designer to make. Suppose you want to change the stack pointer register entirely. You have to load the stack segment register (SS) and the stack pointer (SP). The problem is, loading both of these isn’t an atomic operation. That is, it takes two different instructions, one for each register. If an interrupt occurs after loading SS but before loading SP, then the interrupt context will wind up at some incorrect memory location.

You could fix this in several ways. The way Intel did it was to make the two instructions that could modify SS hold off interrupts for one more instruction cycle. This would allow you to load SP before an interrupt could occur. Essentially, it makes the MOV SS or POP SS instructions protect the next instruction so you can code an atomic operation. The truth is, the “fix” stalls interrupts for any segment register load. [Ken] notes that this wasn’t necessary and later implementations on newer processors only stall the interrupt for SS.

You might think you could have pushed this off to the programmer. You could, for example, insist that stack pointer changes occur with interrupts disabled. The problem is the 8086 has a nonmaskable interrupt that uses the stack, and software can’t stop it.

Failure analysis back the 1970s and 1980s was fun. An optical microscope would do most of it. If you had a SEM with an EDS attachment, you could do nearly everything. An Auger and a SIMS would put you in the world-class lab status. These days with device geometry several orders of magnitude smaller, dice upside down in the package, and dozens of layers — we aren’t sure how you’d do all this on a modern chip.

We love [Ken’s] CPU teardowns. He works on bigger things, too.

SysLog.gr

Silicon Sleuthing: Finding a ancient bugfix on the 8086

Categories

Archives