Nothing Special   »   [go: up one dir, main page]

The UCA now supports 8087 FPUs

Early in the development process, “UCA” meant “Universal CPU Analyzer”. Then I thought it could also be used to test non-CPU like FPUs, Bitslicers or RAM chips and I finally changed the name for “Universal Chip Analyzer”. The 8087 FPU is the first supported IC that’s not a CPU or MCU. Released in 1980, it’s a much more complex chip than its companion 8086. While the later is built with 29.000 transistors, the 8087 integrates 50% more of them for a total of roughly 45.000! It handles various floating-point arithmetic operations (additions, multiplication, square root, etc.) as well as transcendental functions from exponential to trigonometric calculations. The 8087 was the very first FPU to implement the draft of what was to become the initial IEEE 754 standard (circa 1985).

Building an adapter for the 8087 starting with the iAPX-86 code already done was quite easy. Emulating the CPU with the FPGA was technically feasible, but this would have limited the complexity of the x86/87 ASM code able to run. Fortunately, 8086s are still widely available for cheap and every collectors have spares.

The UCA 8087 FPU Adapter requires any 8086 with a rated speed of 10 MHz of more (the fastest 8087 is clocked at 10 MHz). While the standard 8086/8088 UCA Shield configures the CPU in the simplified “MIN” mode, this adapter requires the “MAX” mode with additional bus decoding stages. The original Intel 8288 Bus Controller had been translated in Verilog HDL and implemented in the FPGA. After some tuning, everything was running properly :

An option to automatically subtract the power consumption of the 8086 (to show only the one from the 8087) will be added later . Target frequencies are 4, 6, 8 and 10 MHz.

 

The UCA now supports Intel 80186 & 80188

The Intel 80186 is one of the lesser known early x86 CPUs. In February 1982, 4 years after the 8086’s introduction, Intel released its successor, the 80286 (or “286”). Simultaneously, Intel also quietly released the 80186 to target different markets. While the 286 is a generic microprocessor like the 8086 was, but based on a new microarchitecture, the 186 could be considered as the first x86-based microcontroller. The difference between a microprocessor (CPU) and a microcontroller (MCU) is the level of integration inside the chip. A microprocessor requires a lot of support components (memory controller, bus arbitration logic, etc.) and is primarily used to build computers. A microcontroller integrates many of these components along with a (less powerful) microprocessor and is used for embedded purposes.

The 80186 integrates an enhanced 8086 CPU with a 16-bit bus and many support components: a clock generator, various controllers (DMA, Interrupt, bus, etc.), programmable timers, wait-state generator, chip-select logic, and even more. All these features greatly reduce the overall component count and the complexity of the board. Here is the original 186’s block diagram:

The 80186 is basically a hybrid concept that has been used in embedded applications as a microcontroller, but also as a CPU to build cheap 8086-class computers. For example, it was at the heart of the Tandy 2000 PC released in 1983, but also buried inside the Intel 14.4EX Modem to compute complex algorithms. They later used the 80188, an even cheaper offshoot almost identical to the 80186 but based on an external 8-bit bus (like the 8088). As 8086-class CPUs, both the 80186 and the 80188 can be linked to the 8087 FPU, but this association was almost never found in real-world products. Original 80188/80186s were built on Intel’s HMOS 3 µm process at 6 MHz, 8 MHz and 10 MHz. They came in 3 different packages: PGA-68, leadless ceramic (CLCC-68) and leadless plastic (PLCC-68). The Universal Chip Analyzer is now able to test and run code on all these CPUs:

UCA testing an original A80188 (PGA-68) at 8 MHz

In 1987, Intel released the 80C188 and 80C186, built on Intel’s 1.5 µm CMOS process. Clock speeds reached 16 MHz and power consumption was vastly reduced. Some features were also added: a power-save mode, a refresh controller to handle RAM refresh cycle without external components and a FPU interface to support the newly released 80C187 (support for the old 8087 was dropped). The uncommon 80C187 is essentially a 80387 repackaged into a DIP-40 or PLCC-44 package. The UCA is able to test and detect 80C186 and 80C188 in various packages:

UCA testing a Intel A80C186-16 (PGA-68) at 16 MHz

In 1991 (the 486 was available at that date), Intel released the improved “XL” variant. Thanks to the CMOS 1 µm process, the 80C186XL and 80C188XL were able to reach up to 25 MHz at a lower power consumption. They now use a static design (able to be clocked down to DC for even more power reduction) while the 80C18x were based on a dynamic design (with a minimum clock frequency needed to retain internal register values). The UCA can also test all members of the “XL” family and even detect their stepping (A-/B- or C-step) :

UCA testing a R80C188XL-25 (CLCC-68) at 20 MHz

The maximum frequency for the UCA is 20 MHz because 186/188 requires a clock doubled input and I wanted to avoid an external PLL to keep cost low (the 186 adapter is a simple 2-layer PCB). Adding  support for it to reach 25 MHz (or much more) is trivial but that will almost double the BOM price for the adapter (from ~$10 to ~$20).

Intel also released the 80C186EB (5V) and 80L186EV (3V) in 1990 and the 80C186/188EA & 80C186/188EC one year later (also available in ‘L’ version).  The 80C186EA in PLCC-68 package is very close to the 80C186XL. The main differences are some more advanced power saving modes and TTL-level inputs compatibility (while the XL requires CMOS-level inputs). I’m still looking for one, but they should work fine on the UCA. The “EB” line adds an improved chip-select unit, two UART for serial communication and 16 GPIO. While electrically able to run on the UCA, they come in a bigger PLCC-84 and PGA-88 packages and don’t fit physically. The “EC” line adds even more GPIOs and is only available in SMD QFP-100 packaging. Designing an adapter for EB and EC 186/188s is not planned at this time.

Stay tuned for another big UCA milestone in the next few days!

PS: PLCCs 80188/186 are also supported!

JTAG Support Added to the UCA 486 Adapter

While developing the 486 Adapter for the Universal Chip Analyzer, I was worrying about how to distinguish between early CPUs from AMD and Intel (the ones without CPUID instruction support). There is no way to distinguish them because they’re basically the exact same chip: same microcode, same architecture, same power consumption, etc. AMD used the Intel’s die for its whole early 486 line and only the external packaging was different. Thus, no BIOS nor any software detection tool can distinguish between an early AMD Am486DX2-66NV8T and an Intel 486 DX2-66. Both even share the same ID set in EDX register at boot.

I carefully read the datasheets and finally found a small difference between them. It’s located in the JTAG controller, embedded in all AMD 486s and Intel 486s starting with the DX-50. The JTAG controller is used as an internal test tool since the late 80s, standardized in 1990 as IEEE 1149.1 (“Standard Test Access Port and Boundary-Scan Architecture“). It’s now an industry-standard feature present in all complex ICs for debugging purposes. JTAG was commonly used in the 90s to remotely sense the state of all hardware pins with the ability to toggle them individually between 0, 1 and High-Z (floating).

The JTAG controller is generally totally isolated from the CPU: you can’t access any of the internal test features nor test registers from the code running on the CPU. (Some years ago, Intel added a feature to access JTAG from USB, which caused some serious vulnerabilities). Back in the 90s, JTAG access had to be done from dedicated CPU pins called the TAP (Test Access Port). The TAP uses 3 input pins (TMS for Chip Select, TCK for Clock and TDI for Data Input) and one output pin (TDO for Data Output). JTAG has been designed to daisy chain many ICs (boundary-scan).

The basic early JTAG implementation in 486s supports 5 instructions:

    • (0000b) EXTEST – Arbitrary setting of pins on the CPU to a given state (0, 1, Z)
    • (0001b) SAMPLE – Poll and report the status of all CPU pins.
    • (0010b) IDCODE – Used for chip identification
    • (1000b) RUNBIST – Launch the internal self-test, built-in on all CPUs since the 386s
    • (1111b) BYPASS – Connect TDI with TDO to bypass the chip (when talking with another IC in the chain)

According to Intel’s datasheet, the IDCODE instruction reports a 32-bit register with the following format:

The Manufacturer Identity is a 11-bit value linked to the chip manufacturer:  0x09 for Intel and 0x01 for AMD. That’s how you can distinguish between an Intel and AMD 486. JTAG is not available on Cyrix, TI and UMC 486s, but these CPUs don’t use the Intel Microcode and they have other identification methods. Accessing the IDCODE register to distinguish AMD and Intel 486s requires specific hardware. Due to limitations in I/O lines available from the FPGA and the tiny space available on the PCB, I chose to add an extremely tiny ATMega328P-MN (0.5 mm pitch!) on the 486 Adapter to access the JTAG port:

The code for bit-banging JTAG commands and communicating with the JTAG controller was quickly written, thanks to this blog that published a nice proof-of-concept many years ago. I then added the link between the FPGA and the outside world to grab the JTAG data from the Windows companion tool. I took the opportunity to rewrite almost all the communication stack between the Universal Chip Analyzer, its integrated MCU and the FPGA.  Let’s try with some real-world 486s!

* AMD Am486DX2-66NV8T

The JTAG IDCODE register reported (0x00432003) strictly follows Intel’s datasheet:

      • Bit[0] = 1 (JTAG constant)
      • Bit[11:1] = 0x01 (AMD’s Manufacturer ID)
      • Bit[27:12] = 0x0432 (Part Number = CPUID Family/Model/Revision)
      • Bit[31:28] = 0 (Revision not set)

As expected, the Part Number filed by AMD is the same as the value reported in the DX register just after boot. All Am486s I tested follow this scheme. I noted that the value reported on the JTAG IDCODE register changes with features activated (2x or 3x multiplier, WT or WB cache mode) just like the CPUID value.

 * Intel 486DX2-66

Here is the most interesting part. For some reason, Intel does not follow its own public datasheet on most of its CPUs. Many Intel’s 486-era datasheets show the JTAG bit order as previously described, but the real value returned by many CPUs I tested often reports a totally different organization (only described properly on a printed Intel Datasheet I own).

 

The raw JTAG IDCODE register value reported on an early i486 DX2-66 (SX626) is 0x00432013 as expected, but a late one (SX955) returns another encoding: 0x40285013. It decodes as follows:

      • Bit[0] = 1 (JTAG constant)
      • Bit[11:1] = 0x09 (Intel’s Manufacturer ID)
      • Bit[16:12] = 0x05 (Proprietary Model Code)
      • Bit[20:17] = 0x04 (CPUID Family, 0x04 = 486)
      • Bit[26:21] = 0x01 (Intel Architecture Type, 1 = x86)
      • Bit[27] : 0x00 (Core Voltage – 1 = 3.3V / 0 = 5V)
      • Bit[31:28] = 0x04 (Proprietary Revision Code)

The Model Code reported in the 5-bit field in bits 16:12 is different than the 4-bit “Model” code read in DX at reset. Here is what I noted:

      • 0x01 = 486 DX
      • 0x02 = 486 SX
      • 0x05 = 486 SX2 or DX2
      • 0x07 = 486 DX2 w/ WT Cache
      • 0x08 = 486 DX4

Support for JTAG is definitely an interesting feature to dive deeper in the 486 architecture. As for today and as far I know, the Universal Chip Analyzer is the only hardware or software tool to distinguish between an Am486 and an Intel 486.

More UCA news soon!

 

The Universal Chip Analyzer now supports Intel 286

Another milestone – albeit not the hardest one – has been reached! The 80286 is the 2nd gen 16-bit x86 CPU introduced by Intel in 1982. The most important improvement was the use of a separated data and address bus. Its predecessor, the famous Intel 8086, used the same pins to send address and then data. The lack of that slow time-multiplexed bus on the 286 allowed a major performance boost, sometimes more than 100% at a similar clock speed. The microarchitecture also evolved with a more advanced (dedicated) address calculation unit and a faster multiplier. The 80286 was also able to support up to 16 MB of RAM, thanks to its 24-bit address bus.

The 80286 also introduced the protected mode, designed to allow much more advanced memory management, with the ability to build multi-user systems using multitasking applications. Unfortunately, due to several limitations in that first implementation, along with several hardware errata found in earlier stepping, protected mode wasn’t really used by software developers on the 286. Intel only solved all these issues with the 80386. The Intel 80286 was initially released at 4, 6 and 8 MHz on nMOS 1.5 µm process. Later released reached 12.5 MHz in 1 µm CMOS process. Several other companies produced CPU fully based on Intel’s 286 microcode like AMD, Siemens and Harris, with speed up to 25 MHz!

UCA 286 Adapter testing LCC (left) & PLCC (right) 286s

Three common 68-pin packages were used for the vast majority of the 80286s ever produced: the original ceramic PGA, a leadless LCC (also ceramic) and a plastic PLCC. On the picture above, you can see an AMD R80286-8 (LCC) and a Harris CS80C286-25 (PLCC). The Universal Chip Analyzer is able to test all three packages just by plugging the related socket on the PGA DIP Socket. Frequencies available (by DIP Switches or software) are 4, 8, 10, 12.5, 16 and 20 MHz.

Why not 25 MHz? Because the 80286 requires a clock-doubled input and feeding a 50 MHz clock to get the 25 MHz core frequency would have required an external PLL. Not a big deal, but there is only one rare 286-class CPU that supports this frequency (the Harris/Intersil CS80C286-25 pictured above) and its timings is not fully compliant with the 286 specifications. Designing a special UCA adapter just for this chip is trivial, but quite useless because the 286 Adapter is already able to test it at 20 MHz. Speaking about “high” frequencies, using the right socket is crucial  

The UCA 286 Adapter is fitted with a high-quality DIP Socket. Directly plugging a PGA 286 CPU is possible but not convenient for testing multiple CPUs in a row.  I was able to secure some ZIF Sockets for 68-pin PGA like the blue one pictured here (from AMP) and also some 3M LCC sockets complete with top cap. About PLCCs, I first tried some cheap socket from eBay. That was a disaster: contact pins were too weak and bent after 2-3 insertions. Worst, the maximum frequency allowed was 8-10 MHz. Replacing these crappy sockets with other ones from Foxconn or 3M solved all the issues. I also bought some awesome Yamaichi test Socket for PLCC (on the right), but unfortunately, they use a specific pinout. As the 286 Adapter uses a simple 2-layer PCB, I will consider designing a specific PCB just for them.  

With the hardware finished, I’ll later tune the software-testing code to see if I can detect various stepping, and maybe also the manufacturer.  

 

The UCA 386 Adapter supports Ti & Cyrix 486s

Adding support for Cyrix & TI 486s was supposed to be a matter of hours. It finally took almost one month and gave me many headaches. I almost burned everything to the ground several times in rage, begged for help from FPGA’s gurus who told me what I’m trying to achieve was like squaring the circle, but I did not give up. Let’s try to explain why it was so hard.

— always(@TLDR; Technical stuff) —

FPGAs are synchronous beasts used to create finite states machines: almost everything inside a FPGA is synchronized to a clock signal. Each time the clock is ticking, the HDL code analyzes inputs and sets a pre-defined state (that itself defines registers, outputs, the next state, …). To add support for a CPU, you must read the datasheet and write some HDL code that will provide the correct outputs (from the FPGA to the CPU) within the required timings. All these timings are linked to the base clock. A synchronization between the CPU and the FPGA is crucial. For all other CPUs I’ve worked on for the UCA, the FPGA provides the base clock to the CPU. Both the FPGA and the CPU are sharing the same clock and synchronization is easy. But 386s require a clock-doubled input (80 MHz for a 386DX-40 MHz) that I’m not able to provide directly from the FPGA because the 3.3-to-5 volt translators are too slow. So I use an external clock-doubler PLL, but doing so prevents the FPGA from having access to the CPU clock. That’s the root of all issues I had.

Fortunately, using an external phase-locked loop (PLL) means the clock input phase is synchronized with the clock-doubled output signal: the rising edge of both clocks occurs at the same time.  Knowing the transmission delays added by the voltage converters at a given frequency, you can still synchronize your FPGA with the CPU without having access to the base clock. That works fine as long as you don’t change the frequency. But that was too easy: I want to be able to switch frequency on-the-fly and within a large range (from 12.5 MHz to 40 MHz to cover all 386s). That’s still possible if you build many bitfiles (compiled HDL “FPGA firmware”), one for each frequency. Nah! I want to use the same bitfile for everything, including support for both microarchitectures (Cyrix & Intel) despite the different timing’s requirements. That’s hell but I almost succeeded.

The actual firmware is not perfect but I’m quite happy with it because it works as expected in most cases. The remaining issue is a hole between ~21 and ~28 MHz where the FPGA can’t reliably catch the required inputs from the CPU at the rising or falling edge of the clock. My Logic Analyzer is unfortunately too slow to solve this but it’s not a big deal. The HDL code works fine at 12.5 MHz, 16 MHz, 20 MHz, 33 MHz and 40 MHz. The only “retail” frequency I’m not able to do is 25 MHz. I built another bitfile for this frequency only and I’ll hope to find a way to merge everything in the same bitfile later. To avoid losing my mind, I’ll wait to have enough money to buy a faster logic analyzer (like the lovely DSLogic U3Pro32) to work on this again.

— End —

But here it is: the UCA supports all Cyrix-based 386 like the 486DLC. Here are the ones I used for the test:

Cyrix 486DLC & DRx2Unlike 386-class CPUs from AMD, which are based on Intel’s microcode and are exact clones, the Cx486DLC introduced in June 1992 uses a custom microarchitecture built from scratch by Cyrix. While still using the 32-bit 386 bus, they come with 486-class features like an embedded L1 cache and some new instructions. The Cyrix 486DLC is not a perfect pin-to-pin replacement for Intel 386s as timings are a bit different and cache control lines must be handled by the chipset. Compatibility issues are well known with many – especially older – motherboards. The original 486DLC was available at 25 MHz, 33 MHz and 40 MHz. All of these were manufactured by Texas Instruments on the 0.8µm CHMOS node. Ti also launched their own, rebranded 486DLC chips, which were exactly the same except for the marking. Please notice the vicious 90° rotation between printings and pin 1 on the Ti486DLC. Fortunately, the Universal Chip Analyzer have strong short-circuit protection built-in…

Cyrix also later released a special, clock-doubled version called the 486DRx². It was available at 16/32, 20/40 MHz, 25/50 MHz and even 33/66 MHz. This later one was the fastest PGA132 CPU ever released.

Cyrix 486DLC-40 & Cyrix 486DRx²-25/50 Tested on the UCA
Cyrix 486DLC-40 & Cyrix 486DRx²-25/50 Tested on the UCA

The original Cyrix 486DLC exists with two steppings: the earliest one with CPUID 0x420 and a later one with CPUID 0x421. The proprietary “DIR” identification registers available on Cyrix’s CPU is only available on newer CPUs. None of the 486DLC tested have them. The 486DRx² is the only one to have DIR registers and reports itself as Model = 0x07. The UCA happily tested the 486DLC at 40 MHz and was even able to overclock my 486DRx2 25/50 at 33/66 MHz for a short time. Cyrix 486s run hot and deserve a proper heatsink. Power consumption is as high as a 486 DX2 and can go as high as 4 watts (4 times higher than a later Intel 386 DX-33)!

Much later in the development process, I feel confident enough to try a blank 486DLC Engineering sample I got many years ago.

This ES is not a clock-doubled CPU like the DRx² and was able to run properly at 33 MHz. CPUID is 0x421 and – surprise! – it has DIR registers, identifying itself at Model = 0x01 (the expected value for a Cyrix 486DLC) and stepping 0x22, with seems to match the handwritten value (2/2) marked on top. The DRx2 25/50 tested above comes with stepping 0x21, so this ES seems newer. I don’t know at this point if any 486DLCs were released commercially with this stepping – or even if any retail 486DLCs have DIR registers enabled.

Let’s now talk about the Ti 486SXL. After having simply renamed the Cyrix 486DLC to Ti 486DLC, Texas Instruments released a new, reworked core they called the “486SXL”. It was available with PGA132 (386) and PGA168 (486) pinouts. Two models were released for PGA132 Socket: the TI 486 SXL40 and the TI 486 SXL2-50. Here they are:

They come with two major differences compared to the Cyrix 486DLC. First, TI boosted the L1 cache from 1 KB to 8 KB (same size as the Intel 486). Then, the clock-doubling feature (also available on the SXL-40 despite its name) is not always activated by default like on the DRx². It must be enabled after boot by software. You basically have to mess with internal proprietary registers to enable the clock doubling mode.

Very few 386 motherboards support the Ti 486SXL but the UCA happily tested it with and without clock-doubling. Just for fun, I ran some benchmarks on all 386s now supported by the Universal Chip Analyzer. The code is not really well-tuned and is only based on some register manipulations and a lot of math integer operations (add, sub, imult and idiv). Here are the results:

386-class CPUs benchmark

Intel 386s appear as the slowest of them all. AMD 386s performances are exactly the same as expected but their famous 40 MHz model offers a 20% boost versus the Intel 386 DX-33. Cyrix 486DLC are much faster. When introduced, they claimed “up to 2x faster than 386DX at same clock frequency”. Our test showed a ~50% improvement between the Intel 386DX-33 and the Cyrix 486DLC-33. The 486DLC-40 is ~80% faster than the fastest Intel 386.

Anyway, the most impressive performance come from the DRx²: the rare 33/66 MHz version is actually ~7x faster than the original Intel 386 DX released at 12.5 MHz in 1986! Results from the TI486SXL show it’s entirely based on the Cyrix 486DLC core with no tuning at all on the microarchitecture. The effect of the increased 8 KB cache is invisible because the UCA has an extremely fast RAM without any wait-states (similar to the L1 cache). Anyway, even real-world applications don’t benefit from a big gain (no more than 3-5% at best).

Stay tuned for more exciting news from the UCA!

 

The UCA 386 Adapter now supports Intel RapidCAD

The elusive Intel RapidCAD Engineering CoProcessor is a weird and rare 2-chip set designed to upgrade 386 computers. It has been released in February 1992 for $499 and sold as a coprocessor. Technically, the RapidCAD is a 486DX assembled inside a 132-pin ceramic package that plugs into a standard 386 Socket. It features an integrated FPU but Intel removed the 8KB L1 cache and the 486-specific instructions. A second chip (RapidCAD-2) plugs into the 387 Socket, is only needed to provide the #FERR signal used to handle FPU exceptions.

This early sample has been assembled in April 1992 with dies from December 1991. The RapidCAD is able to work at any frequencies from 16 to 33 MHz. The lack of L1 cache and the slower 386 bus used does not provide a significant boost in Integer performances, but the FPU is the fastest available for 386s. The Universal Chip Analyzer is now able to fully test RapidCAD up to 33 MHz.

For some reasons, my sample was unable to run at 12.5 MHz, but works fine from 16 to 33 MHz. It’s probably due to the modification on the internal PLL needed to adapt a 486 CPU (1x clock signal expected) to a 386 Socket (2x clock required). PLLs often have limited top/bottom frequency lock range.

The reported CPUID is 0x340 and the power consumption is quite high (~2W typical in INT, ~2.5W in FPU) for a 386. I ran some INT benchmark only at 33 MHz and I got a score of 105.7 while a standard Intel 386DX-33 (or Am386DX-33) got 99.6. That’s only a 6% increase. The RapidCAD is much faster on FPU, being up to 70% faster than an Intel 387.

The Odd Story of Factory-Downgraded 486s

Counterfeits CPU were very common in the mid-90s. The worst period was between 1993 (just after the launch of the Intel 486 DX2) and 1998 (when the Pentium II started to be multiplier-locked). It was extremely easy for tricksters to remove the original marking and reprint another one with a higher frequency rating. Many DX4-75 were remarked to DX4-100, and even more Pentium 133/150 were remarked as Pentium 166 or 200s.

Genuine factory-remarked CPUs also exist, but they’re generally uncommon. The most well-known example is the double-sigma (ΣΣ) sign added on early 386s after they had been tested bug-free from the infamous 32-bit multiplier bug. Some rare Intel 486 SX were also later remarked with a higher speed grade. Here are two of them:

As for all factory-remarks, the addition is quite obvious. Intel probably binned twice these CPUs again at the request of a big customer (IBM?) and added the second rating later. Today’s story about factory-remarks is much more unusual because it concerns standard models.

Am486DX4-100SV8B (remarked 5×86)

After I published this analysis some weeks ago, a reader told me he had a strange Am486DX4-100 that seemed to be a AMD 5×86. After a careful look at the printings that looked 100% genuine at first sight, he was kind enough to lend it to me for further investigation with the UCA. Here it is:

The “9626” date code tells us it was manufactured in late June or early July 1996, which is quite late for a Am486DX4. I immediately noticed the 25544 package code, only used for the 350 nm die. This die was the basis of all Am486DX5 and Am5x86. The “C” stepping was also unusual as the Am5x86 is based on the A-step (from November 1995) or B-Step (from March 1997). A “C” Stepping build in 1996 is incoherent with the 5×86 line, but very coherent with the 486DX4 (later 486DX4 in the latest “C” Stepping were built on the 25498 package in May/June 1996). So it was time for a test on the Universal Chip Analyzer:

 

WOW! There is no doubt: this CPU is really based on the standard 350 nm die with a fully enabled 16 KB Write-Back L1 cache and a working 4x multiplier. Actually, it can even be overclocked easily to 133 MHz. All specs, including power consumption and CPUID (0x4F4), make it indistinguishable from an AMD 5×86. This CPU can of course also work with a 3x multiplier like an AMD 486DX4-100 (CPUID drops to 0x494).

After some research, it seems that all CPUs based on the 25544/C package are marked as 486DX4-100SV8B while being really DX5 SV16B (5×86). AMD produced them for quite some time between February 1996 and March 1997. They probably stopped the production of the old 500 nm die in early ’96 but still had some demand from customers for DX4s, so they just used the new 350 nm die and marked these CPUs as DX4-100s. As long as you use the default x3 multiplier, they behave exactly like the old one … except for the cache size.

Has Intel also done such weird things? I could have sworn no way. I was wrong…

Intel 486DX2-66 SK080 (remarked DX4)

The same reader also sends me a DX2-66 that could be “really a DX4-100”. That sounded odd and really unlikely to me because Intel has a strict policy on S-Spec. Intel DX4s also have a specific CPUID to help distinguish them from DX2s by software. Unlike AMD 486s, this CPUID does NOT change with the multiplier used, so it’s strange to have a DX2 with a DX4’s CPUID. Here is the original CPU:

Everything looks genuine here. SK080 is one of the least common S-Spec for Intel DX2s. The only other S-Spec beginning with “SK” is the extremely rare SK058. The SK080 is a 3.3V SL-Enhanced part which seems to have been produced only between WW18’94 (May 1994) and WW48’94 (November 1994). Let’s plug in into the UCA:

Awesome! This is really a DX4 factory-downgraded to DX2-66. The 0x480 CPUID leaves no doubt about the original die used here. The usual power consumption and the ability to work fine at 3.3V at 100 MHz let me think it’s probably a DX4-100. With the multiplier set at 2x, the SK080 also works at 2×33 MHz as expected for a CPU marked as a DX2-66. To be 100% sure, I was able to find another sample to confirm these findings.

The UCA 386 Adapter now supports AMD & IBM 386s

 IBM 386

As like all previous microprocessors, Intel licensed the i386 design to third parties. AMD was the only one legally allowed to sell Intel-based 386s to customers (as bare CPUs), but IBM was granted the right to produce some Intel 386s for its own use. They don’t look like a standard ceramic CPUs : IBM used a plastic substrate and a metal cover to protect the die and help with thermal dissipation. Here is how they look like.

IBM 386If the packaging is different, the internal die is the same as on Intel 386s. 7 different IBM part-numbers are actually known: 23F7189 (?? MHz), 32G6633 (25 MHz), 51F0352 (20 MHz), 51F1783 (20? MHz), 51F1784 (20 MHz), 51F1797 (25 MHz) and 63F7615 (25 MHz).I was able to put my hand on a 51F1784 and the later 63F7615. I tested both on the Universal Chip Analyzer. There is no “Pin 1” mark so I had to guess where is pin 1. Fortunately, the UCA has  strong over-current and short-circuit protection. Let’s start with the 63F7615 :

This one is able to work fine up to 33 MHz, with a CPUID set at 0x305, similar to Intel 386 based on the D0-stepping. I don’t know for sure the real rated frequency, but it’s probably only 25 MHz. The other one (51F1784ESD) is not able to work at 33 MHz and not even at 25 MHz. The actual (early) UCA firmware only has 16/25/33/40 MHz, so I can only confirm that that my 51F1784ESD works at 16 MHz but not at 25 MHz. According to various sources, it’s probably rated at 20 MHz.

AMD Am386

AMD also produced a lot of Am386DX at 20, 25, 33 and 40 MHz. While the microcode is 100% from Intel, the manufacturing process is different and they had lower power consumption (thanks to the 0.8µm process used by AMD instead of Intel’s 1µm CMOS-IV on the latest i386s).

Let’s start with the standard Am386 DX/DXL. I tested one Am386 DX/DXL-25 “B-Step”, one Am386 DX/DXL-33 “D-Step” and another Am386 DX/DXL-40 “C-Step”. All came in the 23936 package from Kyocera.

The UCA tool is not yet able to detect them as AMD, but I’m working on a new algorithm based on power consumption to distinguish them from Intel 386s. The B-Step identifies itself as 0x305, the same CPUID used on Intel’s 386 D0-Step. The DXL-25 was able to work up to 33 MHz. Both C- and D-Step have a CPUID set at 0x308, like the later Intel 386s (D1 step and up).

The last CPU to try was the Am386DE-33, an uncommon embedded model. Like the Am386DXL, it uses a fully-static design, meaning it can be clocked down to DC (0 Hz) while retaining all its internal registers content. The biggest difference between the usual Am386DX/DXL and the Am386DE is the disabled Paging Unit in protected mode on the latter. Bit 31 of CR0 (used to enable paging) is reserved on Am386DE. Another difference only available on the Am386DE is the ability to work at its rated frequency with a much lower voltage (down to 3.0V). And it works fine:

At 3.3V, the power needed drops by a huge margin, from 1.1 Watt to as low as 461 mW (0.46W). That’s a -60% power reduction!

[Guide] Am486 Die & Packaging

After weeks spent to test A LOT of AMD 486 with the Universal Chip Analyzer, messing with a gas torch  to decap some of them and speaking with a former AMD engineer that worked on them back in the 90s, I’m happy to publish here all the information I was able to get! Here it is:

The Ultimate AMD 486 Die & Packaging Guide

 

PS: If you have any more information about AMD 486, please leave a comment. Thanks!

The Universal Chip Analyzer now supports Intel 386 !

Another milestone has been reached. A new iconic CPU family is now supported by the UCA: the Intel 80386, the very first 32-bit x86 microprocessor! The i386 was originally released in 1985 at 12.5 MHz and 16 MHz. It added a 3-stage instruction pipeline and an on-chip MMU (Memory Management Unit) able to address up to 4 GB of RAM. A giant capacity for that time. The original Intel 386 – then renamed 386DX – comes in a PGA132 package and its frequency was later upgraded to 20 MHz, 25 MHz and finally 33 MHz. Later clones from AMD & Cyrix – not yet supported by the UCA – were also released at frequencies up to 40 MHz.

The design of the UCA 386 Adapter was challenging because of the high frequencies involved. While all 486s work with a standard clock input, 386s require a clock-doubled signal with a strong voltage swing (CMOS) between 0 and +5V. Generating frequencies up to 80 MHz (for a 386DX-40) with these requirement was not possible with the UCA architecture, so the UCA 386 Adapter includes an external clock-doubling PLL. For the first try, I used a NB3N511 clock multiplier from On Semiconductor but I was unable to reliably run 386s at more than 20 MHz. The UCA is based on a FPGA and timings are crucial.  With the NB3N511, I was unable to successfully match timings because of the lack of phase synchronization between the input clock generated by the FPGA and the doubled frequency fed to the CPU. So I needed another PLL, with 0-delay between input and output.

After some research, I gave the ICS570A a try and it worked perfectly fine. I was able to sync the internal FPGA logic to the clock-doubled CPU signal up to 40/80 MHz, without having direct access to that signal. Some high-quality ceramic decoupling caps were also mandatory to achieve the highest frequencies. I had a bad surprise while looking at ZIF socket for the 386 Adapter: for some reasons, PGA132 ZIF Socket are extremely expensive and I was unable to source them at a decent price. I bought some at ~$25 each, but if you can help me find more at a lower price, please drop me an email!

The 4 standard test frequencies for 386 are set to 12.5 MHz, 25 MHz, 33 MHz and 40 MHz. Right now, only Intel 386 are supported, but I’m working on AMD, Cyrix, etc. clones and I’m confident the UCA will support them all very soon. Here are the i386 I have for testing:

  1. The first one is an early 386 clocked at 16 MHz and produced in 1986. The ΣΣ sign engraved shows that it has been tested free of the infamous 32-bit multiplier bug (more on this on a later post). According to this source, the S40344 S-Spec is a B1 stepping. This is confirmed by the CPUID displayed by the UCA Analyzer tool: 0x303. At 12.5 MHz, this CPU requires about 188 mA at 5V (a bit less than 1W).

  1. The second one is very similar to the first one, except the rated speed at 20 MHz. Still the same B1-Stepping, same CPUID, same ΣΣ, same power consumption, same everything. It also can’t be overclocked at 25 MHz.
    .
  2. The third one is also rated at 20 MHz but don’t have any S-Spec. It has been assembled in November 1988, more than one year after the previous one. The CPUID is different at 0x305, which indicates a D0 stepping. Surprisingly, the power consumption is 10% higher, at ~212 mA for 5V at 12.5 MHz. Maybe Intel added some logic to solve the numerous erratas in previous stepping, maybe it’s just sample variation. Anyway, the D0 stepping still uses the CHMOS III (1.5 µm) process. This chip can be overclocked at 25 MHz

  1. The fourth one is a 16 MHz marked SX236 and build with the more advanced CHMOS IV process. The CPUID is 0x308, which is used by Intel for D1, D2 and E Stepping. The last stepping is usually marked on the chip, so we can guess it’s a D1 or D2 stepping. Power consumption drops from 212 mA to 147 mA, thanks to the 1 µm process (instead of 1.5 µm). Unfortunately, it can’t work at 25 MHz.
    .
  2. The fifth one is a A80386DX-33 made in 1992 with s-spec SX366. It’s the faster clock speed Intel released for a 386DX. The CPUID is still set at 0x308 but the stepping is clearly more advanced: the power consumption drops to 126 mA while using the same CHMOS IV process than the previous one. This particular CPU requires 126 mA at 12.5 MHz, 206 mA at 25 MHz and up to 261 mA at 33 MHz. It can’t be overclocked to 40 MHz.

  1. The last one is much newer than the others. It was manufactured in 2000 and uses the E-Stepping, as stated by the last letter of the lot code. Aside from this, all specs look identical to the SX366. Measures are also the same and it doesn’t work at 40 MHz.

Stay tuned for more exciting news about the UCA!