It has nothing to do with how many bits the CPU process data or addresses memory. The latest Intel Core i7 and AMD Phenom-II are also x86 CPUs for the same reason.
, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.2, SSE5, AES-NI, CLM UL, BRAND, SHA, MPX, SGX, TOP, F16C, ADX, BMI, FMA, AVX, AVX2, AVX512, VT-x, AMD-V, TSX, ASF OpenPartly. The 80486 processor has been on the market for more than 30 years and so cannot be subject to patent claims.
The x86 architectures were based on the Intel 8086 microprocessor chip, initially released in 1978. Intel Core 2 Duo, an example of an x86 -compatible, 64- bit multicore processor AMD Avalon (early version), a technically different but fully compatible x86 implementationMany additions and extensions have been added to the x86 instruction set over the years, almost consistently with full backward compatibility. The architecture has been implemented in processors from Intel, Cyril, AMD, VIA Technologies and many other companies; there are also open implementations, such as the Get SoC platform (currently inactive).
Nevertheless, of those, only Intel, AMD, VIA Technologies, and DMP Electronics hold x86 architectural licenses, and from these, only the first two are actively producing modern 64- bit designs. Today, however, x86 usually implies a binary compatibility also with the 32- bit instruction set of the 80386.
A few years after the introduction of the 8086 and 8088, Intel added some complexity to its naming scheme and terminology as the “IAP” of the ambitious but ill-fated Intel IAP 432 processors was tried on the more successful 8086 family of chips, applied as a kind of system-level prefix. There were also terms Irma (for operating systems), ISBN (for single-board computers), and ISBN (for multimodule boards based on the 8086-architecture), all together under the heading Microsystem 80.
Modern x86 is relatively uncommon in embedded systems, however, and small low power applications (using tiny batteries), and low-cost microprocessor markets, such as home appliances and toys, lack significant x86 presence. Simple 8- and 16- bit based architectures are common here, although the x86 -compatible VIA C7, VIA NATO, AMD's Geode, Avalon Neo and Intel Atom are examples of 32- and 64- bit designs used in some relatively low-power and low-cost segments.
There have been several attempts, including by Intel, to end the market dominance of the “inelegant” x86 architecture designed directly from the first simple 8- bit microprocessors. However, the continuous refinement of x86 microarchitectures, circuitry and semiconductor manufacturing would make it hard to replace x86 in many segments.
AMD's 64- bit extension of x86 (which Intel eventually responded to with a compatible design) and the scalability of x86 chips in the form of modern multi-core CPUs, is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures. Each line item is characterized by significantly improved or commercially successful processor microarchitecture designs.
Addition of Vector Neural Network Instructions Software Emulation ARM64 2017Windows 10 on ARM64Cooperation between Microsoft and Qualcomm bringing Windows 10 onto ARM64 platform with x86 applications supported by CAPE emulator starting from 1709 (16299.15) Era Release CPU models Physical address space New features Am386, released by AMD in 1991At various times, companies such as IBM, NEC, AMD, TI, STM, Fujitsu, OK, Siemens, Cyril, Intersil, C&T, Bergen, UMC, and DMP started to design or manufacture x86 processors (CPUs) intended for personal computers and embedded systems. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors, often named similarly to Intel's original chips.
Other companies, which designed or manufactured x86 or x87 processors, include ITT Corporation, National Semiconductor, ULSI System Technology, and Water. Following the fully pipelinedi486, Intel introduced the Pentium brand name (which, unlike numbers, could be trademarked) for their new set of super scalar x86 designs.
With the x86 naming scheme now legally cleared, other x86 vendors had to choose different names for their x86 -compatible products, and initially some chose to continue with variations of the numbering scheme: IBM partnered with Cyril to produce the 5×86 and then the very efficient 6×86 (M1) and 6×86 MX (MID) lines of Cyril designs, which were the first x86 microprocessors implementing register renaming to enable speculative execution. AMD meanwhile designed and manufactured the advanced but delayed 5k86 (K5), which, internally, was closely based on AMD's earlier 29KRISC design; similar to NexGen's Nx586, it used a strategy such that dedicated pipeline stages decode x86 instructions into uniform and easily handled micro-operations, a method that has remained the basis for most x86 designs to this day.
The 6×86 was also affected by a few minor compatibility problems, the Nx586 lacked a floating point unit (FPU) and (the then crucial) pin-compatibility, while the K5 had somewhat disappointing performance when it was (eventually) introduced. AMD later managed to grow into a serious contender with the K6 set of processors, which gave way to the very successful Avalon and Operon.
VIA Technologies energy efficient C3 and C7 processors, which were designed by the Centaur company, have been sold for many years. Centaur's newest design, the VIA NATO, is their first processor with super scalar and speculative execution.
The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32- bit 80386 (later known as i386) which gradually replaced the earlier 16- bit chips in computers (although typically not in embedded systems) during the following years; this extended programming model was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture.
The x86 architecture is a variable instruction length, primarily CISC design with emphasis on backward compatibility. The instruction set is not typical CISC, however, but basically an extended version of the simple eight- bit 8008 and 8080 architectures.
Memory access to unaligned addresses is allowed for all valid word sizes. The largest native size for integer arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well).
Multiple scalar values can be handled simultaneously via the SIMS unit present in later generations, as described below. Immediate addressing offsets and immediate data may be expressed as 8- bit quantities for the frequently occurring cases or contexts where a -128.127 range is enough.
To further conserve encoding space, most registers are expressed in opcodes using three or four bits, the latter via an opcode prefix in 64- bit mode, while at most one operand to an instruction can be a memory location. Among other factors, this contributes to a code size that rivals eight- bit machines and enables efficient use of instruction cache memory.
The relatively small number of general registers (also inherited from its 8- bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. A dedicated floating point processor with 80- bit internal registers, the 8087, was developed for the original 8086.
The presence of wide SIMS registers means that existing x86 processors can load or store up to 128 bits of memory data in a single instruction and also perform bitwise operations (although not integer arithmetic ) on full 128-bits quantities in parallel. Intel's Sandy Bridge processors added the Advanced Vector Extensions (AVX) instructions, widening the SIMS registers to 256 bits.
During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces called micro-operations. These modern x86 designs are thus pipeline, super scalar, and also capable of out of order and speculative execution (via branch prediction, register renaming, and memory dependence prediction), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in the same order as given in the instruction stream.
Some Intel CPUs (Leon Foster MP, some Pentium 4, and some Females and later Intel Core processors) and AMD CPUs (starting from Zen) are also capable of simultaneous multi threading with two threads per core (Leon Phi has four threads per core). However, traditional microcode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously.
Not having to synchronize the execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit. The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into a more complex micro-op which fits the execution model better and thus can be executed faster or with fewer machine resources involved.
Intel followed this approach with the Execution Trace Cache feature in their Outburst microarchitecture (for Pentium 4 processors) and later in the Decoded Stream Buffer (for Core-branded processors since Sandy Bridge). Transmeta used a completely different method in their Crusoe x86 compatible CPUs.
Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations. Minicomputers during the late 1970s were running up against the 16- bit 64- KB address limit, as memory had become cheaper.
Some minicomputers like the PDP-11 used complex bank-switching schemes, or, in the case of Digital's VAX, redesigned much more expensive processors which could directly handle 32- bit addressing and data. The original 8086, developed from the simple 8080 microprocessors and primarily aiming at very small and inexpensive computers and other specialized devices, instead adopted simple segment registers which increased the memory address width by only 4 bits.
In practice, on the x86 it was (is) a much -criticized implementation which greatly complicated many common programming tasks and compilers. However, the architecture soon allowed linear 32- bit addressing (starting with the 80386 in late 1985) but major actors (such as Microsoft) took several years to convert their 16- bit based systems.
Data and code could be managed within “near” 16- bit segments within 64 KB portions of the total 1 MB address space, or a compiler could operate in a “far” mode using 32- bit segment:offset pairs reaching (only) 1 MB. While that would also prove to be quite limiting by the mid-1980s, it was working for the emerging PC market, and made it very simple to translate software from the older 8008, 8080, 8085, and Z80 to the newer processor.
In real mode, segmentation is achieved by shifting the segment address left by 4 bits and adding an offset in order to receive a final 20- bit address. Thus, the total address space in real mode is 2 20 bytes, or 1 MB, quite an impressive figure for 1978.
All memory addresses consist of both a segment and offset; every type of access (code, data, or stack) has a default segment register associated with it (for data the register is usually DS, for code it is CS, and for stack it is SS). In this scheme, two different segment/offset pairs can point at a single absolute location.
Offsets referring to locations inside the segment are combined with the physical address of the beginning of the segment to get the physical address corresponding to that offset. The segmented nature can make programming and compiler design difficult because the use of near and far pointers affects performance.
The stack grows toward numerically lower addresses, with SS:SP pointing to the most recently pushed item. Four segment registers (CS, DS, SS and ES) are used to form a memory address.
Finally, the instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; this register cannot be directly accessed (read or written) by a program. In the Intel 80286, to support protected mode, three special registers hold descriptor table addresses (GDR, LDT, IDR), and a fourth task register (TR) is used for task switching.
The nomenclature represented this by prefixing an E (for “extended”) to the register names in x86 assembly language. With a greater number of registers, instructions and operands, the machine code format was expanded.
With the 80486 and all subsequent x86 models, the floating-point processing unit (FPU) is integrated on-chip. With the Pentium III, Intel added a 32- bit Streaming SIMS Extensions (SSE) control/status register (MX CSR) and eight 128- bit SSE floating point registers (XMM0 to XMM7).
In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems. AVX-512 has eight extra 64- bit mask registers for selecting elements in a MM.
AL/AH/AX/EAX/RAN: Accumulator BL/BH/BX/EBX/REX: Base index (for use with arrays) CL/CH/CX/EX/REX: Counter (for use with loops and strings) DL/DH/DX/EDX/REX: Extend the precision of the accumulator (e.g. combine 32- bit EAX and EDX for 64- bit integer operations in 32- bit code) SI/ESI/RSI: Source index for string operations. Some instructions compile and execute more efficiently when using these registers for their designed purpose.
For example, using AL as an accumulator and adding an immediate byte value to it produces the efficient add to AL opcode of 04h, whilst using the BL register produces the generic and longer add to register opcode of 80C3h. Another example is double precision division and multiplication that works specifically with the AX and DX registers.
Some special instructions lost priority in the hardware design and became slower than equivalent small code sequences. (On the IBM PC platform, direct software access to the IBM BIOS routines is available only in real mode, since BIOS is written for real mode.
It is technically possible to use up to 256 KB of memory for code and data, with up to 64 KB for code, by setting all four segment registers once and then only using 16- bit offsets (optionally with default-segment override prefixes) to address memory, but this puts substantial restrictions on the way data can be addressed and memory operands can be combined, and it violates the architectural intent of the Intel designers, which is for separate data items (e.g. arrays, structures, code units) to be contained in separate segments and addressed by their own segment addresses, in new programs that are not ported from earlier 8- bit processors with 16- bit address spaces. Each segment can be assigned one of four ring levels used for hardware-based computer security.
Because offsets are 16 bits, segments are still limited to 64 KB each in 80286 protected modes. Actual memory operations using protected mode segments are not slowed much because the 80286 and later have hardware to check the offset against the segment limit in parallel with instruction execution.
Protected mode on the 80386 can operate with paging either enabled or disabled; the segmentation mechanism is always active and generates virtual addresses that are then mapped by the paging mechanism if it is enabled. Linux, 386BSD and Windows NT were developed for the 386 because it was the first Intel architecture CPU to support paging and 32- bit segment offsets.
Booting), the processor initializes in real mode, and then begins executing instructions. Operating system boot code, which might be stored in ROM, may place the processor into the protected mode to enable paging and other features.
In the mid 1990s, it was obvious that the 32- bit address space of the x86 architecture was limiting its performance in applications requiring large data sets. A 32- bit address space would allow the processor to directly address only 4 GB of data, a size surpassed by applications such as video processing and database engines.
In 1999, AMD published a (nearly) complete specification for a 64- bit extension of the x86 architecture which they called x86 -64 with claimed intentions to produce. That design is currently used in almost all x86 processors, with some exceptions intended for embedded systems.
The success of the AMD64 line of processors coupled with lukewarm reception of the IA-64 architecture forced Intel to release its own implementation of the AMD64 instruction set. Intel had previously implemented support for AMD64 but opted not to enable it in hopes that AMD would not bring AMD64 to market before Titanium's new IA-64 instruction set was widely adopted.
In its literature and product version names, Microsoft and Sun refer to AMD64/Intel 64 collectively as x64 in the Windows and Polaris operating systems. This does not affect actual binary backward compatibility (which would execute legacy code in other modes that retain support for those instructions), but it changes the way assembler and compilers for new code have to work.
This was the first time that a major extension of the x86 architecture was initiated and originated by a manufacturer other than Intel. Early x86 processors could be extended with floating-point hardware in the form of a series of floating point numericalco-processors with names like 8087, 80287 and 80387, abbreviated x87.
This was also known as the NP (Numeric Processor extension), an apt name since the coprocessors, while used mainly for floating-point calculations, also performed integer operations on both binary and decimal formats. The operations include arithmetic and transcendental functions, including trigonometric and exponential functions, and instructions that load common constants (such as 0; 1; e, the base of the natural logarithm; log2(10); and log10(2)) into one of the stack registers.
(The x86 CPU keeps running while the x87 coprocessor calculates, and the x87 sets a signal to the x86 when it is finished or interrupts the x86 if it needs attention because of an error.) The MMX instruction set was developed from a similar concept first used on the Intel i860.
MMX is typically used for video processing (in multimedia applications, for instance). MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as Man).
Unlike the FP stack, these Man registers were fixed, not relative, and therefore they were randomly accessible. The instruction set did not adopt the stack-like semantics so that existing operating systems could still correctly save and restore the register state when multitasking without modifications.
Was designed to be the natural evolution of MMX from integers to floating point. Thus, no special modifications are required to be made to operating systems which would otherwise not know about them.
In 1999, Intel introduced the Streaming SIMS Extensions (SSE) instruction set, following in 2000 with SSE2. The first addition allowed offloading of basic floating-point operations from the x87 stack and the second made MMX almost obsolete and allowed the instructions to be realistically targeted by conventional compilers.
However, the downside was that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. SSE is a SIMS instruction set that works only on floating point values, like 3DNow!.
The SSE2 introduced the capability to pack double precision numbers too, which 3DNow! It also introduced the VEX coding scheme to accommodate the larger registers, plus a few instructions to permute elements.
AVX2 did not introduce extra registers, but was notable for the addition for masking, gather, and shuffle instructions. AVX-512 features yet another expansion to 32 512- bit MM registers and a new EVEN scheme.
Unlike its predecessors featuring a monolithic extension, it is divided into many subsets that specific models of CPUs can choose to implement. Physical Address Extension or PAE was first added in the Intel Pentium Pro, and later by AMD in the Avalon processors, to allow up to 64 GB of RAM to be addressed.
Without PAE, physical RAM in 32- bit protected mode is usually limited to 4 GB. Although the initial implementations on 32- bit processors theoretically supported up to 64 GB of RAM, chipset and other platform limitations often restricted what could actually be used.
x86 -64 processors define page table structures that theoretically allow up to 52 bits of physical address, although again, chipset and other platform concerns (like the number of DIMM slots available, and the maximum RAM possible per DIMM) prevent such a large physical address space to be realized. PAE mode does not affect the width of linear or virtual addresses.
In supercomputer clusters (as tracked by TOP 500 data and visualized on the diagram above, last updated 2013), the appearance of 64- bit extensions for the x86 architecture enabled 64- bitx86 processors by AMD and Intel (olive-drab with small open circles, and red with small open circles, in the diagram, respectively) to replace most RISC processor architectures previously used in such systems (including PA-RISC, SPARC, Alpha, and others), and 32- bitx86 (green on the diagram), even though Intel initially tried unsuccessfully to replace x86 with a new incompatible 64- bit architecture in the Titanium processor. The main non- x86 architecture which is still used, as of 2014, in super computing clusters is the Power ISA used by IBM POWER microprocessors (blue with diamond tiling in the diagram), with SPARC as a distant second. By the 2000s, 32- bitx86 processors' limits in memory addressing were an obstacle to their use in high-performance computing clusters and powerful desktop workstations.
The aged 32- bitx86 was competing with much more advanced 64- bit RISC architectures which could address much more memory. However, Intel felt that it was the right time to make a bold step and use the transition to 64- bit desktop computers for a transition away from the x86 architecture in general, an experiment which ultimately failed.
In 2001, Intel attempted to introduce a non- x86 64- bit architecture named IA-64 in its Titanium processor, initially aiming for the high-performance computing market, hoping that it would eventually replace the 32- bitx86. While IA-64 was incompatible with x86, the Titanium processor did provide emulation abilities for translating x86 instructions into IA-64, but this affected the performance of x86 programs so badly that it was rarely, if ever, actually useful to the users: programmers should rewrite x86 programs for the IA-64 architecture or their performance on Titanium would be orders of magnitude worse than on a true x86 processor.
The market rejected the Titanium processor since it broke backward compatibility and preferred to continue using x86 chips, and very few programs were rewritten for IA-64. AMD decided to take another path toward 64- bit memory addressing, making sure backward compatibility would not suffer.
In April 2003, AMD released the first x86 processor with 64- bit general-purpose registers, the Operon, capable of addressing much more than 4 GB of virtual memory using the new x86 -64 extension (also known as AMD64 or x64). (p13–14) The market responded positively, adopting the 64- bit AMD processors for both high-performance applications and business or home computers.
As a result, the Titanium processor with its IA-64 instruction set is rarely used and x86, through its x86 -64 incarnation, is still the dominant CPU architecture in non-embedded computers. x86 -64 also introduced the NO bit, which offers some protection against security bugs caused by buffer overruns.
As a result of AMD's 64- bit contribution to the x86 lineage and its subsequent acceptance by Intel, the 64- bit RISC architectures ceased to be a threat to the x86 ecosystem and almost disappeared from the workstation market. x86 -64 began to be utilized in powerful supercomputers (in its AMD Operon and Intel Leon incarnations), a market which was previously the natural habitat for 64- bit RISC designs (such as the IBM POWER microprocessors or SPARC processors).
The great leap toward 64- bit computing and the maintenance of backward compatibility with 32- bit and 16- bit software enabled the x86 architecture to become an extremely flexible platform today, with x86 chips being utilized from small low-power systems (for example, Intel Quark and Intel Atom) to fast gaming desktop computers (for example, Intel Core i7 and AMD FX / Ry zen), and even dominate large super computing clusters, effectively leaving only the ARM 32- bit and 64- bit RISC architecture as a competitor in the smartphone and tablet market. The introduction of the AMD-V and Intel VT-x instruction sets in 2005 allowed x86 processors to meet the Pope and Goldberg virtualization requirements.
^ Intel abandoned its x86 naming scheme with the P5 Pentium during 1993 (as numbers could not be trademarked). ^ Such a system also contained the usual mix of standard 7400 series support components, including multiplexers, buffers, and glue logic.
^ late 1981 to early 1984, approximately ^ The embedded processor market is populated by more than 25 different architectures, which, due to the price sensitivity, low power, and hardware simplicity requirements, outnumber the x86. ^ The NEC V20 and V30 also provided the older 8080 instruction set, allowing PCs equipped with these microprocessors to operate CP/M applications at full speed (i.e., without the need to simulate an 8080 by software).
Some companies started as fabled manufacturers and later became fables designers, one such example being AMD. ^ It had a slower FPU however, which is slightly ironic as Cyril started out as a designer of fast Floating point units for x86 processors.
^ That is because integer arithmetic generates carry between subsequent bits (unlike simple bitwise operations). ^ Two Mrs of particular interest are SYSENTER_EIP_MSR and SYSENTER_ESP_MSR, introduced on the Pentium® II processor, which store the address of the kernel mode system service handler and corresponding kernel stack pointer.
^ “AMD Discloses New Technologies At Microprocessor Forum” (Press release). “Time and again, processor architects have looked at the inelegant x86 architecture and declared it cannot be stretched to accommodate the latest innovations,” said Nathan Brook wood, principal analyst, Insight 64.
The FISTS AX form of the instruction is used primarily in conditional branching... ^ Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture (PDF). AMD Athlon™ Processor x86 Code Optimization Guide (Revision K ed.).
Once touted by Intel as a replacement for the x86 product line, expectations for Titanium have been throttled well back. “IBM Geosphere Application Server 64- bit Performance Demystified” (PDF).
Figures 5, 6 and 7 also show the 32- bit version of WAS runs applications at full native hardware performance on the POWER and x86 -64 platforms. Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006.