CN104239273B - Microprocessor and its execution method - Google Patents
Microprocessor and its execution method Download PDFInfo
- Publication number
- CN104239273B CN104239273B CN201410431656.9A CN201410431656A CN104239273B CN 104239273 B CN104239273 B CN 104239273B CN 201410431656 A CN201410431656 A CN 201410431656A CN 104239273 B CN104239273 B CN 104239273B
- Authority
- CN
- China
- Prior art keywords
- core
- mentioned
- square
- microprocessor
- cores
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Power Sources (AREA)
Abstract
The present invention provides a kind of microprocessor and its executes method.Above-mentioned microprocessor includes multiple processing cores, a service unit and a memory, is accessed by above-mentioned service unit and above-mentioned multiple processing cores.At least one processing core of above-mentioned multiple processing cores is configured as a repairing above-mentioned memory is written, wherein above-mentioned repairing includes one or more instructions, after above-mentioned at least one processing core write-in, to extract from above-mentioned memory and be executed by above-mentioned service unit in above-mentioned memory.The present invention has less power consumption.
Description
Technical field
The present invention repairs mechanism about a microprocessor, and particularly with regard to service processor.
Background technique
The increase of multi-core microprocessor is primarily due to it and provides the advantage in performance.It may be mainly due to half
Conductor device geometry dimension size is rapidly reduced, to increase transistor density.The presence of multicore in a microprocessor
The demand communicated with a core with other cores is generated, to complete various functions, such as power management, cache memory pipe
Reason removes configuration wrong and relevant to more cores.
Traditionally, the program (for example, operating system or application program) for operating in framework on multi-core processor has used position
Semaphore on by all core frameworks in an addressable system storage is communicated.This may be sufficiently used for many mesh
, but possibly can not provide other required speed, accuracy and/or systemic hierarchial transparency.
Summary of the invention
The present invention provides a kind of microprocessor.Above-mentioned microprocessor includes multiple processing cores, a service unit and one
Memory is accessed by above-mentioned service unit and above-mentioned multiple processing cores.At least one processing of above-mentioned multiple processing cores
Core is configured as a repairing above-mentioned memory is written, wherein above-mentioned repairing include one or more instructions with above-mentioned memory by
After above-mentioned at least one processing core write-in, extracts from above-mentioned memory and executed by above-mentioned service unit.
Present invention method as performed by microprocessor, wherein above-mentioned microprocessor has multiple processing cores, a service center
Reason unit and the memory accessed by above-mentioned service unit and above-mentioned multiple processing cores.The above method includes: by upper
It states multiple processing cores at least one and handles core for an above-mentioned memory of repairing write-in.Above-mentioned repairing includes one or more instructions;By
After above-mentioned memory is written in above-mentioned repairing by above-mentioned at least one processing core, mentioned from above-mentioned memory by above-mentioned service unit
Take one or more instructions of above-mentioned repairing;And it is instructed by the said extracted that above-mentioned service unit executes above-mentioned repairing.
The present invention provides a kind of encoded for an at least non-transient computer usable medium in a computer installation
Computer program product, above-mentioned computer program product include the computer usable program code for indicating a microprocessor.Above-mentioned meter
Calculation machine usable program code includes: the first procedure code indicated;Indicate the second procedure code of one service unit of instruction;
And the third procedure code of one memory of instruction, it is accessed by above-mentioned service unit and above-mentioned multiple processing cores.It is above-mentioned
Multiple processing cores at least one handle core and are configured as a repairing above-mentioned memory is written.Above-mentioned repairing includes one or more instructions
With above-mentioned memory by it is above-mentioned at least one processing core write-in after, by above-mentioned memory extract and by above-mentioned service unit
It executes.
The present invention has less power consumption.
Detailed description of the invention
Fig. 1 is the block diagram for showing a multi-core microprocessor.
Fig. 2 is the block diagram for showing a control word, a status word and a configuration words.
Fig. 3 is the flow chart for showing control unit operation.
Fig. 4 is the block diagram for showing the microprocessor of another embodiment.
Fig. 5 is to show a microprocessor operation with the flow chart of dump Debugging message.
Fig. 6 is the operation example timing diagram for showing one according to microprocessor in Fig. 5 flow chart.
Fig. 7 A~7B is to show that a microprocessor executes the flow chart of across core speed buffering control operation.
Fig. 8 is the timing diagram for showing the microprocessor operation example according to Fig. 7 A~7B flow chart.
Fig. 9 is the operational flowchart that display microprocessor enters low-power encapsulation C- state.
Figure 10 is the timing diagram shown according to one microprocessor operation example of Fig. 9 flow chart.
Figure 11 is the operating process that microprocessor according to another embodiment of the present invention enters low-power encapsulation C- state
Figure.
Figure 12 is the timing diagram for showing one example of microprocessor operation according to Figure 11 flow chart.
Figure 13 is the timing diagram for showing another example of microprocessor operation according to Figure 11 flow chart.
Figure 14 is the flow chart that the dynamic of display microprocessor reconfigures.
Figure 15 is to show the flow chart that reconfigures of microprocessor dynamic according to another embodiment.
Figure 16 is the timing diagram for showing one example of microprocessor operation according to Figure 15 flow chart.
Figure 17 is shown in a block diagram of hardware semaphore 118 in Fig. 1.
Figure 18 is shown when a core 102 reads the operational flowchart of hardware semaphore 118.
Figure 19 is the operational flowchart shown when core write-in hardware semaphore.
Figure 20 is shown when microprocessor using hardware semaphore to execute the operating process for needing a resource exclusive ownership
Figure.
Figure 21 is to show to issue the timing diagram that non-sleep synchronization request operates an example according to the core of Fig. 3 flow chart.
Figure 22 is the program flow diagram for showing configuration microprocessor.
Figure 23 is the program flow diagram for showing configuration microprocessor according to another embodiment.
Figure 24 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 25 is the block diagram for showing a microcode patching framework.
Figure 26 A~26B is to show the microprocessor in Figure 24 to propagate a microcode patching of Figure 25 to the microprocessor
One operational flowchart of multicore.
Figure 27 is the timing diagram for showing an example of a microprocessor operation for 6A~26B flow chart according to fig. 2.
Figure 28 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 29 A~29B be in the Figure 28 shown according to another embodiment the microprocessor to propagate a microcode patching extremely
One operational flowchart of multiple cores of the microprocessor.
Figure 30 is the flow chart for showing the microprocessor of Figure 24 to repair a service processor procedure code.
Figure 31 is the block diagram for showing a multi-core microprocessor according to another embodiment.
Figure 32 is show that the microprocessor is updated to multiple cores of the microprocessor to propagate a MTRR in Figure 31 one
Operational flowchart.
Wherein, symbol is simply described as follows in attached drawing:
100: multi-core microprocessor;102A, 102B, 102N: core A, core B, core N;103: non-core;104: control unit;
106: state buffer;108A, 108B, 108C, 108D, 108N: synchronous buffer;108E, 108F, 108G, 108H: shadow is same
Walk buffer;114: fuse;116: dedicated random access memory;118: hardware semaphore;119: shared speed buffering is deposited
Reservoir;122A, 122B, 122N: clock signal;124A, 124B, 124N: interrupt signal;126A, 126B, 126N: data-signal;
128A, 128B, 128N: controlling electric energy signal;202: control word;204: wake events;206: synchronously control;208: power supply lock;
212: sleep;214: selective wake-up;222:S;224:C;226: synchronous regime or C- state;228: core set;232: forcing
It is synchronous;234: selectivity is synchronous to be stopped;236: deactivating core;242: status word;244: wake events;246: minimum common C- shape
State;248: error code;252: configuration words;254-0~254-7: enable;256: local nuclear volume;258: amount of crystals;302,
304,305,306,312,314,316,318,322,326,328,332,334,336: step;402A, 402B: bus between crystal
Bus unit B between unit A, crystal;404: bus between crystal;406A, 406B: crystal A, crystal B;502,504,505,508,
514,516,518,524,526,528,532: step;702,704,706,708,714,716,717,718,724,726,727,
728,744,746,747,748,749,752: step;902,904,906,907,908,909,914,916,919,921,924:
Step;1102,1104,1106,1108,1109,1121,1124,1132,1134,1136,1137: step;1402,1404,
1406,1408,1412,1414,1416,1417,1418,1422,1424,1426: step;1502,1504,1506,1508,
1517,1518,1522,1524,1526,1532: step;1702: possessing position;1704: owner position;1706: state machine 1802,
1804,1806,1808: step;1902,1904,1906,1908,1912,1914,1916,1918: step;2002,2004,
2006,2008: step;2202,2203,2204,2205,2206,2208,2212,2214,2216,2218,2222,2224: step
Suddenly;2302,2304,2305,2306,2312,2315,2318,2324: step;2404: core microcode read-only memory;2408: non-
Core microcode patching random access memory;2423: service unit;2425: non-core microcode read-only memory;2439: repairing
It can addressing content memorizer;2497: service unit initial address buffer 2499: core random access memory;2500: micro-
Code repairing;2502: header;2504: repairing immediately;2506: check and correction and;2508:CAM data;2512: core PRAM repairing;2514:
Check and correction and;2516:RAM repairing;2518: non-core PRAM repairing;2522: check and correction and;2602,2604,2606,2608,2611,
2612,2614,2616,2618,2621,2622,2624,2626,2628,2631,2632,2634,2652: step;2808: core
Repair RAM;2912,2916,2922,2932: step;3002,3004,3006: step;3102: type of memory range is temporary
Device;3202,3204,3206,3208,3211,3212,3214,3216,3218,3252: step.
Specific embodiment
Hereinafter introduce highly preferred embodiment of the present invention.Each embodiment is but non-to limit to illustrate the principle of the present invention
The system present invention.The scope of the present invention is when being subject to claims.
Fig. 1 is please referred to, is the block diagram for showing a multi-core microprocessor 100.Microprocessor 100 includes multiple processing
Core is denoted as 102A, 102B and is referred to as multiple processing cores 102, or referred to as multiple cores 102 to 102N, and is individually referred to as locating
Manage core 102 or abbreviation core 102.It more preferably says, the pipeline of each core 102 including one or more functional units (not shown go out),
Including an instruction cache (instruction cache), an instruction converting unit or instruction decoder, more preferably
It is deposited including a microcode (microcode) unit, temporary call by name unit, reservation station (Reservation station), speed buffering
Reservoir, execution unit, memory sub-system and the retirement unit (retire unit) including an order buffer.More preferably say,
Multiple cores 102 include a SuperScale (Superscalar), non-sequential execution (out-of-order execution) microbody frame
Structure.In one embodiment, microprocessor 100 is an x86 architecture microprocessor, but in other embodiments, and microprocessor 100 accords with
Close the framework of other instruction set.
Microprocessor 100 also includes a non-core 103 different from above-mentioned multiple cores 102 for being coupled to above-mentioned multiple cores 102.
Non-core 103 includes a control unit 104, fuse 114, a dedicated 116 (Private Random of random access memory
Access Memory, PRAM) and a shared cache memory 119 (Shared Cache Memory), for example, by more
The second level (level-2, L2) and/or the third level (level-3, L3) cache memory that a core 102 is shared.It is each
The configuration of core 102 to read data/write data to non-core 103 from non-core 103 by a respective address/data bus 126,
Core 102 provides a nand architecture address space (being also considered as dedicated or micro-architecture address space) to the shared resource of non-core 103.It is dedicated
Random access memory 116 is dedicated or nand architecture, that is to say, that it is not in framework user's program of microprocessor 100
In the space of location.In one embodiment, non-core 103 includes arbitrated logic (Arbitration Logic), passes through multiple cores 102
The resource of requests for arbitration access non-core 103.
Each fuse 114 is an electronic device, can be blown or not be blown;When fuse 114 is not blown,
Fuse 114 has Low ESR and easily conducts electric current;When fuse 114 is blown, fuse 114 has high impedance and does not allow
Easily conduction electric current.One detection circuit is associated with each fuse 114, to assess the fuse 114, for example, detecting the fusing
Device 114 whether conduct a high current or low-voltage (not blowing, for example, logic is zero or removes (clear)) or a low current or
High voltage (is blown, for example, logic is one or setting (set)).The fuse 114 can be during the manufacture of microprocessor 100
It is blown, and in some embodiments, a fuse 114 not blown can be blown after the manufacture of microprocessor 100.More preferably
It says, a fuse 114 blown is irreversible.The example of one fuse 114 is a polysilicon fuse, can be applied between device
Add a sufficiently high voltage and blows.Another example of one fuse 114 is nickel-chromium fuse, and a laser can be used and blow.
It more preferably says, sensing circuit electric power opens sensing fuse 114, and provides the preservation buffer of its assessment to microprocessor 100
A corresponding positions in (Holding Register).When microprocessor 100 is reset releasing, multiple cores 102 (for example, microcode)
Read the value for saving buffer to determine sensed fuse 114.In one embodiment, solution is reset in microprocessor 100
Before removing, updated value can input scanning to preservation buffer via a boundary scan, for example, seem a combined testing
For tissue (Joint Test Action Group, JTAG) input, the value of fuse 114 is updated with essence.This is for testing
And/or wrong purpose is detectd, it is such as described in lower section particularly useful in embodiment relevant to Figure 22 and Figure 23.
In addition, in one embodiment, microprocessor 100 includes different local advanced programmable related to each core 102
Interrupt control unit (Advanced Programmable Interrupt Controller, APIC) (not shown go out).It is real one
It applies in example, abides by local Advanced Programmable Interrupt Controllers APICs framework California (California) holy santa clara
The Intel Company of (Santa Clara) is one in May, 2012 Intel 64 and IA-32 Framework Software developer's handbook 3A
The explanation of local Advanced Programmable Interrupt Controllers APICs, especially in Section 10.4.Especially local advanced programmable interrupt control
Device processed includes that an Advanced Programmable Interrupt Controllers APICs ID and one includes pilot processor (Bootstrap Processor, BSP) flag
Target Advanced Programmable Interrupt Controllers APICs plot buffer, generate and purposes will be described in further detail it is as follows, especially with
The related embodiment of Figure 14 to Figure 16.
Control unit 104 includes the combination of hardware, software or hardware and software.Control unit 104 includes a hardware signal
Amount (Hardware Semaphore) 118 (describes following Figure 17 to Figure 20), a state buffer 106, one configuration in detail temporarily
Storage 112 and buffer 108 synchronous with each core 102 corresponding one.It more preferably says, the entity of each non-core 103 is non-
Can be addressed by each core 102 in different address in framework address space, the nand architecture address space can make microcode read and
Core 102 is written.
Each synchronous buffer 108 can be written by corresponding core 102.State buffer 106 is read by each core 102
It takes.Configuring buffer 112 can be read by each core 102 (via the deactivated core position 236 of Fig. 2 as described below) and be written indirectly.
Control unit 104 may also include interrupt logic (not shown go out), which generates to the corresponding interruption letter of each core 102
Number (interrupt signal, INTR) 124, the interrupt signal are generated by control unit 104 to interrupt corresponding core 102.In
Disconnected source responds the control unit 104 and generates to an interrupt signal 124 of a core 102, and interrupt source may include exterior interrupt (example
As x86 framework INTR, SMI, NMI interrupt source) or bus events (for example, x86 framework formula bus signals STPCLK establish
(assertion) or (de-assertion) is established in releasing).In addition, each core 102 can be transmitted by write control unit 104
One internuclear interrupt signal 124 is to other each cores 102.It more preferably says, unless otherwise stated, described herein internuclear
Interrupt signal is that the microcode of a core 102 requests the internuclear interrupt signal of nand architecture via a microcommand (microinrstuction),
It is different from instructing the requested internuclear interrupt signal of conventional architectures via a framework by system software.Finally, when a synchronous feelings
When condition (Synchronization Condition) has occurred and that, as described below (for example, please referring to the side in Figure 21 and Fig. 3
Block 334), control unit 104 can produce an interrupt signal 124 to core 102 (a synchronous interrupt signal).Control unit 104 also produces
A raw corresponding clock signal (CLOCK) 122 is wherein closed to 104 property of can choose of control unit to each core 102, and is had
Effect ground backs up corresponding core 102 to wake up core 102 into sleeping and opening.Control unit 104 also generates a corresponding core
Controlling electric energy signal (PWR) 128 selectively controls corresponding core 102 and receives or do not receive electric energy to each core 102.Cause
This, control unit 104 can selectively make a core 102 enter a deeper sleep shape via corresponding controlling electric energy signal 128
State reopens electric energy to the core 102 to wake up the core 102 to close the electric energy of the core.
Writable its corresponding, with sync bit set (position S 222 for please referring to Fig. 2) the synchronization buffer of one core 102
In 108, aforesaid operations are considered as a synchronization request (Synchronization Request).More detailed description is described as follows,
In one embodiment, synchronization request request control unit 104 makes core 102 enter sleep state, and synchronizes and happen when one
When and/or when a specific wake events occur when wake up the core 102.One synchronize happen in microprocessor 100 own
The core 102 that can enable and (please refer to the enable position 254 in Fig. 2) or the specific subset that can enable core 102 conjunction (please refer in Fig. 2
Core set field 228) have been written into identical synchronous situation and (be described in more detail in the position C 224, synchronous situation or C- status bar in Fig. 2
One combination of position 226 and core set field 228, the position S 222 are more fully described as follows) to its corresponding synchronous buffer 108
When.The occurrence of synchronizing in response to one, control unit 104 wake up all cores 102 for just waiting the synchronous situation simultaneously,
That is, having requested that synchronous situation.In another embodiment being described as follows, core 102 can request to be only the last written the synchronization request
A core 102 be waken up (the selective wake-up position 214 for please referring to Fig. 2).In another embodiment, synchronization request does not request core
102 enter sleep state, on the contrary, synchronization request requests control unit 104 to interrupt core 102 when synchronous situation occurs, more in detail
It carefully is described as follows, especially Fig. 3 and Figure 21.
It more preferably says, when control unit 104 is detected when a synchronous situation has occurred (due to being ultimately written synchronization request to same
Walk the last core 102 in buffer 108), control unit 104 makes last core 102 enter sleep state, is sent to for example, closing
Be ultimately written the clock signal 122 of core 102, then simultaneously wake up all cores 102, for example, open be sent to all cores 102 when
Arteries and veins signal 122.In this method, all cores 102 are all accurately waken up in identical clock cycle (clock cycles),
For example, being turned on its clock signal 122.It for certain operations, such as is particularly advantageous (please join except wrong (debugging)
Read the embodiment in Fig. 5), it is beneficial for accurately waking up core 102 in the same clock cycle.In one embodiment, non-
Core 103 includes a single phase-locked loop (Phase-locked Loop, PLL), generates the clock signal for being supplied to core 102
122.In other embodiments, microprocessor 100 includes multiple phase-locked loops, generates the clock signal for being provided to core 102
122。
Control, state and configuration words
Referring to figure 2., a block diagram of a control word 202, status word 242 and a configuration words 252 is shown.One core 102
Be written control word 202 a value to Fig. 1 control unit 104 synchronization buffer 108, with generate an atom request (atomic
Request), with request enter sleep state and/or with core 102 all other in microprocessor 100 or a specific subset contract
Stepization (synchronization).One core 102 reads a value of the status word 242 that state buffer 106 is transmitted in the control unit 104,
To determine status information described herein.What configuration buffer 112 was transmitted in the one core 102 reading control unit 104 should
One value of configuration words 252, and the value is used, it is described as follows.
Control word 202 includes the synchronous control group position 206 of a wake events field 204, one and a power supply lock (Power
Gate, PG) position 208.The synchronously control field 206 includes various positions or sub- field, controls sleep and/or the core 102 of core 102
It is synchronous with other cores 102.Synchronously control field 206 include one sleep position 212, the position 214 a selective wake-up (SEL WAKE),
One position 222 S, a position 224 C, a synchronous regime or C- state field 226, a core set field 228, a forcing synchronization position 232,
One selectivity is synchronous to stop position (kill) 234 and the deactivated core position 236 of core.Status word 242 include a wake events field 244,
One minimum common C- state field 246 and an error code field 248.The configuration words 252 include each core of microprocessor 100
The local nuclear volume field 256 in a 102 enable position 254, one and an amount of crystals field 258.
The wake events field 204 of the control word 202 includes multiple positions corresponding to different event.As fruit stone 102 is arranged
One in wake events field 204, when this corresponding occurs for event, control unit 104 will wake up the core 102 (for example, opening
Clock signal 122 is opened to the core 102).When the core 102 is synchronous with all other core specified in core set field 228
When, then a wake events occur.In one embodiment, core set field 228 may specify all cores 102 in microprocessor 100;Institute
Have core 102 and instant (instant) core 102 share a cache memory (for example, a second level (L2) speed buffering and/
Or the third level (L3) speed buffering);In identical semiconductor crystal, all cores 102 are instant core 102 (refering to describing one in Fig. 4
One example of the embodiment of polycrystal, multi-core microprocessor 100);Or all cores 102 in other semiconductor crystals are instant
Core 102.The core set 102 of one shared cache memory can be considered a chip (Slice).Other examples of other wake events
Son includes, but are not limited to, and (de- is established in the establishment (assertion) or releasing of x86INTR, SMI, NMI, a STPCLK
) and an internuclear interruption (inter-core interrupt) assertion.When a core 102 is waken up, can be read in state
Wake events field 244 in word 242 is to determine the positive movable wake events.
When the position PG 208 is arranged such as fruit stone 102, which is closed after so that core 102 is entered sleep state to core
102 electric energy (for example, via the controlling electric energy signal 128).When control unit 104 then restores electricity to core 102, control
Unit 104 removes the position PG 208.The use of the position PG 208 will be more fully described in following Figure 11 to Figure 13.
If control unit 104 makes in the write-in of core 102 when the core 102 setting sleep position 212 or selective wake-up position 214
With specifying after the synchronization buffer 108 of 204 wake events of wake events field, core 102 is made to enter sleep state.The sleep position
212 and 214 mutual exclusion of selective wake-up position.When one, which synchronizes, happens, the difference between them is taken with control unit 104
Action it is related.If the setting sleep of core 102 position 212, when one, which synchronizes, happens, then control unit 104 will wake up all cores
102.Conversely, when one, which synchronizes, happens, control unit 104 will only wake up if selective wake-up position 214 is arranged in a core 102
It is ultimately written the core 102 that synchronous situation synchronizes buffer to it.
If fruit stone 102 does not set sleep position 212, when not set selective wake-up position 214, although control unit 104 is not yet
Core 102 can be made to enter sleep state, but when one synchronizes and happens, control unit 104 will not wake up core 102.Control is single
Member 104 will be arranged in one synchronous situation of instruction and be positive the position of movable wake events field 204, therefore core 102 can be detected
The synchronous situation has occurred and that.Many can refer to interrupt due to the wake events in the wake events field 204 by the control
An interrupt signal produced by unit 104 is to the source of core 102.However, the microcode of core 102, which can cover interruption, to be come if requiring
Source.In this way, when core 102 is waken up, the microcode can be read state buffer 106 determine a synchronous situation or a wake events or
Whether the two occurs.
If the position S 222 is arranged in fruit stone 102, request control unit 104 synchronous in a synchronous situation.The synchronous situation is in C
It is designated in some combinations of position 224, synchronous situation or C- state field 226 and in core set field 228.If the position C 224 is set
When setting, C- state field 226 specifies a C- state value;If the position C 224 is to remove, synchronous situation field 226 specifies a non-C- shape
State synchronous situation.It more preferably says, the value of synchronous regime or C- state field 226 includes the bounded set of a nonnegative integer.One
In embodiment, the synchronous situation or C- state field 226 are 4.When the position C 224 is to remove (clear), synchronous situation hair
Life exists: all cores 102 in a specific core set field 228 have been written into the set of the position S 222 and synchronous situation field 226
Identical value is into synchronous buffer 108.In one embodiment, the corresponding unique synchronous situation of the value of synchronous situation field 226,
For example, synchronous situation various in the embodiment of the demonstration described by lower section.When the position C 224 is set, synchronous situation occurs
All cores 102 whether have been written into identical value in the C- state field 226, all in a specific core set field 228
The respective collection of the position S 222 is written to be bonded in synchronous buffer 108.In the case, control unit 104 distributes (post) the C- state
Minimum write-in in field 226 is worth the minimum common C- state field 246 into the state buffer 106, the minimum write-in value
It can be read by a core 102, for example, by the main core 102 in square 908 or by being ultimately written/selecting in square 1108
Core 102 is waken up to selecting property to be read.In one embodiment, if core 102 specifies a preset value in synchronous situation field 226
(for example, all set), this instruction control unit 104 are any synchronous with specified by other cores 102 to match instant core 102
226 value of situation field.
If core 102 sets forcing synchronization position 232, control unit 104 will force all synchronization requests just carried out to be stood
Match.
In general, if any core 102 is waken up because of the wake events specified by wake events field 204,
Control unit 104 stops (kill) all synchronization requests just carried out by removing in synchronous buffer 108 position S 222.So
And if control unit 104 will stop only because of (asynchronous to happen) when the setting of core 102 selectivity synchronizes middle stop bit 234
The synchronization request that the core 102 that wake events are waken up just is carrying out.
If two or more core 102 requests synchronous under different synchronous situations, control unit 104 thinks that this pauses for one
(deadlock) situation.If a value is the position 222 S that (set) is arranged, the C that a value is removing (clear) by two or more core 102
When different value in position 224 and synchronous situation field 226 is written in respective synchronous buffer 108, two or more core 102 then exists
It requests to synchronize under different synchronous situations.For example, if a core 102 by a value be the position 222 S of (set) is set, a value is clear
Except the position 224 C of (clear) and the write-in of value 7 of a synchronous situation 226 are into synchronous buffer 108, and another core 102 is by a value
For the position 222 S of setting (set), a value be remove (clear) the position 224 C and 226 value 9 of a synchronous situation be written it is temporary to synchronizing
When in device 108, control unit 104 then thinks this for a stall condition.In addition, if a core 102 by a value be remove the position C 224
Be written to its synchronize in buffer 108 and another core 102 by a value be arranged (set) the write-in of the position C 224 synchronize to it is temporary
In device 108, then control unit 104 thinks this for a stall condition.In response to a stall condition, control unit 104 stops institute
There is the synchronization request just carried out, and wakes up all cores 102 in sleep mode.Control unit 104 also distributes (post) in shape
Value in the error code field 248 of state buffer 106, state buffer 106 are that can be read by core 102 to determine pause original
Cause and the state buffer to take appropriate action.In one embodiment, error code 248 indicates the synchronization that each core 102 is written
Situation, the synchronous situation make each core decide whether to continue to execute the projected route of its movement or be delayed to another core 102.Citing
For, if a core 102 synchronous situation is written with execute a power management operations (for example, execute an x86MWAIT instruction) and
A synchronous situation is written to execute cache management operation (for example, x86WBINVD is instructed) in another core 102, then plan is held
The core 102 of the row MWAIT instruction is because MWAIT is a selectable operation, and WBINVD is an enforceable operation and is cancelled
MWAIT instruction, to be delayed to another positive core 102 for executing WBINVD instruction.As another example, if a core 102 write-in is together
Step situation is to execute one except wrong operation (for example, dump removes wrong state (Dump debug state)) and another core 102 are written
When one synchronous situation is to execute cache management operation (for example, WBINVD is instructed), then plan the core 102 for carrying out WBINVD
By storing WBINVD state, wait dump except mistaking raw and recovery WBINVD state and executing WBINVD instruction, to be delayed to
Executive dumping is except wrong core 102.
Amount of crystals field 258 is zero in the embodiment of a single crystal.More than one a crystal embodiment (for example,
In Fig. 4), amount of crystals field 258 indicates which crystal is resident by the core 102 for reading configuration buffer 112.Citing comes
Say, in the embodiment of one or two crystal, the crystal be designated as 0 and 1 and the amount of crystals field 258 have 0 or 1 value.
In one embodiment, for example, fuse 114 is selectively blown with a specified crystal as 0 or 1.
Local nuclear volume field 256 indicates the number of the local crystal center to the positive core 102 for reading configuration buffer 112
Amount.It more preferably says, although having a sole disposition buffer 112 shared by all cores 102, control unit 104 is known
Which core of road 102 is just reading configuration buffer 112, and is provided in local nuclear volume field 256 correctly according to a reader
Value.This makes the microcode of core 102 know the local nuclear volume in same crystal between other cores 102.In one embodiment, exist
One multiplexer of 103 part of non-core of microprocessor 100 selects value appropriate, which can be read based on core 102
It configures buffer 112 and restores in the local nuclear volume field 256 of configuration words 252.In one embodiment, it selectively blows
The operation of fuse 114 restores the value of local nuclear volume field 256 together with multiplexer.It more preferably says, local nuclear volume column
The value of position 256 be it is fixed independent, the core 102 in crystal be workable, 254 meaning of enable position as described below
Show.That is, even if the value of local nuclear volume field 256 remains solid when one or more cores 102 of the crystal are deactivated
It is fixed.In addition, the microcode of core 102 calculates the whole nuclear volume of core 102, the whole nuclear volume of the core 102 is one relevant to configuration
Value, purposes are described in detail as follows.The nuclear volume of the whole core 102 of whole nuclear volume instruction microprocessor 100.Core 102 is by making
Its whole nuclear volume is calculated with the value of amount of crystals field 258.For example, in one embodiment, microprocessor 100 includes 8 cores
102, average mark is into two crystal with crystal value 0 and 1, in each crystal, the local recovery of nuclear volume field 256 1,
1,2 or 3 value;Restore the value of local nuclear volume field 256 plus 4 in the core that crystal value is 1 to calculate its whole nuclear volume.
Each core 102 of microprocessor 100 has the corresponding enable position 254 of a configuration words 252, and configuration words 252 indicate the core
Whether 102 be activated or deactivate.In Fig. 2, enable position 254 is indicated with enable position 254-x respectively, and wherein x is the correspondence core 102
Whole nuclear volume.Example in Fig. 2 assumes there is eight cores 102 in microprocessor 100, in the example of Fig. 2 and Fig. 4, causes
Energy position 254-0 instruction has whether the core 102 (for example, core A) of whole nuclear volume 0 is activated, and 254-1 instruction in enable position is with whole
Whether the core 102 (for example, core B) of body nuclear volume 1 is activated, and 254-2 instruction in enable position has the 102 (example of core of whole nuclear volume 2
Such as, core C) whether be activated etc..Therefore, by understanding whole nuclear volume, the microcode of a core 102 can be by determining in configuration words 252
Which core 102 for determining microprocessor 100 is deactivated and which core 102 is activated.More preferably say, if the core 102 is activated,
Then an enable position 254 is set, if core 102 is deactivated, enable position 254 is removed.When the microprocessor 100 is set again
Periodically, hardware is automatically filled the enable position 254 (populate).It more preferably says, when microprocessor 100 has been manufactured instruction one
Whether given core 102 is enabling, if be off, which is based on fuse 114 and is selectively blown and inserts enable
Position 254.For example, if a given core 102 is tested and finds that it is failure, a fuse 114 can be blown
To remove the enable position 254 of the core 102.In one embodiment, a fuse 114 being blown indicates that a core 102 is deactivated, and
It prevents from the clock signal for being provided to deactivated core 102.This can be deactivated the write-in of core position 236 to its synchronization by each core 102
In buffer 108, to remove its enable position 254, more details relevant to Figure 14 to Figure 16 be will be described in as follows.More preferably
It says, removing enable position 254 will not prevent the core 102 from executing instruction, but will be updated the configuration buffer 112, also, the core 102
A different position (not shown go out) must be set, to prevent the core itself from executing instruction, for example, make its power supply be removed and/or
Close its clock signal.Microprocessor 100 (for example, Fig. 4) is configured for a polycrystal, which includes that this is micro-
An enable position 254 of all cores 102 in processor 100, for example, all cores 102 not only can be the core 102 of the local crystal, and
It and can also be the core 102 of the distal end crystal.It more preferably says, in the microprocessor 100 of polycrystal configuration, when a core 102 is write
Enter when synchronizing buffer 108 to it, the synchronous buffer of the shadow that the value of synchronous buffer 108 is passed in corresponding another crystal
108 core 102 (please referring to Fig. 4), wherein be set if this deactivates core position 236, a update will be caused to be transferred into distal end crystal
Buffer 112 is configured, so that local and distal end crystal configures the value all having the same of buffer 112.
In one embodiment, configuration buffer 112 can not be directly written by a core 102.However, extremely by a core 102 write-in
The value for causing local enable position 254 is transmitted to other crystal in a polycrystal microprocessor 100 by the configuration buffer 112
Configuration buffer 112 in, for example, such as the description in square 1406 in Figure 14.
Control unit
Referring to FIG. 3, being to show a flow chart for describing the control unit 104.Process starts from square 302.In square
In 302, a synchronization request is written in a core 102, for example, a control word 202 is written to its synchronization buffer 108, the wherein synchronization
Request is received by control unit 104.In the case where a polycrystal configures microprocessor 100 (for example, referring to Fig. 4), when one
The shadow of control unit 104 synchronous buffer 108, which is received, has propagated synchronous buffer 108 by what other crystal 406 were transmitted
Value, the control unit 104 are effectively operated according to Fig. 3, for example, when from its this earth's core 102, one of them connects the control unit 104
A synchronization request (square 302) is received, in addition to the control unit 104 makes core 102 enter sleep (for example, square 314) or wake up
It (in square 306,328 or 336) or interrupts (in square 334) or prevents core 102 in the wake events of its local crystal 406
(square 326) also inserts its local state buffer 106 (square 318).Process proceeds to square 304.
In square 304, which checks the synchronous situation in square 302, to determine a pause
(deadlock) whether situation has occurred, as described by figure 2 above.If so, process marches to square 306;Otherwise, process carries out
To decision block 312.
In square 305, wake events field 204 of the control unit 104 detecting in one of synchronous buffer 108
A wake events generation the occurrence of (one in addition to being detected in square 316 synchronous other than).Such as lower section square 326
Described in, control unit 104 can automatically prevent wake events.Control unit 104 can detect the wake events and occur as
A synchronization request is written in square 302 when one event asynchronous (Event Asynchronous).Process also by square 305 into
It goes to square 306.
In square 306, which inserts state buffer 106, stops the synchronization request just carried out, and
Wake up the core 102 of any sleep.As described above, waking up sleep core 102 may include restoring its power.The core 102 then can be read
The state buffer 106, especially error code 248, to determine the reason of pausing, and it is corresponding excellent according to collision sync request
First sequential processes it, as described above.In addition, the control unit 104 stops all synchronization requests just carried out (for example, removing
The position S in the synchronization buffer 105 of each core 102 222), unless square 306 is by reaching after square 305 and the selectivity
When synchronizing middle stop bit 234 and being set, in this case, which can stop only to be waken up by the wake events
The synchronization request that core 102 is just carrying out.If square 306 is by reaching after square 305, which can be read 244 column of wake events
Position is to determine wake events occurred.In addition, being controlled if the wake events are an interruption sources for not covering (unmasked)
Unit 104 processed will generate an interrupt requests to the core 102 by the interrupt signal 124.Process terminates in square 306.
In decision block 312, which determines whether sleep position 212 or selective wake-up position 214 are set
It is fixed.If so, then process is carried out to square 314;Otherwise, process is carried out to decision block 316.
In square 314, control unit 104 makes the core 102 enter sleep state.It is slept as described above, entering a core 102
Dormancy state may include removing its power supply.In one embodiment, as an optimized example, even if the position PG 208 is set, if
This is the core 102 (for example, the generation that will cause synchronous situation) being ultimately written, and in square 314, which is not moved
Except the power supply of the core 102, and because the control unit 104 backs up the core 102 that instant on is ultimately written in square 328,
Therefore the selective wake-up position 214 is set.In one embodiment, which includes synchronous logic and sleep logic,
The two is separated from each other, but communicates with each other;In addition, each synchronous logic includes the one of the synchronous buffer 108 with sleep logic
Part.Advantageously, write-in sleeping to the synchronous logic part of the synchronization buffer 108 buffer 108 synchronous with this is written to
Dormancy logical gate is atom (atomic), i.e., indivisible.That is, if being synchronized when a part write-in occurs
Logical gate and sleep logic part all guarantee to occur.It more preferably says, the piping obstruction of the core 102, does not allow any more
Write-in occur, until it is guaranteed to be written until two parts in the synchronization buffer 108 have all occurred.Write-in is together
Step is requested and the advantages of immediately entering sleep state is that it does not need the core 102 (for example, microcode) and continuously operates so that determine should
Whether synchronous situation has occurred and that.Due to can save electric power and not consume other resources, such as bus and/or Memory bandwidth
Width, thus it is very useful.It is worth noting that, in order to enter sleep state but without request it is synchronous with other cores 102 (for example,
Square 924 and square 1124), the core 102 can be written into the position S 222 be remove (Clear) and sleep position 212 be set (Set),
A referred to herein as Sleep Request, until in the synchronization buffer 108;If specified one does not hide in wake events field 204
When the wake events covered occur (for example, square 305), but the occurrence of this core 102 1 is synchronous is not found (for example, square
316) when, in this case, which wakes up the core 102 (for example, square 306).Process proceeds to decision block
316。
In decision block 316, which determines whether a synchronous situation occurs.If so, process is carried out to side
Block 318.As described above, a synchronous situation can be only when the position S 222 be set.In one embodiment, the control unit 104
Using the enable position 254 in Fig. 2, indicate which core 102 is activated in the microprocessor 100 and which core 102 is stopped
With.The control unit 104 only looks for the core 102 being activated, to determine whether a synchronous situation occurs.One core 102 can be because of its quilt
It tests and finds defective in the production time and be deactivated.Therefore, a fuse is blown so that the core 102 can not operate simultaneously
Indicate that the core 102 is deactivated.One core 102 can be deactivated (for example, please referring to Figure 15) due to the 102 requested software of core.It lifts
For example, in user request, a special mould group buffer (Model Specific Register, MSR) is written in BIOS
To request the core 102 to be deactivated, itself (for example, core position 236 is deactivated by this) is stopped using to respond the core 102, and lead to
Know that other cores 102 read other cores 102 and determine to deactivate the configuration buffer 112 of the core 102.One core 102 can also be via a microcode
It repairs (patch) (for example, please referring to Figure 14), which can be generated by blowing fuse 114 and/or from system storage
(such as a FLASH memory) is loaded into.Other than determining whether a synchronous situation occurs, which checks that this is strong
Compel sync bit 232.If setting (set), process is then carried out to square 318.If the forcing synchronization position 232 is to remove (clear)
And one synchronous situation not yet occur, then process ends in square 316.
In square 318, which inserts the state buffer 106.Explicitly, in case of synchronous feelings
When condition is that all cores 102 request the synchronization of a C- state, as described above, the control unit 104 inserts minimum common C- status bar
Position 246.Process is carried out to decision block 322.
In decision block 322, which checks the position 214 selective wake-up (SEL WAKE).If the position is
When (set) is arranged, process is carried out to square 326;Otherwise, process is carried out to decision block 322.
In square 326, which prevents all other core 102 other than instant core (instant core)
All wake events, wherein the instant core be ultimately written in square 302 synchronization request to its synchronize buffer 108 core
102, therefore the synchronous situation occurs.In one embodiment, if wake events to be prevented and other aspects are true (True)
When, simply boolean's (Boolean) AND operation has one to the logic of the control unit 104 is the wake-up feelings of false (False) signal
Condition.The purposes of all wake events of all cores is prevented to be described in more detail as follows, especially Figure 11 to Figure 13.Process carries out
To square 328.
In square 328, which only wakes up the instant core 102, but the not wake request synchronization is other
Core.In addition, the control unit 104 stops the synchronization request that the instant core 102 is just carrying out by removing the position S 222, but do not stop
The synchronization request that other cores 102 are just carrying out, for example, the position S 222 for leaving other cores 102 is arranged.It is therefore advantageous that if working as
Instant core 102 will again result in the generation of synchronous situation (assuming that other when another synchronization request is written after it is waken up
The synchronization request of core 102 is not yet aborted), an example will describe in lower section Figure 12 and Figure 13.Process ends at square 328.
In decision block 332, which checks the sleep position 212.If the position is setting (set),
Process proceeds to square 336;Otherwise, process proceeds to square 334.
In square 334, which transmits an interrupt signal (sync break) to all cores 102.Figure 21 when
Sequence figure is the example for illustrating a non-sleep synchronization request.Each core 102 can be read the wake events field 244 and detect one and synchronizes
The occurrence of be interrupt the reason of.Process has progressed to square 334, in the case, when its synchronization request is written in core 102
When, the selection of core 102 does not enter sleep state.Although such situation does not make core 102 same when obtaining with entrance sleep state
Benefit (for example, waking up simultaneously), but there is the core 102 for making core 102 be ultimately written its synchronization requirement in waiting to be not necessarily to simultaneously for it
In the case where wake-up, the potential advantages of instruction are continued with.Process ends at square 334.
In square 336, which is waken up by all cores 102 simultaneously.In one embodiment, the control unit
104 are accurately opened into the clock signal 122 of all cores 102 in the same clock cycle.In another embodiment, the control list
Member 104 opens the clock signal 122 to all cores 102 in such a way that one interlocks.That is, the control unit 104 is when opening
Arteries and veins signal 122 to it is each it is internuclear introduce a clock cycle predetermined quantity (for example, clock sequence be ten or 100).However, when
Staggeredly (staggering) unlatching is considered in the present invention arteries and veins signal 122 simultaneously.To reduce by one when all cores 102 are waken up
A possibility that power loss spike, it is beneficial that clock signal 122, which is staggeredly opened,.In still another embodiment, in order to reduce electricity
When power consumes a possibility that spike, which is opened into the clock signal 122 of all cores 102 in the same clock cycle,
But clock signal 122 is provided in the frequency by being initially at a reduction and is improved under frequency to target frequency, is continued absolutely one
It is executed in continuous (stuttering) or compacting (throttled) mode.In one embodiment, the synchronization request is as the core 102
The implementing result of micro-code instruction be issued, and the microcode is designed at least some synchronous situation values, and specifies this same
It is unique for walking the microcode position of case values.For example, only a place includes a synchronous x request in microcode, in microcode
In only a place include a synchronous y request, and so on.In these cases, because all cores 102 are in identical local quilt
It wakes up, Microcode Design personnel may make to design more efficiently and flawless procedure code, therefore it is beneficial for waking up simultaneously.
In addition, when attempting to re-establish and repair mistake occur because of multicore interaction, but do not occur mistake then when the operation of single core
It mistakes, it may be particularly advantageous for waking up simultaneously for the purpose of except mistake.Fig. 5 and Fig. 6 is to show this example.In addition, the control
Unit 104 stops all synchronization requests just carried out (for example, removing the position S in the synchronization buffer 108 of each core 102
222).Process ends at square 336.
One advantage of embodiment described herein be its quantity that can substantially reduce the microcode in a microprocessor, because compared with
It recycles (looping) or executes other inspections to synchronize the operation between multicore, the microcode in each core can be simply written together
Step request into sleep state, and is aware of when that in microcode, same place wakes up all cores.The synchronization request mechanism it is micro-
Code purposes will be described in lower section.
Polycrystal microprocessor
It referring to figure 4., is the block diagram for showing another embodiment microprocessor 100.Microprocessor 100 in Fig. 4 exists
Many aspects are similar to the microprocessor 100 of Fig. 1, wherein a multi-core processor and core 102 are similar.However, the embodiment of Fig. 4
It is polycrystal configuration.That is, the microprocessor 100 includes being mounted in a common packaging body (common package)
And the multiple semiconductor crystal 406 communicated via a crystal internal bus 404 with another crystal.The embodiment of Fig. 4 includes two crystal
406, labeled as crystal A406A and the crystal B406B coupled by bus 404 between crystal.In addition, each crystal 406 includes
Bus unit 402 between one crystal, bus unit 402 contacts respective crystal 406 to bus 404 between the crystal between crystal.More into
One step, each crystal 406 includes the control unit being coupled between respective core 102 and crystal in the non-core 103 of bus unit 402
104.In the fig. 4 embodiment, crystal A 406A includes four 102-core of core A 102A, core B 102B, core C 102C and core D
102D, wherein aforementioned four core 102 is coupled to a control unit A 104A for being coupled to bus unit A 402A between a crystal;Together
Sample, crystal B 406B includes four 102-core of core E 102E, core F 102F, core G102G and core H102H, wherein aforementioned four
Core 102 is coupled to a control unit B104B for being coupled to bus unit B 402B between a crystal.Finally, each control unit 104
Not only include a synchronous buffer 108 of each core in the crystal 406 for including itself, also includes every in another crystal 406
The synchronous buffer 108 of the one of one core, wherein the synchronization buffer 108 in above-mentioned another crystal 406 is shadow shown in Fig. 4
Buffer (Shadow register).Therefore, each control unit in embodiment illustrated in fig. 4 includes eight synchronous buffers
108, it is expressed as 108A, 108B, 108C, 108D, 108E, 108F, 108G and 108H.In control unit A104A, synchronous buffer
108E, 108F, 108G and 108H are shadow buffer, and in control unit B104B, synchronous buffer 108A, 108B,
108C, 108D are shadow buffer.
Control unit when a value is written to it by a core 102 synchronizes buffer 108, in the crystal 406 of core 102
104, via bus 404 between bus unit 402 and crystal between crystal, it is temporary that the value corresponding shadow into another crystal 406 is written
Storage 108.In addition, if when deactivated core position 236 is set in the value for propagating to the synchronous buffer 108 of shadow, the control
Unit 104 also updates the corresponding enable position 254 in configuration buffer 112.In this way, even in microprocessor
100 caryogamy set in the case of being dynamic change (for example, Figure 14 to Figure 16), one it is synchronous the occurrence of (including one across crystal
(trans-die) generation of synchronous situation) it can be detected.In one embodiment, bus 404 is a relative low speeds between crystal
Bus, and the clock cycle sequence for 100 core of a predetermined quantity can be used in the propagation, and each control unit 104 includes one
Status mechanism takes the time of a predetermined quantity to detect the generation of the synchronous situation, and opens the clock signal to respective
All cores 102 in crystal 406.More preferably say, control unit 104 start write-in be worth to another crystal 406 (for example, by
Bus 404 between the crystal authorized), control unit 104 in local crystal 406 (e.g., including the crystal of write-in core 102
406) it is configured as delay and updates the local synchronization buffer until time of a predetermined quantity (for example, propagation time number
The summation of detecting time quantity occurs with status mechanism synchronous situation for amount).Control list in such mode, in two crystal
The occurrence of member 104 while synchronous detecting one, and at the same time being opened into the clock pulse letter of all cores 102 in two crystal 406
Number.When trial re-establishes and repair the mistake for only occurring by multicore interaction, but not occurring when a single core is just run
It mistakes, by may be particularly beneficial except for for the purpose of wrong.Fig. 5 and Fig. 6 describes the embodiment possibly also with this functionality advantage.
Debugging operations
The core 102 of microprocessor 100 is configured to execute individually adjustment operation, such as instruction execution and data access
Breakpoint (Breakpoint).In addition, microprocessor 100 is configured to execute as the debugging behaviour across core (trans-core)
Make, for example, the debugging operations are related to the more than one core 102 of microprocessor 100.
Referring to Fig. 5, it is the operation of display microprocessor 100 with the flow chart of dump (dump) debugging (debug) information.
The operation is described by the angle from a single core, but each core 102 according to its description operates common dump in microprocessor 100
The state of microprocessor 100.More specifically, Fig. 5 describes a core and receives request with the operation of dump Debugging message, process
Start from square 502, and the operating process of other cores 102 starts from square 532.
In square 502, one of them one request of reception of core 102 is with dump Debugging message.It more preferably says, above-mentioned adjustment letter
Breath includes the state of the core 102 or one subset.It more preferably says, adjustment information can pass through tune by dump to system storage or one
The external bus of finishing equipment monitoring, seems a logic analyzer.Respond the request, one debugging dump information of the transmission of core 102 to its
Its core 102 simultaneously transmits the internuclear interrupt signal of other cores 102 1.It more preferably says, (example in a period of this time, interruption was deactivated
Such as, which does not allow to be interrupted in itself), core 102 prevents microcode to respond the request with dump Debugging message (in square 502
In), or the above-mentioned interrupt signal (in square 532) of response, and be maintained in microcode, until square 528.In an embodiment
In, core 102 only need to be in sleep state when it and interrupt when being located at framework instruction boundaries.In one embodiment, described herein
Various internuclear information (seem square 502 and it is other seem the information in square 702,1502,2606 and 3206) via
Synchronous situation or C- state field 226 of synchronous 108 control word of buffer are transmitted and are received.In other embodiments, core
Between information transmitted and received via the dedicated random access memory 116 of non-core.Process proceeds to square 504 from square 502.
In square 532, one of other cores 102 in square 502 (for example, receive debugging dump request core
A core 102 except 102) turn since the internuclear interrupt signal and information that transmit in square 502 are interrupted and receive the debugging
Store up information.Although as described above, in the process in square 532 as described by the angle of single core 102, each other cores 102
(for example, the not core 102 in square 502) is interrupted and receives the information in square 532, and executes the step of square 504 to 528
Suddenly.Process proceeds to square 504 by square 532.
In square 504, the synchronization request that a synchronous situation 1 (SYNC 1 is denoted as in Fig. 5) is written in core 102 is same to it
It walks in buffer 108.Therefore, which makes core 102 enter sleep state.Process proceeds to square 506.
In square 506, when all cores have been written into SYNC 1, core 102 is waken up by control unit 104.Process carries out
To square 508.
In square 508, its state of 102 dump of core is into memory.Process proceeds to square 514.
In square 514, a SYNC 2 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream
Journey proceeds to square 516.
In square 516, when all cores have been written into SYNC 2, core 102 is waken up by control unit 104.Process carries out
To square 518.
In square 518, the storage address of 102 dump of core Debugging message in square 508 sets a flag
(flag), it is maintained by resetting (Reset) signal, then resets itself.Core 102 resets microcode, which detects the flag
It marks and its state is loaded by stored storage address again.Process proceeds to square 524.
In square 524, a SYNC 3 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream
Journey proceeds to square 526.
In square 526, when all cores have been written into SYNC 3, core 102 is waken up by control unit 104.Process carries out
To square 528.
In square 528, which is removed based on the state being loaded into again in square 518 and is reset, and starts to mention
Framework (for example, x86) is taken to instruct.Process ends at square 528.
Fig. 6 is please referred to, is the operation example timing diagram for showing one according to microprocessor 100 in Fig. 5 flow chart.In this example
In son, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that
It is that in other embodiments, microprocessor 100 may include the core 102 of different number.In this timing diagram, the mistake of event-order serie
Journey is as described below.
Core 0 receives a debugging dump request, and transmits a debugging dump information and interrupting information to 2 (each party of core 1 and core
Block 502) in response.The core 0 is then written to a SYNC 1, and enters sleep state (each square 504).
Each core 1 and core 2 are finally by being interrupted and reading its information (each square 532) in its current task.As sound
It answers, each core 1 and core 2 are written a SYNC 1 and enter sleep state (each square 504).As shown, each core write-in
The time of SYNC 1 may be different, for example, since the instruction is carrying out when the interruption is established.
When all cores have been written into SYNC 1, control unit 104 wakes up all cores (each square 506) simultaneously.Each core
Then its state of dump is written a SYNC 2 and enters sleep state (each square 514) to memory (each square 508).
Need the time quantum of the dump state may be different;Therefore, may be different in the time of each core write-in SYNC 2, as shown in the figure.
When all cores have been written into SYNC 2, control unit 104 wakes up all cores (each square 516) simultaneously.Each core
Then itself is reset and by being loaded into its state (each square 518) in memory again, SYNC 3 is written and entering sleep shape
State (each square 524).As shown, need to reset and again be loaded into state time quantum may be different;Therefore, every
The time that SYNC 3 is written in one core may be different.
When all cores have been written into SYNC 3, control unit 104 wakes up all cores (each square 526) simultaneously.Each core
Then start to extract framework instruction (each square 528) at the time point being interrupted.
Tradition solution of simultaneously operating between multiprocessor is using software signal amount (semaphore).However,
Traditional solution synchronizes (Clock-level Synchronization) the disadvantage is that it can not provide time grade.Herein
The advantages of described embodiment is that control unit 104 can open clock signal 122 to all core 102 simultaneously.
In method as described above, the engineer of an adjustment microprocessor 100 can configure one of core 102 with the period
Property real estate biopsy look into time point, to generate debugging dump request, for example, executed in the instruction of a predetermined quantity
Afterwards.When microprocessor 100 at runtime, engineer obtains all work in a record shelves on 100 external bus of microprocessor
It is dynamic.The record shelves part for being noticeable time of origin close to bus can provide to a software simulator, simulate the microprocessor
100 to help engineer to debug.Simulator simulation executes the instruction as indicated by each core 102, and simulates external micro process
100 bus of device uses the execution for noting down information.In one embodiment, the simulator of all cores 102 is opened from simultaneously by a resetting point
It is dynamic.Therefore, all cores 102 of the microprocessor 100, which actually stop resetting (for example, after SYNC 2) in the same time, is
Effect with higher.In addition, by all other core 102 stopped its current task (for example, SYNC 1 it
Before afterwards), when waiting its state of dump, its state of 102 dump of You Yihe will not execute debugging (for example, shared deposit with other cores
Memory bus or speed buffering influence each other) procedure code and/or hardware interfere with each other, can increase and regenerate mistake and sentence
A possibility that its reason of breaking.Similarly, (for example, in SYNC 3 until all cores 102 have completed to be loaded into its state again
Later), it waits to start to extract framework instruction, the journey of debugging will not be executed with other cores by being loaded into state again by a core 102
Sequence code and/or hardware interfere with each other, and can increase a possibility that regenerating mistake and judging its reason.
These benefits provide the advantage more than existing method, existing method such as United States Patent (USP) US8, and 370,684, from
All purposes can not enjoy the benefit that can obtain the synchronization request core collectively as with reference to this is incorporated in.
Speed buffering control operation
The core 102 of microprocessor 100 is configured to execute independent speed buffering control operation, seems in local high speed
Buffer storage, for example, the high-speed buffer that do not shared by two or more cores 102.In addition, microprocessor 100 is configured
To execute to control and operating across the speed buffering of core (Trans-core), for example, with the more than one core of microprocessor 100
102 is related, and, for example, because it is related to a shared cache memory 119.
Fig. 7 A~7B is please referred to, is process of the display microprocessor 100 to execute across core speed buffering control operation
Figure.The embodiment of Fig. 7 A~7B describes microprocessor 100 and how to execute an x86 framework to write back invalid buffering (Write Back
And Invalidate Cache, WBINVD) instruction.The core 102 that one WBINVD instruction instruction executes instruction writes back in micro process
All modification rows to system storage and make cache memory fail in 100 cache memory of device, or empty
(Flush).WBINVD instruction also indicates the core 102 and issues the special bus cycles with will be outside any cache memory
Directly refer in microprocessor 100, to write back the data that it has been modified, and makes above-mentioned data failure.Aforesaid operations are single with one
Described by the angle of one core, but each core 102 of microprocessor 100 writes back to have modified jointly and delay at a high speed according to this specification operation
It breasts the tape (Modified cache line) and keeps the cache memory of microprocessor 100 invalid.It further illustrates, schemes
7A~7B describes the operation that a core encounters WBINVD instruction, and process starts from square 702, and the process of other cores 102 is opened
Start from square 752.
In block 702, one of core 102 encounters WBINVD instruction.In response, core 102 transmits a WBINVD
Command information is to other cores 102 and transmits an internuclear interrupt signal to above-mentioned other cores 102.More preferably say, until process into
Before row to square 748/749, core 102 is in a period of the time, interrupt signal was deactivated (for example, the microcode does not allow itself
Be interrupted), prevent response (in block 702) of the microcode to instruct as WBINVD, or using as the interrupt signal (in square
In 752) response, and maintain in microcode.Process proceeds to square 704 from square 702.
In square 752, one of other cores 102 (for example, in addition to encountering WBINVD instruction in block 702
A core except core 102) it is interrupted due to the internuclear interrupt signal that is transmitted in block 702 and receives the WBINVD and refer to
Enable information.As described above, although process is each other 102 (examples of core as described by the angle of single core 102 in square 752
Such as, it is not core 102 in block 702) information is interrupted and received in square 752, and square 704 is executed to square
749 the step of.Process proceeds to square 704 by square 752.
In square 704, the synchronization request which is written a synchronous situation 4 (is denoted as SYNC in Fig. 7 A~7B
4) it is synchronized in buffer 108 to it.Therefore, control unit 104 makes core 102 enter sleep state.Process proceeds to square 706.
In block 706, when all cores 102 have been written into SYNC 4, which is waken up by control unit 104.Process
Proceed to square 708.
In block 708, core 102 writes back and local cache memory is failed, for example, not by core 102 and its
The 1st grade of shared (Level-1, L1) cache memory of its core 102.Process proceeds to frame 714.
In square 714, a SYNC 5 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream
Journey proceeds to square 716.
In square 716, when all cores 102 have been written into SYNC 5, core 102 is waken up by control unit 104.Process into
Row arrives decision block 717.
In decision block 717, core 102 judges whether it is the core 102 for encountering WBINVD instruction in block 702
(being contrasted with the core 102 for receiving the WBINVD command information in square 752).If so, process proceeds to square 718;
Otherwise, process proceeds to square 724.
In square 718, core 102 writes back and shared scratch pad memory 119 is made to fail.In one embodiment, microprocessor
100 include multiple chips multiple cores but and not all core in, the core 102 of microprocessor 100 shares a cache memory,
As described above.In this embodiment, it is performed similar to intermediary operation (not shown go out) of the square 717 into square 726,
To be write back by the execution of one of core 102 in the wafer and being made shared buffer out of memory, and the chip is other (multiple)
Core is returned to similar to the sleep state in square 724 to wait until the cache miss.Process proceeds to
Square 724.
In square 724, a SYNC 6 is written in core 102, causes control unit 104 that core 102 is made to enter sleep state.Stream
Journey proceeds to square 726.
In square 726, when all cores 102 have been written into SYNC 6, core 102 is waken up by control unit 104.Process into
Row arrives decision block 727.
In decision block 727, core 102 judge its whether be encounter in block 702 WBINVD instruction core 102 (with
The core 102 that the WBINVD command information is received in square 752 contrasts).If so, process proceeds to square 728;It is no
Then, process proceeds to square 744.
In square 728, core 102 issues the specific bus cycles to cause external high-speed buffer to be written back into and make outside
High-speed buffer failure.Process proceeds to square 744.
In square 744, a SYNC 13 is written, causes control unit 104 that core 102 is made to enter sleep state.Process into
Row arrives square 746.
In square 746, when all cores 102 have been written into SYNC 13, core 102 is waken up by control unit 104.Process
Proceed to decision block 747.
In decision block 747, core 102 judge its whether be encounter in block 702 WBINVD instruction core 102 (with
The core 102 that the WBINVD command information is received in square 752 contrasts).If so, process proceeds to square 748;It is no
Then, process proceeds to square 749.
In square 748, core 102 completes WBINVD instruction comprising the WBINVD instruction of resignation (retire), and can wrap
Include the ownership for abandoning a hardware semaphore (see Figure 20).Process ends at square 748.
In square 749, before core 102 is interrupted in square 752, core 102 restores to continue its positive execution in square 749
Task 102.Process ends at square 749.
It is to show to be schemed according to the operation timing of the microprocessor 100 of Fig. 7 A~7B flow chart refering to Fig. 8.In this example
In, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that
It is that in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters a WBINVD and instructs and respond one WBINVD command information of transmission, and interrupts core 1 and (each square of core 2
702).Core 0 then writes a SYNC 4 and enters sleep state (each square 704).
Each core 1 and core 2 are finally interrupted from its current task and read the information (each square 752).As sound
It answers, each core 1 and core 2 are written a SYNC 4 and enter sleep state (each square 704).As shown, each core write-in
The time of SYNC 4 may be different.
When all cores have been written into SYNC 4, control unit 104 wakes up all cores (each square 706) simultaneously.It is each
Core then writes back and makes its specific cache miss (each square 708), and SYNC 5 is written and enters sleep shape
State (each square 714).It need to write back and make the time quantum of cache miss may be different, therefore, be write in each core
The time for entering SYNC 5 may be different, as shown in the figure.
When all cores have been written into SYNC 5, control unit 104 wakes up all cores (each square 716) simultaneously.Only encounter
The core of WBINVD instruction writes back and makes the shared failure of cache memory 119 (each square 718) and the write-in of all cores
SYNC 6 simultaneously enters sleep state (each square 724).Since only a core writes back and loses shared cache memory 119
Effect, therefore the time of each core write-in SYNC 6 may be different.
When all cores have been written into SYNC 6, control unit 104 wakes up all cores (each square 726) simultaneously.Only encounter
The core of WBINVD instruction completes WBINVD instruction (each square 748) and all other core restores the processing before interrupting.
Although it should be appreciated that speed buffering control instruction be an x86WBINVD instruction embodiment be described,
Other embodiments assume that synchronization request is used to execute other speed buffering instructions.For example, class can be performed in microprocessor 100
As act, so as to execute x86INVD instruction and simply without writing back high speed buffer data (in square 708 and 718)
High-speed buffer is set to fail.For as yet another example, speed buffering control instruction can be by instruction more different than x86 framework
Collection framework obtains.
Power management operations
It is configured to execute the operation of each power reduction in the core 102 of microprocessor 100, for example, but be not limited to,
Stopping executes instruction, control unit 104 is requested to stop transmission clock signal to core 102, request control unit 104 by removal core
102 power supply writes back and makes local (for example, unshared) cache miss of core 102 and stores the state of core 102
To an external memory, such as dedicated random access memory 116.Subtract when a core 102 has executed the specified power of one or more cores
When operating less, " core " C- state (also referred to as a core idle state or core sleep state) is entered.In one embodiment,
C- state value can be generally corresponding to known Advanced Configuration and Power Interface (Advanced Configuration and Power
Interface, ACPI) specification processor state, but may also comprise finer granularity (Granularity).In general, one
Core 102 will enter a core C- state to respond the request from aforesaid operations system.For example, x86 framework monitoring waits
(MWAIT) instruction is power management instruction, provides a prompt, i.e. a target C- state, until the core 102 executed instruction is to permit
Perhaps microprocessor 100 enters an optimized state, seems lower-wattage consume state.In the case where a MWAIT instruction, mesh
Mark C- state is exclusive (proprietary) and non-ACPI C- state.Core C- state 0 (C0) corresponds to the operation shape of core 102
The corresponding activity gradually decreased of the value that state and C- state gradually increase or responsive state (such as C1, C2, C3 state).One gradually
The response of reduction or active state refer to configuration or the operation shape that more power are saved relative to a more multi-activity or responsive state
State, or for some reason and the opposite configuration for reducing response or mode of operation (for example, there is longer wake-ups to postpone, compared with
It is few to enable completely).The example that one core 102 may save power operation is the execution of halt instruction, stops transmission clock signal, drop
Low-voltage, and/or part (for example, functional unit and/or local high-speed buffer) or the power supply of entire core for removing core.
In addition, microprocessor 100 is configured to execute the power reduction operations across core.Across core power reduction operations involve
Or influence multiple cores 102 of microprocessor 100.For example, sharing cache memory 119 can be big and disappears relatively
Consume a large amount of power.Therefore, significant power saves the clock pulse letter that shared cache memory 119 can be sent to by removing
Number and/or power supply reach.However, in order to remove to the clock signal and/or power supply of shared cache memory 119, institute
There is the core 102 of shared cache memory that must agree to so that the consistency of data is maintained.Embodiment considers micro- place
Managing device 100 includes the relevant resource of other shared power supplys, seems shared clock pulse and power supply.In one embodiment, microprocessor 100
It is coupled to the System on chip group including a Memory Controller, peripheral controllers and/or power source management controller.In other realities
It applies in example, one or more controllers are integrated into microprocessor 100.System power saving can be by 100 notification controller of microprocessor
Make controller that the movement of power saving be taken to reach.For example, microprocessor 100 can make the height of microprocessor with notification controller
Fast cache invalidation is simultaneously closed, so that it need not be investigated.
Other than the concept of a core C- state, the in general C- state with one " encapsulation " of microprocessor 100 (is also claimed
For an encapsulation idle state or encapsulation sleep state).Encapsulation C- state corresponds to minimum (for example, peak power consumption) of core 102
Common core C- state (for example, the square 318 for please referring to the field 246 and Fig. 3 in Fig. 2).However, in addition to the specific power of core subtracts
Few operation is outer, and encapsulation C- state is related to executing one or more the microprocessor 100 across core power reduction operations.With encapsulation C- shape
Relevant across the core power-save operation example of state include close one generate clock signal phase-locked loop (Phase-locked-loop,
PLL), and the shared cache memory 119 is emptied, and stops its clock pulse and/or power supply, make memory/outside control
Device avoids the local of investigation microprocessor 100 from sharing cache memory.Other examples are to change voltage, frequency and/or total
Line clock pulse than, reduce the size of cache memory, such as shared cache memory 119, and with the operation of the speed of half
Shared cache memory 119.
In many cases, operating system is by effectively to execute the instruction in independent core 102, therefore can enable individually
Core enters sleep state (for example, to a core C- state), but do not have directly enable microprocessor 100 entrance sleep state (for example,
To encapsulation C- state) mode.Valuably, side of the core 102 in control unit 104 of microprocessor 100 is described in embodiment
It helps down and works with working in coordination, with detecting when all cores 102 have entered core C- state and prepare that the power-save operation across core occurs.
Referring to Fig. 9, it is the operational flowchart that display microprocessor 100 enters low-power encapsulation C- state.Fig. 9's
Embodiment describes the example that microprocessor 100 is coupled to a chipset and is executed using MWAIT instruction.However, being understood that
It is that in other embodiments, operating system is using the instruction of other power managements and main core 102 and is integrated into microprocessor
Controller in 100 communicates with each other, and different shake hands (Handshake) agreement using one and describe.
This operation is to be described with the angle of a single core, but each core 102 of the microprocessor 100 can be potentially encountered
MWAIT instruction simultaneously makes microprocessor 100 enter optimum state jointly according to this specification operation.Process starts from square 902.
In square 902, a core 102 encounters one for specifying the MWAIT instruction of target C- state, is denoted as in Fig. 9
Cx, wherein x is a nonnegative integral value.Process proceeds to square 904.
In square 904, a position C 224 set is written in core 102 and 226 value of a C- state field is that x (is denoted as in Fig. 9
SYNC Cx) synchronization request to its synchronize buffer 108.In addition, synchronization request specifies core in its wake events field 204
102 are waken up in all wake events.Therefore, control unit 104 enables core 102 enter sleep state.It more preferably says, core 102
Before SYNC Cx is written, core 102 is first write back and the local cache memory that it is written fails.The process side of proceeding to
Block 906.
In square 906, when all cores 102 have been written into a SYNC Cx signal, 102 controlled unit 104 of core is waken up.
As described above, may be different by the x value that other cores 102 are written, and control unit 104 issues minimum common C- state value to shape
In the minimum common C- state field 246 of 106 status word 242 of state buffer (each square 318).Before square 906, and core
102 be in sleep state when, can be waken up by a wake events, seem an interrupt signal (for example, square 305 and 306).More
Specifically, but do not guarantee that the operating system will execute the MWAIT instruction of all cores 102, it allows to send out in a wake events
Before one of raw (for example, interruption) instruction core 102 effectively cancels MWAIT instruction, microprocessor 100 is executed and encapsulation C-
The relevant power-save operation of state.However, in square 906, once core 102 is waken up, (example in a period of clock pulse is interrupted and deactivated
Such as, microcode does not allow itself to be interrupted), MWAIT of the core 102 (in fact, all core 102) due to (in square 902)
Instruction still executes microcode, and maintains in microcode, until square 924.In other words, although small part in all cores 102
MWAIT instruction is received to enter sleep state, individual core 102 can be in sleep state, but micro- place as an encapsulation
Reason device 100 would not instruct that the chip collection, and it is ready for entering an encapsulation sleep state.However, once all cores 102 have agreed into
Enter an encapsulation sleep state, effectively indicated by the generation of the synchronous situation in square 906, main core 102 is allowed to and crystalline substance
One encapsulation sleep state Handshake Protocol of piece group completion (for example, square 908,909 and following 921), and be not interrupted and do not have and appoint
What its core 102 is interrupted.Process proceeds to decision block 907.
In decision block 907, core 102 judge its whether be microprocessor 100 main core 102.It more preferably says, if sentencing
Break reseting time its for BSP when, a core 102 is main core 102.If the core is main core, process proceeds to square 908;
Otherwise, process proceeds to square 914.
In square 908, main core 102 writes back and shared cache memory 119 is made to fail, then with can take
Appropriate action is communicated with the chip collection for reducing power consumption.For example, due to being in encapsulation C- state when microprocessor 100
When, Memory Controller and/or peripheral control unit all maintain to fail, therefore Memory Controller and/or peripheral control unit can be kept away
Exempt from detect microprocessor 100 local and shared cache memory.Illustrate as another example, which can transmit signal
To microprocessor 100 make microprocessor 100 take power-save operation (for example, establishment x86-style STPCLK as described below,
SLP, DPSLP, NAP, VRDSLP signal).It more preferably says, core 102 is based on minimum common 246 value of C- state field and carries out power
The communication of management information.In one embodiment, core 102 issues an I/O and reads the bus cycles to the relevant electricity of an offer chipset
Source control information, for example, the I/O address of encapsulation C- state value.Process proceeds to square 909.
In square 909, main core 102 waits chipset to establish (assert) STPCLK signal.More preferably say, if
When STPCLK signal is not established after the bright clock cycle of a predetermined number, control unit 104 is stopping its synchronization just carried out
After request, this situation is detected, wake up all cores 102 and indicates the mistake in error code field 248.Process proceeds to square
914。
In square 914, which is written a SYNC 14.In one embodiment, the synchronization request is in its wake events
The core 102 is specified not to be waken up in any wake events in field 204.Therefore, control unit 104 enables core 102 enter sleep
State.Process proceeds to square 916.
In square 916, when all cores 102 have write a SYNC 14, core 102 is waken up by control unit 104.Stream
Journey proceeds to decision block 919.
In decision block 919, core 102 judge its whether be microprocessor 100 main core 102.If so, before process
Enter square 921;Otherwise, process proceeds to square 924.
In square 921, main core 102 issues a stopping in 100 bus of microprocessor allows (grant) period with logical
Knowing the chipset, it may take across core (for example, package perimeter) and the whole relevant power-save operation of microprocessor 100, seem to keep away
Exempt from investigation, removal bus clock pulse (for example, x86- type BCLK) to microprocessor 100 of 100 cache memory of microprocessor,
And other signals (for example, x86- type SLP, DPSLP, NAP, VRDSLP) in the bus are established, so that microprocessor 100 removes
Clock pulse and/or power supply to microprocessor 100 various pieces.Although being described in, embodiments herein relate to arrive microprocessor
100 and one read relevant chip collection to I/O between a Handshake Protocol (in square 908), the establishment of STPCLK is (in square
In 909), and stop the publication (in square 921) for allowing the period, have that history is related to x86 architecture system, Ying Keli
Solution, other embodiments assume with it is other with different agreement instruction set architecture system it is related, but can also save electric energy,
It improves performance and/or reduces complexity.Process proceeds to square 924.
In square 924, a Sleep Request is written (for example, sleep position 212 is setting (set) and the position S 222 is clear in core 102
Except the Sleep Request of (clear)) extremely synchronize buffer 108.In addition, synchronization request indicates core 102 in its wake events field 204
Only in non-established wake events (the wakeup event of the de-assertion of STPCLK, that is, release true of STPCLK
The wake events of vertical STPCLK) in be waken up.Therefore, control unit 104 enables core 102 enter sleep state.Process ends at
Square 924.
Referring to Fig. 10, it is to show the timing diagram for operating embodiment according to Fig. 9 flow chart microprocessor 100.In this example
In son, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, being understood that
It is that in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters the MWAIT instruction (MWAIT C4) (each square 902) of a specified C- state 4.Core 0 then writes one
SYNC C4 simultaneously enters sleep state (each square 904).Core 1 encounters the MWAIT instruction (MWAIT C3) of a specified C- state 3
(each square 902).Core 1 then writes a SYNC C3 and enters sleep state (each square 904).Core 2 encounters a specified C- shape
The MWAIT instruction (MWAIT C2) (each square 902) of state 2.Core 2 then writes a SYNC C2 and enters sleep state (each party
Block 904).As shown, the time in each core write-in SYNC Cx may be different.In fact, it occurs in some other events
Before, such as one interrupt, one or more cores are not likely to be encountered a MWAIT instruction.
When all cores have been written into SYNC Cx, control unit 104 wakes up all cores (each square 906) simultaneously.Mainly
Core then issues I/O and reads bus cycles (each square 908), and waits the establishment (every square 909) of STPCLK.All core
A SYNC 14 is written, and enters sleep state (each square 914).It is slow due to only having main core to empty (Flush) shared high speed
Memory 119 is rushed, I/O is issued and reads the bus cycles and STPCLK is waited to establish, therefore the time of each core write-in SYNC 14 can
Can be different, as shown in the figure.In fact, main core can be sequentially written in SYNC 14 after other cores with several hundred microseconds.
When SYNC 14 is written in all cores, control unit 104 wakes up all cores (each square 916) simultaneously.Only one is main
Core, which issues, to be stopped allowing period (Stop grant cycle) (each square 921).All cores are written in the non-establishment letter of STPCLK
Sleep Request for waiting in number (~STPCLK) simultaneously enters sleep state (each square 924).Since only main core sending stops
Only allow the period, therefore the time of each core write-in Sleep Request may be different, as shown in the figure.
When STPCLK signal, which is released from, establishes (de-asserted), control unit 104 wakes up all cores.
Can be observed by Figure 10, when core 0 executes Handshake Protocol, core 1 and core 2 valuably can one section of suspend mode it is effective when
Between.It is noted, however, that microprocessor 100 need to be waken up usually and suspend mode the required time from encapsulation sleep state
Time span is directly proportional (for example, great power is saved in sleep state).Therefore, relatively long in encapsulation sleep state
In the case of (or the individual 102 sleep state time of core is longer even in), it would be desirable to it is further reduced wake-up
The time waken up needed for occurring and/or being related to Handshake Protocol.Figure 11 describes the Handshake Protocol that single core 102 is handled, and another
Core 102 keeps a dormant embodiment.In addition, saving power can further pass through according in the embodiment of Figure 11
It reduces by one wake events of response and 102 quantity of core that is waken up and obtains.
Figure 11 is please referred to, is that microprocessor 100 according to another embodiment of the present invention enters low-power encapsulation C- shape
The operational flowchart of state.The embodiment of Figure 11 using microprocessor 100 be coupled to example that MWAIT instruction in chipset executes into
Row explanation.However, operating system is instructed using other power managements it should be appreciated that in other embodiments, and last
It synchronous core 102 and is integrated into microprocessor 100, and using the communication of the controller of Handshake Protocols different from description.
The embodiment of Figure 11 is similar to the embodiment of Fig. 9 in some respects.However, in existing operations system request micro process
Device 100 enters low-down power rating and tolerates in the environment of delay associated therewith, the embodiment of Figure 11 be designed in
Save potential bigger power.More specifically, the embodiment of Figure 11 is conducive to control to the power of core and if necessary, such as handle
When interruption, an only core in core is waken up.Embodiment considers to support the behaviour of two modes in Fig. 9 and Figure 11 in the microprocessor 100
Make.In addition, mode is configurable, either manufacture (for example, passing through fuse 114) and/or via software control or by
Microprocessor 100 is automatically determined according to the specific C- state as specified by MWAIT instruction.Process starts from square 1102.
In square 1102, core 102 encounters the MWAIT instruction (MWAIT Cx) for specifying target C- state, is scheming
Cx is expressed as in 11, process proceeds to square 1104.
In square 1104, one position C 224 of the write-in of core 102 is set and 226 value of a C- state field is that (it is in Figure 11 by x
Be denoted as SYNC Cx) synchronization request to its synchronize buffer 108 in.Synchronization request is also provided with selective wake-up (SEL
WAKE) position 214 and the position PG 208.In addition, synchronization request indicates core 102 in all wake events in its wake events field 204
In be waken up, except the establishment of STPCLK and the non-establishment (~STPCLK, that is, the releasing of STPCLK is established) of STPCLK.
(more preferably saying there are other wake events, when such as AP starting, which specifies core 102 not to be waken up).Therefore, control is single
Member 104 enables core 102 enter sleep state comprising prevents to provide power to core 102 because the position PG 208 is set.In addition, core
102 write back and keep local cache memory invalid, and (preferably dedicated arbitrary access is stored before synchronization request is written
Memory 116) its core 102 state.When subsequent core 102 is waken up (for example, in square 1137,1132 or 1106), core 102
(for example, from PRAM 116) is restored into its state.As described above, especially with respect to Fig. 3, when last core 102 write-in one has
When the synchronization request that selective wake-up position 214 is arranged, other than being ultimately written core 102, which can be automatically prevented from institute
There are all wake events (each square 326) of core 102.Process proceeds to square 1106.
In square 1106, when all cores 102 have been written into a SYNC Cx, the wake-up of control unit 104 is ultimately written
Core 102.As described above, control unit 104 maintains the position S 222 of other cores 102 to be arranged, finally write even if control unit 104 wakes up
The core 102 that enters simultaneously removes S.Before square 1106, when core 102 is in sleep state, it can be called out by a wake events
It wakes up, such as one interrupts.However, core 102 is still held because of MWAIT instruction (square 1102) when core 102 is waken up in square 1106
Row microcode, and in a period of interruption is deactivated (for example, the microcode does not allow itself to be interrupted) be maintained in microcode, until
Until square 1124.In other words, although being no more than all cores 102 has been received by a MWAIT instruction to enter sleep state, only singly
Only core 102 can suspend mode, but as the microprocessor of encapsulation 100 do not indicate the chipset it be ready for entering an encapsulation sleep
State.However, passing through the synchronous regime in square 1106 when all cores 102 have agreed to enter an encapsulation sleep state
Indicated by generation, the core 102 (core 102 being ultimately written, cause synchronous situation) being waken up in square 906 is allowed to
Encapsulation sleep state Handshake Protocol (for example, square 1108,1109 and 1121 as follows) is completed without quilt with chipset
It interrupts, and not any other core 102 is interrupted.Process proceeds to square 1108.
In square 1108, core 102 writes back and shared cache memory 119 is made to fail, and then communicates with chipset,
It may take action appropriate, to reduce power consumption.Process proceeds to square 1109.
In square 1109, core 102 waits chipset to establish STPCLK signal.It more preferably says, if STPCLK signal
When not establishing after a clock cycle predetermined quantity, control unit 104 detects this situation, and asks terminating its synchronization just carried out
All cores 102 are waken up after asking, and the mistake is indicated in error code field 248.Process proceeds to square 1121.
In square 1121, core 102, which issues one, to be stopped allowing the chipset on the period to bus.Process proceeds to square
1124。
In square 1124, a Sleep Request is written in core 102, for example, having sleep position 212 is setting (set) and S
222 be removing (clear) and the position PG 208 is setting (set), until in synchronous buffer 108.In addition, synchronization request is in its wake-up
The core 102 is specified only to be waken up in releasing the wake events for establishing STPCLK in event field 204.Therefore, control unit 104
Core 102 is enabled to enter sleep state.Process proceeds to square 1132.
In square 1132, control unit 104 detects the non-establishment of STPCLK and wakes up core 102.It should be noted that previously control
Unit 104 processed wakes up core 102, and control unit 104 does not limit power supply to core 102 yet.It is advantageous that at this time core 102 be it is unique just
In the core of running, this provides 102 chance of core so that it executes any movement that must be performed, without other cores 102
Running.Process proceeds to square 1134.
In square 1134, core 102 is written into a buffer (not shown go out) for control unit 104 that be opened in its with solution right
The wake events of specified each other cores 102 in the wake events field 204 of buffer 108 should be synchronized.The process side of proceeding to
Block 1136.
In square 1136, core 102 handles any wake events for just carrying out specifying the core 102.For example, real one
It applies in example, the system including microprocessor 100 allows the interruption of oriented (both directed) (for example, being directed toward microprocessor
The interruption of 100 1 particular cores) He Feixiang (non-directed) interruption (for example, when microprocessor 100 select when, can be by micro-
Interruption handled by any core 102 of processor 100).One non-is commonly known as one " low priority interrupt " to the example of interruption.
In one embodiment, microprocessor 100 be preferably directed to it is non-to interrupt to square 1132 releasing establish STPCLK in be waken up
Single core 102, since it has been waken up, and can handle the interruption with it is expected other cores 102 do not have it is any just carrying out call out
The event of waking up, therefore can continue to sleep and limit power supply.Process returns to square 1104.
When wake events are released from (unblcked) in square 1134, in addition to the core being waken up in square 1132
Except 102, the wake events that do not specify such as fruit stone 102 are being carried out, then are conducive to core 102 and keep sleep state, and
Power supply is limited in each square 1104.However, when wake events are released from square 1134, if a specified wake-up
Event is just handled by core 102, then core will not limit power supply (un-power-gated), and be waken up by control unit 104.In this feelings
Under condition, different processes starts from the square 1137 in Figure 11.
In square 1137, after wake events are released from square 1134, another core 102 is (for example, in addition in square
The core 102 except wake events core 102 is released in 1134) it is waken up.Other cores 102 handle any positive progress and are directed toward other cores
102 wake events, for example, processing one is interrupted.Process proceeds to square 1104 from square 1137.
Figure 12 is please referred to, is to show the timing diagram for operating an example according to the microprocessor 100 of Figure 11 flow chart.Herein
In example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, it should thus be appreciated that
, in other embodiments, microprocessor 100 may include the core 102 of different number.
Core 0 encounters the MWAIT instruction (MWAIT C7) (each square 1102) of a specified C- state 7.In this example, C-
State 7 allows to limit power supply.It is that (set) (" selection as shown in Figure 12 is arranged that core 0, which is then written to a selective wake-up position 214,
Property wake up ") and the position PG 208 be arranged (set) SYNC C7, and entrance sleep state and limit power supply (each square 1104).
Core 1 encounters the MWAIT instruction (each square 1102) that a specified C- state is 7.Core 1 is then written to selective wake-up position 214
(set) is set and the position PG 208 is the SYNC C7 that (set) is arranged, and enters sleep state and limitation power supply (each square
1104).Core 2 encounters the MWAIT instruction (each square 1102) that a specified C- state is 7.Core 2 is then written to, and there is selectivity to call out
Awake position 214 is setting (set) and the position PG 208 is the SYNC C7 that (set) is arranged, and (each into sleep state and limitation power supply
Square 1104).(however, the core being ultimately written can not limit power supply in being described in the optimal embodiment of square 314 1).Such as
Shown in figure, the write-in of each core may be different with the time of SYNC C7.
When it is that the SYNC C7 of (set) is arranged that the core write-in being ultimately written, which has selective wake-up position 214, the control list
Member 104 stops (block off) all wake events (each square 326) for being ultimately written core, is core 2 in the example of Figure 12.
In addition, control unit 104 only wakes up the core (each square 1106) being ultimately written, because of other core prolonged sleeps and power supply is limited,
And core 2 and chipset execute Handshake Protocol, therefore can save power.Core 2 then issues I/O and reads bus cycles (each square
1108), and the establishment (each square 1109) of STPCLK is waited.In response to STPCLK, core 2, which issues, stops allowing the period (every
One square 1121), and being written one to have the waiting position PG 208 in STPCLK releasing is Sleep Request and the entrance that (set) is arranged
Sleep state and limitation power (each square 1124).Above-mentioned core with suspend mode and can limit the one relatively long time of power.
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12,
The chipset can not establish STPCLK to respond a non-reception to interruption, be forwarded to microprocessor 100.Microprocessor 100
It indicates non-to interrupting to core 2, saves power due to other cores keep sleep state and limitation power supply.Core releases other cores
The wake events of (each square 1134) simultaneously service non-to interruption (each square 1136).Core 2, which then re-writes one, has choosing
Selecting property wake-up position 214 is setting (set) and the position PG 208 is the SYNC C7 that (set) is arranged, and enters sleep state and limit electric
Source (each square 1104).
When the write-in of core 2 has, selective wake-up position 214 is setting (set) and the position PG 208 is the SYNC C7 that (set) is arranged
When, since the synchronization request of other cores is still carrying out, for example, the position S 222 of other cores is not removed by the wake-up of core 2, therefore
The control unit 104 stops (block off) wake events of all cores other than core 2, for example, it is (each to be ultimately written core
Square 326).In addition, control unit 104 only wakes up core 102 (each square 1106).Core 2 then issues I/O and reads the bus cycles
(each square 1108), and wait the establishment (each square 1109) of STPCLK.In response to STPCLK, core 2, which issues, to be stopped permitting
Perhaps period (each square 1121), and being written one to have the position PG 208 waited in STPCLK can not be established is setting (set)
Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12,
STPCLK is because other non-to being released from establishment due to interruption.Therefore, microprocessor 100 indicates the interruption to core 2, this can save function
Rate.Core 2 releases the wake events (each square 1134) of other cores again and to service this non-to interruption (each square 1136).Core 2
Then it be that the SYNC C7 that (set) and the position PG 208 are setting (set) is arranged that one is written again with selective wake-up position 214, is gone forward side by side
Enter sleep state and limitation power (each square 1104).
This period lasts is for quite a long time, i.e., only non-to be generated to interruption.Figure 13 is one instruction one of display in addition to most
The example of different IPs interrupt processing except core is written afterwards.
It can know that the embodiment in Figure 12 advantageously, is slept once core 102 initially enters by comparing Figure 10 and Figure 12
Dormancy state (is written after SYNC C7) in the example in figure 12, and only a core 102 is waken up again to execute association of shaking hands with chipset
View, and other cores 102 keep sleep, can be a significant advantage if core 102 is under a quite long sleep state.Function
Rate saves possible highly significant, especially handles workload very for single core 102 in systems in operating system identification
In the case where small.
Furthermore it is advantageous that be indicated to other cores 102 as long as no wake events, then only a core 102 be waken up (with
It is non-to event to provide service, seems a low priority interrupt).Come again, it, can if core 102 is in a quite long sleep state
There can be significant advantage.In addition to relatively infrequent non-to interruption, such as USB is interrupted, and is not had in systems especially effective
In the case where load, power saving can be significant.Further, even if a wake events are indicated to another core
When 102 (for example, interrupt operation system is indicated to a single core 102, seems operating system timer interruption), embodiment can be advantageous
The single core 102 of ground switching at runtime, execute encapsulation sleep state agreement and service are non-to wake events, as shown in figure 13, so as to
Enjoy the benefit for waking up an only single core 102.
Figure 13 is please referred to, is to show the timing diagram for operating an example according to the microprocessor 100 of Figure 11 flow chart.Figure 13
Example it is similar to the example of Figure 12 in many aspects.However, being released from the first established example in STPCLK, which is
One is directed toward the interruption (rather than one in Figure 12 example is non-to interruption) of core 1.Therefore, control unit 104 wakes up 2 (each party of core
Block 1132), and (each square 1134) is then released by core 2 in wake events and wakes up core 1 afterwards.Core 2 is then written one again to be had
Selective wake-up position 214 is setting (set) and the position PG 208 is the SYNC C7 that (set) is arranged, and enters sleep state and limitation
Power (each square 1104).
(each block 1137) is interrupted in 1 service-orientation of core.Then write-in has selective wake-up position 214 to set to core 1 again
It sets (set) and the position PG 208 is the SYNC C7 that (set) is arranged, and enter sleep state and limit power (each square 1104) and exist
In this example, its SYNCC7 is written before SYNC C7 is written in core 1 in core 2.Therefore, although core 0 is when initial SYNC C7 is written in it
Still there is its S 222set, but the position S 222 when it is waken up of core 1 is still removed.Therefore, when core 2 is after releasing wake events
When SYNC C7 is written, synchronous C7 request is written in not last core, on the contrary, core 1, which becomes last core, writes synchronous C7 request.
When the write-in of core 1 one has, selective wake-up position 214 is setting (set) and the position PG 208 is the SYNC that (set) is arranged
When C7, because the synchronization request of core 0 is still carrying out (for example, it is not removed by the wake-up of core 1 and core 2), and core 2 is (herein
In example) it has been written into SYNC 14 and request, so the wake events of the control unit 104 blocking all cores other than core 1, for example,
It is ultimately written core (each square 326).In addition, control unit 104 only wakes up core 1 (each square 1106).Core 1 then issues I/
O reads bus cycles (each square 1108), and STPCLK is waited to establish (each square 1109).In response to STPCLK, core 1
It issues and stops allowing period (each square 1121), and being written with waiting STPCLK to release the position PG 208 of establishment is to be arranged
(set) Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK is released from it is established when, control unit 104 only wakes up core 1 (each square 1132).In the example of Figure 12
In, STPCLK non-releases establishment to interruption due to one;Therefore, microprocessor 100 indicates non-to interrupting to core 1, can save
Power.It is handled from core 1 non-to the period lasts of interruption considerable time, that is, only non-to be generated to interruption.In such mode
In, microprocessor 100 can be such that nearest interruption is instructed to save power advantageous by instruction is non-to interruption to core 102,
It is shown in the example of Figure 13 related to a different IPs are switched to.Core 1 releases wake events (each square of other cores again
1134) it and services non-to interruption (each square 1136).Then write-in one has selective wake-up position 214 for setting to core 1 again
(set) and the position PG 208 is the SYNC C7 that (set) is arranged, and enters sleep state and limitation power (each square 1104).
Although it should be appreciated that power management instruction be an x86MWAIT instruction embodiment be described, it is other
The embodiment that synchronization request is used to perform power management instruction can be considered.For example, microprocessor 100 is executable
Similar operations are to respond by one group of reading with the relevant default I/O port address of different C- states.As another example, function
Rate management instruction can be obtained by the instruction set architecture different from x86 framework.
The dynamic of multi-core processor reconfigures
Each core 102 of microprocessor 100 is generated based on the configuration of each core 102 of microprocessor 100 and configures relevant value.
It more preferably says, the microcode of each core 102 is generated, stored and using the relevant value of configuration.The production of embodiment description configuration correlation
It is raw to can be dynamic and beneficial, it is described as follows.The example of configuration correlation includes, but are not limited to the following contents.
Each core 102 generates a whole nuclear volume relevant to above-mentioned Fig. 2.With the core for being only resident crystal 406 in core 102
The local nuclear volume 256 of 102 relevant cores 102 is compared, and whole nuclear volume refers to relevant to all cores 102 of microprocessor 100
The nuclear volume of whole core 102.In one embodiment, core 102 generates whole nuclear volume, and whole nuclear volume is 102 number of crystals of core
Amount 258 and the product of 102 quantity of core of each crystal and its summation of local nuclear volume 256, as follows:
Whole nuclear volume=(number of crystals × each crystal nuclear volume)+local nuclear volume.
Each core 102 also generates a virtual nuclear volume.The virtual nuclear volume is that whole nuclear volume is subtracted with one lower than i.e.
When core 102 whole nuclear volume whole nuclear volume 102 quantity of deactivated core.Therefore, in all cores of the microprocessor 100
In 102 available situations, whole nuclear volume is identical with virtual nuclear volume.However, if one or more cores 102 deactivate, have it is scarce
When falling into, the virtual nuclear volume of a core 102 may be different from its whole nuclear volume.In one embodiment, it is empty to insert it for each core 102
Nucleoid quantity to its corresponding APIC ID buffer APIC ID field.However, according to another embodiment (for example, Figure 22 and
Figure 23), then it is not belonging to such situation.In addition, in one embodiment, operating system may be updated in APIC ID buffer
APIC ID。
Each core 102 also generates a BSP flag, indicates whether the core 102 is BSP.In one embodiment, in general
(for example, when the function of in Figure 23 " all core BSP " deactivates) core 102 is specified originally as boot sequence processor
It itself is an application processor (Application that (Bootstrap Processor, BSP) and each other cores 102, which are specified,
Processor, AP).After reseting, AP core 102 is initialized, and subsequently enters sleep state and BSP notice is waited to start to read
It takes and executes instruction.On the contrary, BSP core 102 immediately begins to read and executes system firmware after the initialization of AP core 102
Instruction, for example, BIOS start code, to initialize system (for example, verifying system storage and the whether normal work of peripheral equipment
Make and initialize and/or configure them) and operating system is guided, for example, it is loaded into operating system (for example, being loaded into from disk),
And control is transferred to operating system.Before guiding operating system, BSP decision systems are configured (for example, at core 102 or logic
Manage the quantity of device in systems), and be stored in memory, so that operating system can be read after system configuration starting.
In operating system after being guided, instruction AP core 102 starts to read and execute operating system instruction.In one embodiment, generally
For (for example, when the function of " modification BSP " and " BSP of all cores " in Figure 22 and Figure 23, when deactivating respectively), if a core 102
When its virtual nuclear volume is 0, then specify this as BSP, and all other core 102 is specified originally as an AP core 102.Most preferably,
One core 102 inserts BSP flag bit of its BSP flag relevant configuration value into the APIC substrate address register of its corresponding APIC.
According in an embodiment, as described above, BSP is the main core 102 in square 907 and 919, the encapsulation sleep shape of Fig. 9 is executed
State Handshake Protocol.
Each core 102 also generates the APIC base value for inserting APIC substrate buffer.APIC substrate address is based on core
102 APIC ID and generate.In one embodiment, the APIC base in APIC substrate address register may be updated in operating system
Bottom address.
Each core 102 also generates a crystal and mainly indicates, indicates whether the core 102 is the crystal 406 for including the core 102
Main core 102.
Each core 102 also generates a chip and mainly indicates, indicate the core 102 whether be include instant 102 chip of core
Main core, wherein assuming that the microprocessor 100 is configured with chip, detailed description is as above.
Each core 102 calculates configuration correlation and operates with the configuration correlation, so that being including microprocessor 100
System normal operation.For example, system is based on its relevant APIC ID instruction interrupt requests to core 102.APIC ID determines core
Which interrupt requests 102 should respond.It further illustrates, each interrupt requests including a mesh identifier, and a core 102 is only
Responded when mesh identifier is matched with the APIC ID of core 102 interrupt requests (if or the interrupt requests identifier be one to
Indicate that it is the particular value of all cores 102 of a request).As another example, each core 102 must be known by whether it is BSP, with
So that it is executed initial BIOS code and guide operating system, and executes encapsulation sleep state as described in Figure 9 in one embodiment
Handshake Protocol.Embodiment is described as follows (refering to Figure 22 and 23), and wherein BSP flag and APIC ID can be due to specific purposes by it
It makes an amendment in normal value, seems for testing and/or debugging.
Figure 14 is please referred to, is the flow chart that 100 dynamic of display microprocessor reconfigures.In the explanation of Figure 14, with
The polycrystal microprocessor 100 of Fig. 4 is as reference comprising two crystal 406 and eight cores 102.However, being understood that
It is that described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, that is, there is more than two crystal or list
A crystal, and more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but micro-
Each core 102 of processor 100 with overall dynamics operates according to the description and reconfigures the microprocessor 100.Process is opened
Start from square 1402.
In square 1402, microprocessor 100 is reset, and quantity of the hardware of microprocessor 100 based on available core 102
And the suitable value of amount of crystals filling of core 104 is resided at into the configuration buffer 112 of each core 102.In one embodiment,
Local nuclear volume 256 and amount of crystals 258 are hard-wired (hardwired).As described above, hardware can decide whether by fuse
114 states blown or do not blown enable or deactivate a core 102.Process proceeds to square 1404.
In square 1404, core 102 is by reading configuration words 252 in configuration buffer 112.Core 102 is then based in square
Read 252 value of configuration words generates its correlation in 1402.In the case where polycrystal microprocessor 100 configures, in square
Generated configuration correlation will not consider the core 102 of other crystal 406 in 1404.However, in square 1414 and 1424 (with
And square 1524 in Figure 15) caused by configuration correlation will consider the core 102 of other crystal 406, as described below.Process carries out
To square 1406.
In square 1406, core 102 makes to be passed in 254 value of enable position of this earth's core 102 being locally configured in buffer 112
It casts to distal end crystal 406 and configures the corresponding enable position 254 of buffer 112.For example, the configuration of Fig. 4 is please referred to, one in crystal
Core 102 in A 406A makes and configuration buffer 112 center A, B, C and D (this earth's core) in crystal A 406A (local crystal)
Relevant enable position 254 is transmitted to and 112 center A, B, C and D phase of configuration buffer in crystal B 406B (distal end crystal)
The enable position 254 of pass.On the contrary, the core 102 in crystal B 406B makes and the configuration in crystal B 406B (local crystal)
The relevant enable position 254 112 center E, F, G and H (this earth's core) of buffer is transmitted to and at crystal A 406A (distal end crystal)
The relevant enable position 254 configuration buffer 112 center E, F, G and H.In one embodiment, core 102 is locally configured by write-in
Buffer 112 propagates to other crystal 406.It more preferably says, local match is made to buffer 112 is locally configured by the write-in of core 102
Setting buffer, no change has taken place, but will cause local control unit 104 and propagate local 254 value of enable position to distal end crystal 406
In.Process is carried out to square 1408.
In square 1408, core 102 be written a synchronous situation 8 (being denoted as SYNC 8 in fig. 8) synchronization request to its
In synchronous buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Process proceeds to square 1412.
In square 1412, when all available cores 102 have been written into one in the core set specified by core set field 228
When SYNC 8, control unit 104 wakes up core 102.It is worth noting that, the case where 406 microprocessor 100 of a polycrystal configures
Under, synchronous situation occurs to occur for a polycrystal synchronous situation.That is, control unit 104 by wait with wake up (or
The not set sleep position 212 of core 102 is to determine to interrupt in sleepless situation) core 102, until in core set field 228, (it can
To include in the core 102 in crystal 406) its synchronization request is written until.Process proceeds to square 1414.
In square 1414, core 102 reads again configuration buffer 112 and is based on including by the transmitted enable of distal end crystal
Newly value generates its configuration correlation to the configuration words 252 of the right value of position 254, and process proceeds to decision block 1416.
In decision block 1416, core 102 determines whether it should deactivate itself.In one embodiment, fuse 114 because
The microcode reads (before decision block 1416) in its reset process, to indicate that core 102 should deactivate itself and be blown, therefore
Core 102 determines that it need to deactivate itself.Fuse 114 can be blown during or after the manufacture of microprocessor 100.Another
In embodiment, 114 value of fuse of update, which can be scanned up to, to be kept in buffer, as described above, and scanned value instruction
The core 102 should be deactivated.Figure 15 is to describe core 102 to judge that it should be stopped another embodiment used by different modes.If
When core 102 determines that it should be deactivated, process proceeds to square 1417;Otherwise, process proceeds to square 1418.
In square 1417, core 102, which is written, deactivates core position 236 so as to remove in itself list by available core 102, example
Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding
Any more instructions of row more preferably by one or more positions are arranged come to close its clock signal, and remove its power supply.Process
Terminate in square 1417.
In square 1418, the synchronization request of a synchronous situation 9 (SYNC 9 is denoted as in Figure 14) is written to same in core 102
It walks in buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Process proceeds to square 1422.
In square 1422, when the core 102 of all enablings has been written into a SYNC 9, core 102 is called out by control unit 104
It wakes up.In addition, synchronous situation occurs based in configuration buffer 112 in the case where the configuration of 406 microprocessor 100 of a polycrystal
In updated value may happen for a quartz lock.Furthermore when control unit 104 determines whether a synchronous situation occurs
When, control unit 104 deactivates the core 102 of itself by consideration is excluded in square 1417.It is described in more detail, in a situation
In, before synchronous buffer 108 is written in the core 102 for not deactivating itself in square 1417, all other core 102 (in addition to
Except the core 102 for deactivating itself) one SYNC 9 of write-in, then when the core 102 for not deactivating itself stops in square 1417
When synchronous buffer 108 is written with the setting of core position, control unit 104 will detect the generation of synchronous situation (in square 316).When
Control unit 104 because deactivate core 102 enable position 254 be remove (clear) due to determine that synchronous situation has occurred and that when, control
Unit 104 does not consider further that deactivated core 102.That is, due to all enabling cores 102, but do not include deactivating core 102, it has write
Enter SYNC 9, no matter deactivates whether core 102 has been written into SYNC 9, therefore control unit 104 judges that synchronous situation has occurred and that.
Process proceeds to square 1424.
In square 1424, if a core 102 is deactivated by operation of another core 102 in square 1417, core 102
Configuration buffer 112 is read again, and the new value of configuration words 252 reflects a deactivated core 102.Core 102 is then according to configuration words
252 new value generates it again and configures correlation, is similar to the mode in square 1414.One deactivated core can there are 102
Some configuration correlations can be will cause and be different from the generated new value in square 1414.For example, as described above, virtual nucleus number
Amount, APIC ID, BSP flag, BSP plot, the main chip of predominant crystal can because deactivate core 102 there are due to change.Next implementation
In example, after generating and configuring correlation, core 102 one of them (for example, BSP) is by all cores 102 whole one of microprocessor 100
The dedicated random access memory 116 of non-core is written in a little configuration correlations, read it then can by all cores 102.For example,
In one embodiment, whole configuration correlation is read by core 102 to execute framework instruction (for example, x86CPUID is instructed),
Its related Global Information of instruction request microprocessor 100 seems 102 quantity of core of microprocessor 100.Process proceeds to judgement
Square 1426.
In square 1426, core 102, which removes, resets and starts to extract framework instruction.Process ends at square 1426.
Figure 15 is please referred to, is to show the flow chart that reconfigures of 100 dynamic of microprocessor according to another embodiment.?
In the explanation of Figure 15, using the polycrystal microprocessor 100 of Fig. 4 as reference comprising two crystal 406 and eight cores 102.So
And, it should thus be appreciated that, described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, that is, has more
In two crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is from a single core
Described by angle, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures micro- place
Manage device 100.It further illustrates, Figure 15 describes a core 102 and encounters the operation that core deactivates instruction, and process starts from square
1502, and another core 102 operates, operating process starts from square 1532.
In square 1502, one of core 102 encounters one to indicate that core 102 deactivates the instruction of itself.It is real one
It applies in example, which is x86WRMSR instruction.In response, the transmission of core 102 one reconfigures information to other cores 102 and passes
Send one internuclear interrupt signal.It more preferably says, (for example, the microcode does not allow its own in a period of the time, interruption was deactivated
Be interrupted), core 102 prevents microcode to respond the instruction, to deactivate itself (in square 1502), or respond the interruption (
In square 1532), and maintain in microcode, until square 1526.Process proceeds to square 1504 by square 1502.
In square 1532, one of other cores 102 are (for example, deactivate the core of instruction in addition to encountering in square 1502
Core except 102) it is interrupted and receives by the internuclear interruption that is transmitted in square 1502 and reconfigure information.Institute as above
It states, although each other cores 102 are (for example, not in the process in square 1532 as described by the angle of a single core 102
Core 102 in square 1502) information is interrupted and received in square 1532 and executes the step in square 1504 to 1526
Suddenly.Process proceeds to square 1504 by square 1532.
In square 1504, the write-in of core 102 one, which synchronizes, asks the synchronization request of condition 10 (SYNC 10 is denoted as in Figure 15) extremely
It is synchronized in buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Process proceeds to square 1506.
In square 1506, when all available cores 102 have been written into a SYNC 10, core 102 is called out by control unit 102
It wakes up.It is worth noting that, synchronous situation generation can be a polycrystal in the case where configuration of 406 microprocessor 100 of a polycrystal
Synchronous situation occurs.That is, control unit 104 by wait with wake up (or core 102 not yet determines entrance it is dormant
In the case of interrupt) core 102, until specified in core set field 228 (it may include the core 102 in crystal 406) and can
Until enabling its synchronization request of the write-in of core 102 of (it is as indicated by enable position).Process proceeds to decision block 1508.
In decision block 1508, core 102 judges whether it is one to be instructed in square 1502 to deactivate itself
Core 102.If so, process proceeds to square 1517;Otherwise, process proceeds to square 1518.
In square 1517, core 102, which is written, deactivates core position 236 so as to remove in itself list by available core 102, example
Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding
Any more instructions of row more preferably by one or more positions are arranged come to close its clock signal, and remove its power supply.Process
Terminate in square 1517.
In square 1518, the synchronization request of a synchronous situation 11 (SYNC 11 is denoted as in Figure 15) is written extremely in core 102
In synchronous buffer 108.Therefore, control unit 104 enables core 102 enter sleep state.Process proceeds to square 1522.
In square 1522, when the core 102 of all enablings has been written into a SYNC 11, core 102 is by 104 institute of control unit
It wakes up.In addition, synchronous situation occurs based in configuration buffer in the case where the configuration of 406 microprocessor 100 of a polycrystal
Updated value in 112 may occur for a polycrystal synchronous situation.Furthermore when control unit 104 determines that a synchronous situation is
When no generation, control unit 104 deactivates the core 102 of itself by consideration is excluded in square 1517.It is described in more detail, one
In situation, before synchronous buffer 108 is written in the core 102 for not deactivating itself in square 1517, all other core 102
One SYNC 11 of (other than deactivating the core 102 of itself) write-in, then when the enable position 254 because of deactivated core 102 is to remove
(clear) when determining whether synchronous situation has occurred and that, because control unit 104 does not consider further that deactivated core 102, therefore ought not stop
When synchronous buffer 108 is written in square 1517 with the core 102 of itself, control unit 104 will detect the hair of synchronous situation
Raw (in square 316) (please referring to Figure 16).That is, no matter stopping since all enabling cores 102 have been written into a SYNC 11
SYNC 11 whether is had been written into core 102, control unit 104 then judges that synchronous situation has occurred and that.Process proceeds to square
1524。
In square 1524, core 102 reads configuration buffer 112, and configuration words 252, which will reflect in square 1517, to be stopped
Deactivated core 102.The core 102 then generates it according to the new value of configuration words 252 and configures relevant value.It more preferably says, in side
It is performed by system firmware (for example, BIOS is arranged) that instruction is deactivated in block 1502, and after core 102 deactivates, system firmware is held
The restarting of row system, for example, after in square 1526.During restarting, microprocessor 100 can be carried out not
It is same as previously having configured the operation of correlation generation in square 1524.For example, BSP can be for one not during restarting
It is same as generating the core 102 before configuration correlation.Illustrate as yet another example, before guiding operating system by BSP determine with
It stores to memory so that the system configuration information that can read of operating system is (for example, core 102 and logic processor in systems
Quantity) can not be identical.Illustrate as another example, the APIC ID of the core 102 still used is different from before generating configuration correlation
APIC ID, in the case, operating system will indicate interrupt requests and response is different from previously configuration correlation and produced by core 102
Raw interrupt requests.Illustrate as yet another example, the master of Fig. 9 encapsulation sleep state Handshake Protocol is executed in square 907 and 919
Want core 102 that can be different from the core 102 that previously configuration correlation generates for one.Process proceeds to decision block 1526.
In square 1526, core 102 restores the task of its execution before being interrupted in square 1526.The process side of ending at
Block 1526.
The microprocessor 100 described herein that dynamically reconfigures can be used in various applications.For example, it moves
State, which reconfigures, to be used to test and/or simulate in the development process of microprocessor 100, and/or in on-the-spot test.Separately
Outside, a user may wonder the performance and/or function using only system when 102 subset of a core, one specific application program of operation
The total amount of rate consumption.In one embodiment, when a core 102 is deactivated, its clock pulse can be made to stop and/or remove power supply, with
It is set to there is no consumption power supply.In addition, each core 102 can periodically check other cores in the system of high reliability
102 and the selected particular core 102 of core 102 whether break down, the core of non-failure can disabling faulty core 102 and make remaining
Core 102 executes dynamically to be reconfigured as described above.In this embodiment, control word 202 may include an additional field, make
Write-in core 102 specifies the core 102 to be deactivated and modifies the operation described in Figure 15 so that a core can in square 1517
Deactivate the core 102 for being different from core 102 itself.
Figure 16 is please referred to, is to show the timing diagram for operating an example according to the microprocessor 100 of Figure 15 flow chart.Herein
In example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.However, it should thus be appreciated that
, in other embodiments, microprocessor 100 may include the core 102 of different number and can be micro- for single crystal or polycrystal
Processor 100.In this timing diagram, the timing of event is advanced downwards.
Core 1, which encounters the instruction that one deactivates itself and transmits one in response, to be reconfigured information and interrupts core 0 and core 2
(each square 1502).Core 1 is then written to SYNC 10 and enters sleep state (each square 1504).
Each core 0 and core 2 are finally interrupted from its current task and read the information (each square 1532).As
The write-in SYNC 10 of response, each core 0 and core 2 simultaneously enters sleep state (each square 1504).As shown, each core
It is written possible different with the time of SYNC 10.For example, due to the delay of the instruction, which is established when interruption
When and execute.
When SYNC 10 is written in all cores 102, control unit 104 wakes up all cores (each square 1506) simultaneously.Core 0
And core 2 then determines that it will not be deactivated itself (each decision block 1508), and a SYNC 11 is written and enters sleep state
(each square 1518).However, because core 1 determines that it deactivates itself, so it, which is written, in it deactivates 236 (each square of core position
1517).In this example, it is written after respective SYNC 11 is written in core 0 and core 2 and deactivates core position 236 for core 1, as shown in the figure.
However, control is single since control unit 104 determines that the core 102 that each enable position 254 is set is arranged in the position S 222
Member 104 is detected the synchronous situation and is occurred.That is, even if the position 222 S of core 1 is not set, enable position 254 is in square 1517
The synchronization buffer 108 of core 1 is removed when being written.
When all available cores have been written into SYNC 11, control unit 104 wakes up all cores (each square 1522) simultaneously.
As described above, deactivating core position 236 when it is written in core 1, and locally control in the case where a polycrystal microprocessor 100
Unit 104 removes the local enable position 254 of core 1 respectively, and it is brilliant that local control unit 104 also propagates local enable position 254 to distal end
Body 406.Therefore, Remote Control Unit 104 also detects the generation of synchronous regime and to wake up its crystal 406 simultaneously all available
Core.Core 0 and core 2 then generate it based on the value for having updated configuration buffer 112 and configure correlation (each square 1524), and extensive
Activity (each square 1526) before its multiple interruption.
Hardware semaphore (HARDWARE SEMAPHORE)
Figure 17 is please referred to, a block diagram of hardware semaphore 118 in Fig. 1 is shown in.Hardware semaphore 118 includes one
Possess position (owned bit) 1702, owner position (owner bit) 1704 and a state machine 1706, state machine 1706 to
It updates and possesses position 1702 and owner position 1704 to respond the hardware semaphore 118 for being read and being written by core 102.More preferably say,
In order to recognize the hardware semaphore 118 that core possesses at present, the quantity of owner position 1704 is log with the microprocessor 100 that 2 be bottom
102 quantity of core of configuration.In another embodiment, owner position 1704 includes that each core 102 1 of microprocessor 100 is corresponding
Position.It is worth noting that, although one group possesses position 1702, owner position 1704 and state machine 1706 and is described with a hardware signal
Amount 118 is realized, but microprocessor 100 may include multiple hardware semaphores 118, wherein each hardware semaphore 118 all includes upper
The a set of hardware stated.It more preferably says, needs the exclusive operation for reading shared resource to execute, run in each core 102
The ownership that microcode reads and the hardware semaphore 118 is written to obtain one by 102 shared resources of core, is described in detail in down
In the example of side.The microcode can join each multiple hardware semaphores 118 shared resource ownership different from microprocessor 100
It is tied.It more preferably says, hardware semaphore 118 passes through the preset address in a nand architecture address space of core 102 of core 102
It is middle to read and be written.The nand architecture address space can only be read by the microcode of a core 102, but can not be directly by user's journey
Sequence code reads (for example, program instruction of x86 framework).To update hardware semaphore 118 possess position 1702 and the owner position
1704 operation of state machine 1706 is described as in Figure 18 and 19, and the use of hardware semaphore 118 is also described later.
Figure 18 is please referred to, is shown when a core 102 reads the operational flowchart of hardware semaphore 118.Process starts from
Square 1802.
In square 1802, a core 102 is denoted as core x, reads hardware semaphore 118.As described above, more preferably saying, core
102 microcode reads the presumptive address in the resided in nand architecture address space of hardware semaphore 118.Process proceeds to judgement
Square 1804.
In decision block 1804, state machine 1706 checks owner position 1704, to determine whether core 102 is hardware letter
Number amount 118 the owner.If so, process proceeds to square 1808;Otherwise, process proceeds to square 1806.
In square 1806, which returns and reads the zero in core 102 to indicate the core 102 not
Possess hardware semaphore 118, process terminates in square 1806.
In square 1808, which returns and reads the value in core 102, to indicate that the core 102 possesses firmly
Part semaphore 118, process terminate in square 1808.
As described above, microprocessor 100 may include multiple hardware semaphores 118.In one embodiment, microprocessor 100
Including 16 hardware semaphores 118, and when a core 102 reads presumptive address, one 16 bit data values are received, each
One of them different hardware semaphore 118 of corresponding 16 hardware semaphores 118, and indicate the core 102 of the reading presumptive address
Whether corresponding hardware semaphore 118 is possessed.
Figure 19 is please referred to, is the operational flowchart shown when a core 102 write-in hardware semaphore 118.Process starts from
Square 1902.
In square 1902, a core 102 is denoted as core x, hardware semaphore 118 is written, for example, as described above non-
The preset address of framework.Process proceeds to decision block 1804.
In decision block 1904, state machine 1706 check this possess position 1702, with determine hardware semaphore 118 whether be
Any core 102 possesses or is not occupied (free).If being possessed, process proceeds to decision block 1914;Otherwise, process
Proceed to decision block 1906.
In decision block 1906, state machine 1706 checks the value of write-in.If the value is 1, it is hard to indicate that core 102 is intended to obtain
The ownership of part semaphore 118, then process proceeds to square 1908.However, indicating the hardware to be abandoned of core 102 if the value is 0
The ownership of semaphore 118, then process proceeds to square 1912.
In square 1908, the update of state machine 1706 possesses position 1702 to 1, and owner position 1704 is arranged and indicates that core x is existing
In the hardware semaphore 118 possessed.Process terminates in square 1908.
In square 1912, which is not carried out the update for possessing position 1702, is also not carried out owner position 1704
Update, process ends in square 1912.
In decision block 1914, state machine 1706 checks owner position 1704, to determine whether core x is hardware signal
The owner of amount 118.If so, process proceeds to decision block 1916;Otherwise, process proceeds to square 1912.
In decision block 1916, state machine 1706 checks value be written.If the value is 1, indicate that the core 102 is intended to
Obtain hardware semaphore 118 ownership, then process proceed to square 1912 (wherein therefore core 102 possessed hardware semaphore
118, so not having more kainogenesis, as judged in decision block 1914).However, indicating that the core 102 is intended to put if the value is 0
The ownership of hardware semaphore 118 is abandoned, then process proceeds to square 1918.
In square 1918, it is zero that the state machine 1706 update, which possesses position 1702, to indicate not having core 102 to possess firmly now
Part semaphore 118, process end at square 1918.
As described above, in one embodiment, microprocessor 100 includes 16 hardware semaphores 118.When a core 102 is written
When the presumptive address, one 16 bit data values are written, each corresponds to 16 hardware semaphores 118, and one of them is different hard
Part semaphore 118, and indicate whether the core 102 of the write-in presumptive address requests to possess (value 1) or abandon corresponding hardware signal
The ownership (value zero) of amount 118.
In one embodiment, arbitrated logic arbitration requested to access the hardware semaphore 118 by core 102 so that core 102 by
Hardware semaphore 118 serializes (Serialize) read/write hardware semaphore 118.In one embodiment, arbitrated logic exists
Using a loop control justice algorithm (Round-Robin Fairness Algorithm) with access hardware signal between core 102
Amount 118.
Figure 20 is please referred to, is display when microprocessor 100 needs a resource to monopolize institute using hardware semaphore 118 to execute
The operational flowchart having the right.It further illustrates, hardware semaphore 118 is write to encounter respectively in two or more core 102
It returns and makes to ensure in the case where the shared failure of cache memory 119 instruction sometime only a core 102 executes one and writes back,
And shared cache memory 119 is made to fail.The operation is but the microprocessor 100 with described by the angle of a single core
Each core 102 ensures that a core 102 execution writes back and keeps the operation of other cores 102 invalid according to the present invention with whole.That is,
The operation of Figure 20 ensures that WBINVD instruction process is serialized (Serialize).In one embodiment, the operation of Figure 20 can be one
It is executed in microprocessor 100, WBINVD instruction is executed according to the embodiment in Fig. 7 A~7B.Process starts from square
2002。
In square 2002, a core 102 encounters a speed buffering control instruction, seems WBINVD instruction.Process carries out
To square 2004.
In square 2004, the write-in 1 of core 102 is into WBINVD hardware semaphore 118.In one embodiment, the microcode has been
One of hardware semaphore 118 is distributed into WBINVD operation.The core 102 then read WBINVD hardware semaphore 118 with
Determine whether it obtains ownership.Process proceeds to decision block 2006.
In decision block 2006, if core 102 determines that it obtains the ownership of WBINVD hardware semaphore 118, flow
Journey proceeds to square 2008;Otherwise, process is back to square 2004 to again attempt to obtain ownership.It should be noted that when instant
The microcode of core 102 is recycled via between square 2004 to 2006, eventually by possessing the core 102 of WBINVD hardware semaphore 118
It is interrupted, because the core 102 executes WBINVD just in Fig. 7 A~7B and instructs and transmit an interruption to instant core in square 702
102.More preferably say, via each circulation, the microcode of instant core 102 checks interrupt status buffer, with observe other cores 102 its
One of (for example, the core 102 for possessing the WBINVD hardware semaphore 118) whether send an interruption to instant core 102.This is immediately
Core 102 then will execute Fig. 7 A~7B operation, and in square 749 according to fig. 20 recovery operation with attempt obtain hardware signal
The ownership of amount 118, to execute its WBINVD instruction.
In square 2008, core 102 has obtained the square 702 that all processes for the time being proceed in Fig. 7 A~7B to execute
WBINVD instruction.Since the WBINVD of part instructs operation, in Fig. 7 A~7B square 748, the core 102 write-in zero to WBINVD
To abandon its ownership in hardware semaphore 118.Process ends at square 2008.
One, which is similar to the described operation of Figure 20, to be executed by the microcode, monopolized with other shared resources of acquisition all
Power.It is non-core 103 that one core 102, which can get by using other resources of exclusive ownership used in a hardware semaphore 118,
Buffer, shared by core 102.In one embodiment, 103 buffer of non-core includes a control buffer comprising every
The respective field of one core 102.The field controls the operating aspect of each core 102.Since field is located in identical buffer, when
When one core 102 is intended to update its respective field but can not update the field of other cores 102, it is temporary which must read the control
Storage, the read value of modification then write back the value modified to controlling buffer.For example, microprocessor 100 can wrap
103 Properties Control buffer of a non-core (Performance Control Register, PCR) is included, is used to control core 102
Bus clock pulse ratio.In order to update its bus clock pulse ratio, a specific core 102 must read, modify and write back PCR.Therefore, one
In embodiment, microcode is configured as when core 102 possesses hardware semaphore 118 relevant to PCR, executes effective original of a PCR
Sub- reading/modification/writes back.Bus clock pulse ratio determines that single 102 clock frequency of core is the support micro process via an external bus
The multiple of the clock frequency of device 100.
Another resource is a reliable platform mould group (Trusted Platform Module, TPM).In one embodiment,
Microprocessor 100 executes a reliable platform mould group of running microcode in core 102.In the given instant time, operation
In a core 102 and core 102, one of them microcode implements TPM.However, implementing the core 102 of TPM may change over time.It is logical
Use hardware semaphore 118 associated with TPM is crossed, the microcode of core 102 can ensure that only a core 102 is in time implementation TPM.More
It specifically describes, TPM state to dedicated arbitrary access is written before abandoning implementing the TPM and deposits for the positive core 102 for executing TPM at present
Reservoir 116, and the core 102 of adapter tube implementation TPM reads the state of TPM from dedicated random access memory 116.Each
The microcode of core 102 is configured as making when core 102, which is intended to, becomes the core 102 for executing TPM, and core 102 is by dedicated random access memory
The ownership of TPM hardware semaphore 118 is obtained before reading TPM state in device 116 first, and starts to execute TPM.Implement one
In example, TPM generally conforms to the TPM specification issued by believable operation tissue (Trusted Computing Group), seems
ISO/IEC11889 specification.
As described above, tradition is utilized in system storage the solution of resource contention between multiple processors
Software signal amount (software semaphore).The potential advantage of hardware semaphore 118 described herein is that it can avoid
The generation of additional transmissions amount in extra memory bus, and its access speed is faster than the memory of access system.
It interrupts, non-sleep synchronization request
Figure 21 is please referred to, is to show that issuing non-sleep synchronization request according to the core 102 of Fig. 3 flow chart operates an example
Timing diagram.In this example, there are three cores 102 for the configuration of microprocessor 100 tool, are denoted as core 0, core 1 and core 2, as shown in the figure.So
And, it should thus be appreciated that, in other embodiments, which may include the core 102 of different number.
A SYNC 14 is written in core 0, is not set in sleep position 212, nor be set to 214 (example of selective wake-up position
Such as, a non-sleep synchronization request) in.Therefore, control unit 104 allows core 0 to remain operational the (branch of each decision block 312
"No").
A non-sleep SYNC 14 is also finally written for core 1 and control unit 104 allows core 1 to remain operational.Finally, core 2 is written
One non-sleep SYNC 14.As shown, the time of each core write-in SYNC 14 may be different.
When all cores have been written into non-sleep synchronization 14, control unit 104 simultaneously send a sync break to each core 0,
Core 1 and core 2.Each core then receives sync break and service synchronization is interrupted (unless the sync break is shielded, in such case
Under, which generally understands poll (poll) sync break).
Pilot processor is specified
In one embodiment, as described above, usual (for example, when the function of Figure 23 " all core BSP " is deactivated) core
102 specify this as bootstrap processor (BSP) and execute specified task, seem guidance work system.In one embodiment, lead to
Often (for example, when the function of Figure 22 and 23 " modification BSP " and " all core BSP " are deactivated respectively) quantity of virtual core is by core
102BSP is preset as 0.
However, embodiment will inventor have observed that BSP is designated in a different mode may be advantageous
It is described below.For example, many tests of part microprocessor 100 especially in manufacture test are operated by guidance
System is executed with operation procedure code, to ensure that the part microprocessor 100 is normally carried out work.Because BSP core 102 executes system
The operating system is initialized and starts, therefore BSP core 102 can be run in such a way that AP core is unable to run.In addition, can by observation
Know, even it is larger that BSP usually bears the processing load compared with AP in the operating environment of multi-threading (Multithreaded)
Part, therefore, AP core 102 can not make as BSP core 102 comprehensively test.Finally, may have certain movements, it only need to be by
It seems that encapsulation sleep state as described in Fig. 9 is shaken hands association to execute that the BSP core 102, which represents microprocessor 100 and is integral,
View.
Therefore, embodiment, which describes any core 102, can be designated as BSP.In one embodiment, in the survey of microprocessor 100
During examination, operation test n times, wherein N is the quantity of 100 core 102 of microprocessor, and micro- place in each operation of test
Reason device 100 is reconfigured so that BSP is different core 102.This can advantageously provide better test in the fabrication process
Coverage rate, and also advantageously in the design process of microprocessor 100 disclosed in the mistake in microprocessor 100.It is another excellent
Point is that each core 102 can have a different APIC ID in different operations, so that different interrupt requests are responded, it can
Wider test coverage is provided.
Figure 22 is please referred to, is the program flow diagram for showing configuration microprocessor 100.Fig. 4 is referred in the description of Figure 22
In polycrystal microprocessor 100 comprising two crystal 406 and eight cores 102.However, it should be appreciated that being described in this
Dynamic reconfigure can be used have a different configuration of microprocessor 100, that is, have more than two crystal or single crystal,
And more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but microprocessor
100 each core 102 with overall dynamics operates according to the description and reconfigures the microprocessor 100.The process side of starting from
Block 2202.
In square 2202, microprocessor 100 is reset, and executes the initial part of its initialization, more preferably a mode
It is similar to mode described in above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially
It is APIC ID and the BSP flag, is executed in mode described in square 2203 to 2204.Process proceeds to square 2203.
In square 2203, core 102 generates its virtual nuclear volume, is more preferably described in Figure 14.Process proceeds to judgement
Square 2204.
In decision block 2204, one instruction of the sampling of core 102 is to determine whether a function can enable.The function is herein
Referred to as " modification BSP " function.In one embodiment, the function of BSP can be modified by blowing a fuse 114.It more preferably says, is testing
In the process, the fuse 114 of modification BSP function is not blown, but a true value (True) is scanned up to and melts with modification BSP function
In the disconnected relevant preservation buffer position of device 114, as shown in above-mentioned Fig. 1, so that modification BSP function can enable.In this mode
In, modification BSP function in part microprocessor 100 and it is impermanent enable, but deactivated afterwards in power supply (power-up).
It more preferably says, the operation in square 2203 to 2214 is as performed by the microcode of core 102.If modification BSP function is activated,
Process proceeds to square 2205.Otherwise, process proceeds to square 2206.
In square 2205, the modification of core 102 generated virtual nuclear volume in square 2203.In one embodiment, core
102 modify virtual nuclear volumes to generate a cyclical function (Rotate of the produced virtual nuclear volume in square 2203
Function result and an internal circulating load), as follows:
Virtual nuclear volume=circulation (internal circulating load, virtual nuclear volume).
Cyclical function recycles virtual nucleus number by recurring number in one embodiment between core 102.Internal circulating load is to burn
One value of disconnected fuse 114, or more preferably say, it is scanned up to keeps in buffer during the test.Table 1 shows each core
102 virtual nucleus number, ordered pair (amount of crystals 258, local nuclear volume 256) are shown in the left row of an example configuration,
And each internal circulating load is shown in top row, amount of crystals 406 is two and 102 quantity of core of each crystal 406 is 4, and
All cores 102 can be activated.In such mode, tester, which is authorized to, makes core 102 generate its virtual nucleus number and for example any have
The APIC ID of valid value.Although other embodiments can also be expected for modifying in the embodiment that virtual nucleus number is described in.
For example, loop direction can be shown on the contrary in table 1.Process proceeds to square 2206.
Table 1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
(0,0) | 0 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
(0,1) | 1 | 0 | 7 | 6 | 5 | 4 | 3 | 2 |
(0,2) | 2 | 1 | 0 | 7 | 6 | 5 | 4 | 3 |
(0,3) | 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 |
(1,0) | 4 | 3 | 2 | 1 | 0 | 7 | 6 | 5 |
(1,1) | 5 | 4 | 3 | 2 | 1 | 0 | 7 | 6 |
(1,2) | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 7 |
(1,3) | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
In square 2206, core 102 is produced by the default virtual nuclear volume generated in square 2203 or in square 2203
In the raw value filling local APIC ID buffer modified.In one embodiment, APIC ID buffer can be existed by the core 102
(for example, by by BIOS and/or operating system) is read in storage address 0x0FEE00020 from itself.However,
In another embodiment, APIC ID buffer can be read by core 102 in the address MSR 0x802.Process proceeds to decision block
2208。
In decision block 2208, core 102 determines whether it in the APIC ID that square 2208 is inserted is zero.If so,
Process proceeds to square 2212;Otherwise, process proceeds to square 2214.
In square 2212, its BSP flag is set true (true) by core 102, to indicate core 102 for BSP.Implement one
In example, BSP flag is one of the x86APIC plot buffer (IA32_APIC_BASE MSR) of the core 102.Process proceeds to
Decision block 2216.
In square 2214, BSP flag is set to false as (false) by core 102, with indicate core 102 not for BSP, for example,
In one AP.Process proceeds to decision block 2216.
In decision block 2216, core 102 judges whether it is BSP, such as, if it is specified originally as in square 2212
BSP core 102, and non-designated itself is AP core 102 in square 2214.If so, process proceeds to square 2218;It is no
Then, process proceeds to square 2222.
In square 2218, core 102 starts to extract and execute system initialization firmware (for example, BSP BIOS bootstrap
Code).This may include instruction relevant to BSP flag and APIC ID, for example, reading APIC ID buffer or APIC plot are temporary
The instruction of device, in the case, core 102 restore the value being written in square 2206 and 2212/2214.It may also include as micro- place
It seems encapsulation sleep state that Fig. 9 is described to execute operation that the reason unique core 102 of device 100, which represents microprocessor 100 and is integral,
Handshake Protocol.It more preferably says, BSP core 102 starts to obtain and execute system initialization in a defined framework resetting vector solid
Part.For example, resetting vector is directed toward 0xFFFFFFF0 in x86 framework.It more preferably says, executing system initialization firmware includes drawing
The operating system is led, for example, being loaded into the operating system and being changed into control operating system.Process proceeds to square 2224.
In square 2222, core 102 stops itself and the initiating sequence from BSP is waited to refer to start to extract and execute
It enables.In one embodiment, from BSP received initiating sequence include to AP system initialization firmware an interrupt vector (for example,
AP BIOS program code).This may include instruction relevant to BSP flag and APIC ID, and in this case, core 102 restores
The value being written in square 2206 and 2212/2214.Process proceeds to square 2224.
In square 2224, when core 102 executes instruction, the core 102 is temporary based on its APIC ID is write in square 2206
The APIC ID of storage receives interrupt requests and responds the interrupt requests.Process ends at square 2224.
As described above, the core 102 that virtual nucleus number is zero is preset as BSP according in an embodiment.However, inventor is
Observe may have a case that be designated as all cores 102 BSP advantageous, embodiment will be described in lower section.For example,
100 developer of microprocessor has put into the significant a large amount of time and has been designed at original research and development one in single-threaded
(single-threaded) the huge test subject run in a monokaryon, and developer wants to test using monokaryon to survey
Try multi-core microprocessor 100.For example, the test may be old and well-known in x86 realistic model dos operating system in run.
In the operation of each core 102, these tests can use modification BSP function described in Figure 22 with continuous one
Mode in complete and/or by blow fuse or scanning to keep buffer modify fuse value to deactivate all cores 102,
But a core 102 is used to be tested.However, inventor have understood that this will than in all cores 102 and meanwhile run test needs
More times (for example, being about 4 times in the case where one 4 core microprocessor 100), in addition, required test is each individually micro-
The time of 100 part of processor be it is valuable, especially when manufacturing hundreds of thousands of or more 100 parts of microprocessor, especially
When many tests are tested in very expensive test equipment.
In addition, other may be when running more than one core 102 (or all cores 102) in the same time, due to it
More thermal energy can be generated and/or attract more energy, the speed path in 100 logic of microprocessor will be applied more
The case where multiple pressure power.The test run in continuous mode herein may not generate additional pressure and disclose the speed road
Diameter.
Therefore, embodiment, which describes all cores 102, can dynamically be specified the BSP core 102 so that all cores 102 can be performed simultaneously
One test.
Figure 23 is please referred to, is the program flow diagram for showing configuration microprocessor 100 according to another embodiment.Scheming
23 description is with reference to the polycrystal microprocessor 100 in Fig. 4 comprising two crystal 406 and eight cores 102.However, Ying Keli
Solution, dynamic described herein, which reconfigures can be used, has a different configuration of microprocessor 100, that is, has more than two
Crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is the angle institute from a single core
Description, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures the microprocessor
100.Process starts from square 2302.
In square 2302, microprocessor 100 is reset, and executes the initial part of its initialization, more preferably a mode
It is similar to mode described in above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially
It is APIC ID and the BSP flag, is executed in mode described in square 2304 to 2312.Process proceeds to decision block
2304。
In decision block 2304, core 102 is detected a function and can be activated.The function is referred to herein as " all cores
BSP " function.It more preferably says, blowing fuse 114 can be such that all core BSP functions are activated.More preferably say, during the test,
The fuse 114 of all core BSP functions is not blown, but a true value (True) is scanned up to and fuses with all core BSP functions
In the relevant preservation buffer position of device 114, as shown in above-mentioned Fig. 1, so that all core BSP functions can enable.In this mode
In, all core BSP functions in part microprocessor 100 and it is impermanent enable, but stop after power supply (power-up)
With.It more preferably says, the operation in square 2304 to 2312 is as performed by the microcode of core 102.If all core BSP functions are opened
Used time, process proceed to square 2305.Otherwise, process proceeds to square 2203 in Figure 22.
In square 2305, no matter 258 quantity of crystal of local nuclear volume 256 and core 102 why, it is empty that core 102 sets its
Nucleoid quantity is zero.Process proceeds to square 2306.
In square 2306, the virtual nuclear volume that value set in square 2305 is zero is inserted local APIC by core 102
ID buffer.Process proceeds to square 2312.
In square 2312, no matter 258 quantity of crystal of local nuclear volume 256 and core 102 why, its BSP is arranged in core 102
Flag is true (True) to indicate the core 102 for BSP.Process is carried out to square 2315.
In square 2315, when a core 102 executes a memory access requests, microprocessor 100 is respectively modified often
The higher address position of one core, 102 memory access requests address, so that each core 102 accesses its individual storage space.?
That is microprocessor 100 modifies higher address position, so that higher address position according to the core 102 for generating memory access requests
With the unique value of each core 102 1.In one embodiment, the modification of microprocessor 100 is as indicated by the value for blowing fuse 114
Higher address position.In another embodiment, amount of crystals 258 of the microprocessor 100 based on local nuclear volume 256 and core 102
Modify higher address position.For example, in the embodiment that nuclear volume is 4 in a microprocessor 100, microprocessor 100 is modified
Higher two positions of the storage address, and a unique value is generated in 102 higher two positions of each core.In fact, can
N number of subspace is divided by the storage space that microprocessor 100 addresses, wherein N is the quantity of core 102.Test program is opened
Hair is so that it limits oneself itself to specify the address of the minimum subspace in N number of subspace.For example, it is assumed that microprocessor 100
The address and microprocessor 100 that memory 64GB can be looked for include four cores 102.The test, which is developed, only accesses memory most
Low 8GB.When core 0 executes the instruction of access storage address A (lower 8GB in memory), microprocessor 100 is being deposited
An address is generated in memory bus A (unmodified);When core 1 executes the instruction of access the same memory address A, the microprocessor
100 generate an address in memory bus A+8GB;When core 2 executes the instruction of access the same memory address A, micro- place
Reason device 100 generates an address in memory bus A+16GB;And when core 3 executes the instruction of access the same memory address A
When, which generates an address in memory bus A+32GB.In such mode, advantageously, core 102 will not
It can access in memory at it and mutually conflict, test can be made to be appropriately carried out.It more preferably says, single-threaded test is performed in
In one independent test machine, the microprocessor 100 can be individually tested.100 developer of the microprocessor develops test number
It is supplied to the microprocessor 100 according to and by test machine, on the contrary, 100 developer of the microprocessor researches and develops result data,
To compare the data result that the microprocessor 100 is written during a memory is written and accesses by test machine, to ensure
Correct data are written in the microprocessor 100.In one embodiment, cache memory 119 is shared (for example, highest high
Fast buffer storage, generate for external bus processing in address) be microprocessor 100 a part, configuration to
Higher address position is modified when all core BSP functions enable.Process proceeds to square 2318.
In square 2318, core 102 starts to extract and execute system initialization firmware (for example, BSP BIOS bootstrap
Code).This may include instruction relevant to the BSP flag and APIC ID, for example, reading the APIC ID buffer or APIC plot
The instruction of buffer, in the case, the core 102 restore the zero being written in square 2306.It more preferably says, the BSP core
102 start to read and hold in the resetting vector (Architecturally-defined reset vector) that a framework defines
Row system initialization firmware.For example, resetting vector is directed toward the address 0xFFFFFFF0 in x86 framework.It more preferably says, executing should
System initialization firmware includes guidance operating system, for example, being loaded into the operating system and being changed to control the operating system.Stream
Journey proceeds to square 2324.
In square 2324, when core 102 executes instruction, the core 102 is temporary based on its APIC ID is write in square 2306
The APIC ID value that storage value is zero receives interrupt requests and responds the interrupt requests.Process ends at square 2324.
Although all cores 102 are designated as being described in Figure 23 in the embodiment of the BSP, other embodiments can
To consider multiple but be designated as the BSP all or fewer than core 102.
Although embodiment is described with an x86 type system for content, each core 102 uses a local APIC and tool in system
There is the relevance between local APIC ID and BSP is specified, it should thus be appreciated that, the specified not office of the bootstrap processor
It is limited to the embodiment of x86, but can be used in the system with different system architectures.
The propagation of microcode patching (PATCH) for multicore
As observed by previously, it is possible to many important functions of mainly being executed by the microcode of microprocessor, and particularly,
It correctly need to communicate and coordinate between the microcode example being implemented in the microprocessor multicore.Due to the complexity of microcode,
Therefore a significant probability shows that mistake will be present in and needs in modified microcode.This can be caused via using new micro-code instruction to replace
The microcode patching of the old micro-code instruction of the mistake is completed.That is, the microprocessor includes beneficial to the specific of microcode patching
Hardware.Under normal circumstances, ideal is all cores that micro- modification is applied to the microprocessor.Traditionally, by
Framework instruction is individually performed in each core to execute repairing.However, traditional method might have problem.
Firstly, the repairing to using the intercore communication of microcode example (for example, core is synchronous, hardware semaphore use) it is related or
With need microcode intercore communication function (for example, across core adjustment request, speed buffering control operation or power management, or dynamic it is more
The configuration of core microprocessor) it is related.The execution of framework repairing application program may generate a time form on each core respectively,
Its microcode patching be applied in some cores but not be applied in other cores (or a previous repairing application some cores and newly
Repairing application to other cores).This is likely to result in an internuclear communication failure and the incorrect operation of the microprocessor.If should
All cores of microprocessor use identical microcode patching, and other expectable and not expected problem may also generate.
Secondly, the framework of the microprocessor specifies many functions, it can be micro- by this in certain examples (instance)
Reason device is supported, and is not supported by other microprocessors.During operation, microprocessor can with support the specific function
System software is communicated.For example, x86CPUID instruction can be soft by system in the case where an x86 architectural framework microprocessor
Part is executed to determine supported function setting.However, determining the instruction (for example, CPUID) of function setting respectively at micro- place
It manages and is executed in each core of device.In some cases, a function can be deactivated because of the mistake that one was present in the time, and be solved
Except the microprocessor.However, can be developed with the latter microcode patching for repairing this mistake, so that this function can be in repairing application
After be activated.However, if repairing is with traditional conventional implementation (for example, by applying repairing instruction in each core
Do not instruct, be implemented on each core respectively), different core may depend on whether the repairing has been applied in core, give one
Time point indicates different functional configuration.This may be it is problematic, especially when the system software (such as operating system, for example,
Internuclear Thread is helped to migrate), it is expected that the function setting having the same of all cores of the microprocessor.Especially, it has been observed that
Some system softwares only obtain the functional configuration of a core, and assume other cores functional configuration having the same.
Furthermore each nuclear control and/or the non-nuclear resource shared to core are (for example, synchronize relevant hardware, hardware signal
Amount, shared PRAM, shared high-speed buffer or service unit) communication microcode example.Therefore, because in core wherein it
One has no use (or two cores have different microcode patchings), in general, two kinds with cores other using microcode patching
It may be problematic that the microcode of different IPs carries out controlling or communicate with non-nuclear resource in two different ways simultaneously.
Finally, the repairing of traditional approach also can be used in the microcode patching hardware in the microprocessor, but it may make
At other core repairing applications and by the interference of a core repair operation, for example, if the part of repairing hardware is internuclear shared.
It more preferably says, in framework instruction-level using microcode patching a to multi-core microprocessor in a manner of an atom (atomic)
Embodiment with solve the problems, such as description in this article.Firstly, by repairing application in response to list in whole microprocessor 100
The execution that a framework instructs in one core 102.That is, embodiment need not require system software to execute one in each core 102 using micro-
Code repairing instruction (as described below).More specifically, information will be transmitted using the single core 102 that microcode patching instructs by encountering this
And other cores 102 are interrupted to cause its microcode to make for the example of repair part and all microcode examples with another microcode cooperation
It obtains the microcode patching to be applied in the microcode patching software of each core 102, and when deactivating interruption in all cores 102, shares
The repairing hardware of the microprocessor 100.Secondly, the microcode of the atom repairing application mechanism is run and realized in all cores 102
Example is mutually cooperated with another microcode, so that it avoids executing any framework and instructing existing (other than an application microcode patching instruction)
All cores 102 of the microprocessor 100 have agreed to after repairing using this, until all cores 102 are completed.That is, working as
When any core 102 is using the microcode patching, framework instruction is executed without core 102.In addition, in one more preferably embodiment, institute
Having core 102 to reach the identical place of the microcode has the repairing application for deactivating and interrupting to execute, and only executes use in core 102 later
In repairing the micro-code instruction until all cores of the microprocessor 100 confirm that the repairing has been used.That is, working as
When any core 102 of the microprocessor 100 is just using the repairing, core 102 does not have other than the micro-code instruction for using microcode patching
Core 102 executes micro-code instruction.
Referring to figure 2. 4, it is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The micro process
Device 100 is similar to the microprocessor 100 of Fig. 1 in many aspects.However, the microprocessor 100 of Figure 24 further includes in its non-core 103
In service unit (Service Processing Unit, SPU) 2423, service unit (SPU) initial address it is temporary
Storage 2497, a non-core microcode read-only memory (Read Only Memory, ROM) 2425 and a non-core microcode patching are deposited at random
Access to memory (Random Access Memory, RAM) 2408.In addition, each core 102 includes a core PRAM2499, a repairing
It can addressing content memorizer (Content Addressable Memory, CAM) 2439 and a core microcode ROM 2404.
Microcode includes micro-code instruction.The micro-code instruction be stored in the microprocessor 100 one or more memories (for example,
Non-core microcode ROM 2425, non-core microcode patching RAM2408 and/or core microcode ROM 2404) in nand architecture instruction, wherein should
Micro-code instruction is based on being stored in the nand architecture microprogram counter (Micro-program Counter, Micro- by a core 102
PC the extraction address (fetch) is extracted in), and is used by the core 102 to realize 100 instruction set architecture of microprocessor
Instruction.More preferably say, which is translated into microcommand by a micro- transfer interpreter (Microtranslator), microcommand by
Performed by the execution unit of the core 102, or in another embodiment, the micro-code instruction is directly as performed by execution unit, herein
In the case of, micro-code instruction is microcommand.The micro-code instruction is that nand architecture instruction means that it is not the instruction set of the microprocessor 100
The instruction of framework (Instruction Set Architecture, ISA), but its according to one be different from the architecture instruction set finger
It enables collection and is encoded.The nand architecture microprogram counter is not defined by the instruction set architecture of the microprocessor 100, and different
(Architecturally-defined) program counter is defined in the framework of the core 102.This is micro- as follows to realize for the microcode
The some or all of instructions of the ISA instruction set of processor.ISA instruction is executed in response to one microcode of decoding, which is changed into
Control a microcode routine program (Routine) relevant to the ISA.The microcode routine program includes micro-code instruction.The execution
Unit executes the micro-code instruction, or according to preferred embodiment, which is further translated for by the execution unit institute
The microcommand of execution.Micro-code instruction (or the microcommand translated as the micro-code instruction) execution as performed by the execution unit
It as a result is the result defined in ISA instruction.Therefore, relevant to ISA instruction microcode (or refers to from the microcode routine program
Enabling the microcommand of translation) the common execution of routine program is " to implement (Implement) " ISA by the execution unit to instruct.?
That is by execute micro-code instruction (or from the micro-code instruction translate microcommand) execution unit performed by common execution
The operation by the ISA instruction in the input of the ISA instruction is completed, institute is instructed by the ISA to generate one
The result of definition.In addition, the micro-code instruction can be performed when the microprocessor resets to configure the microprocessor
(or translating to the microcommand being performed).
The core microcode ROM 2404 possesses the microcode as performed by the particular core 102 for including the core microcode ROM 2404.This is non-
Core microcode ROM 2425 also possesses the microcode as performed by the core 102.However, compared with core microcode ROM 2404, non-core ROM
2425 are shared by core 102.More preferably say, since the access time of non-core ROM 2425 is greater than core ROM 2404,
Non-core ROM 2425 possesses the microcode routine program for needing less performance and/or less frequently executing.In addition, non-core ROM
2425 possess the procedure code for being extracted and being executed by the SPU 2423.
Non-core microcode patching RAM2408 is also shared by core 102.Non-core microcode patching RAM2408 possesses by core 102
Performed micro-code instruction.When the content phase of the extraction address and one of project (entry) in repairing CAM 2439
Timing, then repairing CAM2439, which possesses, extracts address by one microcode of response and is exported by repairing CAM 2439 to a micro- sequence
The patch address of column device (Microsequencer).In the case, the patch address of microsequencer output is the microcode
Address is extracted, rather than the extraction of next sequence refers to address (or destination address in branching type instruction), using non-as this
Core repairs the reply of one repairing micro-code instruction of the output of RAM 2408.For example, because repairing micro-code instruction and/or after which micro-
Code instruction is an error source, therefore a repairing micro-code instruction is carried out by extracting in non-core repairing RAM2408, rather than non-from this
The micro-code instruction extracted in core ROM 2425 or core ROM 2404.Therefore, which effectively replaces or repairs
Benefit resides in core ROM 2404 in original microcode extraction address or the unexpected microcode of non-core microcode ROM 2425 refers to
It enables.It more preferably says, it includes that framework in system software refers to that repairing CAM 2439 and repairing RAM 2408, which are loaded into respond,
The operating system that order seems BIOS or is run in the microprocessor 100.
In other events, non-core PRAM 116 is by the microcode to store value used in the microcode.These values
A part of valid function is constant
Except the execution for the instruction (for example, a WRMSR instruction) that may clearly modify the value via a repairing or for response one
Except, when the microprocessor 100 is reset and is not modified during the operation of the microprocessor 100, since it is storage
It is stored in the immediate value (immediate value) of the core microcode ROM 2404 or the non-core microcode ROM 2425 or in the micro process
Device 100 is manufactured or blows the fuse 114 by the time point that the microcode is written to non-core PRAM 116.Advantageously, this
A little values can be modified via repairing mechanism described herein, without changing the possible very expensive core microcode of cost
ROM2404 or the non-core microcode ROM 2425, and without the fuse 114 that one or more do not blow.
In addition, non-core PRAM 116 is to save the repairing code for being extracted and being executed by the SPU 2423, such as this paper institute
It states.
Core PRAM 2499 is similar to non-core PRAM 116, to be dedicated (private) or nand architecture,
Mean that core PRAM 2499 is not in 100 framework user's program address space of microprocessor.However, unlike this is non-
Core PRAM 116, every PRAM 2499 are only read by its respective core 102 and are not shared by other cores 102.As the non-core
As PRAM 116, core PRAM2499 is also used as the microcode to store the value as used in the microcode.Advantageously, these
Value can be modified via repairing mechanism described herein, and without changing the core microcode ROM 2404 or non-core microcode ROM
2425。
The SPU 2423 has stored program processor including one, is an adjunct that is attached and being different from each core 102
(adjunct).Although the instruction (for example, the ISA of x86 is instructed) of the ISA of the core 102 can be performed in 102 structure of core,
But the SPU 2423 can not be done so in structure.So that it takes up a position, for example, the operating system can not transport in the SPU 2423
Row, can not also be such that the ISA operation system scheduler (for example, the ISA of x86 is instructed) of the core 102 transports in the SPU 2423
Row.In other words, which is not the system resource managed by the operating system.More precisely, the SPU 2423 is held
The operation gone for adjusting the microprocessor 100.In addition, the SPU 2423 can help to measure the performance of the core 102 and other
Function.More preferably say, the SPU 2423 is smaller than the core 102, it is less complex and have less power consumption (for example,
In one embodiment, which includes that built-in clock pulse gates (Clock Gating)).In one embodiment, SPU
2423 include a FORTH CPU core.
The asynchronous events occurred together can be instructed possibly can not to handle very except wrong with as performed by the core 102
It is good.However, it is advantageous that the SPU 2423 can be ordered by a core 102 to detect the event, and operation is executed, seems to establish
Behavior and/or 100 external bus interface of microprocessor of 102 various aspects of core are modified in one record shelves (log), using as detecing
Survey the response of this event.The SPU 2423 can provide the record shelves information to the user, and it can also be mutual with tracker
It is dynamic that request, the tracker provides the record shelves information or request tire tracker executes other movements.In one embodiment, the SPU
2423 are able to access that controlling the buffer of the memory sub-system and the programmable interrupt controller of each core 102 and this is total to
Enjoy the control buffer of speed buffering buffer 119.
The example that the SPU 2423 can detect event includes the following: (1) the one just running of core 102, for example, the core 102 is one
Not yet resignation (retire) programmable any instruction in the clock cycle of quantity;(2) one cores 102 are loaded into non-by memory one
Data in speed buffering region;(3) temperature changes in the microprocessor 100;(4) operating system request is micro- at this
A variation of the variation and/or request for 100 bus clock pulse ratio of processor in 100 voltage level of microprocessor;(5) meet this
The microprocessor 100 of body changes voltage level and/or bus clock pulse ratio, for example, to reach power saving and improve performance;(6) one
One internal timer overtime of core 102;(7) one speed bufferings spy upon (snoop), collide a modified scratchpad row
(Cache line), and the scratchpad row is caused to be written back in memory;(8) temperature of the microprocessor 100, voltage,
Bus clock pulse ratio exceeds a respective range;An external terminal (pin) of (9) one outer triggering signals in the microprocessor 100
In established by a user.
Advantageously, not having seems in the core because of the procedure code 132 of core 102 described in 2423 independent operating of SPU
Tracker microcode (tracer code) identical limitation is executed in 102.Therefore, which can detect or be notified independence
In the 102 instruction execution boundary of core event and do not interrupt the state of the core 102.
The SPU 2423 has its procedure code for executing itself.The SPU 2423 can from non-core microcode ROM 2425 or from
Its procedure code is extracted in non-core PRAM 116.That is, more preferably saying, the SPU 2423 and non-core ROM 2425 and the non-core
The shared microcode run in the core 102 of PRAM 116.The SPU 2423 stores its data using non-core PRAM 116, packet
Include the record shelves.In one embodiment, which further includes the sequence port interface of itself, can transmit the record shelves
To an external device (ED).Advantageously, the SPU 2423 can also indicate that the tracker run in a core 102 to believe the record shelves
Breath is by the storage of non-core PRAM 116 into system storage.
The SPU 2423 is communicated by state buffer and control buffer with the core 102.The SPU state buffer packet
It includes above corresponding be described in and the SPU 2423 can detect one of each event.It, should in order to notify 2,423 1 event of SPU
Core 102 is arranged one in the SPU state buffer of the corresponding event.Some events position by the microprocessor 100 hardware institute
Be arranged and some microcodes as the core 102 set by.The SPU 2423 reads the state buffer to determine to have occurred
The list of event.One control buffer includes the position of corresponding each operation, and each operation is that the SPU 2423 response detecting exists
An operation of one of event is specified in state buffer.That is, in each possible thing of the state buffer
Part, one group of operative position are present in the control buffer.In one embodiment, each event has 16 act bits.Implement one
In example, when the state buffer is written into indicate an event, the SPU 2423 interruption will cause, using as the SPU
2423 read the response of the state buffer, to determine which event has occurred and that.Advantageously, can be so somebody's turn to do by reducing
The demand of 2423 poll of the SPU state buffer is to save power supply.The state buffer and control buffer can also be referred to by execution
User's program of (for example, RDMSR and WRMSR instruction) is enabled to read and write.
The executable group operation as one event response of detecting of the SPU 2423 includes the following terms.(1) by the record
Non-core PRAM 116 is written in shelves information.Operation for each write-in record shelves, multiple operative positions exist so that program is set
Meter personnel specify this, and only the specific subset for noting down shelves information should be written into.(2) by the record shelves are written in non-core PRAM 116
Information is to the sequence port interface.(3) write-in controls one of buffer to set an event of tracker.That is,
The SPU 2423 can interrupt a core 102 and cause the tracker microcode that need to execute one group of operation relevant to the event.The operation
It can be by specified by previous user.In one embodiment, when the control buffer is written so that the thing is arranged in the SPU 2423
When part, it is abnormal that this will cause 102 1 hardware check of core, and the hardware check abnormality processing machine check is to check tracker
It is no to be activated.If so, hardware check exception handler conversion and control is to the tracker.The tracker reads the control buffer
And if the event being arranged in the control buffer is user when having enabled the event of the tracker, the tracker by with
The relevant user of event executes previously described operation.For example, the settable event of SPU 2423 is to cause the tracking
Device will be in the record shelves information writing system memory that be stored in non-core PRAM 116.(4) one control buffer of write-in, to make
A microcode address as specified by the SPU 2423 is branched off at the microcode.If this is that be particularly helpful to the microcode unlimited one
In circulation, prevent the tracker is from executing any significant operation, but the core 102 still executes and retracts (retire) this refers to
It enables, means that the event that the processor is just executing will not occur.(5) one control buffer of write-in is so that a core 102 is reset.Such as
Mentioned above, which can detect the core 102 that one is just carrying out and (for example, for some time programmable amounts, not yet move back
Return (retire) any instruction) and reset the core.Whether the resetting microcode can check to check the resetting by 2423 institute of SPU
It initiates, if so, facilitating before removing the record shelves information to write out the record shelves information during initializing core 102
Into system storage.(6) shelves event is continuously recorded.In this mode, and one event of non-camp is interrupted, but the SPU
2423 one check the state buffer circulations (loop) in rotate (spin), and continuously record information to be shown in this with
The relevant non-core PRAM116 of event, and may be selected that the sequence port interface additionally is written in the record shelves information.(7) it is written
One control buffer issues a request to the shared cache memory 119 to stop a core 102, and/or stops the shared height
Fast 119 confirmation request of buffer storage is to core 102.This is particularly useful in the relevant design mistake of removal memory sub-system, as
It is page translation tables (tablewalk) hardware error, or even the mistake can be modified during the microprocessor 100 operates, as
It is that 2423 procedure code of SPU is modified by a repairing, as described below.(8) 100 1 external bus of microprocessor is written to connect
The control buffer of mouthful controller, to execute the processing in external system bus, seem the specific period or memory read/
Write cycle.(9) it is written to the one of 102 programmable interrupt controller of a core and controls buffer, interrupt for example, generating one to another
The mistake of core 102 or one I/O device of simulation to core 102 or fixed reparation in the interrupt control unit.(10) this is total for write-in one
A control buffer of cache memory 119 is enjoyed to control its size, for example, deactivating or enabling relevant in different ways
Shared cache memory 119.(11) it is special to configure different performances that the control buffer of 102 various functional units of core is written
Sign, seems branch prediction (branch prediction) and data preextraction (prefetch) algorithm.As described below, the SPU
2423 procedure codes can help to be repaired, even if completing the design of the microprocessor 100 and having produced the microprocessor 100
Later, the SPU 2423 is made to execute the defect of movement repairing design as described herein or execute other functions.
The SPU initial address buffer 2497 keeps starting the ground for extracting instruction when the SPU 2423 is removed and reset
Location.The SPU initial address buffer is written by core 102.The address can be located at non-core PRAM116 or non-core microcode ROM 2425
In.
Figure 25 is please referred to, is the framework block diagram for showing one microcode patching 2500 of an embodiment according to the present invention.Scheming
In 25 embodiment, which includes following part a: header 2502;One repairing 2504 immediately;This is repaired immediately
2504 check and correction and (Checksum) 2506;One CAM data 2508;One core PRAM repairing 2512;The CAM data 2508 and core
One check and correction of PRAM repairing 2512 and 2514;One RAM repairing 2516;One non-core PRAM repairing 2518;Core PRAM repairing 2512
An and check and correction and 2522 for RAM repairing 2516.It proofreads and 2506/2514/2522 after being loaded on the microprocessor 100,
Make the integrality of the microprocessor 100 verification repairing various pieces.It more preferably says, the microcode patching 2500 is by system storage
And/or one non-volatile (Non-volatile) system read, for example, seem from having system bios or expansible
In the ROM or FLASH memory of firmware.Header 2502 describes each section of the repairing 2500, seems its size, repairs in its loading
It mends the position in each self-healing relational storage in part and whether the instruction part includes one applied to the microprocessor 100
The effective flag of the one of Efficient software patching.
The instant repairing 2504 includes procedure code (for example, instruction, preferable micro-code instruction) to be loaded on the non-of Figure 24
Core microcode patching RAM 2408 (for example, in square 2612 of Figure 26 A~26B), then as performed by each core 102 (for example,
The square 2616 of Figure 26 A~26B).The repairing 2500 also specifies the instant repairing 2504 to be loaded in repairing RAM2408
Address.It more preferably says, this is repaired 2504 yards immediately and modifies the preset value being written by the resetting microcode, seems to be written into influence to be somebody's turn to do
The value for the configuration buffer that microprocessor 100 configures.It is held in instant repairing 2504 by each core outside repairing RAM2408
After row, it can't be performed again.In addition, subsequent RAM repairing 2516 be loaded into repairing RAM2408 process (for example,
Square 2632 in Figure 26 A~26B) the instant repairing 2504 of repairing RAM2408 may be covered on.
RAM repairing 2516 includes to the repairing microcode in the non-core ROM2425 that is substituted in core ROM2404 or need to repair
Instruction.RAM repairing 2516 further includes when the repairing 2500 is by use, the repairing micro-code instruction is written into the repairing
The address (for example, in square 2632 of Figure 26 A~26B) of the position in RAM 2408.The CAM data 2508 are loaded on each
The repairing CAM2439 (for example, in square 2626 of Figure 26 A~26B) of core 102.It is with the behaviour of repairing CAM 2439 above
Make described by angle, which includes one or more projects, and each project includes that a pair of of microcode extracts address.This
One address is the micro-code instruction being extracted and the content by the extraction address matching.Second address is directed in the repairing
Address in RAM 2408, repairing RAM 2408, which has, replaces the repairing microcode for being repaired micro-code instruction and being performed to refer to
It enables.
Different from the instant repairing 2504, RAM repairing 2516 is maintained in repairing RAM2408, and (with according to repairing
The repairing CAM2439 operation of CAM data 2508 is together) continue running to repair the core microcode ROM 2404 and/or the non-core
Microcode ROM 2425, until being reset by another repairing 2500 or the microprocessor 100.
Core PRAM repairing 2512 includes being written into the data of the core PRAM2499 of each core 102 and every in the data
One project is written into the address (for example, in square 2626 of Figure 26 A~26B) in core PRAM2499.Non-core PRAM repairing
2518 include being written into the data of non-core PRAM 116 and being written into non-core PRAM 116 in each project of the data
Address (for example, in square 2632 of Figure 26 A~26B).
Figure 26 A~26B is please referred to, is to show that an operation of the microprocessor 100 in Figure 24 is micro- to propagate the one of Figure 25
Code repairing 2500 to the microprocessor 100 multiple cores 102 a flow chart.The operation is retouched with a single and new angle
It states, but each core 102 of microprocessor 100 is operated according to the present invention to propagate the microcode patching jointly to the microprocessor 100
All cores 102.Figure 26 A~26B describes the core that one encounters the instruction and modifies using one to the operation of the microcode, and process starts
In square 2602, and the operation of other cores 102, process start from square 2652.It should be appreciated that multiple repairings 2500 can
The microprocessor 100 is applied in different time during the microprocessor 100 operation.Such as one first repairing 2500 work as
It seem during BIOS initialization, according to description atom in this article when the system including the microprocessor 100 is guided
Embodiment and used and one second repairing 2500 is used after the operating system, to remove at this
It is particularly useful for the purpose of 100 mistake of reason device.
In square 2602, one of core 102 encounters an instruction, and it applies the microcode patching in the microprocessor 100
Instruction.It more preferably says, which is similar to microcode patching recited above.In one embodiment, this is repaired using microcode
Mending instruction is x86WRMSR instruction.It is instructed to respond this using microcode patching, which deactivates to interrupt and prevent to execute this and answer
The microcode instructed with microcode patching.It should be appreciated that the system software including this using microcode patching instruction may include one
Multiple instructions sequence, using the preparation applied as the microcode patching.It more preferably, however says, is instructed as the sequence single architecture
Response, and the microcode patching is transmitted to all cores in the framework instruction-level with an atomic way.That is, in once
Break and be deactivated in first core 102 (for example, the core 102 encounters this and instructs using microcode patching in square 2602), when holding
(for example, until after square 2652 when capable microcode propagates the microcode patching and is applied to 100 all cores 102 of microprocessor
Until), interruption still remains deactivated;Furthermore it once being deactivated (for example, in square 2652) in other cores 102, is still deactivated
(for example, being after square 2634 until the microcode patching has been applied in all cores 102 of microprocessor 100
Only).It is therefore advantageous that the microcode patching is transmitted with an atomic way in the framework instruction-level and is applied to the micro process
In all cores 102 of device 100.Process proceeds to square 2604.
In square 2604, which obtains the ownership of the hardware semaphore 118 in Fig. 1.It more preferably says, micro- place
Managing device 100 includes a hardware semaphore 118 relevant to repairing microcode.It more preferably says, which obtains hardware letter in such manner
The ownership of number amount 118, mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.Hardware letter
Number amount 118 is used, and due to being possible to core 102, one of them uses a repairing 2500 to refer to using microcode patching as encountering one
The response of order, and one second core 102 encounters an application microcode patching and instructs, this will be begun to use second to repair as second core
2500 are mended, incorrect execution is likely to result in, for example, due to the misuse of first repairing 2500.Process proceeds to
Square 2606.
In square 2606, which transmits a repair information to other cores 102 and transmits one and internuclear interrupt to other
Core 102.It more preferably says, the core 102 is in a period of the time, interruption was deactivated (for example, the microcode does not allow itself to be interrupted)
The microcode is prevented to respond this using microcode patching instruction (square 2602), or responds the interruption (square 2652), and keeping should
In microcode, until square 2634.Process proceeds to square 2608 by square 2606.
In square 2652, one of other cores 102 in addition to encountering this in square 2602 using microcode patching (for example, refer to
A core except the core 102 enabled) it is interrupted and the internuclear interruption because being transmitted in square 2606 due to receives the repairing
Information.In one embodiment, which obtains in next framework instruction boundaries (for example, in next x86 instruction boundaries) is somebody's turn to do
It interrupts.In response to the interruption, which deactivates the microcode for interrupting and preventing to handle the repair information.Although as described above,
Process in square 2652 is with described by the angle of a single core 102, but each other cores 102 are not (for example, in square 2602
In core 102) be interrupted and receive the information in square 2652, and execute square 2608 to square 2634 the step of.Stream
Journey proceeds to square 2608 by square 2652.
In square 2608, the synchronization request which is written a synchronous situation 21 (is denoted as in Figure 26 A~26B
SYNC 21) into its synchronization buffer 108, and enable the core 102 enter sleep state by the control unit 104, and then work as institute
When thering is core 102 to have been written into SYNC 21, waken up by the control unit 104.Process proceeds to decision block 2611.
In decision block 2611, which judges whether it is the core 102 for meeting the microcode patching in square 2602
(compared with the core 102 for receiving the repair information in square 2652).If so, process proceeds to square 2612;Otherwise,
Process proceeds to square 2614.
In square 2612, it is non-that which by a part of the instant repairing 2504 of the microcode patching 2500 is loaded into this
Core repairs RAM 2408.In addition, the core 102 generate the loading repair immediately 2504 one check and and verify its with the check and correction and
2506 match.More preferably say, which also conveys information to other cores 102, indicate this it is instant repairing 2504 length and
The instant repairing 2504 is loaded in the position in non-core repairing RAM2408.Advantageously, because executing reality known to all cores 102
The identical microcode of row microcode patching application, therefore when a previous RAM repairing 2516 is present in non-core repairing RAM2408,
Then due to (not being repaired assuming that being rendered in the microcode that the microcode patching is applied) in repairing CAM 2439 during the period
In will not have collision (hit), therefore it be safe for covering the non-core to repair RAM2408 using the new repairing.In another embodiment
In, which is loaded into non-core PRAM 116 for the instant repairing 2504, and the instant repairing 2504 in square 2616
Before execution, this is repaired 2504 immediately and copies to non-core repairing RAM 2408 from non-core PRAM 116 by core 102.More preferably
It says, which repairs this part for being loaded into the non-core PRAM 116 for being preserved for this purpose immediately, for example, not
It is used for a part of the non-core PRAM 116 of other purposes, seems to hold the value as used in the microcode (for example, institute as above
102 state of core, TPM state or the effective microcode constant stated), and a part of of non-core PRAM 116 can be repaired (example
Such as, in square 2632) so that any previous non-core PRAM repairing 2518 is not destroyed (clobber).In one embodiment, it carries
Enter non-core PRAM 116 or executed in multiple stages by the movement that non-core PRAM 116 is replicated, has been retained with reducing this
Size needed for part.Process proceeds to square 2614.
In square 2614, which is written the same of a synchronous situation 22 (being denoted as SYNC 22 in Figure 26 A~26B)
Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores
When 102 one SYNC 22 of write-in, waken up by control unit 104.Process proceeds to square 2616.
In square 2616, which executes the instant repairing 2504 in non-core repairing RAM2408.As described above,
In one embodiment, before the core 102 executes the instant repairing 2504, the core 102 is by the instant repairing 2504 by the non-core
Repairing RAM 116 is copied to non-core repairing RAM 2408.Process is carried out to square 2618.
In square 2618, which is written the same of a synchronous situation 23 (being denoted as SYNC 23 in Figure 26 A~26B)
Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores
When 102 one SYNC 23 of write-in, waken up by control unit 104.Process proceeds to decision block 2621.
In decision block 2621, which determines whether the core 102 is that this encountered in square 2602 applies microcode
Repair the core 102 of instruction (compared with the core 102 for receiving the repair information in square 2652).If so, process carries out
To square 2622;Otherwise, process proceeds to square 2624.
In square 2622, which is loaded into non-core PRAM for the CAM data 2508 and core PRAM repairing 2512
116.In addition, the core 102 generates an inspection of loading CAM data 2508 and core PRAM repairing 2512 and and verifies itself and the school
To and 2514 match.It more preferably says, which also conveys information to other cores 102, indicates the CAM data 2508 and core
The length and the CAM data 2508 of PRAM repairing 2512 and core PRAM repairing 2512 are loaded in non-core PRAM 116
Position.It more preferably says, which is loaded into the one of non-core PRAM 116 for the CAM data 2508 and core PRAM repairing 2512
Retain part, so that any previous non-core PRAM repairing 2518 is not destroyed (clobber), is similar to institute in square 2612
The mode of description.Process advances to square 2624.
In square 2624, which is written the same of a synchronous situation 24 (being denoted as SYNC 24 in Figure 26 A~26B)
Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores
When 102 one SYNC 24 of write-in, waken up by control unit 104.Process proceeds to square 2626.
In square 2626, which is loaded into it by non-core PRAM 116 for the CAM data 2508 and repairs CAM
2439.In addition, core PRAM repairing 2512 is loaded into its core PRAM 2499 by non-core PRAM 116 by the core 102.It is advantageous
It is just to execute to be rendered in identical microcode in microcode patching application as known to all cores, even if correspondence RAM repairing 2516
It is not yet written into non-core repairing RAM 2408 (it will occur in square 2632), due to during the period (assuming that carrying out
It is not repaired in the microcode of microcode patching application) will not have collision (hit) in repairing CAM 2439, therefore using should
It is safe that CAM data 2508, which are loaded into repairing CAM 2439,.Further, since just executing known to all cores 102, to be rendered in this micro-
Code repairing application in identical microcode, and interrupt incite somebody to action not in any core 102 using until the repairing 2500 is transmitted to institute
Until having core 102, therefore to any update of core PRAM 2499 as performed by core PRAM repairing 2512 comprising to
Change the update (for example, function setting) that may influence the value of the core 102 operation, guarantee will not be seen in framework, until this
Until repairing 2500 has been transmitted to all cores 102.Process proceeds to square 2628.
In square 2628, which is written the same of a synchronous situation 25 (being denoted as SYNC 25 in Figure 26 A~26B)
Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores
When 102 one SYNC 25 of write-in, waken up by control unit 104.Process proceeds to decision block 2631.
In decision block 2631, which determines whether the core 102 is that this encountered in square 2602 applies microcode
Repair the core 102 of instruction (compared with the core 102 for receiving the repair information in square 2652).If so, process carries out
To square 2632;Otherwise, process proceeds to square 2634.
In square 2632, which is loaded into RAM repairing 2516 to the non-core and repairs RAM 2408.In addition, the core
102 are loaded into non-core PRAM repairing 2518 to non-core PRAM 116.In one embodiment, non-core PRAM repairing 2518 includes
The procedure code as performed by the SPU 2423.In one embodiment, non-core PRAM repairing 2518 includes the microcode institute use value
Update, as described above.In one embodiment, non-core PRAM repairing 2518 includes 2423 procedure code of SPU and the microcode
The update of institute's use value.Advantageously, because just executed known to all cores 102 be rendered in the microcode patching application in it is identical micro-
Code, more specifically, the repairing CAM 2439 of all cores 102 have been loaded into the new CAM data 2508 (for example, in square
In 2626), and (be not repaired assuming that being rendered in the microcode that the microcode patching is applied) in repairing CAM during the period
To not have collision (hit) in 2439.Phase in microcode patching application is rendered in further, since just executing known to all cores 102
With microcode, and interrupt incite somebody to action not in any core 102 using until the repairing 2500 is transmitted to all core 102, by
The performed any update to non-core PRAM 116 of non-core PRAM repairing 2518, including the core may be influenced to change
The update (for example, function setting) of the value of 102 operations, guarantee will not be seen in framework, until the repairing 2500 has been transmitted
Until all cores 102.Process proceeds to square 2634.
In square 2634, which is written the same of a synchronous situation 26 (being denoted as SYNC 26 in Figure 26 A~26B)
Step request synchronizes buffer 108 to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores
When 102 one SYNC 26 of write-in, waken up by control unit 104.Process ends at square 2634.
After square 2634, if procedure code is loaded on the non-core PRAM116 for the SPU 2423, this is repaired
It mends core 102 also then to start to execute the procedure code, as described in Figure 30.In addition, the repairing core 102 release exists after square 2634
Acquired hardware semaphore 118 in square 2634.Furthermore, after square 2634, in the core 102 restarting
State interruption.
Figure 27 is please referred to, is the timing for showing an example of a microprocessor operation for 6A~26B flow chart according to fig. 2
Figure.In this example, there are three cores 102 for a microprocessor 100 configuration tool, are denoted as core 0, core 1 and core 2, as shown in the figure.So
And, it should thus be appreciated that, in other embodiments, which may include the core 102 of different number.In this timing diagram
In, the timing that event carries out is as described in lower section.
Core 0 receives the request (each square 2602) of request repairing microcode and obtains the hardware semaphore with response
118 (each squares 2604).Core 0 then transmits a microcode patching information and interrupts to core 1 and core 2 (each square 2606).Core 0
It is then written to a SYNC 21 and enters sleep state (each square 2608).
Each core 1 and core 2 are finally by being interrupted and reading the information (each square 2652) in its current task.It is right
This, each core 1 and core 2 are written a SYNC 21 and and enter sleep state (each square 2608).As shown, for example, due to
When the interruption is established, the factor of the instruction delay is just being executed, the time of each core write-in SYNC 21 may be different.
When all cores have been written into SYNC 21, which wakes up (each square 2608) for all cores simultaneously.
The instant repairing 2504 is then loaded into non-core PRAM 116 (each square 2612) by core 0, and a SYNC 22 is written, and
Into sleep state (each square 2614).A SYNC 22 is written in each core 1 and core 2, and enters sleep state (each square
2614)。
When all cores have been written into the SYNC 22, which wakes up (each square for all cores simultaneously
2614).Each core executes 2504 (each squares 2616) of instant repairing and a SYNC23 is written, and it is (every to enter sleep state
One square 2618).
When all cores have been written into the SYNC 23, which wakes up (each square for all cores simultaneously
2618).The CAM data 2508 and core PRAM repairing 2512 are then loaded into non-core PRAM 116 (each square 2622) by core 0,
And a SYNC 24 is written, and enter sleep state (each square 2624).
When all cores have been written into the SYNC 24, which wakes up (each square for all cores simultaneously
2624).Each core then uses the CAM data 2508 to be loaded into it and repairs CAM 2439, and (every using core PRAM repairing 2512
One square 2626) it is loaded into its core PRAM 2499, and a SYNC 25 is written, and enter sleep state (each square 2628).
When all cores have been written into the SYNC 25, which wakes up (each square for all cores simultaneously
2628).RAM repairing 2516 is then loaded into non-core repairing RAM 2408 by core 0, and non-core PRAM repairing 2518 is carried
Enter to non-core PRAM 116, and one SYNC 26 of write-in, and enters sleep state (each square 2634).
When all cores have been written into the SYNC 26, which wakes up (each square for all cores simultaneously
2634).As described above, if procedure code has been loaded on for the non-core PRAM 116 in the SPU 2423 with square 2632
When step, which also then starts to execute the procedure code, as described by following figure 30.
Referring to figure 2. 8, it is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The micro process
Device 100 is similar to the microprocessor 100 of Figure 24 in many aspects.However, the microprocessor 100 of Figure 28 does not include that a non-core is repaired
RAM, but provided similar with Figure 24 non-core repairing RAM 2408 in each core 102 including core repairing RAM 2808
Function.However, core repairing RAM 2808 in each core 102 by its respectively the institute of core 102 it is dedicated and not with other 102 institutes of core
It is shared.
Please refer to Figure 29 A~29B, be in the Figure 28 shown according to another embodiment the microprocessor 100 to propagate
One microcode patching to the microprocessor 100 multiple cores 102 an operational flowchart.In another reality of Figure 28 and Figure 29 A~29B
It applies in example, the repairing 2500 of Figure 25 can be modified, so that the check and correction and 2514 repairs 2516 using the RAM, rather than using should
Core PRAM repairing 2512, and repair 2512 and RAM repairing 2516 in the integrality of the CAM data 2508, core PRAM and be loaded into
After the microprocessor 100 (for example, in square 2922 in Figure 29 A~29B), the microprocessor 100 is enabled to verify the CAM number
2512 and RAM repairing 2516 is repaired according to 2508 integrality, core PRAM.The flow chart of Figure 29 A~29B class in many aspects
It is similar to the flow chart of Figure 26 A~26B, and the square equally numbered is also similar.However, square 2912 replaces square 2612, square
2916 replacement squares 2616, square 2922 replace square 2622, square 2926 replaces square 2626 and 2932 side of replacement of square
Block 2632.In square 2912, which is loaded into non-core PRAM 116 for the instant repairing 2504 and (rather than is loaded into one
Non-core repairs RAM).In square 2916, the core 102 execute this it is instant repairing 2504 before, by this it is instant repairing 2504 from
Non-core PRAM 116 copies to core repairing RAM 2808.In square 2922, in addition to the CAM data 2508 and core PRAM are repaired
It mends except 2512, which is loaded into non-core PRAM 116 for RAM repairing 2516.In square 2926, which is removed
The CAM data 2508 are loaded into it by non-core PRAM 116 and repair CAM 2439 and by core PRAM repairing 2512 by this
Non-core PRAM 116 is loaded into except its core PRAM2499, which also carries RAM repairing 2516 from non-core PRAM 116
Enter to it and repairs RAM 2808.In square 2932, different from the square 2632 of Figure 26 A~26B, which is not repaired the RAM
It mends 2516 and is loaded into non-core repairing RAM.
It can be by being observed in above-described embodiment, beneficial to propagating to each relational storage 2439/ of 100 core of microprocessor 102
2499/2808 and the atom propagation of the microcode patching 2500 to related non-nuclear memory 2408/116 carry out in such manner with true
The integrality and validity of the repairing 2500 are protected, even if 102 energy shared resource of core is no there are multiple cores 102 being performed simultaneously
Then when being applied to traditional approach, core 102 may destroy each section of (clobber) another core repairing.
Repair service processor procedure code
Figure 30 is please referred to, is the process for showing the microprocessor 100 of Figure 24 to repair a service processor procedure code
Figure.Process starts from square 3002.
In square 3002, which is loaded into the procedure code executed by the SPU 2423 in a repairing specified one
Non-core PRAM 116 in patch address, as described in Figure 26 A~26B square 2632 above.Process enters the square 3004.
In square 3004, which controls the SPU 2423 to execute the procedure code in patch address, for example, the SPU
2423 procedure code is written in the address in non-core PRAM 116 in square 3002.In one embodiment, the SPU 2423
Configuration resets vector (for example, extracting the SPU 2423 is removed after resetting to extract it since initial address buffer 2497
The address of instruction) and the core 102 the initial address buffer 2497 is written into the patch address, being then written to one makes this
In the control buffer that SPU 2423 is reset.Process proceeds to square 3006.
In square 3006, which starts in the patch address extraction procedure code (for example, extracting its first finger
Enable), for example, the address of 2423 procedure code of SPU into non-core PRAM 116 is written in square 3002.In general, it is resident
Execution one is jumped (jump) to residing in non-core ROM by 2423 Hotfix code of SPU in non-core PRAM 116
2423 procedure code of SPU in 2425.Process ends at square 3006.
The function of repairing 2423 procedure code of SPU may be particularly useful.For example, the SPU 2423 can be used for substantially
Of short duration performance test, for example, it may be not intended to that 2423 procedure code of performance test SPU is made to become the microprocessor 100
Permanent a part, and only become a part for developing part, for example, only becoming for manufacturing part and developing part
A part.In another example, which can be used to look for and/or repair mistake.In another example, the SPU 2423
It can be used to configure the microprocessor 100.
The atom for being updated to the visual storage resources of the instant framework of each core is propagated
Referring to figure 3. 1, it is the block diagram for showing a multi-core microprocessor 100 according to another embodiment.The micro process
Device 100 is similar to the microprocessor 100 of Figure 24 in many aspects.However, each core 102 of microprocessor 100 of Figure 31 further includes
Visible type of memory range buffer (Memory Type Range Registers, MTRRs) 3102 on framework.Also
It is to say, each core 102 instantiates visible MTRR 3102 on framework, even if System Software Requirement MTRR 3102 is in all cores
(more detailed description is as follows) is consistent in 102.MTRR 3102 is that each core instantiates visible storage resources on framework
Visible storage resources embodiment is described as follows on example and other each core instantiation frameworks.(although figure do not show that,
But each core 102 further includes core PRAM 2499, core microcode ROM 2404, repairing CAM 2439 in Figure 24, and real one
It applies in example, the core microcode patching RAM 2808 of Figure 28).
MTRR 3102 provides a kind of system software so that a type of memory in 100 system storage of microprocessor
Multiple and different physical address ranges is related in address space.The example of different memory type includes strong not cacheable
(strong uncacheable), not cacheable (uncacheable), write-in combine (write-combining), write-in logical
It crosses (write through), write back (write back) and write protection (write protected).Every MTRR3102 is (bright
Really or impliedly) specify a memory range and its type of memory.The common value of each MTRR3102 defines a memory and reflects
It penetrates, specifies the type of memory of different memory ranges.In one embodiment, MTRR3102 be similar to Intel 64 with
And IA-32 Framework Software developer's handbook, the 3rd: System Programming guide, in September, 2013, especially at Section 11.11
Description, is cited herein and forms part of this specification.
Wish the memory as defined in MTRR 3102 be mapped in for be in all cores of the microprocessor 100 it is identical,
So that the software operated in the microprocessor 100 has a memory consistency.However, in traditional processor, and
No hardware supported is to maintain the consistency of the internuclear MTRRs of a multi-core processor.3rd 11- of Intel handbook as mentioned previously
Description is explained in page 20 bottoms, " P6 and more nearest processor families provide have no provide to maintain [MTRRs value it is consistent
Property] hardware supported ".Therefore, system software is then responsible for maintaining the consistency across core MTRR.Quote Intel handbook the in top
11.11.8 an algorithm of section description system software is closed to maintain and update with each nuclear phase of its MTRRs multi-core processor
Consistency, for example, all cores execute the instruction for updating its respective MTRRs.
On the contrary, the system software one of them middle update MTRR 3102 can respectively be requested in the core 102
(instance), and in an atomic way being conducive to the core 102 propagation, this is updated in all cores 102 of microprocessor 100
The embodiment description of MTRR 3102 respectively requested (is similar to description Figure 24 embodiment institute into Figure 30 above in this article
The mode of the microcode patching executed).It provides a kind of 3102 framework instruction-levels of MTRR to maintain different IPs 102
The method of consistency.
Figure 32 is please referred to, is that the microprocessor 100 to propagate a MTRR 3102 is updated to micro- place in display Figure 31
Manage the operational flowchart of one of multiple cores 102 of device 100.Described by angle of the operation from a single core, but the microprocessor
100 each core 102 is carried out according to propagating the MTRR3102 jointly and be updated to the description of all cores 102 of microprocessor 100
Operation.More specifically, Figure 32 description encounters the operation for updating the core of the MTRR 3102 instruction, process starts from square
3202, and the operation of other cores 102, process start from square 3252.
In square 3202, core 102 one of them encounter the instruction that the instruction core updates its MTRR 3102.Namely
It says, the MTRR more new command includes that a MTRR3102 identifier and one are written into the updated value of the MTRR 3102.Implement one
In example, the MTRR more new command is x86WRMSR instruction, to specify the updated value in EAX:EDX buffer and
MTRR3102 identifier of the ECX buffer, for the address MSR in the MSR address space of the core 102.For sound
It should MTRR more new command, the deactivated interruption of the core 102 and the microcode for preventing to execute the MTRR more new command.It should be appreciated that
The system software including the MTRR more new command may include a multiple instructions sequence, using the preparation updated as the MTRR 3102.
It more preferably, however says, as the response of sequence single architecture instruction, the MTRR 3102 of all cores 102 is instructed in the framework
It is updated in grade with an atomic way.It is deactivated in first core 102 that is, once interrupting (for example, in square 3202
In, which encounters the MTRR more new command), when the microcode of execution propagates new 3102 value of MTRR to 100 institute of microprocessor
When (for example, until after square 3218) have core 102, interruption still remains deactivated.Furthermore once the quilt in other cores 102
It deactivates (for example, in square 3252), is still deactivated until the MTRR 3102 of all cores 102 of the microprocessor 100 has updated
Until (for example, until after square 2634).It is therefore advantageous that new 3102 value of MTRR in the framework instruction-level with
One atomic way is transmitted in all cores 102 of the microprocessor 100.Process proceeds to square 3204.
In square 3204, which obtains the ownership of the hardware semaphore 118 in Fig. 1.It more preferably says, micro- place
Managing device 100 includes a hardware semaphore 118 relevant to a MTRR 3102.It more preferably says, which obtains firmly in such manner
The ownership of part semaphore 118, mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.This is hard
Part semaphore 118 is used, and due to being possible to core 102, one of them executes a MTRR 3102 and updates, using as encountering a MTRR
The response of more new command, and one second core 102 encounters a MTRR more new command, is somebody's turn to do using will start to update as second core
The response of MTRR3102, this is likely to result in incorrect execution.Process proceeds to square 3206.
In square 3206, a core 102 transmission one MTRR more new information to other cores 102 and transmits other 102 1 cores of core
Between interrupt.More preferably say, in a period of the time, interruption was deactivated (for example, the microcode does not allow itself to be interrupted), the core
102 prevent the microcode to respond the MTRR more new command (in square 3202) or respond the interruption (in the square 3252),
And be maintained in the microcode, until square 3218.Process proceeds to square 3208.
In square 3252, one of other cores 102 are (for example, in addition to encountering the MTRR more new command in square 3202
A core except the core 102) it is interrupted and the internuclear interruption because being transmitted in square 3206 due to receives MTRR update
Information.In one embodiment, which obtains in next framework instruction boundaries (for example, in next x86 instruction boundaries) is somebody's turn to do
It interrupts.In response to the interruption, which deactivates the microcode for interrupting and preventing the processing MTRR more new information.Though as described above,
It is so with described by the angle of a single core 102 in the process in square 3252, but each other cores 102 are not (for example, in square
Core 102 in 3202) information is interrupted and received in square 3252, and execute in square 3208 to the step of square 3234
Suddenly.Process proceeds to square 3208 by square 3252.
In square 3208, which is written the synchronization request (SYNC 31 is denoted as in Figure 32) of a synchronous situation 31
It is synchronized in buffer 108 to it, and enables the core 102 enter sleep state by the control unit 104, and then when all cores 102
When having been written into SYNC 31, waken up by the control unit 104.Process proceeds to decision block 3211.
In decision block 3211, which judges whether it is to meet the MTRR more new command in square 3202
Core 102 (compared with the core 102 for receiving the MTRR more new information in square 3252).If so, process proceeds to square
3212;Otherwise, process proceeds to square 3214.
In square 3212, which will be updated MTRR identifier and the MTRR quilt of instruction by the MTRR
It updates so that the MTRR updated value that all other core 102 can be seen that is loaded into non-core PRAM 116.In an x86 embodiment
In the case of, MTRR 3102 includes: (1) repair coverage MTRR comprising one via single the 64 of the update of single WRMSR instruction
Position MSR and (2) different range MTRR comprising two 64 MSR, every MSR are written by a different WRMSR instructions,
For example, the two WRMSR instructions specify the different addresses MSR.For different range MTRRs, one of the MSR (should
PHYSBASE buffer) include the memory range a plot and a type field to specify the type of memory,
And others MSR (the PHYSMASK buffer) includes that the masking column that the range covers (mask) is arranged in a significance bit and one
Position.It more preferably says, the MTRR updated value which is loaded into non-core PRAM 116 is as follows.
If 1, the MSR is determined as the PHYSMASK buffer, which is loaded into non-core PRAM 116 1 128
Updated value, the updated value include new 64 place value (it includes the significance bit and shading values) as specified by the WRMSR instruction and
The current value of the PHYSBASE buffer (it includes base value and types value).
If 2, the MSR is determined as the PHYSBASE buffer:
If a, significance bit is just being set in the PHYSMASK buffer, which is loaded into non-core PRAM 116
One 128 updated value, the updated value include that (64 place value includes the base for this is new as specified by the WRMSR instruction 64 place values
Value and types value) and the PHYSMASK buffer current value (current value includes the significance bit and shading values).
If b, significance bit is just being set in the PHYSMASK buffer, which is loaded into non-core PRAM 116
One 64 updated value, the updated value only include that (64 place value includes the base for this is new as specified by the WRMSR instruction 64 place values
Value and types value).
In addition, a flag is arranged in non-core PRAM 116 in the core 102 if the updated value of the write-in is one 128 values
Mark, if also, updated value when being one 64 values, which removes the flag.Process proceeds to square by square 3212
3214。
In square 3214, which is written the synchronization request of a synchronous situation 32 (SYNC 32 is denoted as in Figure 32)
Buffer 108 is synchronized to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores 102 are written
When one SYNC 32, waken up by control unit 104.Process proceeds to square 3216.
In square 3216, which reads the MTRR 3102 being written in square 3212 from non-core PRAM 116
Identifier and the MTRR updated value.Advantageously, the MTRR updated value is propagated with atomic way execution, so that any may
The update guarantee for influencing the MTRR 3102 that respective core 102 operates is architecturally invisible, until the updated value has been transmitted to institute
Until the MTRR 3102 for having core 102, is just being executed as known to all cores and is being rendered in identical microcode in the MTRR more new command,
And it interrupts and will not be used in any core 102, be until the updated value is transmitted to the respective MTRR 3102 of all cores 102
Only.As described in square 3212 in above the present embodiment, if the flag is set in square 3212, which also updates
(other than fixed MSR) PHYSMASK the or PHYSBASE buffer;Otherwise, if the flag is removing (clear),
Then the core 102 only updates fixed MSR.Process proceeds to square 3218.
In square 3218, which is written the synchronization request of a synchronous situation 33 (SYNC 33 is denoted as in Figure 32)
Buffer 108 is synchronized to it, and so that the core 102 is entered sleep state by the control unit 104, then when all cores 102 are written
When one SYNC 33, waken up by control unit 104.Process ends at square 3218.
After square 3218, which discharges the hardware semaphore 118 obtained in square 3204.More
Further, after square 3218, the core 102 restarting is interrupted.
From from Figure 31 and Figure 32 it is found that operating in system software in Figure 31 microprocessor 100 can be conducive to execute at this
A MTRR more new command is executed in the single core 102 of microprocessor 100 to complete to update the finger of all cores 102 of microprocessor 100
Determine MTRR 3102, and non-individual executes a MTRR more new command in each core 102, can provide the integrality of system.
One instantiation specific MTRR3102 in each core 102 is a system management range buffer (System
Management Range Register, SMRR) 3102.Since the SMRR 3102 possesses procedure code and and System Management Mode
The operation of (System Management Mode, SMM) relevant data, such as a system management interrupt (System
Management Interrupt, SMI) processor, therefore the memory range as specified by the SMRR 3102 is referred to as
The region SMRAM.When the procedure code run in a core 102 is attempted to access the region SMRAM, if the core 102 runs on SMM
In, then the core 102 only allows this access;Otherwise, which ignores the write-in that the region SMRAM is written, and restores by this
The fixed value of each is read in the region SMRAM.In addition, if the core 102 operated in the SMM is attempted at this
Program code outside the region SMRAM, then it is abnormal will to establish a hardware check for the core 102.In addition, when the core 102 operates in SMM
When, which only allows procedure code to be written in the SMRR3102.This is conducive to SMM procedure code and data in the region SMRAM
Protection.In one embodiment, which is similar in Intel64 and IA-32 Framework Software developer handbook the 3rd
Volume: System Programming guide, in September, 2013 are drawn herein especially in 11.11.2.4 and 34.4.2.1 section description
With and form part of this specification.
In general, each core 102 has the example of its own SMM procedure code and data in memory.Desirably
The SMM procedure code and data of each core 102 are protected to come not only from the procedure code run in itself, but also
The procedure code run in another core 102.It is completed to use SMRRs3102, system software is usually by multiple SMM programs
Code and data instance are placed in block adjacent in memory.That is, the region SMRAM is one single including all SMM procedure codes
With the adjacent memory region of data instance.If the SMRR 3102 of all cores 102 of microprocessor 100 has specified packet
When including value of all SMM for the single adjacent memory region entirety of this of procedure code and data instance, this can be prevented in non-SMM
In the procedure code of core operation update the SMM procedure code and data instance of another core 102.When a time window is present in core 102
When middle 3102 value of SMRR is not identical, for example, SMRRs 3102 has different values in 100 different IPs 102 of microprocessor,
Any value is clearly less than the entirety in the single adjacent memory region including all SMM procedure codes and data instance, then system can
It can may be serious for giving the property of SMM vulnerable to a security attack.Therefore, description atom, which is propagated, updates
Embodiment to SMRRs 3102 can be particularly advantageous.
In addition, visible storage resources on the other each core instantiation frameworks of the expectable microprocessor 100 of other embodiments
Update be transmitted with an atomic way of the similar above method.For example, in one embodiment, each instantiation of core 102 should
Certain bit field positions of x86IA32_MISC_ENABLE MSR, and a performed WRMSR in a core 102 is with similar as above
One mode is transmitted to all cores 102 in the microprocessor 100.In addition, embodiment is also contemplated by a WRMSR's
Execution in one core 102 is all on framework and special to the other MSR being instantiated in all cores 102 of microprocessor 100
And/or current and future, all cores being transmitted in a manner of similar as described above one in the microprocessor 100
102。
In addition, although it is MTRRs, other implementations that embodiment, which describes visible storage resources on each core instantiation framework,
It is different from the resource of x86ISA instruction set architecture and other other than MTRRs that example, which is expected to each core instantiation resource,
Resource.For example, other resources other than MTRRs include the MSR of CPUID value and report-back function, seem that vector is more
Media extension (Vectored Multimedia eXtensions, VMX) function.
Although the present invention has been disclosed as a preferred embodiment, however, it is not to limit the invention, those skilled in the art
Member do not departing from spirit and scope of the invention, when can do it is a little change and retouch, therefore protection scope of the present invention when with
Subject to the claim of this application is defined.For example, software can enable, for example, function, manufacture, modelling, simulation, description and/
Or test device of the present invention and method.It is above-mentioned to be retouched by using general procedure language (such as: C, C++), hardware
Predicate says that (Hardware Description Languages, HDL) includes Verilog HDL, VHDL etc. to realize.It is such
Software can be contained in tangible media with the kenel of procedure code, such as any other machine-readable (such as computer-readable)
Storage medium such as semiconductor, disk, hard disk or CD (such as: CD-ROM, DVD-ROM etc.), wherein when procedure code is by machine
Device, when being loaded into and execute such as computer, this machine becomes to implement the device of the invention.Method and apparatus of the invention can also
To be transmitted with procedure code kenel by some transmission mediums, such as electric wire or cable, optical fiber or any transmission kenel,
In, when procedure code is by machine, when receiving, be loaded into and execute such as computer, this machine becomes to implement the device of the invention.When
In general service processor implementation, procedure code combination processing device provides an operation and is similar to the uniqueness for applying particular logic circuit
Device.Device of the present invention and method may be included in the (insertion of a semiconductor intelligence property right core such as microprocessor core
In HDL), and it is converted into the hardware product of integrated circuit.In addition, device of the present invention and method may include with hardware
And the composite entity embodiment of software.Therefore subject to protection scope of the present invention ought be defined depending on the claim of this application.
Finally, those skilled in the art can based on disclosed herein concept and specific embodiment, do not departing from essence of the invention
A little change and retouch to reach identical purpose of the invention can be done in mind and range.
Claims (20)
1. a kind of microprocessor characterized by comprising
Multiple processing cores;
Service unit is used to execute the operation to debug above-mentioned microprocessor;And
First memory can be accessed by above-mentioned service unit and above-mentioned multiple processing cores,
Wherein at least one processing core of above-mentioned multiple processing cores is configured as that the above-mentioned first memory of write-in will be repaired, wherein above-mentioned
Repairing includes one or more instructions, and above-mentioned first memory is being written by above-mentioned at least one processing core in one or more above-mentioned instructions
Afterwards, it is extracted and is executed from above-mentioned first memory by above-mentioned service unit.
2. microprocessor according to claim 1, which is characterized in that above-mentioned multiple processing cores have instruction set architecture,
In above-mentioned service unit be framework on it is invisible with execute it is above-mentioned it is multiple processing core above-metioned instruction collection framework instruction.
3. microprocessor according to claim 1, which is characterized in that further include:
Second memory is configured as what preservation will be extracted and be executed from above-mentioned second memory by above-mentioned service unit
Instruction, wherein above-mentioned second memory is read-only memory.
4. microprocessor according to claim 3, which is characterized in that further include:
Buffer is configured as storage address, wherein above-mentioned buffer can be by above-mentioned at least one processing of above-mentioned multiple processing cores
Core is written,
Wherein above-mentioned service unit is configured as: after above-mentioned service unit is reset, in above-mentioned buffer
At specified address the first of above-mentioned service unit is extracted and executed from above-mentioned first memory or above-mentioned second memory
Instruction.
5. microprocessor according to claim 1, which is characterized in that above-mentioned first memory is additionally configured to save by institute
The data for thering is microcode performed by above-mentioned multiple processing cores to read and be written.
6. microprocessor according to claim 1, which is characterized in that further include:
Cache memory is shared by above-mentioned multiple processing cores,
Wherein above-mentioned service unit is configured as write-in control buffer, to stop one or more above-mentioned multiple for handling core
Processing core issues a request to shared cache memory.
7. microprocessor according to claim 1, which is characterized in that further include:
Cache memory is shared by above-mentioned multiple processing cores,
Wherein above-mentioned service unit is configured as write-in control buffer, true to stop the cache memory sharing
Recognize the request to the cache memory shared generated by above-mentioned multiple processing cores.
8. microprocessor according to claim 1, which is characterized in that further include:
Cache memory is shared by above-mentioned multiple processing cores,
Wherein above-mentioned service unit is configured as write-in control buffer, to adjust the cache memory shared
Size.
9. microprocessor according to claim 1, which is characterized in that further include:
Bus interface is configured as engaging the system bus outside above-mentioned microprocessor,
Wherein above-mentioned service unit is configured as write-in control buffer, total in above system to control above-mentioned bus interface
Processing is executed in line.
10. microprocessor according to claim 1, which is characterized in that further include:
Programmable interrupt controller, to control to the interrupt requests of above-mentioned microprocessor,
Wherein above-mentioned service unit is configured as write-in control buffer, to control above-mentioned programmable interrupt controller simulation
For the input/output device of above-mentioned multiple processing cores.
11. microprocessor according to claim 1, which is characterized in that
Each processing core of above-mentioned multiple processing cores includes multiple functional units;And
Above-mentioned service unit is configured as write-in control buffer, to configure above-mentioned multiple function in above-mentioned multiple processing cores
The performance characteristic of one or more functional units of energy unit, wherein above-mentioned performance characteristic is one or more following performance characteristics: point
Branch predicts algorithm and data preextraction algorithm.
12. a kind of method as performed by microprocessor, which is characterized in that above-mentioned microprocessor has multiple processing cores, is used for
Execute the service unit to debug the operation of above-mentioned microprocessor and can be by above-mentioned service unit and above-mentioned multiple
The memory that processing core is accessed, the above method include:
Above-mentioned memory is written into repairing by least one processing core of above-mentioned multiple processing cores, wherein above-mentioned repairing includes one or more
A instruction;
After above-mentioned memory is written in above-mentioned repairing by above-mentioned at least one processing core, deposited by above-mentioned service unit from above-mentioned
One or more above-mentioned instructions of above-mentioned repairing are extracted in reservoir;And
The extracted instruction of above-mentioned repairing is executed by above-mentioned service unit.
13. according to the method for claim 12, which is characterized in that above-mentioned multiple processing cores have instruction set architecture, wherein
Above-mentioned service unit is the invisible instruction to execute the above-metioned instruction collection framework of above-mentioned multiple processing cores on framework.
14. according to the method for claim 12, which is characterized in that further include:
The microcode as performed by each processing core of above-mentioned multiple processing cores from above-mentioned memory read data or writes data to
Above-mentioned memory.
15. according to the method for claim 12, which is characterized in that further include:
Control buffer is written by above-mentioned service unit, is issued with stopping one or more processing cores of above-mentioned multiple processing cores
The cache memory of above-mentioned microprocessor is requested, wherein above-mentioned cache memory is total to by above-mentioned multiple processing cores
It enjoys.
16. according to the method for claim 12, which is characterized in that further include:
By above-mentioned service unit be written control buffer, with stop above-mentioned microprocessor cache memory confirmation by
The request to above-mentioned cache memory that above-mentioned multiple processing cores generate, wherein above-mentioned cache memory is by above-mentioned more
A processing core is shared.
17. according to the method for claim 12, which is characterized in that further include:
By above-mentioned service unit be written control buffer, with adjust above-mentioned microprocessor cache memory it is big
It is small, wherein above-mentioned cache memory is shared by above-mentioned multiple processing cores.
18. according to the method for claim 12, which is characterized in that above-mentioned microprocessor further includes bus interface, the bus
Interface is configured as engaging the system bus outside above-mentioned microprocessor, the above method further include:
Control buffer is written by above-mentioned service unit, to control above-mentioned bus interface in above system bus at execution
Reason.
19. according to the method for claim 12, which is characterized in that above-mentioned microprocessor further includes programmable Interrupt control
Device, the programmable interrupt controller is to control to the interrupt requests of above-mentioned microprocessor, the above method further include:
Control buffer is written by above-mentioned service unit, to control above-mentioned programmable interrupt controller simulation for above-mentioned more
The input/output device of a processing core.
20. according to the method for claim 12, which is characterized in that each processing core of above-mentioned multiple processing cores includes multiple
Functional unit, the above method further include:
Control buffer is written by above-mentioned service unit, to configure above-mentioned multiple above-mentioned multiple functional units for handling core
The performance characteristic of one or more functional units, wherein above-mentioned performance characteristic is one or more following performance characteristics: branch prediction is drilled
Algorithm and data preextraction algorithm.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361871206P | 2013-08-28 | 2013-08-28 | |
US61/871,206 | 2013-08-28 | ||
US201361916338P | 2013-12-16 | 2013-12-16 | |
US61/916,338 | 2013-12-16 | ||
US14/281,758 | 2014-05-19 | ||
US14/281,758 US9471133B2 (en) | 2013-08-28 | 2014-05-19 | Service processor patch mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104239273A CN104239273A (en) | 2014-12-24 |
CN104239273B true CN104239273B (en) | 2019-08-06 |
Family
ID=52227371
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410431656.9A Active CN104239273B (en) | 2013-08-28 | 2014-08-28 | Microprocessor and its execution method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104239273B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286116A (en) * | 2007-07-24 | 2008-10-15 | 威盛电子股份有限公司 | Device and method for prosecuting real time microcode repairing |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7290081B2 (en) * | 2002-05-14 | 2007-10-30 | Stmicroelectronics, Inc. | Apparatus and method for implementing a ROM patch using a lockable cache |
US20070088939A1 (en) * | 2005-10-17 | 2007-04-19 | Dan Baumberger | Automatic and dynamic loading of instruction set architecture extensions |
US7836442B2 (en) * | 2007-03-15 | 2010-11-16 | Lenovo (Singapore) Pte. Ltd. | Out-of-band patch management system |
EP2195741A4 (en) * | 2007-10-04 | 2011-01-05 | Openpeak Inc | Firmware image update and management |
US8296528B2 (en) * | 2008-11-03 | 2012-10-23 | Intel Corporation | Methods and systems for microcode patching |
CN101561764B (en) * | 2009-05-18 | 2012-05-23 | 华为技术有限公司 | Patching method and patching device under multi-core environment |
-
2014
- 2014-08-28 CN CN201410431656.9A patent/CN104239273B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286116A (en) * | 2007-07-24 | 2008-10-15 | 威盛电子股份有限公司 | Device and method for prosecuting real time microcode repairing |
Also Published As
Publication number | Publication date |
---|---|
CN104239273A (en) | 2014-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462004B (en) | The method of microprocessor and its internuclear synchronous operation of processing | |
CN104216680B (en) | Microprocessor and its execution method | |
TWI637316B (en) | Dynamic reconfiguration of multi-core processor | |
CN104331388B (en) | Microprocessor and the method for the internuclear synchronization of processing in microprocessor | |
CN104216679B (en) | Microprocessor and its execution method | |
CN104360727B (en) | Microprocessor and the method for using its power saving | |
CN104238997B (en) | Microprocessor and its execution method | |
CN104239275B (en) | Multi-core microprocessor and its relocation method | |
CN104239274B (en) | Microprocessor and its configuration method | |
CN104331387B (en) | Microprocessor and its configuration method | |
CN104239273B (en) | Microprocessor and its execution method | |
CN104239272B (en) | Microprocessor and its operating method | |
CN104216861B (en) | Microprocessor and the in the microprocessor method of synchronization process core |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |