Nothing Special   »   [go: up one dir, main page]

US20050138450A1 - Apparatus and method for power performance monitors for low-power program tuning - Google Patents

Apparatus and method for power performance monitors for low-power program tuning Download PDF

Info

Publication number
US20050138450A1
US20050138450A1 US10/741,002 US74100203A US2005138450A1 US 20050138450 A1 US20050138450 A1 US 20050138450A1 US 74100203 A US74100203 A US 74100203A US 2005138450 A1 US2005138450 A1 US 2005138450A1
Authority
US
United States
Prior art keywords
power consumption
micro
power
functional unit
instruction sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/741,002
Other versions
US7287173B2 (en
Inventor
Cheng-Hsueh Hsieh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HSIEH, CHENG-HSUEH
Priority to US10/741,002 priority Critical patent/US7287173B2/en
Priority to CN201611199215.6A priority patent/CN106598691B/en
Priority to PCT/US2004/040136 priority patent/WO2005066774A1/en
Priority to CN2004800361038A priority patent/CN1890636B/en
Priority to CN201010571004.7A priority patent/CN102063323B/en
Priority to DE112004002506T priority patent/DE112004002506B4/en
Priority to TW093137665A priority patent/TWI301573B/en
Publication of US20050138450A1 publication Critical patent/US20050138450A1/en
Publication of US7287173B2 publication Critical patent/US7287173B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4432Reducing the energy consumption
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • One or more embodiments of the invention relate generally to the field of low-power programming. More particularly, one or more of the embodiments of the invention relate to a method and apparatus for power performance monitors for low-power program tuning.
  • IA Intel® Architecture
  • IA-614 Current Intel® Architecture (IA) Processor Families (IA-32 and IA-64) provide various performance monitors to record information, such as cache miss, branch miss prediction, retired instructions, and the like, with very little overhead, to the executing program. Compilers can also install operating system drivers to record various performance monitor information. In addition, the performance monitoring information is used for the next program compilation to speed-up the code based on a period of typical use. In the past, performance monitors have helped both programmers and compilers to refine generated program code without resorting to traditional probing code that causes substantial overhead or alters program characteristics to render measured statistics unusable.
  • FIG. 1 is a block diagram illustrating a computer system, including a power optimization compiler, in accordance with one embodiment.
  • FIG. 2 is a block diagram illustrating a micro-architecture, as depicted in FIG. 1 , configured to compute power consumption levels required to execute the instructions of an application program, in accordance with one embodiment.
  • FIG. 3 is a block diagram further illustrating a functional unit and a micro-operation of FIG. 2 , in accordance with one embodiment.
  • FIG. 4 is a block diagram further illustrating a run-time optimizer of the compiler of FIG. 1 , to identify instruction sequences of an application program having an excess power consumption level, in accordance with one embodiment.
  • FIG. 5 is a flowchart illustrating a method for recompiling an application program to reduce power consumption levels of identified instruction sequences having an excess power consumption level, in accordance with one embodiment.
  • FIG. 6 is a flowchart illustrating a method for computing power consumption levels of instructions of an application program, in accordance with one embodiment.
  • FIG. 7 is a flowchart illustrating a method for updating power consumption fields of processed micro-operations by functional units of a micro-architecture, in accordance with one embodiment.
  • FIG. 8 is a flowchart illustrating a method for incrementing a power consumption field of one or more identified micro-operations by a determined power consumption level, in accordance with one embodiment.
  • FIG. 9 is a flowchart illustrating a method for updating a power history buffer entry according to a value of each executed micro-operations power consumption field, in accordance with one embodiment.
  • FIG. 10 is a flowchart illustrating a method for identifying instruction sequences of an application program having an excess power consumption level, in accordance with one embodiment.
  • FIG. 11 is a flowchart illustrating a method for recompiling an application program to reduce power consumption levels of one or more identified instruction sequences.
  • FIG. 12 is a block diagram illustrating various design representations or formats for simulation, emulation and fabrication of a design using the disclosed techniques.
  • the method includes the computation of power consumption levels of instructions of an application. Once consumption levels are computed, instruction sequences of the application are identified that exhibit an excess power consumption level. For the identified instruction sequences, the application program is recompiled to reduce power consumption levels of one or more of the identified instruction sequences.
  • the instruction sequence in situations where power consumption by an instruction sequence cannot be reduced, utilization of functional units required to execute the instruction sequence is monitored. Hence, the instruction sequence may be executed during periods of time when utilization of the functional units is below a predetermined level. In one embodiment, power consumption levels of the instructions of an application program are reduced, and in addition to the power consumption reduction, utilization of functional units may be reduced to prevent overheating.
  • FIG. 1 is a block diagram illustrating a computer system 100 including a processor 110 having micro-architecture 200 , in accordance with one embodiment of the invention.
  • computer system 100 includes a power optimization compiler 300 to recompile an application program to reduce power consumption levels of one or more identified instruction sequences having an excess power consumption level, in accordance with one embodiment.
  • Computer system 100 comprises a processor system bus (front side bus (FSB)) 102 for communicating information between the processor (CPU) 110 and a chipset 180 coupled together via FSB 102 .
  • FSB front side bus
  • Chipset 180 is used in a manner well-known to those skilled in the art to describe collectively the various devices coupled to CPU 110 to perform desired system functionality.
  • Chipset 180 is comprised of a memory controller or memory controller hub (MCH) 120 , as well as an input/output (I/O) controller or I/O controller hub (ICH) 130 .
  • I/O bus 125 couples MCH 120 to ICH 130 .
  • Memory controller 120 of chipset 180 is coupled to main memory 140 and one or more graphics devices or graphics controller 160 .
  • main memory 110 is volatile memory, including but not limited to, random access memory (RAM), synchronous RAM (SRAM), double data rate (DDR) S-data RAM (SDRAM), Rambus data RAM (RDRAM), or the like.
  • RAM random access memory
  • SRAM synchronous RAM
  • DDR double data rate S-data RAM
  • SDRAM double data rate S-data RAM
  • RDRAM Rambus data RAM
  • main memory 110 is volatile memory, including but not limited to, random access memory (RAM), synchronous RAM (SRAM), double data rate (DDR) S-data RAM (SDRAM), Rambus data RAM (RDRAM), or the like.
  • HDD hard disk drive devices
  • I/O controller 130 as illustrated, CPU 110 includes micro-architecture 200 to compute power consumption levels required to execute the instructions of an application program, in accordance with one embodiment of the invention, as illustrated in to FIG. 2 .
  • system 100 may be a portable device that include a self contained power supply (source) 104 , such as a battery.
  • source a self contained power supply
  • system 100 may be a non-portable device, such as, for example, a desktop computer or a server computer not including optional source 104 .
  • micro-architecture 200 includes power consumption meters (PM) to assist compiler 300 in pinpointing portions of an application program that consume more power than remaining portions of the program, as illustrated in FIG. 2 .
  • compiler 300 determines power consumption information associated with instructions (sequences) at sampled program counters. Compiler 300 can use this information to identify which part of the application program consumes the most of the power and switch to alternative algorithms or optimization strategies to achieve less power consumption.
  • a run-time analyzer receives constant feedback to recompile the program to achieve lower power consumption for a given performance goal. Hence, compiler 300 can strike a good balance between performance and power consumed while providing a specified throughput.
  • micro-architecture 200 is configured to perform dynamic execution.
  • dynamic execution refers to the use of front-end logic 202 to fetch the next instructions according to program order and prepare the instructions for subsequent execution in the system pipeline.
  • front-end logic 202 IFU (not shown) fetches macro-instructions via bus interface unit (BIU) 270 .
  • BIU bus interface unit
  • uOPs micro-operations
  • an instruction decoder (ID) 210 ( 210 - 1 , 210 - 2 ), decodes the macro-instruction into one or more uOPs which are provided to instruction decoder queue (IDQ) (not shown) and subsequently to out-of-order (OOO) core 220 .
  • the front-end logic 202 supplies a high bandwidth stream of decoded instructions to OOO core 220 , which directs execution (the actual completion) of the instructions.
  • front-end logic 202 may utilize highly accurate branch prediction logic (not shown) in order to speculate where a program is going to execute next, or as referred to herein, dynamic execution.
  • uOPs are scheduled to avoid stalling when following delayed instructions. In other words, uOPs are executed in an “out-of-order” execution fashion when required to ensure the most efficient use of available processor resources.
  • reservation station (RS) 230 of OOO core 220 receives decoded uOPs 212 from front-end logic 202 .
  • uOPs 212 received by RS 230 remain in RS 230 to await arrival of referenced source operands.
  • RS 230 schedules the execution of the respective uOP within one or more execution units, including arithmetic logic units (ALU) 240 (A 1 240 - 1 , A 2 240 - 2 ) to handle simple arithmetic and logic operations.
  • ALU arithmetic logic units
  • reservation schedules execution of floating point instructions within floating point unit (FP) pipeline 250 .
  • FP 250 and ALUs 240 are collectively referred to as execution units.
  • execution units use a memory unit pipeline (M), which uses a bus and memory subsystem block (BIU) 270 to execute received uOPs. Subsequently, executed uOPs are received by retirement unit (RT) 280 . In one embodiment, RT 280 receives the completion status of executed uOPs from execution units 240 - 250 and processes the results to commit (retire) a proper architectural state according to the program order.
  • FUs functional units refers to the various components ( 210 - 280 ) of front-end logic 202 and 000 core 220 used to schedule and execute uOPs.
  • micro-architecture 200 includes a respective power meter (PM) 290 ( 290 - 1 , . . . , 290 - 9 ), which is coupled to the various FUs ( 210 - 280 ) of micro-architecture 200 .
  • PM 290 are configured to measure power consumed by an attached FU ( 210 - 280 ) during a program cycle.
  • an FU ( 210 - 280 ) such as, for example a decoder (ID 210 ), includes an attached PM (e.g., 290 ) configured to measure power consumed by ID 210 during, for example, a program cycle.
  • FUs ( 210 - 280 ) communicate with an attached PM 290 to receive power consumption values measured by the attached PM 290 during a program cycle.
  • FUs ( 210 - 280 ) enable the measurement of power consumption levels consumed by instructions of an application program during execution.
  • PMs, as described herein, are placed on die, and directly coupled to respective FUs of the micro-architecture to provide a virtually exact measurement of power consumed by the respective FU during a program cycle.
  • uOP 212 includes a program counter (PC) field 214 , as well as a power consumption value (PCV) field 216 . Accordingly, during initial decoding of a received macro instruction, a program counter value associated with the macro instruction 204 is placed within PC field 214 of each uOP decoded from macro instruction 204 . However, certain complex macro instructions may require decoding into multiple uOPs.
  • PC program counter
  • PCV power consumption value
  • the various FUs ( 210 - 280 ) query an attached power meter 290 to determine a power consumption value consumed by the respective FUs ( 210 - 280 ) to process one or more uOPs during a program cycle.
  • FUs repeat querying of their respective PM 290 to receive a power consumption value.
  • the power consumption value is used to increment PCV field 216 of uOP 212 .
  • PCV field 216 contains a power consumption value representing a summation of the power consumed by each FU ( 210 - 280 ) required to process the respective uOP during execution.
  • RT 280 updates an internal power history buffer PHB prior to retirement of each received, executed uOP.
  • RT 280 determines a program counter value according to PC field 214 of the executed uOP 212 and updates an entry within internal PHB corresponding to the PC value of executed uOP 212 .
  • internal PHB may be implemented within, for example, hardware registers of micro-architecture 200 (not shown).
  • internal PHB includes a fixed number of entries.
  • generation of new entries by RT 280 within the internal PHB may cause flushing of least recently updated entries of the internal PHB (“PHB overflow event”).
  • the internal PHB may be implemented within one or more 128-bit registers to avoid PHB overflow events.
  • a run-time analyzer (RTA) 350 may utilize an operating system (OS) driver to periodically issue an interrupt in order to read values from the internal PHB to form PHB 380 , as illustrated in FIG. 4 .
  • PHB 380 includes PC column 382 , as well as power consumption value (PCV) column 384 .
  • PCV column 420 will represent power consumed by the respective macro instruction.
  • RTA 350 periodically issues an interrupt to read values from the internal PHB to update PHB 380 and may use the various program power consumption information to identify instruction sequences of the application program having an excess power consumption level.
  • FU 400 is illustrated, which may be used as an FU ( 210 - 280 ) of micro-architecture 200 of FIG. 2 , in one embodiment.
  • FU 400 includes an average power meter (APM) field (register) 410 , as well as a utilization (U) field (register) 420 .
  • APM average power meter
  • U utilization
  • FU 400 updates APM register 410 according to a measured power consumption value divided by a number of uOPs processed by FU 400 during the program cycle (“power consumption per cycle value”).
  • FU 400 accumulates the power consumption per cycle value to generate an average power amount consumed by FU 400 within APM register 410 .
  • each FU may also track the utilization (i.e., total cycles divided by cycles doing real work (non-idle cycles) within, for example, U register 420 , as illustrated with reference to FIG. 3 .
  • U register 420 is implement using two registers. For example, a first register could be used to contain a count of the total cycles. In addition, a second register could be used to contain a count of the total non-idle cycles. In one embodiment, the first and second registers are 128-bit registers.
  • an operating system (OS) driver for example, as directed by RTA 350 can read the various U registers 420 , as well as APM register 410 of FU 400 for identifying typical power usage for each FU 210 - 280 .
  • RTA 350 maintains power consumption table (PCT) 360 according to APM register 410 and U register 420 of FU 210 - 280 .
  • PCT 360 as well as PHB 380 are implemented by using one or more registers, such as 128-bit registers, that are accessible by both RTA 350 as well as OS drivers.
  • RTA 350 is implemented as a software component, but may be implemented as hardware component, depending on the desired implementation.
  • compiler 300 may utilize information from APM registers 410 , as well as information from U registers 420 , in order to identify instruction sequences that are executed by FUs having an APM value in excess of a predetermined FU power consumption level. In one embodiment, such identified “high power instruction sequences” may be replaced with alternative instruction sequences to utilize FUs having an average power consumption level, less than a predetermined FU power consumption level. In a further embodiment, when alternate instruction sequences are not available, compiler 300 delays issuing of identified high power instruction sequences and limits issuing of such instruction sequences to FUs during identified low utilization times, according to U registers 420 . Procedural methods for implementing embodiments of the invention are now described.
  • FIG. 5 is a flowchart illustrating a method 500 for recompiling an application program to reduce power consumption levels of one or more identified instruction sequences having an excess power consumption level, in accordance with one embodiment, as described with reference to FIGS. 1-4 .
  • power consumption levels are computed for the instructions of an application program.
  • instruction sequences of the application program are identified having an excess power consumption level.
  • instructions having an excess power consumption level may include instructions that belong to critical power path instruction sequences, such as instruction sequences that are part of a critical path that exhibits an excess power consumption level.
  • identified instruction sequences having an excess power consumption level may belong to high power level instruction sequences.
  • high power level instruction sequences include instructions executed by function units (FU) of a micro-architecture having an power consumption level in excess of the predetermined FU power consumption level.
  • FU function units
  • the application program is recompiled to reduce power consumption levels of one or more of the identified instruction sequences. In an alternate embodiment, the application program is recompiled to reduce overheating of the various functional units of a processor micro-architecture.
  • FIG. 6 is a flowchart illustrating a method 510 for computing power consumption levels of instructions of process block 502 of FIG. 5 , in accordance with one embodiment.
  • micro-instructions uOPs
  • FU micro-architecture functional units
  • each FU updates a power consumption field of each uOP processed by the respective FU with a power consumption level of the respective FU required to process the respective uOP.
  • the various FU ( 210 - 280 ) update, for example, PCV field 216 of uOP 212 .
  • a power history buffer (PHB) entry is updated according to the value of each executed micro-operations power consumption field.
  • PHB power history buffer
  • a value of PCV field 216 of each uOP 212 is updated within an internal PHB of RT 280 .
  • a run-time analyzer (RTA) 350 updates PHB 380 according to the internal PHB of RT 280 , as illustrated with reference to FIG. 4 .
  • FIG. 7 is a flowchart illustrating a method 520 for updating the power consumption field of process block 514 of FIG. 6 , in accordance with one embodiment.
  • a power consumption level of an FU is determined for the program cycle. In one embodiment, determination of the power consumption level of an FU is performed by querying a power consumption meter (PM) coupled to the FU, for example, as illustrated with reference to FIG. 2 . Once queried, PM 290 returns a power consumption level measured by the PM 290 during the program cycle.
  • PM power consumption meter
  • FU ( 210 - 280 ) keeps track of various uOPs processed during a program cycle.
  • a value of a power consumption field of the more and more identified uOPs is incremented by the determined power consumption level.
  • process blocks 522 - 526 are repeated for each program cycle.
  • micro-architecture 200 of FIG. 2 represents a processor pipeline, which performs parallel execution of various uOPs.
  • methods 510 and 520 are performed in parallel by each FU ( 210 - 280 ) of micro-architecture 200 .
  • FIG. 8 is a flowchart illustrating a method 528 for incrementing the PCV field of process block 526 of FIG. 7 .
  • the determined power consumption level is divided by a count of identified uOPs processed during the program cycle to form a power consumption per cycle value.
  • the PCV field of the identified uOPs is incremented by the power consumption per cycle value.
  • an average power consumption meter (APM) of the FU is incremented by the power consumption per cycle value.
  • a utilization register (meter) of the FU is incremented for the program cycle.
  • each APM 410 contains a total power consumed divided by a total number of uOPs processed for the respective FU (e.g. 210 - 280 ).
  • a first 128-bit integer register containing a sum of total power consumed for the respective FU, is incremented by the amount of power each new uOP consumes when processed by the respective FU ( 210 - 280 ).
  • a second 128-bit integer register containing a sum of the total number of uOPs processed by the respective FU, is incremented for each new uOP processed by the respective FU.
  • RTA 350 or an OS driver samples the first and second 128-bit registers and divides the sampled values off-line in order to store an APM value for the respective FU within PCT 360 (see FIG. 4 ).
  • FIG. 9 is a flowchart illustrating a method 542 for updating the internal PHB of process block 540 of FIG. 6 , in accordance with one embodiment.
  • an executed uOP is received.
  • a program counter value associated with the executed uOP is identified according to a PC field of the executed uOP.
  • a power consumption level associated with the executed uOP is identified according to the PCV field of the executed uOP.
  • process block 550 an entry within the internal PHB corresponding to the PC field value of the executed uOP is incremented by a value of the PCV field of the executed uOP.
  • process blocks 544 - 550 are repeated for each received executed uOP.
  • method 542 describes operations performed by RT 280 , for example, as illustrated with reference to FIG. 4 , for updating an internal PHB which is read by RTA 350 to form PHB 380 .
  • FIG. 10 is a flowchart illustrating a method 562 for identifying instruction sequences having an excess power consumption level of process block 560 of FIG. 5 , in accordance with one embodiment.
  • the internal PHB is periodically queried to identify power consumption levels of the instructions of the application program.
  • CPU 110 may issue an interrupt at predetermined cycles or occurrences of an internal PHB overflow event to enable a driver to record contents of the internal PHB, which may optionally be cleared.
  • APM values and U values from APM register 410 and U register 420 are read by RTA 350 ( FIG. 4 ). As described herein, querying of the internal PHB is performed since the internal PHB includes a limited number of entries when implemented using registers.
  • instructions having a power consumption level exceeding a predetermined power consumption level are detected.
  • critical path power instructions are identified from the detected instructions, as instructions that fall within a frequently-executed instruction path having a high power consumption level.
  • application program critical paths may be identified using conventional techniques. Once identified, the critical paths may be analyzed to determine a power consumption level consumed by the critical paths. For critical paths having an excess power consumption level, such critical paths are identified as critical power path instruction sequences.
  • high power level instruction sequences are identified from the detected instructions as instruction sequences executed by FUs having an average power consumption level greater than a predetermined FU power consumption level.
  • FIG. 11 is a flowchart illustrating a method 582 for recompiling the application program to reduce power consumed by one or more identified instruction sequences having an excess power consumption level of process block 580 of FIG. 5 .
  • identified critical power path instruction sequences are replaced with alternate instruction sequences to reduce power consumption levels by using the alternate instruction sequences.
  • high power level instruction sequences are redistributed to utilize FUs with a lower average power consumption level.
  • a run-time optimizer may identify an idle time of one or more FUs used to execute identified high power level instruction sequences. In one embodiment, this may be performed by accessing, for example, a utilization register of the respective FU. As such, at process block 590 , when high power level instruction sequences cannot be redistributed to FUs having a lower average power consumption level, in one embodiment, the compiler may issue high power instruction sequences to one or more FUs when utilization of the FUs is low to prohibit FU overheating.
  • compiler 300 may use the two tuple field (PC, PCV) sampled over time to identify application programs that have an excess power consumption level.
  • an instruction is identified as a high power level instruction when uOPs decoded from the instruction are executed by FUs that exhibit an above-average power consumption level.
  • identified instruction sequences, which exhibit an excess power consumption level may include instructions which are decoded into a plurality of uOPs for execution and/or instruction sequences falling into critical paths of the application program. Accordingly, in one embodiment, compiler 300 may replace such instruction sequences with alternative instruction sequences, which consume less power at the expense of slightly decreased performance, while meeting an overall performance goal.
  • compiler 300 may sample APM registers by using an OS driver to identify FUs that consume less power. Hence, during recompiling of the application program, compiler 300 may distribute identified instruction sequences having an excess power consumption level to minimize total power consumption of the application program.
  • utilization levels of FUs of the micro-architecture are used to issue instructions during identified idle periods of the various FUs. Hence, in one embodiment, the compiler may utilize a dynamic approach to prevent FUs from overheating by issuing identified high-power instruction sequences when identified utilization of an FU is low.
  • a PHB is used by a compiler to identify program portions that consume an inordinate amount of power by querying APM registers, as well as U registers, to assist the compiler in implementing different optimization strategies with a different mix of functional units.
  • APM registers as well as U registers
  • U registers such as U registers
  • embodiments described herein are directed to a micro-architecture of a processor, the embodiments described herein may be applied to other units, such as storage, computer graphics devices and I/O, such as peripheral interconnect devices.
  • power meters may be attached and sampled to similar functional units of attached program components.
  • the OS can sample power consumed in external units and schedule tasks accordingly to prevent program system components from overheating.
  • FIG. 12 is a block diagram illustrating various representations or formats for simulation, emulation and fabrication of a design using the disclosed techniques.
  • Data representing a design may represent the design in a number of manners.
  • the hardware may be represented using a hardware description language, or another functional description language, which essentially provides a computerized model of how the designed hardware is expected to perform.
  • the hardware model 610 may be stored in a storage medium 600 , such as a computer memory, so that the model may be simulated using simulation software 620 that applies a particular test suite 630 to the hardware model to determine if it indeed functions as intended.
  • the simulation software is not recorded, captured or contained in the medium.
  • the data may be stored in any form of a machine readable medium.
  • An optical or electrical wave 660 modulated or otherwise generated to transport such information, a memory 650 or a magnetic or optical storage 640 , such as a disk, may be the machine readable medium. Any of these media may carry the design information.
  • the term “carry” e.g., a machine readable medium carrying information
  • the set of bits describing the design or a particular of the design are (when embodied in a machine readable medium, such as a carrier or storage medium) an article that may be sealed in and out of itself, or used by others for further design or fabrication.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

In some embodiments, a method and apparatus for power performance monitors for low-power program tuning are described. In one embodiment, the method includes the computation of power consumption levels of instructions of an application. Once consumption levels are computed, instruction sequences of the application are identified that exhibit an excess power consumption level. For the identified instruction sequences, the application program is recompiled to reduce power consumption levels of one or more of the identified instruction sequences. Other embodiments are described and claimed.

Description

    FIELD OF THE INVENTION
  • One or more embodiments of the invention relate generally to the field of low-power programming. More particularly, one or more of the embodiments of the invention relate to a method and apparatus for power performance monitors for low-power program tuning.
  • BACKGROUND OF THE INVENTION
  • A vast amount of research and system architecture design efforts are directed to increasing data throughput within computer systems. Technologies, such as data pipeline, out-of-order execution and the like, enable advanced architectures in processing with significantly higher clock rates to achieve world class performance. Furthermore, this research, as well as architecture redesign, has enabled the mobile market for laptop computers, hand-held devices, personal digital assistants (PDAs), and the like.
  • Unfortunately, such mobile platforms may be limited to a run-time dictated by the life of a battery used by the respective mobile platform when another power source is unavailable. Depending on the complexity of the mobile platform, power resources from an attached battery may be depleted within a relatively short amount of time. Furthermore, inclusion of technologies, such as data pipeline, out-of-order execution and the like within a mobile platform generally results in the consumption of inordinate amounts of power during execution. Hence, high performance mobile platforms may not provide a user with a sufficient amount of mobile operation time.
  • Current Intel® Architecture (IA) Processor Families (IA-32 and IA-64) provide various performance monitors to record information, such as cache miss, branch miss prediction, retired instructions, and the like, with very little overhead, to the executing program. Compilers can also install operating system drivers to record various performance monitor information. In addition, the performance monitoring information is used for the next program compilation to speed-up the code based on a period of typical use. In the past, performance monitors have helped both programmers and compilers to refine generated program code without resorting to traditional probing code that causes substantial overhead or alters program characteristics to render measured statistics unusable.
  • Unfortunately, in the area of low-power programming, performance monitors for pinpointing portions of an application program that consume more power than remaining portions of the program do not exist. Conventional compilers cannot collect power consumption information of a processor without help from the processor. Hence, without adequate tools, researchers often rely on some low power principles in order to promote their programming or computing strategies as requiring low power. Such practices often present inaccurate accounts of what really happens in the processor. Researchers often correlate low power to performance. Consequently, most performance enhancing operations that achieve the same throughput with less time are erroneously labeled as low-power technologies.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The various embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompany drawings, and in which:
  • FIG. 1 is a block diagram illustrating a computer system, including a power optimization compiler, in accordance with one embodiment.
  • FIG. 2 is a block diagram illustrating a micro-architecture, as depicted in FIG. 1, configured to compute power consumption levels required to execute the instructions of an application program, in accordance with one embodiment.
  • FIG. 3 is a block diagram further illustrating a functional unit and a micro-operation of FIG. 2, in accordance with one embodiment.
  • FIG. 4 is a block diagram further illustrating a run-time optimizer of the compiler of FIG. 1, to identify instruction sequences of an application program having an excess power consumption level, in accordance with one embodiment.
  • FIG. 5 is a flowchart illustrating a method for recompiling an application program to reduce power consumption levels of identified instruction sequences having an excess power consumption level, in accordance with one embodiment.
  • FIG. 6 is a flowchart illustrating a method for computing power consumption levels of instructions of an application program, in accordance with one embodiment.
  • FIG. 7 is a flowchart illustrating a method for updating power consumption fields of processed micro-operations by functional units of a micro-architecture, in accordance with one embodiment.
  • FIG. 8 is a flowchart illustrating a method for incrementing a power consumption field of one or more identified micro-operations by a determined power consumption level, in accordance with one embodiment.
  • FIG. 9 is a flowchart illustrating a method for updating a power history buffer entry according to a value of each executed micro-operations power consumption field, in accordance with one embodiment.
  • FIG. 10 is a flowchart illustrating a method for identifying instruction sequences of an application program having an excess power consumption level, in accordance with one embodiment.
  • FIG. 11 is a flowchart illustrating a method for recompiling an application program to reduce power consumption levels of one or more identified instruction sequences.
  • FIG. 12 is a block diagram illustrating various design representations or formats for simulation, emulation and fabrication of a design using the disclosed techniques.
  • DETAILED DESCRIPTION
  • A method and apparatus for power performance monitors for low-power program tuning are described. In one embodiment, the method includes the computation of power consumption levels of instructions of an application. Once consumption levels are computed, instruction sequences of the application are identified that exhibit an excess power consumption level. For the identified instruction sequences, the application program is recompiled to reduce power consumption levels of one or more of the identified instruction sequences.
  • In one embodiment, in situations where power consumption by an instruction sequence cannot be reduced, utilization of functional units required to execute the instruction sequence is monitored. Hence, the instruction sequence may be executed during periods of time when utilization of the functional units is below a predetermined level. In one embodiment, power consumption levels of the instructions of an application program are reduced, and in addition to the power consumption reduction, utilization of functional units may be reduced to prevent overheating.
  • System
  • FIG. 1 is a block diagram illustrating a computer system 100 including a processor 110 having micro-architecture 200, in accordance with one embodiment of the invention. In one embodiment, computer system 100 includes a power optimization compiler 300 to recompile an application program to reduce power consumption levels of one or more identified instruction sequences having an excess power consumption level, in accordance with one embodiment. Computer system 100 comprises a processor system bus (front side bus (FSB)) 102 for communicating information between the processor (CPU) 110 and a chipset 180 coupled together via FSB 102.
  • As described herein, the term “chipset” is used in a manner well-known to those skilled in the art to describe collectively the various devices coupled to CPU 110 to perform desired system functionality. Chipset 180 is comprised of a memory controller or memory controller hub (MCH) 120, as well as an input/output (I/O) controller or I/O controller hub (ICH) 130. In one embodiment, I/O bus 125 couples MCH 120 to ICH 130. Memory controller 120 of chipset 180 is coupled to main memory 140 and one or more graphics devices or graphics controller 160.
  • In one embodiment, main memory 110 is volatile memory, including but not limited to, random access memory (RAM), synchronous RAM (SRAM), double data rate (DDR) S-data RAM (SDRAM), Rambus data RAM (RDRAM), or the like. In addition, hard disk drive devices (HDD) 150, as well as one or more I/O devices 170 (170-1, . . . , 170-N) are coupled to I/O controller 130 of chipset 180. As illustrated, CPU 110 includes micro-architecture 200 to compute power consumption levels required to execute the instructions of an application program, in accordance with one embodiment of the invention, as illustrated in to FIG. 2.
  • It should be understood that embodiments of the invention may be used in any apparatus having a processor. Although embodiments of system 100 are not limited in this respect, system 100 may be a portable device that include a self contained power supply (source) 104, such as a battery. A non-exhaustive list of examples of such portable devices includes laptop and notebook computers, mobile telephones, personal digital assistants (PDAs), and the like. Alternatively, system 100 may be a non-portable device, such as, for example, a desktop computer or a server computer not including optional source 104.
  • Unfortunately, such mobile platforms may be limited to a run-time dictated by the life of a battery used by the respective mobile platform when another power source is unavailable. Depending on the complexity of the mobile platform, power resources from an attached battery may be depleted within a relatively short amount of time. Furthermore, inclusion of technologies, such as data pipeline, out-of-order execution and the like within a mobile platform generally results in the consumption of inordinate amounts of power during execution. Hence, high performance mobile platforms may not provide a user with a sufficient amount of mobile operation time.
  • Accordingly, in one embodiment, micro-architecture 200 includes power consumption meters (PM) to assist compiler 300 in pinpointing portions of an application program that consume more power than remaining portions of the program, as illustrated in FIG. 2. In one embodiment, compiler 300 determines power consumption information associated with instructions (sequences) at sampled program counters. Compiler 300 can use this information to identify which part of the application program consumes the most of the power and switch to alternative algorithms or optimization strategies to achieve less power consumption. In one embodiment, a run-time analyzer receives constant feedback to recompile the program to achieve lower power consumption for a given performance goal. Hence, compiler 300 can strike a good balance between performance and power consumed while providing a specified throughput.
  • Representatively, micro-architecture 200 is configured to perform dynamic execution. As described herein, “dynamic execution” refers to the use of front-end logic 202 to fetch the next instructions according to program order and prepare the instructions for subsequent execution in the system pipeline. Accordingly, front-end logic 202 IFU (not shown) fetches macro-instructions via bus interface unit (BIU) 270. Once the instructions are fetched, the instructions are decoded into basic operations, referred to herein as “micro-operations” (uOPs). In response to received macro-instruction 204, an instruction decoder (ID) 210 (210-1, 210-2), decodes the macro-instruction into one or more uOPs which are provided to instruction decoder queue (IDQ) (not shown) and subsequently to out-of-order (OOO) core 220.
  • In effect, the front-end logic 202 supplies a high bandwidth stream of decoded instructions to OOO core 220, which directs execution (the actual completion) of the instructions. In order to execute the instructions in the most efficient manner, front-end logic 202 may utilize highly accurate branch prediction logic (not shown) in order to speculate where a program is going to execute next, or as referred to herein, dynamic execution. Once received, uOPs are scheduled to avoid stalling when following delayed instructions. In other words, uOPs are executed in an “out-of-order” execution fashion when required to ensure the most efficient use of available processor resources.
  • Representatively, reservation station (RS) 230 of OOO core 220 receives decoded uOPs 212 from front-end logic 202. In one embodiment, uOPs 212 received by RS 230 remain in RS 230 to await arrival of referenced source operands. Once source operands of the respective uOP are received, RS 230 schedules the execution of the respective uOP within one or more execution units, including arithmetic logic units (ALU) 240 (A1 240-1, A2 240-2) to handle simple arithmetic and logic operations. Likewise, reservation schedules execution of floating point instructions within floating point unit (FP) pipeline 250. As described herein, FP 250 and ALUs 240 are collectively referred to as execution units.
  • In one embodiment, execution units use a memory unit pipeline (M), which uses a bus and memory subsystem block (BIU) 270 to execute received uOPs. Subsequently, executed uOPs are received by retirement unit (RT) 280. In one embodiment, RT 280 receives the completion status of executed uOPs from execution units 240-250 and processes the results to commit (retire) a proper architectural state according to the program order. As described herein, the term “functional units” (FUs) refers to the various components (210-280) of front-end logic 202 and 000 core 220 used to schedule and execute uOPs.
  • However, in contrast to conventional micro-architectures, micro-architecture 200 includes a respective power meter (PM) 290 (290-1, . . . , 290-9), which is coupled to the various FUs (210-280) of micro-architecture 200. In one embodiment, PM 290 are configured to measure power consumed by an attached FU (210-280) during a program cycle. In one embodiment, an FU (210-280), such as, for example a decoder (ID 210), includes an attached PM (e.g., 290) configured to measure power consumed by ID 210 during, for example, a program cycle.
  • In one embodiment, FUs (210-280) communicate with an attached PM 290 to receive power consumption values measured by the attached PM 290 during a program cycle. In one embodiment, FUs (210-280) enable the measurement of power consumption levels consumed by instructions of an application program during execution. Representatively, PMs, as described herein, are placed on die, and directly coupled to respective FUs of the micro-architecture to provide a virtually exact measurement of power consumed by the respective FU during a program cycle.
  • In one embodiment, as illustrated with reference to FIG. 3, uOP 212 includes a program counter (PC) field 214, as well as a power consumption value (PCV) field 216. Accordingly, during initial decoding of a received macro instruction, a program counter value associated with the macro instruction 204 is placed within PC field 214 of each uOP decoded from macro instruction 204. However, certain complex macro instructions may require decoding into multiple uOPs. Accordingly, as a uOP transitions between the various FUs (210-280) of micro-architecture 200, the various FUs (210-280), in one embodiment, query an attached power meter 290 to determine a power consumption value consumed by the respective FUs (210-280) to process one or more uOPs during a program cycle.
  • In one embodiment, FUs (210-280) repeat querying of their respective PM 290 to receive a power consumption value. In one embodiment, the power consumption value is used to increment PCV field 216 of uOP 212. Hence, once uOP 212 is executed and reaches RT 280, PCV field 216 contains a power consumption value representing a summation of the power consumed by each FU (210-280) required to process the respective uOP during execution. Accordingly, in one embodiment, RT 280 updates an internal power history buffer PHB prior to retirement of each received, executed uOP.
  • Hence, in one embodiment, RT 280 determines a program counter value according to PC field 214 of the executed uOP 212 and updates an entry within internal PHB corresponding to the PC value of executed uOP 212. In one embodiment, internal PHB may be implemented within, for example, hardware registers of micro-architecture 200 (not shown). In one embodiment, internal PHB includes a fixed number of entries. Hence, generation of new entries by RT 280 within the internal PHB may cause flushing of least recently updated entries of the internal PHB (“PHB overflow event”). In one embodiment, the internal PHB may be implemented within one or more 128-bit registers to avoid PHB overflow events.
  • In one embodiment, a run-time analyzer (RTA) 350 may utilize an operating system (OS) driver to periodically issue an interrupt in order to read values from the internal PHB to form PHB 380, as illustrated in FIG. 4. Representatively, PHB 380 includes PC column 382, as well as power consumption value (PCV) column 384. Hence, once each uOP decoded from a received macro instruction is executed within micro-architecture 200, PCV column 420 will represent power consumed by the respective macro instruction. In one embodiment, RTA 350 periodically issues an interrupt to read values from the internal PHB to update PHB 380 and may use the various program power consumption information to identify instruction sequences of the application program having an excess power consumption level.
  • Referring again to FIG. 3, FU 400 is illustrated, which may be used as an FU (210-280) of micro-architecture 200 of FIG. 2, in one embodiment. Representatively, in one embodiment, FU 400 includes an average power meter (APM) field (register) 410, as well as a utilization (U) field (register) 420. In one embodiment, FU 400 updates APM register 410 according to a measured power consumption value divided by a number of uOPs processed by FU 400 during the program cycle (“power consumption per cycle value”). In one embodiment, FU 400 accumulates the power consumption per cycle value to generate an average power amount consumed by FU 400 within APM register 410.
  • In a further embodiment, in addition to keeping track of average power consumed, each FU (210-280) may also track the utilization (i.e., total cycles divided by cycles doing real work (non-idle cycles) within, for example, U register 420, as illustrated with reference to FIG. 3. In one embodiment, U register 420 is implement using two registers. For example, a first register could be used to contain a count of the total cycles. In addition, a second register could be used to contain a count of the total non-idle cycles. In one embodiment, the first and second registers are 128-bit registers.
  • Hence, in one embodiment, as illustrated in FIG. 4, an operating system (OS) driver, for example, as directed by RTA 350 can read the various U registers 420, as well as APM register 410 of FU 400 for identifying typical power usage for each FU 210-280. Representatively, RTA 350 maintains power consumption table (PCT) 360 according to APM register 410 and U register 420 of FU 210-280. In one embodiment PCT 360 as well as PHB 380 are implemented by using one or more registers, such as 128-bit registers, that are accessible by both RTA 350 as well as OS drivers. Representatively, RTA 350 is implemented as a software component, but may be implemented as hardware component, depending on the desired implementation.
  • In one embodiment, compiler 300 may utilize information from APM registers 410, as well as information from U registers 420, in order to identify instruction sequences that are executed by FUs having an APM value in excess of a predetermined FU power consumption level. In one embodiment, such identified “high power instruction sequences” may be replaced with alternative instruction sequences to utilize FUs having an average power consumption level, less than a predetermined FU power consumption level. In a further embodiment, when alternate instruction sequences are not available, compiler 300 delays issuing of identified high power instruction sequences and limits issuing of such instruction sequences to FUs during identified low utilization times, according to U registers 420. Procedural methods for implementing embodiments of the invention are now described.
  • Operation
  • FIG. 5 is a flowchart illustrating a method 500 for recompiling an application program to reduce power consumption levels of one or more identified instruction sequences having an excess power consumption level, in accordance with one embodiment, as described with reference to FIGS. 1-4. At process block 502, power consumption levels are computed for the instructions of an application program. At process block 560, instruction sequences of the application program are identified having an excess power consumption level. As described herein, instructions having an excess power consumption level may include instructions that belong to critical power path instruction sequences, such as instruction sequences that are part of a critical path that exhibits an excess power consumption level.
  • In one embodiment, identified instruction sequences having an excess power consumption level may belong to high power level instruction sequences. As described herein, high power level instruction sequences include instructions executed by function units (FU) of a micro-architecture having an power consumption level in excess of the predetermined FU power consumption level. Once instruction sequences are identified, at process block 580, the application program is recompiled to reduce power consumption levels of one or more of the identified instruction sequences. In an alternate embodiment, the application program is recompiled to reduce overheating of the various functional units of a processor micro-architecture.
  • FIG. 6 is a flowchart illustrating a method 510 for computing power consumption levels of instructions of process block 502 of FIG. 5, in accordance with one embodiment. At process block 512, micro-instructions (uOPs) decoded from instructions of the application program, are executed within one or more micro-architecture functional units (FU). At process block 514, each FU updates a power consumption field of each uOP processed by the respective FU with a power consumption level of the respective FU required to process the respective uOP.
  • Hence, as illustrated with reference to FIGS. 2 and 3, the various FU (210-280) update, for example, PCV field 216 of uOP 212. Once updated, at process block 540, a power history buffer (PHB) entry is updated according to the value of each executed micro-operations power consumption field. Hence, in one embodiment, as illustrated with reference to FIG. 2, prior to retirement by RT 280, a value of PCV field 216 of each uOP 212 is updated within an internal PHB of RT 280. In one embodiment, a run-time analyzer (RTA) 350 updates PHB 380 according to the internal PHB of RT 280, as illustrated with reference to FIG. 4.
  • FIG. 7 is a flowchart illustrating a method 520 for updating the power consumption field of process block 514 of FIG. 6, in accordance with one embodiment. At process block 522, a power consumption level of an FU is determined for the program cycle. In one embodiment, determination of the power consumption level of an FU is performed by querying a power consumption meter (PM) coupled to the FU, for example, as illustrated with reference to FIG. 2. Once queried, PM 290 returns a power consumption level measured by the PM 290 during the program cycle. At process block 524, one or more uOPs processed by the FU during the program cycle are identified.
  • Hence, as illustrated with reference to FIG. 2, FU (210-280) keeps track of various uOPs processed during a program cycle. At process block 526, a value of a power consumption field of the more and more identified uOPs is incremented by the determined power consumption level. Hence, at process block 538, process blocks 522-526 are repeated for each program cycle. As described herein, micro-architecture 200 of FIG. 2 represents a processor pipeline, which performs parallel execution of various uOPs. Hence, in one embodiment, methods 510 and 520 are performed in parallel by each FU (210-280) of micro-architecture 200.
  • FIG. 8 is a flowchart illustrating a method 528 for incrementing the PCV field of process block 526 of FIG. 7. At process block 530, the determined power consumption level is divided by a count of identified uOPs processed during the program cycle to form a power consumption per cycle value. Once formed, at process block 532, the PCV field of the identified uOPs is incremented by the power consumption per cycle value. At process block 534, an average power consumption meter (APM) of the FU is incremented by the power consumption per cycle value. At process block 536, a utilization register (meter) of the FU is incremented for the program cycle.
  • In one embodiment, each APM 410 contains a total power consumed divided by a total number of uOPs processed for the respective FU (e.g. 210-280). In an alternate embodiment, a first 128-bit integer register, containing a sum of total power consumed for the respective FU, is incremented by the amount of power each new uOP consumes when processed by the respective FU (210-280). In addition, a second 128-bit integer register, containing a sum of the total number of uOPs processed by the respective FU, is incremented for each new uOP processed by the respective FU. Representatively, RTA 350 or an OS driver samples the first and second 128-bit registers and divides the sampled values off-line in order to store an APM value for the respective FU within PCT 360 (see FIG. 4).
  • FIG. 9 is a flowchart illustrating a method 542 for updating the internal PHB of process block 540 of FIG. 6, in accordance with one embodiment. At process block 544, an executed uOP is received. Once received, at process block 546, a program counter value associated with the executed uOP is identified according to a PC field of the executed uOP. At process block 548, a power consumption level associated with the executed uOP is identified according to the PCV field of the executed uOP.
  • At process block 550, an entry within the internal PHB corresponding to the PC field value of the executed uOP is incremented by a value of the PCV field of the executed uOP. At process block 552, process blocks 544-550 are repeated for each received executed uOP. In one embodiment, method 542 describes operations performed by RT 280, for example, as illustrated with reference to FIG. 4, for updating an internal PHB which is read by RTA 350 to form PHB 380.
  • FIG. 10 is a flowchart illustrating a method 562 for identifying instruction sequences having an excess power consumption level of process block 560 of FIG. 5, in accordance with one embodiment. At process block 564, the internal PHB is periodically queried to identify power consumption levels of the instructions of the application program. In one embodiment, CPU 110 may issue an interrupt at predetermined cycles or occurrences of an internal PHB overflow event to enable a driver to record contents of the internal PHB, which may optionally be cleared. In one embodiment, APM values and U values from APM register 410 and U register 420 are read by RTA 350 (FIG. 4). As described herein, querying of the internal PHB is performed since the internal PHB includes a limited number of entries when implemented using registers.
  • At process block 566, instructions having a power consumption level exceeding a predetermined power consumption level are detected. At process block 568, critical path power instructions are identified from the detected instructions, as instructions that fall within a frequently-executed instruction path having a high power consumption level. In one embodiment, application program critical paths may be identified using conventional techniques. Once identified, the critical paths may be analyzed to determine a power consumption level consumed by the critical paths. For critical paths having an excess power consumption level, such critical paths are identified as critical power path instruction sequences. At process block 570, high power level instruction sequences are identified from the detected instructions as instruction sequences executed by FUs having an average power consumption level greater than a predetermined FU power consumption level.
  • FIG. 11 is a flowchart illustrating a method 582 for recompiling the application program to reduce power consumed by one or more identified instruction sequences having an excess power consumption level of process block 580 of FIG. 5. At process block 584, identified critical power path instruction sequences are replaced with alternate instruction sequences to reduce power consumption levels by using the alternate instruction sequences. At process block 586, high power level instruction sequences are redistributed to utilize FUs with a lower average power consumption level.
  • At process block 588, a run-time optimizer may identify an idle time of one or more FUs used to execute identified high power level instruction sequences. In one embodiment, this may be performed by accessing, for example, a utilization register of the respective FU. As such, at process block 590, when high power level instruction sequences cannot be redistributed to FUs having a lower average power consumption level, in one embodiment, the compiler may issue high power instruction sequences to one or more FUs when utilization of the FUs is low to prohibit FU overheating.
  • Accordingly, as described herein, for instructions, which require additional time to complete execution, or for instructions that are on a critical path, such instructions will generally exhibit high power consumption levels, as determined from PHB 380 (FIG. 4). Hence, in one embodiment, techniques may be used to correlate the frequency of execution to adjust power information in post-processing, for example, by RTA 350, as required for applications. Accordingly, as described herein, compiler 300 may use the two tuple field (PC, PCV) sampled over time to identify application programs that have an excess power consumption level.
  • In one embodiment, an instruction is identified as a high power level instruction when uOPs decoded from the instruction are executed by FUs that exhibit an above-average power consumption level. In addition, identified instruction sequences, which exhibit an excess power consumption level, may include instructions which are decoded into a plurality of uOPs for execution and/or instruction sequences falling into critical paths of the application program. Accordingly, in one embodiment, compiler 300 may replace such instruction sequences with alternative instruction sequences, which consume less power at the expense of slightly decreased performance, while meeting an overall performance goal.
  • In an alternative embodiment, compiler 300 may sample APM registers by using an OS driver to identify FUs that consume less power. Hence, during recompiling of the application program, compiler 300 may distribute identified instruction sequences having an excess power consumption level to minimize total power consumption of the application program. In a further embodiment, utilization levels of FUs of the micro-architecture are used to issue instructions during identified idle periods of the various FUs. Hence, in one embodiment, the compiler may utilize a dynamic approach to prevent FUs from overheating by issuing identified high-power instruction sequences when identified utilization of an FU is low.
  • Accordingly, in one embodiment, a PHB is used by a compiler to identify program portions that consume an inordinate amount of power by querying APM registers, as well as U registers, to assist the compiler in implementing different optimization strategies with a different mix of functional units. Although embodiments described herein are directed to a micro-architecture of a processor, the embodiments described herein may be applied to other units, such as storage, computer graphics devices and I/O, such as peripheral interconnect devices. Hence, in one embodiment, power meters may be attached and sampled to similar functional units of attached program components. Hence, the OS can sample power consumed in external units and schedule tasks accordingly to prevent program system components from overheating.
  • FIG. 12 is a block diagram illustrating various representations or formats for simulation, emulation and fabrication of a design using the disclosed techniques. Data representing a design may represent the design in a number of manners. First, as is useful in simulations, the hardware may be represented using a hardware description language, or another functional description language, which essentially provides a computerized model of how the designed hardware is expected to perform. The hardware model 610 may be stored in a storage medium 600, such as a computer memory, so that the model may be simulated using simulation software 620 that applies a particular test suite 630 to the hardware model to determine if it indeed functions as intended. In some embodiments, the simulation software is not recorded, captured or contained in the medium.
  • In any representation of the design, the data may be stored in any form of a machine readable medium. An optical or electrical wave 660 modulated or otherwise generated to transport such information, a memory 650 or a magnetic or optical storage 640, such as a disk, may be the machine readable medium. Any of these media may carry the design information. The term “carry” (e.g., a machine readable medium carrying information) thus covers information stored on a storage device or information encoded or modulated into or onto a carrier wave. The set of bits describing the design or a particular of the design are (when embodied in a machine readable medium, such as a carrier or storage medium) an article that may be sealed in and out of itself, or used by others for further design or fabrication.
  • It is to be understood that even though numerous characteristics and advantages of various embodiments are set forth in the foregoing description, together with details of the structure and function of various embodiments of the invention, this disclosure is illustrative only. In some cases, certain subassemblies are only described in detail with one such embodiment. Nevertheless, it is recognized and intended that such subassemblies may be used in other embodiments of the invention. Changes may be made in detail, especially matters of structure and management in parts within the principles of the embodies of the present invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.
  • Having disclosed exemplary embodiments and the best mode, modifications and variations may be made to the disclosed embodiments while remaining within the scope of the embodiments of the invention as defined by the following claims.

Claims (30)

1. A method comprising:
computing power consumption levels of instructions of an application program;
identifying instruction sequences of the application program having an excess power consumption level; and
recompiling the application program to reduce power consumption levels of one or more of the identified instruction sequences.
2. The method of claim 1, wherein computing power comprises:
executing micro-operations decoded from the instructions of the application program within one or more micro-architecture functional units;
updating, by each functional unit, a power consumption field of each micro-operation processed by the respective functional unit with a power consumption level of the respective functional unit required to process the respective micro-operation during micro-operation execution; and
prior to retiring each executed micro-operation, updating a power history buffer entry according to a value of each executed micro-operation's power consumption field.
3. The method of claim 2, wherein updating a power consumption field comprises:
(i) determining a power consumption level of a functional unit for a program cycle;
(ii) identifying one or more micro-operations processed during the program cycle;
(iii) incrementing a value of a power consumption field of the one or more identified micro-operations by the determined power consumption level; and
(iv) repeating (i)-(iii) for each program cycle by each of one or more functional units.
4. The method of claim 3, wherein determining the power consumption level comprises:
querying a power consumption meter coupled to the functional unit; and
receiving a power consumption level measured by the power consumption meter during the program cycle.
5. The method of claim 3, wherein prior to incrementing, the method comprises:
dividing the determined power consumption level by a count of identified micro-operations during the program cycle to form a power consumption per cycle value;
incrementing the power consumption field of the identified micro-operations by the power consumption per cycle value;
updating an average power consumption meter of the function unit according to the power per consumption cycle value; and
updating a utilization meter of the functional unit for the program cycle.
6. The method of claim 3, wherein updating the power history buffer comprises:
(i) receiving an executed micro-operation;
(ii) identifying a program counter value associated with the executed micro-operation according to a program counter field of the executed micro-operation;
(iii) identifying a power consumption level associated with the executed micro-operation according to a power consumption field of the executed micro-operation;
(iv) incrementing an entry within the power history buffer corresponding to the identified program counter of the executed micro-operation by a value of the identified power consumption level of the executed micro-operation; and
(v) repeating (i)-(iv) for each received, executed micro-operation.
7. The method of claim 1, wherein identifying further comprises:
periodically querying a power history buffer to identify power consumed levels of the instructions of the application program;
detecting instructions having a power consumption level exceeding a predetermined power consumption level;
identifying, from the detected instructions, critical power path instruction sequences as instructions falling within a frequently executed instruction path having a high power consumption level; and
identifying, from the detected instructions, high power level instruction sequences as instruction sequences executed by functional units having an average power consumption level greater than a predetermined functional unit power consumption level.
8. The method of claim 1, wherein recompiling comprises:
replacing identified critical power path instruction sequences with alternate instruction sequences to reduced program power consumption levels by using the alternate instruction sequences.
9. The method of claim 1, wherein recompiling further comprises:
redistributing high power level instruction sequences to utilize functional units with a lower, average power consumption level.
10. The method of claim 1, further comprising:
sampling an idle time of one or more functional units used to execute identified high power level instruction sequences; and
issuing the high power instruction sequences to one or more functional units when utilization of the functional units is low to prohibit functional unit overheating.
11. An article of manufacture including a machine readable medium having stored thereon instructions which may be used to program a system to perform a method, comprising:
computing power consumption levels of instructions of an application program;
identifying instruction sequences of the application program having an excess power consumption level; and
recompiling the application program to reduce power consumption levels of one or more of the identified instruction sequences.
12. The article of manufacture of claim 11, wherein computing power comprises:
executing micro-operations decoded from the instructions of the application program within one or more micro-architecture functional units;
updating, by each functional unit, a power consumption field of each micro-operation processed by the respective functional unit with a power consumption level of the respective functional unit required to process the respective micro-operation during micro-operation execution; and
prior to retiring each executed micro-operation, updating a power history buffer entry according to a value of each executed micro-operation's power consumption field.
13. The article of manufacture of claim 12, wherein updating a power consumption field comprises:
(i) determining a power consumption level of a functional unit for a program cycle;
(ii) identifying one or more micro-operations processed during the program cycle;
(iii) incrementing a value of a power consumption field of the one or more identified micro-operations by the determined power consumption level; and
(iv) repeating (i)-(iii) for each program cycle by each of one or more functional units.
14. The article of manufacture of claim 13, wherein determining the power consumption level comprises:
querying a power consumption meter coupled to the functional unit; and
receiving a power consumption level measured by the power consumption meter during the program cycle.
15. The article of manufacture of claim 13, wherein prior to incrementing, the method comprises:
dividing the determined power consumption level by a count of identified micro-operations during the program cycle to form a power consumption per cycle value;
incrementing the power consumption field of the identified micro-operations by the power consumption per cycle value;
updating an average power consumption meter of the function unit according to the power per consumption cycle value; and
updating a utilization meter of the functional unit for the program cycle.
16. The article of manufacture of claim 13, wherein updating the power history buffer comprises:
(i) receiving an executed micro-operation;
(ii) identifying a program counter value associated with the executed micro-operation according to a program counter field of the executed micro-operation;
(iii) identifying a power consumption level associated with the executed micro-operation according to a power consumption field of the executed micro-operation;
(iv) incrementing an entry within the power history buffer corresponding to the identified program counter of the executed micro-operation by a value of the identified power consumption level of the executed micro-operation; and
(v) repeating (i)-(iv) for each received, executed micro-operation.
17. The article of manufacture of claim 11, wherein identifying further comprises:
periodically querying a power history buffer to identify power consumed levels of the instructions of the application program;
detecting instructions having a power consumption level exceeding a predetermined power consumption level;
identifying, from the detected instructions, critical power path instruction sequences as instructions falling within a frequently executed instruction path having a high power consumption level; and
identifying, from the detected instructions, high power level instruction sequences as instruction sequences executed by functional units having an average power consumption level greater than a predetermined functional unit power consumption level.
18. The article of manufacture of claim 11, wherein recompiling comprises:
replacing identified critical power path instruction sequences with alternate instruction sequences to achieve reduced instruction power consumption levels by using the alternate instruction sequences.
19. The article of manufacture of claim 11, wherein recompiling further comprises:
redistributing high power level instruction sequences to utilize functional units with a lower, average power consumption level.
20. The article of manufacture of claim 11, further comprising:
sampling an idle time of one or more functional units used to execute identified high power level instruction sequences; and
issuing the high power instruction sequences to one or more functional units when utilization of the functional units is low to prohibit functional unit overheating.
21. An apparatus comprising:
at least one functional unit to execute micro-operations decoded from instructions of an application program, the functional unit to compute power consumption levels of instructions of an application program; and
a memory coupled to the functional unit, the memory including a compiler to recompile the application program to reduce power consumption levels of an instruction sequence identified as having an excess power consumption level.
22. The apparatus of claim 21, wherein the at least one functional unit further comprises:
a retirement unit to update a prosecution history buffer entry according to a value of a power consumption field of each executed micro-operation.
23. The apparatus of claim 21, wherein the at least one functional unit is to increment a power consumption field of each micro-operation by a power consumption level required to process the respective micro-operation.
24. The apparatus of claim 21, further comprising:
a power meter coupled to the at least one functional unit, the power meter to measure power consumed by the at least one functional unit during a program cycle.
25. The apparatus of claim 21, wherein the at least one functional unit further comprises:
an average power consumption meter, the at least one functional unit to update the average power consumption meter according to a power consumption per cycle value.
26. A system comprises:
a self-contained power source;
a processor coupled to the power source, the process comprising:
at least one functional unit to execute micro-operations decoded from instructions of an application program, the functional unit to compute power consumption levels of instructions of an application program; and
a memory coupled to the processor, the memory including a compiler to recompile the application program to reduce power consumption levels of an instruction sequence identified as having an excess power consumption level.
27. The system of claim 26, wherein the at least one functional further comprises:
a retirement unit to update a prosecution history buffer entry according to a value of a power consumption field of each executed micro-operation.
28. The system of claim 26, wherein the at least one functional unit is to increment a power consumption field of each micro-operation by a power consumption level required to process the respective micro-operation.
29. The system of claim 26, further comprising:
a memory controller coupled between the processor and the memory; and
an input/output controller coupled to the memory controller.
30. The system of claim 26, further comprising:
logic built on a same die as the at least one functional unit and coupled to the at least one functional unit to measure power consumed by the at least one functional unit during a program cycle; and
an integrated circuit package containing the die in which the logic, the at least one functional unit and the processor are built.
US10/741,002 2003-12-19 2003-12-19 Method for computing power consumption levels of instruction and recompiling the program to reduce the excess power consumption Active 2025-03-29 US7287173B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US10/741,002 US7287173B2 (en) 2003-12-19 2003-12-19 Method for computing power consumption levels of instruction and recompiling the program to reduce the excess power consumption
CN201010571004.7A CN102063323B (en) 2003-12-19 2004-12-01 Apparatus and method for the power-performance monitor of low-power program adjustment
PCT/US2004/040136 WO2005066774A1 (en) 2003-12-19 2004-12-01 An apparatus and method for power performance monitors for low-power program tuning
CN2004800361038A CN1890636B (en) 2003-12-19 2004-12-01 Apparatus and method for power performance monitors for low-power program tuning
CN201611199215.6A CN106598691B (en) 2003-12-19 2004-12-01 Apparatus and method for power performance monitor for low power program adjustment
DE112004002506T DE112004002506B4 (en) 2003-12-19 2004-12-01 Apparatus and method for energy performance monitors for program adaptation to low power consumption
TW093137665A TWI301573B (en) 2003-12-19 2004-12-06 An apparatus and method for power performance monitors for low-power program tuning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/741,002 US7287173B2 (en) 2003-12-19 2003-12-19 Method for computing power consumption levels of instruction and recompiling the program to reduce the excess power consumption

Publications (2)

Publication Number Publication Date
US20050138450A1 true US20050138450A1 (en) 2005-06-23
US7287173B2 US7287173B2 (en) 2007-10-23

Family

ID=34678022

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/741,002 Active 2025-03-29 US7287173B2 (en) 2003-12-19 2003-12-19 Method for computing power consumption levels of instruction and recompiling the program to reduce the excess power consumption

Country Status (5)

Country Link
US (1) US7287173B2 (en)
CN (3) CN106598691B (en)
DE (1) DE112004002506B4 (en)
TW (1) TWI301573B (en)
WO (1) WO2005066774A1 (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046546A1 (en) * 2006-08-18 2008-02-21 Parmar Pankaj N EFI based mechanism to export platform management capabilities to the OS
US20080129341A1 (en) * 2006-12-01 2008-06-05 Matsushita Electric Industrial Co., Ltd. Semiconductor apparatus
US20090019264A1 (en) * 2007-07-11 2009-01-15 Correale Jr Anthony Adaptive execution cycle control method for enhanced instruction throughput
US20090019265A1 (en) * 2007-07-11 2009-01-15 Correale Jr Anthony Adaptive execution frequency control method for enhanced instruction throughput
US20090044032A1 (en) * 2007-08-09 2009-02-12 Timothy Chainer Method, Apparatus and Computer Program Product Providing Instruction Monitoring for Reduction of Energy Usage
JP2009037608A (en) * 2007-07-11 2009-02-19 Internatl Business Mach Corp <Ibm> Method, system and processor for controlling adaptive performance cycle for enhanced instruction throughput
US20090049277A1 (en) * 2007-08-14 2009-02-19 Oki Electric Industry Co., Ltd. Semiconductor integrated circuit device
US20090070607A1 (en) * 2007-09-11 2009-03-12 Kevin Safford Methods and apparatuses for reducing step loads of processors
US20090313615A1 (en) * 2008-06-16 2009-12-17 International Business Machines Corporation Policy-based program optimization to minimize environmental impact of software execution
US20100210240A1 (en) * 2009-02-17 2010-08-19 Flexilis, Inc. System and method for remotely securing or recovering a mobile device
US7853812B2 (en) 2007-02-07 2010-12-14 International Business Machines Corporation Reducing power usage in a software application
US20110022870A1 (en) * 2009-07-21 2011-01-27 Microsoft Corporation Component power monitoring and workload optimization
US20110047033A1 (en) * 2009-02-17 2011-02-24 Lookout, Inc. System and method for mobile device replacement
US20110145920A1 (en) * 2008-10-21 2011-06-16 Lookout, Inc System and method for adverse mobile application identification
US20110213995A1 (en) * 2007-08-09 2011-09-01 International Business Machines Corporation Method, Apparatus And Computer Program Product Providing Instruction Monitoring For Reduction Of Energy Usage
US20130042122A1 (en) * 2009-08-14 2013-02-14 Google Inc. Providing a user with feedback regarding power consumption in battery-operated electronic devices
US20130173935A1 (en) * 2012-01-04 2013-07-04 Ho Yang Power control method and apparatus for array processor
US8505095B2 (en) 2008-10-21 2013-08-06 Lookout, Inc. System and method for monitoring and analyzing multiple interfaces and multiple protocols
US8510843B2 (en) 2008-10-21 2013-08-13 Lookout, Inc. Security status and information display system
US8533844B2 (en) 2008-10-21 2013-09-10 Lookout, Inc. System and method for security data collection and analysis
US8544095B2 (en) 2008-10-21 2013-09-24 Lookout, Inc. System and method for server-coupled application re-analysis
US8561144B2 (en) 2008-10-21 2013-10-15 Lookout, Inc. Enforcing security based on a security state assessment of a mobile device
US8655307B1 (en) 2012-10-26 2014-02-18 Lookout, Inc. System and method for developing, updating, and using user device behavioral context models to modify user, device, and application state, settings and behavior for enhanced user security
US8683593B2 (en) 2008-10-21 2014-03-25 Lookout, Inc. Server-assisted analysis of data for a mobile device
US8738765B2 (en) 2011-06-14 2014-05-27 Lookout, Inc. Mobile device DNS optimization
US8788881B2 (en) 2011-08-17 2014-07-22 Lookout, Inc. System and method for mobile device push communications
US8855599B2 (en) 2012-12-31 2014-10-07 Lookout, Inc. Method and apparatus for auxiliary communications with mobile communications device
US8855601B2 (en) 2009-02-17 2014-10-07 Lookout, Inc. System and method for remotely-initiated audio communication
US9042876B2 (en) 2009-02-17 2015-05-26 Lookout, Inc. System and method for uploading location information based on device movement
US9043919B2 (en) 2008-10-21 2015-05-26 Lookout, Inc. Crawling multiple markets and correlating
CN104679657A (en) * 2015-03-16 2015-06-03 广州市久邦数码科技有限公司 Testing method for dynamically adjusting application program functions
EP2490103A3 (en) * 2006-08-31 2015-06-24 ATI Technologies ULC Video decoder and/or battery-powered device with reduced power consumption and methods thereof
US9208215B2 (en) 2012-12-27 2015-12-08 Lookout, Inc. User classification based on data gathered from a computing device
US9215074B2 (en) 2012-06-05 2015-12-15 Lookout, Inc. Expressing intent to control behavior of application components
US9235704B2 (en) 2008-10-21 2016-01-12 Lookout, Inc. System and method for a scanning API
US9307412B2 (en) 2013-04-24 2016-04-05 Lookout, Inc. Method and system for evaluating security for an interactive service operation by a mobile device
US9367680B2 (en) 2008-10-21 2016-06-14 Lookout, Inc. System and method for mobile communication device application advisement
US9374369B2 (en) 2012-12-28 2016-06-21 Lookout, Inc. Multi-factor authentication and comprehensive login system for client-server networks
US9424409B2 (en) 2013-01-10 2016-08-23 Lookout, Inc. Method and system for protecting privacy and enhancing security on an electronic device
US20160306414A1 (en) * 2011-06-30 2016-10-20 International Business Machines Corporation Software-centric power management
US9589129B2 (en) 2012-06-05 2017-03-07 Lookout, Inc. Determining source of side-loaded software
US9642008B2 (en) 2013-10-25 2017-05-02 Lookout, Inc. System and method for creating and assigning a policy for a mobile communications device based on personal data
US9753796B2 (en) 2013-12-06 2017-09-05 Lookout, Inc. Distributed monitoring, evaluation, and response for multiple devices
US9779253B2 (en) 2008-10-21 2017-10-03 Lookout, Inc. Methods and systems for sharing risk responses to improve the functioning of mobile communications devices
US9852416B2 (en) 2013-03-14 2017-12-26 Lookout, Inc. System and method for authorizing a payment transaction
US9955352B2 (en) 2009-02-17 2018-04-24 Lookout, Inc. Methods and systems for addressing mobile communications devices that are lost or stolen but not yet reported as such
US10122747B2 (en) 2013-12-06 2018-11-06 Lookout, Inc. Response generation after distributed monitoring and evaluation of multiple devices
US10218697B2 (en) 2017-06-09 2019-02-26 Lookout, Inc. Use of device risk evaluation to manage access to services
US10440053B2 (en) 2016-05-31 2019-10-08 Lookout, Inc. Methods and systems for detecting and preventing network connection compromise
US10540494B2 (en) 2015-05-01 2020-01-21 Lookout, Inc. Determining source of side-loaded software using an administrator server
US20200041577A1 (en) * 2018-08-03 2020-02-06 Advanced Micro Devices, Inc. Linear, low-latency power supply monitor
US10699273B2 (en) 2013-03-14 2020-06-30 Lookout, Inc. System and method for authorizing payment transaction based on device locations

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160474A1 (en) * 2004-01-15 2005-07-21 Fujitsu Limited Information processing device and program
US7898545B1 (en) * 2004-12-14 2011-03-01 Nvidia Corporation Apparatus, system, and method for integrated heterogeneous processors
US7466316B1 (en) 2004-12-14 2008-12-16 Nvidia Corporation Apparatus, system, and method for distributing work to integrated heterogeneous processors
JP2007109085A (en) * 2005-10-14 2007-04-26 Sony Computer Entertainment Inc Method, apparatus and system for controlling heat generation
US7376532B2 (en) * 2005-11-29 2008-05-20 International Business Machines Corporation Maximal temperature logging
US7512530B2 (en) * 2005-11-29 2009-03-31 International Business Machines Corporation Generation of software thermal profiles for applications in a simulated environment
US7386414B2 (en) * 2005-11-29 2008-06-10 International Business Machines Corporation Generation of hardware thermal profiles for a set of processors
US20070124618A1 (en) * 2005-11-29 2007-05-31 Aguilar Maximino Jr Optimizing power and performance using software and hardware thermal profiles
US7512513B2 (en) * 2005-11-29 2009-03-31 International Business Machines Corporation Thermal throttling control for testing of real-time software
US7681053B2 (en) * 2005-11-29 2010-03-16 International Business Machines Corporation Thermal throttle control with minimal impact to interrupt latency
US7721128B2 (en) * 2005-11-29 2010-05-18 International Business Machines Corporation Implementation of thermal throttling logic
US7603576B2 (en) * 2005-11-29 2009-10-13 International Business Machines Corporation Hysteresis in thermal throttling
US7698089B2 (en) * 2005-11-29 2010-04-13 International Business Machines Corporation Generation of software thermal profiles executed on a set of processors using processor activity
US7848901B2 (en) * 2005-11-29 2010-12-07 International Business Machines Corporation Tracing thermal data via performance monitoring
US7460932B2 (en) * 2005-11-29 2008-12-02 International Business Machines Corporation Support of deep power savings mode and partial good in a thermal management system
US20070260894A1 (en) * 2006-05-03 2007-11-08 Aguilar Maximino Jr Optimizing thermal performance using feed-back directed optimization
US7596430B2 (en) * 2006-05-03 2009-09-29 International Business Machines Corporation Selection of processor cores for optimal thermal performance
US8037893B2 (en) * 2006-05-03 2011-10-18 International Business Machines Corporation Optimizing thermal performance using thermal flow analysis
US7552346B2 (en) * 2006-05-03 2009-06-23 International Business Machines Corporation Dynamically adapting software for reducing a thermal state of a processor core based on its thermal index
US8027798B2 (en) * 2007-11-08 2011-09-27 International Business Machines Corporation Digital thermal sensor test implementation without using main core voltage supply
US20110078655A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Creating functional equivalent code segments of a computer software program with lower energy footprints
US8549330B2 (en) 2009-12-18 2013-10-01 International Business Machines Corporation Dynamic energy management
US8904208B2 (en) * 2011-11-04 2014-12-02 International Business Machines Corporation Run-time task-level dynamic energy management
JP5790431B2 (en) * 2011-11-18 2015-10-07 富士通株式会社 Design support apparatus, design support method, and design support program
US9087095B2 (en) 2012-06-21 2015-07-21 International Business Machines Corporation Processing columns in a database accelerator while preserving row-based architecture
CN104834562B (en) * 2015-04-30 2018-12-18 上海新储集成电路有限公司 A kind of operation method of isomeric data center and the data center
US10114649B2 (en) * 2015-05-26 2018-10-30 International Business Machines Corporation Thermal availability based instruction assignment for execution
US10248554B2 (en) 2016-11-14 2019-04-02 International Business Machines Corporation Embedding profile tests into profile driven feedback generated binaries
US10884485B2 (en) * 2018-12-11 2021-01-05 Groq, Inc. Power optimization in an artificial intelligence processor
CN113792352B (en) * 2021-08-18 2024-06-21 中山大学 Instruction scheduling optimization method, system, device and medium for power consumption balance

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5996083A (en) * 1995-08-11 1999-11-30 Hewlett-Packard Company Microprocessor having software controllable power consumption
US6219796B1 (en) * 1997-12-23 2001-04-17 Texas Instruments Incorporated Power reduction for processors by software control of functional units
US6256743B1 (en) * 1992-03-31 2001-07-03 Seiko Epson Corporation Selective power-down for high performance CPU/system
US6477654B1 (en) * 1999-04-06 2002-11-05 International Business Machines Corporation Managing VT for reduced power using power setting commands in the instruction stream
US6564328B1 (en) * 1999-12-23 2003-05-13 Intel Corporation Microprocessor with digital power throttle
US20030126476A1 (en) * 2002-01-02 2003-07-03 Greene Michael A. Instruction scheduling based on power estimation
US6625740B1 (en) * 2000-01-13 2003-09-23 Cirrus Logic, Inc. Dynamically activating and deactivating selected circuit blocks of a data processing integrated circuit during execution of instructions according to power code bits appended to selected instructions
US20040268159A1 (en) * 2003-06-30 2004-12-30 Microsoft Corporation Power profiling
US7155617B2 (en) * 2002-08-01 2006-12-26 Texas Instruments Incorporated Methods and systems for performing dynamic power management via frequency and voltage scaling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6163764A (en) * 1998-10-12 2000-12-19 Intel Corporation Emulation of an instruction set on an instruction set architecture transition
US6633987B2 (en) * 2000-03-24 2003-10-14 Intel Corporation Method and apparatus to implement the ACPI(advanced configuration and power interface) C3 state in a RDRAM based system
KR100711914B1 (en) * 2001-09-15 2007-04-27 엘지전자 주식회사 An apparatus for power saving of USB hub

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6256743B1 (en) * 1992-03-31 2001-07-03 Seiko Epson Corporation Selective power-down for high performance CPU/system
US5996083A (en) * 1995-08-11 1999-11-30 Hewlett-Packard Company Microprocessor having software controllable power consumption
US6219796B1 (en) * 1997-12-23 2001-04-17 Texas Instruments Incorporated Power reduction for processors by software control of functional units
US6477654B1 (en) * 1999-04-06 2002-11-05 International Business Machines Corporation Managing VT for reduced power using power setting commands in the instruction stream
US6564328B1 (en) * 1999-12-23 2003-05-13 Intel Corporation Microprocessor with digital power throttle
US6625740B1 (en) * 2000-01-13 2003-09-23 Cirrus Logic, Inc. Dynamically activating and deactivating selected circuit blocks of a data processing integrated circuit during execution of instructions according to power code bits appended to selected instructions
US20030126476A1 (en) * 2002-01-02 2003-07-03 Greene Michael A. Instruction scheduling based on power estimation
US7155617B2 (en) * 2002-08-01 2006-12-26 Texas Instruments Incorporated Methods and systems for performing dynamic power management via frequency and voltage scaling
US20040268159A1 (en) * 2003-06-30 2004-12-30 Microsoft Corporation Power profiling

Cited By (114)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046546A1 (en) * 2006-08-18 2008-02-21 Parmar Pankaj N EFI based mechanism to export platform management capabilities to the OS
EP2490103A3 (en) * 2006-08-31 2015-06-24 ATI Technologies ULC Video decoder and/or battery-powered device with reduced power consumption and methods thereof
US20080129341A1 (en) * 2006-12-01 2008-06-05 Matsushita Electric Industrial Co., Ltd. Semiconductor apparatus
US7853812B2 (en) 2007-02-07 2010-12-14 International Business Machines Corporation Reducing power usage in a software application
US20090019264A1 (en) * 2007-07-11 2009-01-15 Correale Jr Anthony Adaptive execution cycle control method for enhanced instruction throughput
US20090019265A1 (en) * 2007-07-11 2009-01-15 Correale Jr Anthony Adaptive execution frequency control method for enhanced instruction throughput
JP2009037608A (en) * 2007-07-11 2009-02-19 Internatl Business Mach Corp <Ibm> Method, system and processor for controlling adaptive performance cycle for enhanced instruction throughput
US7779237B2 (en) * 2007-07-11 2010-08-17 International Business Machines Corporation Adaptive execution frequency control method for enhanced instruction throughput
US7937568B2 (en) * 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090044032A1 (en) * 2007-08-09 2009-02-12 Timothy Chainer Method, Apparatus and Computer Program Product Providing Instruction Monitoring for Reduction of Energy Usage
US20110213995A1 (en) * 2007-08-09 2011-09-01 International Business Machines Corporation Method, Apparatus And Computer Program Product Providing Instruction Monitoring For Reduction Of Energy Usage
US20090049277A1 (en) * 2007-08-14 2009-02-19 Oki Electric Industry Co., Ltd. Semiconductor integrated circuit device
US8886979B2 (en) 2007-09-11 2014-11-11 Intel Corporation Methods and apparatuses for reducing step loads of processors
US20090070607A1 (en) * 2007-09-11 2009-03-12 Kevin Safford Methods and apparatuses for reducing step loads of processors
US8479029B2 (en) 2007-09-11 2013-07-02 Intel Corporation Methods and apparatuses for reducing step loads of processors
US7992017B2 (en) * 2007-09-11 2011-08-02 Intel Corporation Methods and apparatuses for reducing step loads of processors
US20090313615A1 (en) * 2008-06-16 2009-12-17 International Business Machines Corporation Policy-based program optimization to minimize environmental impact of software execution
US8495605B2 (en) * 2008-06-16 2013-07-23 International Business Machines Corporation Policy-based program optimization to minimize environmental impact of software execution
US8505095B2 (en) 2008-10-21 2013-08-06 Lookout, Inc. System and method for monitoring and analyzing multiple interfaces and multiple protocols
US8745739B2 (en) 2008-10-21 2014-06-03 Lookout, Inc. System and method for server-coupled application re-analysis to obtain characterization assessment
US9860263B2 (en) 2008-10-21 2018-01-02 Lookout, Inc. System and method for assessing data objects on mobile communications devices
US20110145920A1 (en) * 2008-10-21 2011-06-16 Lookout, Inc System and method for adverse mobile application identification
US9781148B2 (en) 2008-10-21 2017-10-03 Lookout, Inc. Methods and systems for sharing risk responses between collections of mobile communications devices
US10417432B2 (en) 2008-10-21 2019-09-17 Lookout, Inc. Methods and systems for blocking potentially harmful communications to improve the functioning of an electronic device
US10509910B2 (en) 2008-10-21 2019-12-17 Lookout, Inc. Methods and systems for granting access to services based on a security state that varies with the severity of security events
US8510843B2 (en) 2008-10-21 2013-08-13 Lookout, Inc. Security status and information display system
US8533844B2 (en) 2008-10-21 2013-09-10 Lookout, Inc. System and method for security data collection and analysis
US9779253B2 (en) 2008-10-21 2017-10-03 Lookout, Inc. Methods and systems for sharing risk responses to improve the functioning of mobile communications devices
US8544095B2 (en) 2008-10-21 2013-09-24 Lookout, Inc. System and method for server-coupled application re-analysis
US8561144B2 (en) 2008-10-21 2013-10-15 Lookout, Inc. Enforcing security based on a security state assessment of a mobile device
US9740852B2 (en) 2008-10-21 2017-08-22 Lookout, Inc. System and method for assessing an application to be installed on a mobile communications device
US9563749B2 (en) 2008-10-21 2017-02-07 Lookout, Inc. Comparing applications and assessing differences
US9407640B2 (en) 2008-10-21 2016-08-02 Lookout, Inc. Assessing a security state of a mobile communications device to determine access to specific tasks
US8683593B2 (en) 2008-10-21 2014-03-25 Lookout, Inc. Server-assisted analysis of data for a mobile device
US9367680B2 (en) 2008-10-21 2016-06-14 Lookout, Inc. System and method for mobile communication device application advisement
US9996697B2 (en) 2008-10-21 2018-06-12 Lookout, Inc. Methods and systems for blocking the installation of an application to improve the functioning of a mobile communications device
US8752176B2 (en) 2008-10-21 2014-06-10 Lookout, Inc. System and method for server-coupled application re-analysis to obtain trust, distribution and ratings assessment
US9344431B2 (en) 2008-10-21 2016-05-17 Lookout, Inc. System and method for assessing an application based on data from multiple devices
US9294500B2 (en) 2008-10-21 2016-03-22 Lookout, Inc. System and method for creating and applying categorization-based policy to secure a mobile communications device from access to certain data objects
US8826441B2 (en) 2008-10-21 2014-09-02 Lookout, Inc. Event-based security state assessment and display for mobile devices
US9235704B2 (en) 2008-10-21 2016-01-12 Lookout, Inc. System and method for a scanning API
US9223973B2 (en) 2008-10-21 2015-12-29 Lookout, Inc. System and method for attack and malware prevention
US10509911B2 (en) 2008-10-21 2019-12-17 Lookout, Inc. Methods and systems for conditionally granting access to services based on the security state of the device requesting access
US8875289B2 (en) 2008-10-21 2014-10-28 Lookout, Inc. System and method for preventing malware on a mobile communication device
US8881292B2 (en) 2008-10-21 2014-11-04 Lookout, Inc. Evaluating whether data is safe or malicious
US11080407B2 (en) 2008-10-21 2021-08-03 Lookout, Inc. Methods and systems for analyzing data after initial analyses by known good and known bad security components
US9100389B2 (en) 2008-10-21 2015-08-04 Lookout, Inc. Assessing an application based on application data associated with the application
US8984628B2 (en) * 2008-10-21 2015-03-17 Lookout, Inc. System and method for adverse mobile application identification
US8997181B2 (en) 2008-10-21 2015-03-31 Lookout, Inc. Assessing the security state of a mobile communications device
US9065846B2 (en) 2008-10-21 2015-06-23 Lookout, Inc. Analyzing data gathered through different protocols
US9043919B2 (en) 2008-10-21 2015-05-26 Lookout, Inc. Crawling multiple markets and correlating
US9167550B2 (en) 2009-02-17 2015-10-20 Lookout, Inc. Systems and methods for applying a security policy to a device based on location
US20110047033A1 (en) * 2009-02-17 2011-02-24 Lookout, Inc. System and method for mobile device replacement
US9042876B2 (en) 2009-02-17 2015-05-26 Lookout, Inc. System and method for uploading location information based on device movement
US9100925B2 (en) 2009-02-17 2015-08-04 Lookout, Inc. Systems and methods for displaying location information of a device
US8929874B2 (en) 2009-02-17 2015-01-06 Lookout, Inc. Systems and methods for remotely controlling a lost mobile communications device
US8855601B2 (en) 2009-02-17 2014-10-07 Lookout, Inc. System and method for remotely-initiated audio communication
US9179434B2 (en) 2009-02-17 2015-11-03 Lookout, Inc. Systems and methods for locking and disabling a device in response to a request
US9955352B2 (en) 2009-02-17 2018-04-24 Lookout, Inc. Methods and systems for addressing mobile communications devices that are lost or stolen but not yet reported as such
US8467768B2 (en) 2009-02-17 2013-06-18 Lookout, Inc. System and method for remotely securing or recovering a mobile device
US20100210240A1 (en) * 2009-02-17 2010-08-19 Flexilis, Inc. System and method for remotely securing or recovering a mobile device
US9232491B2 (en) 2009-02-17 2016-01-05 Lookout, Inc. Mobile device geolocation
US8825007B2 (en) 2009-02-17 2014-09-02 Lookout, Inc. Systems and methods for applying a security policy to a device based on a comparison of locations
US8538815B2 (en) 2009-02-17 2013-09-17 Lookout, Inc. System and method for mobile device replacement
US10623960B2 (en) 2009-02-17 2020-04-14 Lookout, Inc. Methods and systems for enhancing electronic device security by causing the device to go into a mode for lost or stolen devices
US8774788B2 (en) 2009-02-17 2014-07-08 Lookout, Inc. Systems and methods for transmitting a communication based on a device leaving or entering an area
US8635109B2 (en) 2009-02-17 2014-01-21 Lookout, Inc. System and method for providing offers for mobile devices
US10419936B2 (en) 2009-02-17 2019-09-17 Lookout, Inc. Methods and systems for causing mobile communications devices to emit sounds with encoded information
US9569643B2 (en) 2009-02-17 2017-02-14 Lookout, Inc. Method for detecting a security event on a portable electronic device and establishing audio transmission with a client computer
US8682400B2 (en) 2009-02-17 2014-03-25 Lookout, Inc. Systems and methods for device broadcast of location information when battery is low
WO2011011452A3 (en) * 2009-07-21 2011-04-28 Microsoft Corporation Component power monitoring and workload optimization
US20110022870A1 (en) * 2009-07-21 2011-01-27 Microsoft Corporation Component power monitoring and workload optimization
US9880920B2 (en) * 2009-08-14 2018-01-30 Google Llc Providing a user with feedback regarding power consumption in battery-operated electronic devices
US20130042122A1 (en) * 2009-08-14 2013-02-14 Google Inc. Providing a user with feedback regarding power consumption in battery-operated electronic devices
US8738765B2 (en) 2011-06-14 2014-05-27 Lookout, Inc. Mobile device DNS optimization
US20160306414A1 (en) * 2011-06-30 2016-10-20 International Business Machines Corporation Software-centric power management
US10181118B2 (en) 2011-08-17 2019-01-15 Lookout, Inc. Mobile communications device payment method utilizing location information
US8788881B2 (en) 2011-08-17 2014-07-22 Lookout, Inc. System and method for mobile device push communications
US20130173935A1 (en) * 2012-01-04 2013-07-04 Ho Yang Power control method and apparatus for array processor
US9992025B2 (en) 2012-06-05 2018-06-05 Lookout, Inc. Monitoring installed applications on user devices
US10256979B2 (en) 2012-06-05 2019-04-09 Lookout, Inc. Assessing application authenticity and performing an action in response to an evaluation result
US9589129B2 (en) 2012-06-05 2017-03-07 Lookout, Inc. Determining source of side-loaded software
US9407443B2 (en) 2012-06-05 2016-08-02 Lookout, Inc. Component analysis of software applications on computing devices
US9215074B2 (en) 2012-06-05 2015-12-15 Lookout, Inc. Expressing intent to control behavior of application components
US10419222B2 (en) 2012-06-05 2019-09-17 Lookout, Inc. Monitoring for fraudulent or harmful behavior in applications being installed on user devices
US9940454B2 (en) 2012-06-05 2018-04-10 Lookout, Inc. Determining source of side-loaded software using signature of authorship
US11336458B2 (en) 2012-06-05 2022-05-17 Lookout, Inc. Evaluating authenticity of applications based on assessing user device context for increased security
US9408143B2 (en) 2012-10-26 2016-08-02 Lookout, Inc. System and method for using context models to control operation of a mobile communications device
US8655307B1 (en) 2012-10-26 2014-02-18 Lookout, Inc. System and method for developing, updating, and using user device behavioral context models to modify user, device, and application state, settings and behavior for enhanced user security
US9769749B2 (en) 2012-10-26 2017-09-19 Lookout, Inc. Modifying mobile device settings for resource conservation
US9208215B2 (en) 2012-12-27 2015-12-08 Lookout, Inc. User classification based on data gathered from a computing device
US9374369B2 (en) 2012-12-28 2016-06-21 Lookout, Inc. Multi-factor authentication and comprehensive login system for client-server networks
US8855599B2 (en) 2012-12-31 2014-10-07 Lookout, Inc. Method and apparatus for auxiliary communications with mobile communications device
US9424409B2 (en) 2013-01-10 2016-08-23 Lookout, Inc. Method and system for protecting privacy and enhancing security on an electronic device
US9852416B2 (en) 2013-03-14 2017-12-26 Lookout, Inc. System and method for authorizing a payment transaction
US10699273B2 (en) 2013-03-14 2020-06-30 Lookout, Inc. System and method for authorizing payment transaction based on device locations
US9307412B2 (en) 2013-04-24 2016-04-05 Lookout, Inc. Method and system for evaluating security for an interactive service operation by a mobile device
US9642008B2 (en) 2013-10-25 2017-05-02 Lookout, Inc. System and method for creating and assigning a policy for a mobile communications device based on personal data
US10452862B2 (en) 2013-10-25 2019-10-22 Lookout, Inc. System and method for creating a policy for managing personal data on a mobile communications device
US10990696B2 (en) 2013-10-25 2021-04-27 Lookout, Inc. Methods and systems for detecting attempts to access personal information on mobile communications devices
US9753796B2 (en) 2013-12-06 2017-09-05 Lookout, Inc. Distributed monitoring, evaluation, and response for multiple devices
US10122747B2 (en) 2013-12-06 2018-11-06 Lookout, Inc. Response generation after distributed monitoring and evaluation of multiple devices
US10742676B2 (en) 2013-12-06 2020-08-11 Lookout, Inc. Distributed monitoring and evaluation of multiple devices
CN104679657A (en) * 2015-03-16 2015-06-03 广州市久邦数码科技有限公司 Testing method for dynamically adjusting application program functions
US10540494B2 (en) 2015-05-01 2020-01-21 Lookout, Inc. Determining source of side-loaded software using an administrator server
US11259183B2 (en) 2015-05-01 2022-02-22 Lookout, Inc. Determining a security state designation for a computing device based on a source of software
US12120519B2 (en) 2015-05-01 2024-10-15 Lookout, Inc. Determining a security state based on communication with an authenticity server
US10440053B2 (en) 2016-05-31 2019-10-08 Lookout, Inc. Methods and systems for detecting and preventing network connection compromise
US11683340B2 (en) 2016-05-31 2023-06-20 Lookout, Inc. Methods and systems for preventing a false report of a compromised network connection
US11038876B2 (en) 2017-06-09 2021-06-15 Lookout, Inc. Managing access to services based on fingerprint matching
US10218697B2 (en) 2017-06-09 2019-02-26 Lookout, Inc. Use of device risk evaluation to manage access to services
US12081540B2 (en) 2017-06-09 2024-09-03 Lookout, Inc. Configuring access to a network service based on a security state of a mobile device
US20200041577A1 (en) * 2018-08-03 2020-02-06 Advanced Micro Devices, Inc. Linear, low-latency power supply monitor
US11237220B2 (en) * 2018-08-03 2022-02-01 Advanced Micro Devices, Inc. Linear, low-latency power supply monitor

Also Published As

Publication number Publication date
CN106598691A (en) 2017-04-26
CN102063323A (en) 2011-05-18
CN1890636B (en) 2011-03-02
CN1890636A (en) 2007-01-03
DE112004002506T5 (en) 2006-11-02
TW200527199A (en) 2005-08-16
CN106598691B (en) 2020-06-05
DE112004002506B4 (en) 2011-06-09
TWI301573B (en) 2008-10-01
CN102063323B (en) 2017-03-01
US7287173B2 (en) 2007-10-23
WO2005066774A1 (en) 2005-07-21

Similar Documents

Publication Publication Date Title
US7287173B2 (en) Method for computing power consumption levels of instruction and recompiling the program to reduce the excess power consumption
Lukefahr et al. Composite cores: Pushing heterogeneity into a core
Bircher et al. Runtime identification of microprocessor energy saving opportunities
Haj-Yihia et al. Fine-grain power breakdown of modern out-of-order cores and its implications on skylake-based systems
Gochman et al. The Intel® Pentium® M Processor: Microarchitecture and Performance.
US7194643B2 (en) Apparatus and method for an energy efficient clustered micro-architecture
US8250395B2 (en) Dynamic voltage and frequency scaling (DVFS) control for simultaneous multi-threading (SMT) processors
Van den Steen et al. Analytical processor performance and power modeling using micro-architecture independent characteristics
US8650413B2 (en) On-chip power proxy based architecture
US7802241B2 (en) Method for estimating processor energy usage
CN101246447B (en) Method and apparatus for measuring pipeline stalls in a microprocessor
US8219833B2 (en) Two-level guarded predictive power gating
Yasin et al. A metric-guided method for discovering impactful features and architectural insights for skylake-based processors
Ratković et al. An overview of architecture-level power-and energy-efficient design techniques
Bircher et al. Effective use of performance monitoring counters for run-time prediction of power
Jarus et al. Top-Down Characterization Approximation based on performance counters architecture for AMD processors
US20230195593A1 (en) System, Method And Apparatus For High Level Microarchitecture Event Performance Monitoring Using Fixed Counters
Owahid et al. Wasted dynamic power and correlation to instruction set architecture for CPU throttling
Mehta et al. Fetch halting on critical load misses
Allam et al. An efficient CPI stack counter architecture for superscalar processors
Goel Per-core power estimation and power aware scheduling strategies for CMPs
Owahid et al. RTL level instruction profiling for CPU throttling to reduce wasted dynamic power
Ozen et al. The return of power gating: smart leakage energy reductions in modern out-of-order processor architectures
Eyerman et al. Extending the performance analysis tool box: Multi-stage CPI stacks and FLOPS stacks
Haj-Yahya et al. Power modeling at high-performance computing processors

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HSIEH, CHENG-HSUEH;REEL/FRAME:014828/0096

Effective date: 20031219

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12