5.1 Inputs for ILP Model
We have shown the overview block diagram of the proposed ILP framework in Figure
4.
The first step is to compile each benchmark on the TI-based compiler to generate the assembly code. We begin the profiling process once we have an .asm file (assembly file) for each benchmark. Profiling can be done manually or with online profilers such as VTune, Valgrind, gperf, and gprof. We performed manual profiling using an .asm file as input and checking for branch, load, and store instructions. We manually calculate the total number of loads and stores for each function and global variable by considering the jump instructions. During the profiling and characterization process, we consider the branch instructions and their behavior from the generated assembly code. By manually parsing through each benchmark, we can also determine how many functions and variables are present in a given application.
Using the datasheet of these MSP430FR6989/MSP430F5529 boards, we can get to know the sizes of SRAM, flash, and FRAM. We recorded the number of CPU cycles required to execute each application while running it on MSP430FR6989/MSP430F5529 boards. The CPU cycles were recorded in both stable and unstable power scenarios.
Experiments were also conducted to determine the energy required for each read and write request to SRAM and FRAM. We refer to this as a one-time characterization, meaning that we have to conduct these experiments only for the first time. This information is required for a specific MSP430-based microcontroller to calculate the total system energy. We created a software procedure that writes data to the FRAM address space while reading the contents of the SRAM address space through loops, i.e., reading from SRAM and writing to FRAM. Another software procedure that writes data to the SRAM address space is by reading the contents of the FRAM address space using loops, i.e., reading from FRAM and writing to SRAM. Using this analysis, we calculate the energy per read/write to SRAM/FRAM. As shown in Figure
4, we currently have the following information that must be provided to the ILP solver:
–
Memory address ranges for SRAM, flash, and FRAM.
–
Number of CPU cycles required to run a specific application.
–
Number of loads and stores for each function and variable (as determined by the .asm file).
–
Number of functions and variables in a given application is known.
–
Energy required for each read/write to SRAM/FRAM.
–
Frequency of power failures that occur during the execution of an application and duration of each power failure.
The inputs mentioned above are the ILP solver’s inputs. We used ILPSolve 5.5 [
5] as our ILP solver throughout this work. ILPSolve IDE is mentioned in [
5] and explains in detail how to put the ILP formulation into the IDE to achieve the optimized solution. With all of these inputs in hand, we formulate the proposed ILP model that supports intermittent computing.
5.2 ILP Formulation for Data Mapping for Intermittent Computing
We present the ILP formulation for the memory mapping problem mentioned in Definition
4.1. We divide this ILP formulation into two parts, one is for global variables, and the second is for the functions.
For Global Variables: Let the number of global variables in a program be ‘G’. Let the number of reads and writes to the variable ‘i’ be \(r_i\) and \(w_i\). We divided FRAM’s 128 KB into two regions, i.e., \(FRAM_n\) and \(FRAM_b\), \(FRAM_n\) memory region has 125 KB, and the \(FRAM_b\) memory region has 3 KB.
We have two memory regions represented as
\(Mem_j\) as shown in the equation
5; When j = 1, we select the memory region as SRAM, and we use
\(FRAM_{n}\) for j = 2.
Let the SRAM / FRAM sizes be
\(Size(Mem_j)\) as shown in equation
6; When j = 1, we refer to the SRAM memory size in bytes, and when j = 2, we refer to the memory size
\(FRAM_{n}\) in bytes.
Let the energy required for each read/write to \(Mem_j\) be \(E_{r\_j}\) and \(E_{w\_j}\). Let the number of CPU cycles required to execute a global variable \(v_i\) be \(NC_{v_i}\), where \(\forall i \in [1, G])\). Using one-time characterization and static profiling, we gathered data such as per read/write energy to SRAM/FRAM and the number of cycles.
We define a
binary variable (BV);
\(I_{j}\left(v_{i}\right)\), which refers to a variable
\(v_i\) is allocated to the memory region
\(j\). If
\(I_{j}\left(v_{i}\right)\) = 1 then the variable
\(v_i\) is allocated and
\(I_{j}\left(v_{i}\right)\) = 0 indicates that the variable
\(v_i\) is not allocated.
\(I_{j}\left(v_{i}\right)\), where
\((\forall j \in [1, Mem_j], \forall i \in [1, G])\) is defined as shown in equation
7.
Constraints: There are two constraints, one is for BV;
\(I_{j}\left(v_{i}\right)\) and one is a memory size constraint. In any case, a variable
\(v_i\) is allocated to only one memory region, which means that
\(v_i\) is allocated to either SRAM or FRAM but not both. This constraint is defined in equation
8.
The other constraint is related to memory sizes. The allocated variables
\(v_i\) and its
\(Size(v_i)\);
\(\forall i \in [1, G])\) should not be greater than
\(Size(Mem_j)\). This constraint is defined in equation
9.
Objective 4.1: The challenge of mapping global variables in a program to either SRAM or FRAM is to reduce EDP and improve system performance.
\(E_{global}\) is defined in equation
10. Where
\(E_{global}\) is the energy required to allocate global variables to either SRAM or FRAM and execute those from their respective memory regions.
\(EDP_{global}\) is defined in equation
11. Where
\(EDP_{global}\) is the energy-delay product required to allocate global variables to either SRAM or FRAM.
For Functions: Let the number of functions in a program be
\(`N_f^{\prime }\). Let the number of reads and writes to the
\(i{\rm {th}}\) function be
\(r(F_i)\) and
\(w(F_i)\), where
\(\forall i \in [1, N_f]\). The functions consist of procedural parameters, local variables, and return variables. Internally, the code/data of functions are divided into text, data, and stack sections. We map at least one section among these three sections to either SRAM or FRAM regions, i.e.,
\(Mem_j\) and
\(Sec_k(i)\) defines section ‘k’ of
\(i{\rm {th}}\) function as shown in equation
12; when k = 1, we refer to the text section of
\(i{\rm {th}}\) function, when k = 2, we refer to the data section of
\(i{\rm {th}}\) function, and when k = 3, we refer to the stack section of
\(i{\rm {th}}\) function.
We define a BV;
\(I_{j}\left(Sec_{k}(i) \right)\), which refers to a section
\(Sec_k\) of
\(i{\rm {th}}\) function is allocated to only one memory region
\(j\). If
\(I_{j}\left(Sec_{k}(i) \right)\) = 1 then the section
\(Sec_i\) is allocated and
\(I_{j}\left(Sec_{k}(i) \right)\) = 0 that indicates the section
\(Sec_i\) is not allocated.
\(I_{j}\left(Sec_{k}(i) \right)\), where
\((\forall j \in [1, Mem_j], \forall i \in [1, N_f])\),
\(\forall k \in [1, Sec_k(i)])\) is defined as shown in equation
13.
Constraints: There are two constraints, one is for BV;
\(I_{j}\left(Sec_{k}(i) \right)\) and one is a memory size constraint. In any case, a
\(Sec_k\) of the
\(i{\rm {th}}\) function is allocated to only one memory region, which means that the
\(Sec_k\) of the
\(i{\rm {th}}\) function is allocated to either SRAM or FRAM but not both. This constraint is defined in equation
14.
The other constraint is related to memory sizes. The allocated sections
\(Sec_{k}(i)\) and its
\(Size(F_i)\);
\(\forall k \in [1, Sec_k(i)])\),
\(\forall j \in [1, Mem_j]\),
\(\forall i \in [1, N_f]\) should not be greater than the
\(Size(Mem_j)\). This constraint is defined in equation
15.
Objective 4.2: The challenge of mapping sections of these functions in a program to either SRAM or FRAM is to minimize EDP and improve system performance.
\(E_{func}\) is defined in equation
16, where
\(M_{c_i}\) is the number of the times
\(i{\rm {th}}\) functions called.
\(EDP_{func}\) is defined in the equation
17. Where
\(EDP_{func}\) is the energy-delay product required to allocate all functions to either SRAM or FRAM. Where
\(E_{func}\) is the energy required to allocate functions to either SRAM or FRAM. Where
\(NC_{F_i}\) is the number of CPU cycles required to execute a function
\(F_i\).
The total system EDP,
\(EDP_{system}\), is the sum of both
\(EDP_{global}\) and
\(EDP_{func}\) as shown in equation
18.
Our objective function is shown in the equation
19. Our main objective is to minimize the system’s EDP by choosing the optimal placement choice.
5.3 Implementing Mapping Technique in MSP430FR6989
Once we obtain the placement information from \(ILP\_solver\), we map the respective variables and the sections of a function to either SRAM or FRAM. We modify the linker script accordingly to map the sections or variables to either SRAM or FRAM. In our proposed mapping policy, placing global variables is straightforward, i.e., mapping the respective variable to either SRAM or FRAM based on the ILP decision.
We observed that from the linker script, we could map the whole stack section of each function to either SRAM or FRAM. We analyzed the mappings of the stack section for each function by modifying the linker script. We used the built-in attributes to differentiate the mappings between SRAM and FRAM; for example, we used the built-in attribute \((\_\_attribute\_\_((ramfunc))\) that maps this function to SRAM. If we want to place the stack section to SRAM, we modify the linker script by replacing the default setting with “ .stack: {} > RAM (HIGH) ”. If we want to place the stack section to FRAM, we modify the linker script by replacing the default setting with “.stack: {} > FRAM”.
Similarly, for the text section, we observed that placing the text section in either SRAM or FRAM shows an impact on EDP. This effect is because the majority of accesses in the text section are read accesses, as we observed that the energy consumption for each read access to SRAM/FRAM differs. Table
3 shows that approximately FRAM consumes 2x more read energy than SRAM. Thus, we analyzed each application to map the text section based on the free space available. If we have enough space available in SRAM, we place the text section in SRAM itself; otherwise, we place the text section in FRAM. We included the following four lines in our linker script to check the above condition and map the text section.
(1)
\(\#ifndef \_\_LARGE\_CODE\_MODEL\_\_\)
We modified the linker script for mapping the data section by using the inbuilt compiler directives. We followed the below three steps.
(1)
Allocate a new memory block, for instance, \(NEW\_DATASECTION\). We can declare the start address and size of the data section in the linker script.
(2)
Define a segment (.Localvars) which stores in this memory block (\(NEW\_DATASECTION\)).
(3)
Use #pragma \(DATA\_SECTION (funct\_name, seg\_name)\) in the program to define functions in this segment. Where \(funct\_name\) is the function name, and \(seg\_name\) is the created segment name. For instance, #pragma \(DATA\_SECTION (func\_1, .Localvars)\)
Once we are done with creating the different sections, we can allocate these sections to either SRAM or FRAM based on ILP decisions. For instance, placing ” \(NEW\_DATASECTION\): {} > FRAM” in the linker script, which maps the \(NEW\_DATASECTION\) to FRAM.
5.4 Support for Intermittent Computing
When the power is stable, everything works properly. Because of the static allocation scheme, we map all functions/variables to SRAM/FRAM for the first time. During a power failure, SRAM and registers lose all of their contents, including mapping information. When power is restored, we don’t know what functions/variables were allocated to SRAM before the failure. As a result, we must either restart the execution from the beginning or end up with incorrect results. Restarting the application consumes extra energy and time, making our system inefficient in terms of energy consumption and performance.
We propose a backup strategy during frequent power failures. FRAM was divided into
\(FRAM_n\) and
\(FRAM_b\) as shown in Figure
3.
\(FRAM_n\) has a size of 125 KB and is used for regular mappings.
\(FRAM_b\) has a size of 3 KB that serves as a
backup region (BR) during power failures. So, during a power failure, we back up all register and SRAM contents to FRAM. Whenever power is restored, we restore the register and SRAM contents from
\(FRAM_b\) to SRAM and resume the application execution. The proposed backup strategy reduces extra energy consumption and makes the system more energy efficient.
5.4.1 Implementation Details of Flash-based Programming for Intermittent Computing:.
MSP430F5529 consists of SRAM and Flash at main memory. SRAM is the only memory on the chip where the CPU can read code for executing the application during Flash programming. We need to copy the Flash program function onto the stack whenever we want to use only SRAM for mapping the application. Whenever we want to switch between SRAM to Flash, we need to restore the stack pointer, and as well as we need to map the program counter register to the Flash memory region.
During a power failure scenario, we must perform the backup operation to copy the SRAM data to the Flash memory region. For the backup operation, we made some changes to the inbuilt MSP430 functions, such as void Flash_wb(char *Data_ptr, char byte) and void Flash_ww(int *Data_ptr, int word). Where Flash_wb() helps in writing the byte to the Flash memory region, Flash_ww() helps in writing the word to the Flash memory region.
Whenever power comes back, we must restore the contents from the Flash-based backup region to the SRAM memory region. We used the inbuilt functions, i.e., ctpl() functions for copying from Flash to SRAM, and after restoring, we needed to clear the Flash-based backup region; for this, we made changes to the inbuilt function, i.e., void Flash_clr(int *Data_ptr) to clear the Flash data.