SA-CIRCA: Self-Adaptive Control for Mission-Critical Systems

AFRL-IF-RS-TR-2002-207 Final Technical Report August 2002 SA-CIRCA: SELF-ADAPTIVE CONTROL FOR MISSION-CRITICAL SYSTEMS Honeywell Technology Center Sponsored by Defense Advanced Research Projects Agency DARPA Order No. G428 APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government. AIR FORCE RESEARCH LABORATORY INFORMATION DIRECTORATE ROME RESEARCH SITE ROME, NEW YORK This report has been reviewed by the Air Force Research Laboratory, Information Directorate, Public Affairs Office (IFOIPA) and is releasable to the National Technical Information Service (NTIS). At NTIS it will be releasable to the general public, including foreign nations. AFRL-IF-RS-TR-2002-207 has been reviewed and is approved for publication APPROVED: JOHN J. CROWTER Project Engineer ^^W«g^ <^' L^C>^^7<y FOR THE DIRECTOR: JAMES A COLLINS Acting Technical Advisor Information Technology Division Information Directorate Form Approved OMB No. 074-0188 REPORT DOCUMENTATION PAGE Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC 20503 1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED Aug 02 Final Jun 98 – Nov 99 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS SA-CIRCA: SELF-ADAPTIVE CONTROL FOR MISSION-CRITICAL SYSTEMS C PE PR TA WU 6. AUTHOR(S) David J. Musliner, Robert P. Goldman, Michael J. Pelican, Kurt D. Krebsbach, Edmund H. Dunfee, Kang G. Shiu, Haksun Li and Ella Atkins 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) - F30602-98-C-0212 - 62301E/62702F/63728F - G428 - 01 - 01 8. PERFORMING ORGANIZATION REPORT NUMBER Honeywell Technology Center 3660 Technology Drive Minneapolis, MN 55418 9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) Defense Advanced Research Projects Agency 3701 North Fairfax Drive Arlington, VA 22203-1714 10. SPONSORING / MONITORING AGENCY REPORT NUMBER AFRL/IFTD 525 Brooks Rd Rome, NY 13441-4505 AFRL-IF-RS-TR-2002-207 11. SUPPLEMENTARY NOTES AFRL Project Engineer: John J. Crowter, IFTB, 315-330-1459, crowterj@rl.af.mil 12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Approved for public release; distribution unlimited. 13. ABSTRACT (Maximum 200 Words) The goal of this effort was to begin extending the Cooperative Intelligent Real-Time Control Architecture (CIRCA) with abilities to automatically monitor its own performance and adapt in real-time, forming Self-Adaptive CIRCA (SA-CIRCA). CIRCA is a coarse-grain architecture designed to control autonomous systems which require both intelligence, deliberative planning activity and highly reliable, hard-real-time reactions to safety threats. CIRCA allows systems to provide performance guarantees that ensure they will remain safe and accomplish mission-critical goals while also intelligently pursuing long-term, non-critical goals. The SA-CIRCA project took several steps towards extending this architecture with the ability to reason accurately about its own real-time behavior, and adapt that behavior in response to performance feedback. Due to a change in the direction of this research, the SA-CIRCA project was only partially funded. As a result, the development of the architecture and demonstrations was not completed. Major issues investigated during this project include formally verifying real-time control plans, dynamically decomposing long-term plans into sub-goals, and building real-time control plans using probabilistic information to reason about most-likely states first. The primary technical products of this research project are two versions of CIRCA’s controller-synthesis (or planning) algorithm. The first version automatically generates reactive control plans and verifies their correctness using formal model-checking methods. The second version does not use model checking to verify its plans, but uses a novel form of probabilistic reasoning to restrict its planning effort to the most-likely future system state. 14. SUBJECT TERMS 15. NUMBER OF PAGES Mission-Critical Systems, Self-Adaptive Systems, unmanned Autonomous Vehicles, Mission Planning, Machine Learning, Artificial Intelligence 16. PRICE CODE 17. SECURITY CLASSIFICATION OF REPORT 18. SECURITY CLASSIFICATION OF THIS PAGE UNCLASSIFIED UNCLASSIFIED NSN 7540-01-280-5500 19. SECURITY CLASSIFICATION OF ABSTRACT UNCLASSIFIED 48 20. LIMITATION OF ABSTRACT UL Standard Form 298 (Rev. 2-89) Prescribed by ANSI Std. Z39-18 298-102 TABLE OF CONTENTS LIST OF FIGURES II PREFACE III SUMMARY 1 1. Introduction 1.1. The SA-CIRCA Architecture 1.2. Progress 2 2 3 2. Overview of CIRCA 3 3. Verifying State Space Plans 3.1. The CIRCA SSP 3.2. Model Checking for Plan Verification 3.3. Multi-Model Verification 3.4. RelatedWork 7 7 11 15 16 4. The Probabilistic State Space Planner 17 5. The Adaptive Mission Planner 5.1. Requirements for the AMP 5.2. Interface to the CSM 5.3. Potential Starting Points 5.4. Using SIPE-2 for the AMP 5.5.. Using Prodigy for the AMP 18 18 19 21 22 22 6. Demonstrations 6.1. The UAVSimulation 6.2. Demonstrations of the Architecture 6.3. Demonstrations of the PSSP 24 24 26 26 7. Conclusions 28 REFERENCES 31 Appendix A. CIRCA-II and the Probabilistic State Space Planner 32 i LIST OF FIGURES 1. The SA-CIRCA architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. The simulated Puma robot arm domain. . . . . . . . . . . . . . . . . . . . . 4 3. Example transition descriptions given to CIRCA’s planner. . . . . . . . . . . 5 4. Sample output from the TAP compiler. . . . . . . . . . . . . . . . . . . . . . 6 5. Summary of the CIRCA state space planning process. . . . . . . . . . . . . . 7 6. A simple domain description for a UAV threatened by radar-guided missiles. 9 7. Simple UAV controller for evading radar-guided SAM threats. . . . . . . . . 10 8. The input model and clock zone analysis for the example UAV domain. . . . 14 9. Conceptual view of multiple reactive controllers. . . . . . . . . . . . . . . . . 19 10. Comparing existing planners against AMP requirements, we ﬁnd no perfect match. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The demonstration simulation illustrates an SA-CIRCA-controlled aircraft responding to attacks with evasive maneuvers, ﬂares, chaﬀ, and counterattacks. 25 12. The PSSP probability rate functions for the demonstration domain. . . . . . 27 13. The PSSP state space when only IR missile threats are handled. . . . . . . . 28 14. The PSSP state space when only radar missile threats are handled. . . . . . 29 15. The PSSP state space when radar missile threats are handled in one plan, and a contingency plan is built for IR missile threats. . . . . . . . . . . . . . 29 11. ii PREFACE The work reported here was conducted by the Honeywell Technology Center, Minneapolis, MN, during the period June, 1998 through December 1999 under Rome Air Force Research Laboratory contract F30602-98-C-0212. Mr. John J. Crowter was the AFRL project oﬃcer for the contract. iii SA-CIRCA: Self-Adaptive Control for Mission-Critical Systems SUMMARY This is the ﬁnal report for the Defense Advanced Research Projects Agency (DARPA) contract F30602-98-C-0212 entitled “SA-CIRCA: Self-Adaptive Control for Mission-Critical Systems.” The goal of this contract eﬀort was to begin extending the Cooperative Intelligent Real-Time Control Architecture (CIRCA) with abilities to automatically monitor its own performance and adapt in real-time, forming Self-Adaptive CIRCA (SA-CIRCA). CIRCA is a coarse-grain architecture designed to control autonomous systems which require both intelligent, deliberative planning activity and highly reliable, hard-real-time reactions to safety threats. CIRCA allows systems to provide performance guarantees that ensure they will remain safe and accomplish mission-critical goals while also intelligently pursuing long-term, non-critical goals. The SA-CIRCA project took several steps towards extending this architecture with the ability to reason accurately about its own real-time behavior, and adapt that behavior in response to performance feedback. The SA-CIRCA project was only partially funded, so the development of the architecture and demonstrations was not completed. Major issues investigated during this project include formally verifying real-time control plans, dynamically decomposing long-term plans into subgoals, and building real-time control plans using probabilistic information to reason about most-likely states ﬁrst. The primary technical products of this research project are two versions of CIRCA’s controller-synthesis (or planning) algorithm. The ﬁrst version, developed by Honeywell, automatically generates reactive control plans and veriﬁes their correctness using formal model-checking methods. The second version, developed by the University of Michigan on subcontract, does not use model checking to verify its plans, but uses a novel form of probabilistic reasoning to restrict its planning eﬀort to the most-likely future system states. 1 Learning Evaluators Environment Evaluator Generator Goals Adaptive Mission Planner Domain Model Code Module Synthesizer Code Modules subgoals,partial domains Controller Synthesis Module Local Subgoals controller code plan Scheduler State Space Planner Real-Time Subsystem feedback data Local Domain Model schedule Figure 1. The SA-CIRCA architecture. 1. Introduction This is the ﬁnal report for the Defense Advanced Research Projects Agency (DARPA) contract F30602-98-C-0212 entitled “SA-CIRCA: Self-Adaptive Control for Mission-Critical Systems.” The goal of this contract eﬀort was to begin extending the Cooperative Intelligent Real-Time Control Architecture (CIRCA) with abilities to automatically monitor its own performance and adapt in real-time, forming Self-Adaptive CIRCA (SA-CIRCA). CIRCA is a coarse-grain architecture designed to control autonomous systems which require both intelligent, deliberative planning activity and highly reliable, hard-real-time reactions to safety threats. CIRCA allows systems to provide performance guarantees that ensure they will remain safe and accomplish mission-critical goals while also intelligently pursuing long-term, non-critical goals. The SA-CIRCA project took several steps towards extending this architecture with the ability to reason accurately about its own real-time behavior, and adapt that behavior in response to performance feedback. 1.1. The SA-CIRCA Architecture As illustrated in Figure 1, SA-CIRCA was designed to meet the requirements for real-time self-adaptive control software operating autonomously and safely for extended periods of time in dynamic environments. Building on the proven real-time planning and control capabilities of the original CIRCA architecture (lightly shaded in Figure 1), SA-CIRCA was designed to provide revolutionary self-modeling, performance evaluation, and adaptation functions through several key new technology developments (dark shading): • An Adaptive Mission Planner that manages long-term mission planning and adaptation, developing mission plans while reasoning about performance evaluation, 2 evaluator feedback, resource restrictions, and dynamic goals. • An Evaluator Generator that analyzes plans and monitors system goals, and builds sentinel processes that monitor system activities and the environment to recognize threats to goal achievement and opportunities to optimize performance. • Performance Evaluators that provide descriptions of system failures, or operations/capabilities that would improve system performance. These descriptions can be used to pull existing code from a library, or as inputs to a Code Module Synthesizer. These new capabilities were designed to add self-evaluation and long-term self-adaptive behavior to the existing CIRCA architecture, which provides: • A Real-Time Subsystem (RTS) that predictably executes real-time control plans. • A State Space Planner and Scheduler that cooperate to reason about internal models of the world and dynamically program the RTS with a planned set of reactions guaranteed to keep the system safe. These control plans are called TAP (Test Action Pair) schedules. Together, the SSP and Scheduler are components of the Controller Synthesis Module (CSM). 1.2. Progress The SA-CIRCA project was only partially funded, so the development of all the planned architecture features and demonstrations was not completed. Major issues investigated during this project include formally verifying real-time control plans, using probabilistic reasoning to control the complexity of control plan synthesis, and dynamically decomposing long-term plans into subgoals. This research project has yielded two primary technical products: • An integrated controller-synthesis (or planning) algorithm that automatically generates reactive control plans and veriﬁes their correctness using formal model-checking methods (see Section 3). • An alternative controller-synthesis (or planning) algorithm that uses probabilistic information to plan for the most-probable contingencies ﬁrst (see Section 4). In addition, we conducted preliminary investigations into the design of SA-CIRCA’s top level component, the Adaptive Mission Planner, using two existing AI planning systems (see Section 5). Section 6 describes the simulated UAV demonstrations we developed to illustrate the behavior of SA-CIRCA controllers in real-time, mission-critical environments. 2. Overview of CIRCA CIRCA is designed to support both hard real-time response guarantees and unrestricted Artiﬁcial Intelligence (AI) methods that can guide those real-time responses. Figure 1 3 Figure 2. The simulated Puma robot arm domain. illustrates the architecture, in which the AMP and CSM reason about high-level problems that require their powerful but potentially unbounded planning methods, while a separate Real-Time Subsystem (RTS) reactively executes the automatically-generated plans and enforces guaranteed response times. The AMP and CSM modules cooperate to develop executable reaction plans that will assure system safety and attempt to achieve system goals when interpreted by the RTS. CIRCA has been applied to real-time planning and control problems in several domains including mobile robotics and simulated unmanned aircraft (UAVs). A UAV example will be discussed in detail in Section 3. To introduce the key CIRCA concepts in this section, we draw examples from the domain illustrated by Figure 2, in which CIRCA controls a simulated Puma robot arm that must pack parts arriving on a conveyor belt into a nearby box. The parts can have several shapes (e.g., square, rectangle, triangle), each of which requires a diﬀerent packing strategy. The control system may not initially know how to pack all of the possible types of parts— it may have to perform some computation to derive an appropriate box-packing strategy. The robot arm is also responsible for reacting to an emergency alert light. If the light goes on, the system must push the button next to the light before a ﬁxed deadline. In this domain, CIRCA’s planning and execution subsystems operate in parallel. The CSM reasons about an internal model of the world and dynamically programs the RTS with a planned set of reactions. While the RTS is executing those reactions, ensuring that the system avoids failure, the AMP and CSM are able to continue executing heuristic planning methods to ﬁnd the next appropriate set of reactions. For example, the AMP may derive a new box-packing algorithm that can handle a new type of arriving part. The derivation of this new algorithm does not need to meet a hard deadline, because the reactions concurrently executing on the RTS will continue handling all arriving parts, just stacking 4 EVENT emergency-alert PRECONDS: ((emergency nil)) POSTCONDS: ((emergency T)) ;; Emergency light goes on TEMPORAL emergency-failure PRECONDS: ((emergency T)) POSTCONDS: ((failure T)) MIN-DELAY: 30 [seconds] ;; Fail if don’t attend to ;; light by deadline ACTION push-emergency-button PRECONDS: ((part-in-gripper nil)) POSTCONDS: ((emergency nil) (robot-position over-button)) WORST-CASE-EXEC-TIME: 2.0 [seconds] Figure 3. Example transition descriptions given to CIRCA’s planner. unfamiliar ones on a nearby table temporarily. When the new box-packing algorithm has been developed and integrated with additional reactions that prevent failure, the new schedule of reactions can be downloaded to the RTS. CIRCA’s State Space Planner builds reaction plans based on a world model and a set of formally-deﬁned safety conditions that must be satisﬁed by feasible plans [11]. To describe a domain to CIRCA, the user inputs a set of transition descriptions that implicitly deﬁne the set of reachable states. For example, Figure 3 illustrates several transitions used in the Puma domain. These transitions are of three types: Action transitions represent actions performed by the RTS. Temporal transitions represent the progression of time and continuous processes. Event transitions represent world occurrences as instantaneous state changes. The SSP plans by generating a nondeterministic ﬁnite automaton (NFA) from these transition descriptions. The SSP assigns to each reachable state either an action transition or no-op. Actions are selected to preempt transitions that lead to failure states and to drive the system towards states that satisfy as many goal propositions as possible. A planned action preempts a temporal transition when the action will deﬁnitely occur before the temporal transition could possibly occur. The assignment of actions determines the topology of the NFA (and so the set of reachable states): preemption of temporal transitions removes edges and assignment of actions adds them. System safety is guaranteed by planning action transitions that preempt all transitions to failure, making the failure state unreachable [11]. It is this ability to build plans that guarantee the correctness and timeliness of safety-preserving reactions that makes CIRCA suited to mission-critical applications in hard real-time domains. The NFA is translated into a memoryless controller for downloading to the RTS. This is 5 #<TAP 10> Tests Acts Max-per Runtime #<TAP 9> Tests : : : : (AND (PART_IN_GRIPPER NIL) (EMERGENCY T)) push_emergency_button 9984774 2520010 : (AND (PART_IN_GRIPPER NIL) (EMERGENCY NIL) (PART_ON_CONVEYOR T) (NOT (TYPE_OF_CONVEYOR_PART SQUARE))) Acts : pickup_unknown_part_from_conveyor Max-per : 12029856 Runtime : 3540010 #<TAP 8> Tests : (AND (TYPE_OF_CONVEYOR_PART SQUARE) (PART_IN_GRIPPER NIL) (EMERGENCY NIL)) Acts : pickup_known_part_from_conveyor Max-per : 12029856 Runtime : 3520010 Figure 4. Sample output from the TAP compiler. done through a two-step process. First, the action assignments in the NFA are compiled into a set of Test-Action Pairs (TAPs). The tests are a set of boolean expressions that distinguish between states where a particular action is and is not to be executed. Each TAP’s test expression is derived by examining all of the planned actions and ﬁnding a logical expression that distinguishes between the states in which the current TAP’s action is planned and the states in which other actions are planned. Some sample TAPs for the Puma domain are given in Figure 4. Eventually, the TAPs will be downloaded to the RTS to be executed. The RTS will loop over the set of TAPs, checking each test expression and executing the corresponding action if the test is satisﬁed. The tests consist only of sensing the agent’s environment, rather than checking any internal memory, so the RTS is asynchronous and memoryless. However, before the TAPs can be downloaded, they must be assembled into a loop that will meet all of the planned deadlines, captured as constraints on the maximum period of the TAPs (see Figure 4). This second phase of the translation process is done by the scheduler in the CSM. In this phase, CIRCA’s scheduler veriﬁes that all actions in the TAP loop will be executed quickly enough to preempt the transitions that the planner has determined need preempting. The tests and actions that the RTS can execute as part of its TAPs have associated worst-case execution times that are used to verify the schedule. If the scheduling 6 Goals Transition Descriptions PLANNER NFA Temporal Constraints TAP Compiler TAPs SCHEDULER Verified TAP Schedule Figure 5. Summary of the CIRCA state space planning process. does not succeed, the SSP will backtrack to revise the NFA, leading to a new set of TAPs and another scheduling attempt. The planning process is summarized in Figure 5. 3. 3.1. Verifying State Space Plans The CIRCA SSP The CIRCA SSP automatically synthesizes timed discrete-event controllers for hard real-time applications. The input to the SSP is a description of a control problem in the form of environment dynamics (including uncontrollable processes and threats to system safety), actions available to the controller, and goals to be realized. The SSP returns a controller that is guaranteed to maintain the safety of the controlled system. The controller speciﬁes what action should be taken for each reachable system state. The controller provides safety guarantees by meeting the timing requirements of the control problem; these timing requirements are inferred from the model of the uncontrollable processes that threaten the system. To determine that these timing requirements are met, our algorithm consults a model-checker for real-time automata. This model-checking is done on an incremental basis, as the controller is built. For example, Figure 6 contains the transition descriptions for a simple UAV control problem. The transitions describe a problem in which a UAV is attempting to follow a normal ﬂight path (hence the *goals* statement). However, at any time during its ﬂight, the UAV might be tracked by enemy radar. Some time after the initial tracking, a surface-to-air missile (SAM) may be launched. If no countermeasures are taken, that SAM may destroy the UAV after at least a certain minimum amount of time has passed (e.g., the minimum ﬂight time of the missile). The UAV has available to it some evasive 7 maneuvers that will cause the SAM to miss the UAV, if the UAV initiates its maneuvers quickly enough. Also, since the maneuvers divert the UAV from its nominal trajectory, the UAV should end its evasive behavior whenever possible. Figure 7 shows the state space resulting from a simple timed controller design that will preserve the safety of the UAV. In the initial state, labeled “State 17” and shown as a shaded oval, the UAV is on its normal trajectory and has no indication of a radar-guided missile tracking it. This is a desirable state, so the controller will make no eﬀort to leave it. However, at any time, a radar threat could occur, moving the system into state 16. The controller will react to this threat by taking evasive action, and maintaining the evasive maneuvers until the missile has been avoided (i.e., until the system has entered state 24). At this time the threat has been neutralized, and the system is free to return to its normal ﬂight path. This controller was automatically generated by CIRCA, and the state diagram was generated from CIRCA data structures by the daVinci program [5]. There are several important aspects to note about this example state space model, or ﬁnite automaton. First, note that the automaton contains loops: the UAV may be threatened by more than one missile, and will remain in (or re-start) evasive maneuvers as long as it is threatened. Second, observe that time is not an explicit part of the state representation. This is critical to the compact representation of looping plans; if we included time in the state representation, then loops would not occur and persistent reactive control against an unpredictable or adversarial world would explode the state space. Instead, our automaton neatly encodes the continously-reactive behavior of the UAV in a compact, eﬃcient, and automatically-generated form. Of course, the transitions do have temporal semantics, as described in Figure 6. The SSP’s temporal model was carefully designed to support reasoning about system safety with only a minimal amount of temporal information, thus limiting the complexity of the automata model. We associate with each transition a set of bounds on the time (∆) which the system must dwell in the transition’s source state before the transition could possibly occur. The model includes four diﬀerent types of transitions: Temporal Transitions — Drawn as double arrows, temporal transitions represent uncertain processes that may lead to change, but only after at least some minimum amount of time has passed (∆ ≥ min∆). The only temporal transitions in our simple UAV example lead to failure, and are not shown in Figure 7 because the safety-preserving controller design makes failure unreachable. Event Transitions — Drawn as single arrows, event transitions represent instantaneous transitions that are out of our control, and may happen any time their preconditions are satisﬁed. They are essentially the same as temporal transitions with a min∆ of zero. Action Transitions — Drawn as dashed arrows, action transitions represent processes that are guaranteed to occur before the system has dwelled a certain amount of time in the source state. That is, action transitions will deﬁnitely occur before ∆ reaches 8 (setf *goals* ’((path normal))) ;; Radar-guided missile threats can occur at any time. (make-instance ’event :name "radar_threat" :preconds ’((radar_missile_tracking F)) :postconds ’((radar_missile_tracking T))) ;; You die if don’t defeat a threat by 1200 time units. (make-instance ’temporal :name "radar_threat_kills_you" :preconds ’((radar_missile_tracking T)) :postconds ’((failure T)) :min-delay 1200) ;; It takes no more than 10 time units to start evasives. (make-instance ’action :name "begin_evasive" :preconds ’((path normal)) :postconds ’((path evasive)) :max-delay 10) ;; We defeat missile in between 250 and 400 time units. (make-instance ’reliable-temporal :name "evade_radar_missile" :preconds ’((radar_missile_tracking T) (path evasive)) :postconds ’((radar_missile_tracking F)) :delay (make-range 250 400)) ;; It takes no more than 10 time units to end evasives. (make-instance ’action :name "end_evasive" :preconds ’((path evasive)) :postconds ’((path normal)) :max-delay 10) Figure 6. A simple domain description for a UAV threatened by radar-guided missiles. 9 State 23 (PATH EVASIVE) (RADAR_MISSILE_TRACKING T) evade_radar_missile State 24 (PATH EVASIVE) (RADAR_MISSILE_TRACKING F) end_evasive radar_threat State 17 (PATH NORMAL) (RADAR_MISSILE_TRACKING F) radar_threat temporal action event reliable temporal State 16 (PATH NORMAL) (RADAR_MISSILE_TRACKING T) begin_evasive Figure 7. Simple UAV controller for evading radar-guided SAM threats. 10 an upper bound max∆. Reliable Temporal Transitions — Drawn as bold single arrows, reliable temporal transitions represent processes that are guaranteed to occur, if given enough time. They have both lower and upper bounds on the dwell time the system must stay in the source state before the reliable temporal transition will occur (min∆ < ∆ < max∆). Using this information, the SSP reasons about one key temporal relationship: preemption. A transition t is preempted iﬀ some other transition u from the same state must deﬁnitely occur before t could possibly occur. In other words, t is preempted iﬀ max∆(u) < min∆(t). In our UAV example, the radar_threat_kills_you transition is preempted in state 16 by the action transition begin_evasive. Preemption is the key temporal relationship in CIRCA models because it allows the SSP to build discrete event controllers that make certain parts of the potential system state space unreachable. By making all potential failure states unreachable, the SSP can build plans (controllers) that are guaranteed to keep the system safe, while also pursuing other less-critical goals. The goal of plan veriﬁcation, discussed in the next section, is to prove that the preemptions CIRCA has planned will in fact hold true for all possible future world “trajectories” (i.e., paths through the reachable states). Note that the begin_evasive action does not actually disable the radar_threat_kills_you transition: it simply begins the process of defeating the threat, which is represented by the reliable temporal transition evade_radar_missile. So if we were to draw the temporal transitions to failure (TTFs) in the graph of Figure 7, we’d see that the radar_threat_kills_you TTF is actually preempted out of both state 16 and the subsequent state 23. This is called a dependent temporal chain, because the amount of time left to preempt the TTF in state 23 is not the original minimum dwell time (as it was in state 16), but the original min∆ minus however much time the system may have dwelled in state 16 before transitioning to state 23. Since CIRCA reasons about worst-case circumstances, we can assume that that dwell time, in the worst case, is equivalent to the upper bound dwell time (max∆) imposed by the planned action begin_evasive, so that the new min∆ in state 23 is actually 1200 − 10 = 1190. Thus our temporal model is actually non-Markovian: the temporal semantics of the TTF out of state 23 depend on the path the system takes to get there. Naturally, this complicates the process of reasoning about the temporal model, and motivates our use of model checking to verify the required TTF-preemption properties. 3.2. Model Checking for Plan Veriﬁcation In order to verify that the CIRCA SSP’s plans are safe, we must project what will happen when they are executed. We must determine whether the actions we have planned do, in fact, preempt all possible transitions to failure. To do so, we use techniques developed in the computer-aided veriﬁcation research community; speciﬁcally we use techniques for 11 verifying properties of timed automata [1]. A naive algorithm for CIRCA plan veriﬁcation is easy to propose: start at the initial state(s), ﬁnd all the possible successor states, and repeat. If you ever enter a failure state, the veriﬁcation has failed. The problem with this algorithm is hidden in the deﬁnition of system state. To determine the possible successor states, we must know how long transitions have been enabled. For example, to determine at state 23 whether radar_threat_kills_you happens before or after evade_radar_missile, we must know whether the former transition has been active for 1200 time units before the latter has been active for 400 (see Figure 6). Imagine that each transition has associated with it a timer, or “clock.” When the transition is enabled, that clock is reset to zero and started. When the transition is disabled, that clock is turned oﬀ. Whenever that clock goes over the lower bound on the corresponding transition, the transition may occur; the transition is guaranteed to occur before the upper bound on the transition (unless some other transition intervenes). Thus we can characterize the full state of the controlled system by the full set of feature values and a vector of artiﬁcial clock values. For example: flight_path = evasive radar_missile_tracking = true clock(evade_radar_missile) = 40 clock(radar_threat_kills_you) = 700 By comparing this state against the problem deﬁnition given in Figure 6, you may readily see that this state is safe. radar_threat_kills_you cannot take place for 500 more time units, by which time evade_radar_missile will have preempted it. Unfortunately, the veriﬁcation problem, as naively framed, is not practically solvable. Since the clocks are integer-valued,1 the set of system states is inﬁnitely large. However, the set of interesting values is less than inﬁnite, since there are only a ﬁnite number of decisions that need be made. For example, all values of clock(radar_threat_kills_you) that are over 1200 are equivalent. However, the number of relevant states may still be very large. 3.2.1. Timed Automata Representation Fortunately, researchers in computer-aided veriﬁcation have found ways to compactly represent states like this for a class of ﬁnite state machines called timed automata [1]. Timed automata diﬀer in a few minor ways from SSP state machines, but SSP state machines can be translated into timed automata. Timed automata states are composed of a location (corresponding to an SSP state, or feature vector) and a clock-interpretation, or vector of clock values. All of the clocks increment synchronously, but can be independently reset to zero by selected transitions. Transitions themselves are instantaneous. Temporal constraints in timed automata take two forms: transition guard expressions that must be true to enable a transition, and state invariant expressions that must be true all the time 1 Although time is continuous, it may be discretized without loss of accuracy for any veriﬁcation problem. 12 the system remains in a particular state. Mapping an SSP state space model into a timed automaton is a fairly simple matter of assigning diﬀerent clocks to diﬀerent CIRCA transitions and translating the CIRCA transition timing constraints into timed automaton clock constraints. Once this translation is complete, the timed automaton model can be passed to our model-checking code, the Real-Time Analysis (RTA) module, to determine whether failure is reachable and, if so, what path of transitions leads to failure (to guide CIRCA’s intelligent backjumping). Figure 8a illustrates the RTA timed automaton that corresponds to CIRCA’s solution to the UAV example of Figure 7. Brieﬂy, the automatic translation process involves mapping each type of SSP state space transition, as follows: Temporal Transitions — Temporal transitions require the system to dwell in a state for a certain amount of time before the transition may occur. This corresponds exactly to a transition guard expression in RTA. Thus temporal edges are each assigned a clock, and have guard expressions constraining the value of that clock to be greater than the temporal transition’s minimum delay. The clock is reset by all edges entering the source state of the temporal edge, if that edge does not come from a state in which the same temporal is enabled. Event Transitions — Because event transitions can occur at any time, they have no associated clocks and are simply unrestricted edges in the RTA graph. Action Transitions — Recall that action transitions place an upper bound on the time the system may dwell in the transition’s source state before it necessarily will move to the transition’s sink state. In our RTA model, this corresponds to an upper bound state invariant expression. Each instance of an action transition (action edge) is assigned a new clock. The clock is reset by all edges entering the source state of the action edge, if that edge does not come from a state in which the same action is enabled. The action edge itself has no guarding clock constraints; instead, the action edge’s upper bound is expressed as an invariant in the edge’s source state. Reliable Temporal Transitions — Reliable temporals combine the lower-bound and upper-bound timing constraints of temporals and actions, so their RTA mapping uses transition guards to represent the lower bounds and state invariants to represent the upper bounds. 3.2.2. Eﬃcient Model Checking The critical concept for taming the complexity of timed automaton veriﬁcation is an equivalence relation (“region equivalence”) between system states [1]. This equivalence relation makes use of the intuition that all values for a given clock are equivalent above a maximum value (the largest constant the clock is ever compared to). Furthermore, since we are only concerned with the reachability of various states, the actual values of diﬀerent clocks in a state are not as important as their relative values. Because the clocks are all notionally incremented at the same rate, the relationships between the clock values upon entry to a state is suﬃcient to determine which outgoing transitions are possible: a clock 13 RTA−State 0 Location 0 = Invar: () #(0 1) #(0 1) #(0 2) #(0 1) #(0 2) #(0 2) #(0 2) #(0 2) #(0 2) #(0 2) RTA−Location 0 NIL Invariant: () #(0 #(0 #(0 #(0 #(0 1) 2) 1) 2) 2) #(0 #(0 #(0 #(0 #(0 1) 2) 2) 1) 2) #(0 #(0 #(0 #(0 #(0 1) 2) 2) 2) 1) #(0 #(0 #(0 #(0 #(0 1) 1) 1) 1) 1) initial transition Guard: () Resets: (1 2 3 4) RTA−State 1 Location 1 = Invar: () #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) initial transition Guard: () Resets: (1 2 3 4) RTA−Location 1 SSP−State 17 (PATH NORMAL) (RADAR_MISSILE_TRACKING F) Invariant: () SSP−State NIL SSP−State 17 #(0 #(0 #(0 #(0 #(0 1) 1) 1) 1) 1) #(0 #(0 #(0 #(0 #(0 1) 1) 1) 1) 1) radar_threat Guard: () Resets: (4 3) RTA−State 2 Location 5 = SSP−State 16 Invar: (c(4) <= 10) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(1201 0) #(0 1) #(0 1) #(1201 0) #(1201 0) #(1201 0) #(0 1) #(0 1) #(1201 0) #(1201 0) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) radar_threat Guard: () Resets: (4 3) begin_evasive Guard: () Resets: (2) RTA−Location 5 SSP−State 16 (PATH NORMAL) (RADAR_MISSILE_TRACKING T) Invariant: (c(4) <= 10) begin_evasive Guard: () Resets: (2) RTA−State 3 Location 3 = SSP−State 23 Invar: (c(2) <= 400) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(1201 0) #(0 1) #(1201 0) #(1201 0) #(1201 0) #(0 1) #(0 1) #(0 1) #(0 1) #(0 1) #(10 1) #(0 1) #(10 1) #(0 1) #(0 1) #(10 1) #(0 1) #(10 1) #(0 1) #(0 1) radar_threat_kills_you Guard: (c(3) >= 1200) Resets: NIL evade_radar_missile Guard: (c(2) >= 250) Resets: (1) RTA−Location 3 SSP−State 23 (PATH EVASIVE) (RADAR_MISSILE_TRACKING T) Invariant: (c(2) <= 400) evade_radar_missile Guard: (c(2) >= 250) Resets: (1) RTA−State 4 Location 4 = SSP−State 24 Invar: (c(1) <= 10) #(0 1) #(0 1) #(−250 1) #(−250 1) #(−250 1) #(0 1) #(0 1) #(−250 1) #(−250 1) #(−250 1) #(400 1) #(400 1) #(0 1) #(0 1) #(0 1) #(410 1) #(410 1) #(10 1) #(0 1) #(0 1) #(410 1) #(410 1) #(10 1) #(0 1) #(0 1) radar_threat_kills_you Guard: (c(3) >= 1200) Resets: NIL RTA−Location 4 SSP−State 24 (PATH EVASIVE) (RADAR_MISSILE_TRACKING F) Invariant: (c(1) <= 10) radar_threat Guard: () Resets: (3 2) RTA−State 5 Location 3 = SSP−State 23 Invar: (c(2) <= 400) #(0 1) #(0 1) #(0 1) #(0 1) #(−250 1) #(10 1) #(0 1) #(10 1) #(10 1) #(−250 1) #(0 1) #(0 1) #(0 1) #(0 1) #(−250 1) #(0 1) #(0 1) #(0 1) #(0 1) #(−250 1) #(420 1) #(410 1) #(420 1) #(420 1) #(0 1) RTA−Location 2 FAILURE Invariant: () radar_threat Guard: () Resets: (3 2) end_evasive Guard: () Resets: NIL end_evasive Guard: () Resets: NIL evade_radar_missile Guard: (c(2) >= 250) Resets: (1) (a) Input ﬁnite automata. (b) Clock-zone expansion. Figure 8. The input model and clock zone analysis for the example UAV domain. 14 that is behind another cannot catch up (within a state). Based on this equivalence relation, it can be shown that any timed automaton (SSP plan) has only a ﬁnite number of states.2 Therefore, the problem of determining reachability (SSP plan veriﬁcation) is decidable. A further optimization is possible, to make veriﬁcation practical. The key intuition behind this optimization is that all reachability questions hinge on pairwise comparisons between clock values. In order to determine whether or not one transition can occur, we compare a single clock against a constant. To determine whether one transition occurs before another, we only need to determine which will reach its associated constant ﬁrst. To answer this question, we only need to know the diﬀerence between pairs of clock values (since the clock values increase at the same rate). Therefore, we can compactly represent clock regions using a diﬀerence-bound matrix [4] whose entries represent bounds on the diﬀerence between pairs of clocks and between single clocks and a dummy clock whose value is always zero. Diﬀerence-bound matrices have two advantages. First, they provide a compact representation for equivalence classes of clock-states in timed automata. Second, they also have a canonical form, derived using any standard all-pairs shortest-path algorithm [4]. Putting the associated diﬀerence-bound matrices into canonical form makes it easy to determine when two automaton states are equivalent. Recognizing equivalent states is, in turn, necessary in order for reachability search to terminate. Figure 8b illustrates the reachability veriﬁcation of the SSP plan given in Figure 6, optimized by the use of diﬀerence-bound matrices. Space limits preclude us from describing the diﬀerence bound notation in detail. However, a simple examination of Figure 8b shows one notable aspect of the RTA veriﬁcation: there are two RTA states (3 and 5) that correspond to the SSP state 23. That is, the RTA algorithm has recognized the distinction between the two routes into SSP state 23 (see Figure 7) as being a temporally signiﬁcant diﬀerence. The temporal transition to failure from state 23 will have diﬀerent amounts of time left on its clock depending on whether we enter from state 16, where it was already enabled, or state 24, where it was not enabled (see Figure 8a, RTA-locations 5 and 4). Thus the RTA algorithm is unrolling the important paths through dependent temporal chains, checking reachability of failure by removing the original non-Markovian temporal semantics. 3.3. Multi-Model Veriﬁcation The original model-checking interface captured the connectivity and temporal restrictions of the SSP’s state space graph and sent it Kronos for reachability veriﬁcation, to make sure that failure is not reachable. One weakness with this approach is that it relies on our SSP implementation to reason eﬀectively about the “execution semantics” of the actual RTS TAP plans that will be generated by the SSP. That is, we relied on our SSP code to correctly map the SSP state space model into TAPs: we were verifying the SSP state space 2 More precisely, there are only a ﬁnite number of state equivalence classes, and state equivalence classes are suﬃcient to determine reachability. 15 model, which was then converted into TAPs after the veriﬁcation. The alternative is to actually map TAPs directly into Kronos simulation/veriﬁcation models, so that Kronos can reason about the interactions of the planned TAPs directly with the SSP’s world model, and essentially verify that the SSP has correctly built both its state space model and the TAP plan. This approach uses multiple concurrent state machine models to represent all exogeneous transitions and TAPs. This “multi-model” analysis approach will verify that the TAP-based RTS implementation of a CIRCA plan will actually support all the performance guarantees that the SSP believes it will. This approach also oﬀers the opportunity to take advantage of Kronos’ eﬃcient state-space cross-product behavior: the model checker is good at deriving minimal cross-products that avoid enumerating unnecessary system states. Unfortunately, once we implemented the necessary interfaces translating the CIRCA TAP plans into multi-model veriﬁcation problems, we encountered signiﬁcant scalability problems with Kronos. When it detects a veriﬁcation failure (because some failure state is reachable in a tentative SSP plan), Kronos is unable to eﬃciently produce an example state-space path leading from the start state to failure. Apparently because Kronos does some pre-enumeration of the combinatorially-explosive cross product of multiple state machines, it does not scale well on these problems. As a result, we also be began extending our customized RTA veriﬁer to the multi-model case. The multi-model RTA (MMRTA) system has been prototyped and passed initial tests, but we suspended the remaining work (prior to ANTS funding) to focus on the system demonstrations and ﬁnal report. The remaining MMRTA work, which is not too complicated, is to automate the translation of a CIRCA SSP veriﬁcation problem into the multi-model RTA input format. Note that the MMRTA will inherently reason about “ghosting,” which is our term for actions that were planned for some state A, but occur in some diﬀerent state B, because of an exogenous transition that moves the world from A to B after the RTS has sensed A, but before the RTS has executed the action. The MMRTA will automatically reason about these situations since it will expand the state space including possible orderings of TAP activities and exogenous transitions. While we expect MMRTA to inherently detect problem situations where ghosted actions lead to failures, it will take some additional work to modify the SSP to use that information in guiding its search/backjumping. 3.4. Related Work The CIRCA SSP is a reaction planner, and thus has much in common with work on reactive systems in AI and control theory. CIRCA is unusual in two ways: it automatically synthesizes, or plans, its reactions, and it provides performance guarantees through the methods of hard real time. In independently-developed work, Asarin, Maler, Pneuli and Sifakis [2, 9] developed a game-theoretic method of synthesizing real-time controllers. They view the problem as trying to “force a win” against the environment, by guaranteeing that all traces of system 16 execution will pass through (avoid) a set of desirable (undesirable) states. Their method is very similar to ours, but their work stopped at the development of the algorithm and derivation of complexity bounds; it was never implemented. Our work concentrates on the implementation aspects of this problem and on making it computationally practical. Kabanza, et. al. have developed work [6, 7] very similar to ours in intention. Their early work (fully presented in [7]) is similar to the original CIRCA State Space Planner work, but does not take into account metric temporal information. Later work [6], extends the original framework by incorporating metric time, but does so by eﬀectively imposing a system-wide clock and progressing the controller one “tick” at a time. In control problems with widely varying time constants, this approach will lead to an explosion of states; we have adopted model-checking techniques that minimize this state explosion. Markov Decision Processes and Partially-Observable Markov Decision Processes provide a theoretical basis for planning and action that is similar to discrete control theory, but they place the accent on uncertainty [3]. CIRCA simply represents uncertainty through nondeterminism: CIRCA transitions may have alternative outcomes; uncontrollable events may or may not occur; etc. The SSP techniques discussed in this paper do not attempt to reason about quantiﬁed measures of uncertainty, they make the worst-case assumption: “anything bad that can happen will happen.” However, there has been some preliminary work on developing a probability model for CIRCA, to permit principled model-pruning decisions [8]. 4. The Probabilistic State Space Planner When faced with overconstrained problems, the CSM described above is unable to ﬁnd a safety-preserving plan, and it returns failure to the AMP. At that time, the AMP is designed to relax one or of the constraints on the CSM, perhaps by redesigning the problem conﬁguration to include a faster action transition, or modifying a goal. Our University of Michigan colleagues have been working on a new version of the SSP that uses probabilistic information to guide the system in considering the most-probable states ﬁrst. As a result, the Probabilistic State Space Planner (PSSP) is able to build partial plans that are only probabilistically safe, because they only consider those states whose likelihood is above a given threshold. This type of partial plan is useful in situations where the domain is highly challenging, and the CSM may have only a limited amount of computation time to come up with a suitable control plan for the next phase of CIRCA operations. Or, the RTS itself may have limited abilities to sense and react to the world, and these may preclude the CSM from generating a complete real-time controller, and require some performance tradeoﬀs. In any case, by explicitly pruning least-probable areas of the state space, the PSSP approach allows CIRCA to optimize its allocation of reactive resources. Moreover, if additional planning-time resources are available and the RTS is the limiting factor, the system can build contingency plans to handle pruned areas of the state space and swap those plans in when the pruned, less-likely situations arise. Michigan’s progress on this system are described in the attached technical paper, 17 Appendix ??. The results of their work are illustrated in the demonstrations described in Section 6. 5. The Adaptive Mission Planner Our investigation into the CIRCA Adaptive Mission Planner (AMP) pursued several research branches before ending prematurely due to an unexpected budget reduction. We began by clarifying the role of the AMP, and designing the problems it must solve to control and guide the CSM. Since the AMP requirements are similar to existing AI planners (much more so than the CSM), we explored the possibility of using an existing planner, with enhancements, to ﬁll the AMP role. As described below, we investigated the SIPE-2 and Prodigy planners in depth, and began prototyping experiments. 5.1. Requirements for the AMP Based on prior experience with the CIRCA architecture and our scenario designs for the UCAV demonstration domain, we developed a set of general functional requirements for the AMP. We want the AMP to: Project — The system must be able to reason about potential future states of the world, to be a true projective planner (rather than simply reacting to the current world state). Search — The system must maintain an explicit representation of planning decisions for possible re-evaluation and change. Build Phases for the CSM — The primary AMP responsibility is to divide up the overall problem state space into a series of overlapping regions of competency, or phases. The CSM will then build real-time reactive control plans for each phase. Modify Phases when the CSM Fails — If the CSM fails to generate a safe controller for a given subproblem, the AMP must modify its plans to reduce the complexity of that phase. Incorporate Changes During Execution — Because the AMP is the long-term planner in CIRCA, it must be able to handle deviations from its coarse plan as the situation progresses. Set up Execution-Time Sentinels/Monitors — The AMP must be able to automatically monitor its own performance to detect deviations from the plan. Plan for Domain Resources — The AMP must build plans that account for domain resources such as fuel. Manage Reasoning Resources — The AMP must control the CIRCA reasoning process itself, so that the CSM builds controllers in a timely fashion; if the phase problems sent to the CSM are badly formed, the CSM may never return success or failure. The AMP must monitor CSM performance and adjust its plans to meet the soft real-time deadlines imposed by the domain. Incorporate New Operators — The SA-CIRCA architecture calls for the ability to synthesize new code modules in response to changing performance results; the AMP 18 START REGIONS OF REACTIVE PLAN COMPETENCE (STABLE REGIONS) ENTIRE STATE SPACE GOAL Figure 9. Conceptual view of multiple reactive controllers. must be able to reason about new, dynamically-created domain operators that can be used to improve performance. Optional functional requirements include abilities to: Plan for Multiple Agents — Distributed applications of CIRCA will require the AMP to communicate and coordinate with other agents to eﬀectively manage roles, responsibilities, and closely-coordinated behaviors. Direct Execution — The AMP should direct the overall execution of real-time control plans by managing the plan cache in the RTS; this requires interleaving execution and planning. Synthesize New Operators — In the full SA-CIRCA design, one way the AMP may respond to performance deviations is to actually design and synthesize new operators that give it missing domain capabilities. 5.2. Interface to the CSM While developing these requirements, it became clear that there is a fundamental mismatch between the output of a standard planner (a plan), and the input we require for the CSM. To understand this problem, recall that the AMP is notionally responsible for dividing up the overall problem state space into a series of overlapping regions of competency, as shown in Figure 9. The CSM will then build real-time reactive control plans for each region of competency. If the CSM is unable to build a feasible plan for a particular region, then the AMP must reorganize/replan to develop a diﬀerent breakdown of the state space, or possibly give up on some of its goals. In essence, the AMP is responsible for breaking up a large problem into smaller problems for the CSM to solve, and adjusting that breakup to compensate for diﬃculties the CSM may encounter. The interface between the AMP and CSM is non-obvious because the CSM requires as input a “problem conﬁguration” consisting of a set of temporal and event transitions 19 (describing the domain events and processes), a set of action transitions (describing control actions that can be taken), a set of starting states, and a set of goal states (or conditions). In contrast, a traditional planner’s output is a plan consisting of a set of primitive operators, possibly with associated parameter bindings. We want the AMP to be able to search for suitable sequences of problem conﬁgurations, potentially varying any and all aspects of the conﬁgurations (e.g., adding new action transitions when automatic code synthesis provides custom-built primitives, removing temporal transitions with low probabilities, varying action transition delay parameters, etc.). How can we map an AMP’s operator/parameter plan formulation into the complex structure of a CSM problem conﬁguration? The unique, very powerful approach we are currently pursuing involves extending the domain representation of the AMP to include explicit representation of the problem conﬁguration elements. That is, rather than have the AMP only think about the domain, we have the AMP also think explicitly, in its search space, about the CSM problem conﬁgurations. The AMP will plan with both domain operators and “conﬁguration operators” that modify the problems sent to the CSM. For example, high-level UCAV domain activities include aircraft trajectories, missile threats, etc. These will be reﬂected in the AMP representation as “domain operators.” The AMP will use these domain operators to build a coarse-grain plan through the domain, specifying, for example, that the mission will consist of an ingress phase, an attack phase, and an egress phase. The AMP will use the CSM to develop more detailed controllers to invoke during each of these phases. To task the CSM with problem conﬁgurations for each of these phases, the AMP must also decide, for example, which of several reactive evasive-maneuver actions should be made available to the CSM for use in a particular phase. The AMP will have “conﬁguration operators” that can add or remove those action transitions from the problem conﬁguration sent to the CSM. The conﬁguration operators will not directly aﬀect the AMP’s domain model, because they are below the level of detail that the AMP is projecting the world. However, by changing the details of the problem conﬁgurations sent to the CSM, these conﬁguration operators may have signiﬁcant impact on the CSM’s results. Now suppose the CSM is unable to produce a safe controller for the given problem conﬁguration, and the AMP must make some change. Because the conﬁguration operators that built the CSM problem conﬁguration are treated just like domain operators, the AMP’s search machinery works on them just like other operators. So the AMP can search over the set of possible problem conﬁgurations by searching over its conﬁguration operator choices. In other words, the conﬁguration operators bring the CSM problem conﬁgurations under the direct control of the AMP search machinery, which is exactly what we want: the AMP can now reason explicitly about how it is breaking up larger problems into smaller ones, and how the smaller problems are structured. 20 Feature Projection Search Build CSM Phases Modify on CSM Failure Handle Execution Failures Set up Monitors Plan for Domain Resources Manage Reasoning Resources Incorporate New Operators Multi Agent Planning Direct Execution Synthesize New Operators SIPE 2 Prodigy √ √ √ √ √ √ √ √ √ √ PRS MACBeth √ √ √ √ √ √ √ √ √ √ √ √ √ Figure 10. Comparing existing planners against AMP requirements, we ﬁnd no perfect match. 5.3. Potential Starting Points Since the AMP requirements are similar to existing AI planners, we explored the possibility of using an existing planner, with enhancements, to ﬁll the AMP role. Based on our knowledge of the planning literature, we selected several diﬀerent larger-scale planning systems (that have been relatively well-developed and used) to evaluate against the AMP requirements. Figure 10 summarizes our comparison against SIPE-2, Prodigy, PRS, and MACBeth. Based on this comparison, we tried to build initial AMP functionality in SIPE-2, as described in the next section. When this eﬀort failed, we moved on to Prodigy, and the project budget cut occurred shortly after our investigation into Prodigy was concluded. PRS was considered, but because it is fundamentally a reactive system and not a projective planner, it is most useful for “hand-coding” or programming AMP behaviors. PRS does not work by building a plan of future actions and revising it as execution occurs; instead, PRS repeatedly selects only the next action to take. This makes it very ﬂexible and responsive to changing circumstances, but subject to the short-sightedness of reactivity. For the AMP, this means PRS is useful for quickly building known functions and behaviors (as was done in the original CIRCA implementation [10]), but not as a long-term solution capable of model-based adaptivity. MACBeth is a constraint-based planner using hierarchical task networks, developed at Honeywell for tasking and control of robot and UAV systems. MACBeth does not have the long history of development and applications shared by the other systems we evaluated, 21 but it has powerful resource reasoning based on constraints. Major problems with MACBeth include its lack of interleaved execution and monitoring, and the diﬃculty of incorporating new planning operators in the code. 5.4. Using SIPE-2 for the AMP SRI’s SIPE-2 planner is a well-established system that appears to provide many of the features we require, and it has been used in several commercial real-world applications. The government funded SIPE’s development, and Rome Labs already has rights to the software. We obtained object-code versions of the system directly from SIPE-2’s primary developer, Dr. David Wilkins at SRI. SIPE-2 is an HTN-style planner, meaning that it builds plans by joining together pieces of hierarchical task networks. The documentation indicates that SIPE-2 supports reasoning about domain resources and parallel actions (nonlinear plans). The planner works by repeatedly selecting problems with the current partial plan and resolving them, using code modules called critics. These plan critics check the global constraint network for inconsistencies, identifying resource conﬂicts and problematic interactions between unordered actions. The planner can represent context-dependent eﬀects, supports interactive or automated search for solutions, and has some replanning functionality. The system also includes a GUI and a few small demonstration domains. Unfortunately, while the outward appearance gives a good match to our needs, we found it impractical to work with SIPE-2. Initially, we encountered numerous small but fatal bugs in the GUI, interaction metaphors, and demonstration scripts. We engaged in several iterations of bug-reports and patches with SRI, but the SIPE-2 user interface is particularly clumsy and the complexity of the system’s behavior is a challenge. SRI personnel including Dr. David Wilkins and Dr. Karen Myers were very helpful in getting us up to speed. Once we got the system working on its example domains, we implemented very preliminary UAV-like domains and generated simple plans in these domains. In one of the SRI demonstration domains, we were also able to get the replanning functions to operate, although only on a very simple replanning problem. Eﬀorts to employ similar replanning in our UAV domains were futile. Expected SIPE-2 functions such as resource reasoning, replanning, and even heuristic search guidance are undocumented, more diﬃcult to use than expected, or possibly even unusable without the planner’s source code. In addition, it became clear that the SIPE-2 API is not suﬃcient for our needs, being largely undocumented. Brittleness in the system made it virtually impossible to develop domains that varied signiﬁcantly from the examples included with the distribution. Finally, after several months of eﬀort, we concluded that it was not feasible to use SIPE-2 without the source code, and we abandoned the eﬀort. 5.5. Using Prodigy for the AMP We re-examined our AMP requirements and decided to investigate the Prodigy planner from CMU to see if it would be easier to use for an AMP prototype. Prodigy is freely 22 available for research, and we already had the system, source code, and experts on-site. Prodigy is a goal regression planner that backchains from set of initial goals, building plans that always consist of a totally ordered sequence of operators. Prodigy operates in a nonlinear fashion, choosing which goal or subgoal to work on based on a variable heuristic formed from control rules. Prodigy can consider operators with limited types of conditional eﬀects, and it supports the notion of “applying” an operator to provide more complex forms of forward operator inference that cannot be regressed over. Preliminary results with Prodigy were much better than with SIPE-2: we implemented simple domains, debugged several problems with the planner’s source code in fairly short order, and identiﬁed several required enhancements that were only feasible because we had the source code. Positive aspects of Prodigy include: • • • • • Simple operational concept, well documented. Source code available, fairly well written. Useful directed backjumping and replanning. Simple support for interleaved planning and execution. In-house HTC expertise (Dr. Karen Haigh). In the process of building and testing a small UAV demonstration domain for Prodigy, we encountered several problems that indicate major weaknesses, for our purposes. These include: • • • • No true parallel actions. Little or no eﬀective resource reasoning. No way to handle goals of avoidance (e.g., don’t run out of fuel). No direct control of search backtracking. 5.5.1. Goals of Avoidance One of the key aspects of the UAV planning domain that we had diﬃculty encoding into Prodigy is the notion that the system must not produce plans that use up more fuel than the aircraft has. We’d like to express a “goal of avoidance:” always avoid the situation where fuel is zero or less. Note this is diﬀerent from standard goals of achievement (“get to the goal”), even with negation. Standard achievement goals are not true during some early part of the plan, and the plan makes them true. The plan is valid whether the goal is true or not. With a goal of avoidance, however, we must produce plans in which the undesirable situation never occurs: we must never run out of fuel. If we simply tell Prodigy that we’d like to have fuel greater than zero, it will happily run the fuel below zero and then try to add a new operator that will restore the goal (perhaps by refueling). Of course, it is not satisfactory to have a plan that runs out of fuel, and then tries to refuel. We need a new type of goal, for which any violation is treated as a planning failure and leads to backtracking/replanning. Implementing this type of goal requires fundamental changes to the planning behavior of Prodigy: it must check all of these goals of avoidance 23 each time it projects a new state, and initiate backtracking if any are violated. We made this change to the Prodigy code, and successfully demonstrated the new behavior. As a result, we can now properly handle certain types of avoidance goals. 5.5.2. Motivating the Use of Problem Conﬁguration Operators Perhaps the most interesting and planner-independent observation made during our prototyping eﬀorts was the challenge of “motivating” the Prodigy planner to include the appropriate CSM problem-conﬁguration operators into the mission plan. Prodigy plans actions that achieve a subgoal; the problem with conﬁguration operators is that they do not directly achieve any subgoal, but rather conﬁgure the description of the problem for the CSM to examine. For example, in a phase of the mission where a heat-seeking missile is a likely threat, we have several conﬁguration operators that should be included in the AMP plan so that, when that segment of the AMP plan is passed to the CSM for controller synthesis, the CSM accounts for that missile threat. We’d essentially like all of the applicable conﬁguration operators to be applied by default, and then have the AMP’s search mechanisms explicitly decide which conﬁguration operators to remove if necessary. For example, if the CSM is unable to build a feasible plan with the missile threat accounted for, the AMP can either replan the route or decide to not worry much about the missile threat, omitting those conﬁguration operators from the plan and thus “assuaging” the CSM’s conservative worries. We have not yet solved the problem of motivating the addition of conﬁguration operators; it appears that this functionality should also be added to the core planning behavior of the algorithm, but the exact deﬁnition and structure of the modiﬁcation is not yet complete. We face the diﬃculty that making a very large planning system like Prodigy do new things is very challenging. 6. Demonstrations As noted above, we guided the SA-CIRCA project design and development tasks by aiming towards scenarios in which SA-CIRCA controls a UAV engaged in combat missions. We hoped to obtain a ready-made simulation of UAV systems from Wright Patterson AFB, but licensing issues made it impossible for them to deliver the software to us. As a result, we were forced to enhance an existing CIRCA-compatible ﬂight simulation to provide simple combat-oriented simulation functions. 6.1. The UAV Simulation Leveraging prior work on ﬂying UAVs with CIRCA (both at Michigan and Honeywell), we began with a demonstration system capable of simulating a simple F-16 aircraft model ﬂying over high-resolution terrain. The simulation only ran on high-powered Silicon Graphics computers. The CIRCA controller was interfaced at a fairly high level, providing waypoints and other discrete commands (e.g., deploy ﬂaps) to the simulated aircraft. The simulation itself provided a simple autopilot to ﬂy towards waypoints, and an auto-throttle function to maintain appropriate speeds. 24 Figure 11. The demonstration simulation illustrates an SA-CIRCA-controlled aircraft responding to attacks with evasive maneuvers, ﬂares, chaﬀ, and counterattacks. For the SA-CIRCA demonstrations, the simulator was extensively modiﬁed to enable: • Deployment of chaﬀ and ﬂares. • Automated evasive maneuvers. • Ground-launched IR-guided and radar-guided missiles that track the aircraft and are distracted by ﬂares or chaﬀ, respectively. • Smoke trails for aircraft and missiles, to provide persistent trajectory display. • Projected future waypoint/path display. • Ground-based threat installations (SAM sites and IR launch sites). • Air-launched missiles that track ground targets. • Low-resolution graphics display to run on Michigan’s slower machines. • A rebuilt CIRCA command interface to allow easy expansion and provide access to the new control functions (e.g., ﬂare deployment). Figure 11 shows a screengrab from one simulation run, in which the UAV has been threatened by a surface-to-air missile (seen as a black line curving towards the aircraft) and it is responding with ﬂares (one of which is seen falling below the ﬂight path) and a counter-strike missile (seen arcing down onto the surface installation). 25 6.2. Demonstrations of the Architecture We encoded SA-CIRCA domain models for several variant UCAV demo scenarios, involving radar-guided and IR-guided missile threats and several actions that the aircraft must take in response to those threats (chaﬀ, ﬂares, evasive maneuvers), as well as target strike goals. For example, Figure 6 (on page 9) illustrates one fairly simple domain encoding in which only radar-guided missiles and evasive maneuvers are considered. Using these domains, we generated a variety of TAP controllers capable of responding to diﬀerent spectrums of threats. These controllers were tested in the simulation, and several of the runs were videotaped for presentation at the program review in July, 1999. Demonstrations showed the controllers responding in a timely fashion to threats, and also communicating with high-level replanning functions when non-critical goals were threatened. For example, in one scenario the aircraft responds to a missile threat by evasive maneuvers and chaﬀ, but the resulting divergence from the planned trajectory means that the aircraft will miss its assigned “time on target” if it resumes the prior path. So, recognizing that this goal is threatened, the system invokes a path planner on-the-ﬂy and derives a new trajectory that follows a signiﬁcantly diﬀerent path (actually passing through a diﬀerent river valley) to make up for the lost time. Our Michigan teammates used the same simulator, but modiﬁed the domain models to incorporate probabilistic information suited to their demonstration scenarios, described below. 6.3. Demonstrations of the PSSP The goal of the PSSP demonstrations was to show that the probability information can be used, in overconstrained domains, to shed “the right stuﬀ:” the lowest-probability states and contingencies. And, to show that explicit knowledge of what has been shed can be used to build separate contingency plans that detect and react when the improbable shed states actually occur. The underlying assumption is that the RTS isn’t capable of making all guarantees. To keep the demo/simulation simple, there are only 2 temporal transitions to failure (TTFs), associated with the two types of missiles. More realistically, we would assume that our plane would have many other failure modes to preempt. Here we assume that there are enough other important TTFs being addressed that the remaining resources are insuﬃcient to guarantee reactions for both radar-guided and IR-guided missiles. Figure 12 shows the probability rate functions used for the PSSP demos. Both IR and radar threats are equally likely from the states in which they are enabled. However, IR threats are enabled only when the aircraft is at low altitude, and radar threats only when it is at high altitude. Furthermore, IR threats are more likely to kill you if they occur. However, the aircraft is modeled as only occasionally (and, in fact, non-volitionally) moving to low altitude, and then always returning to high altitude using the climb 26 Figure 12. The PSSP probability rate functions for the demonstration domain. operator. These probability rate functions were deﬁned to set up the key question: if you can only guarantee reaction to one type of missile, which should it be? One simple approach to trying to answer this question is to look at the probabilities associated with the TTFs and ignore the least likely. In SA-CIRCA, this would correspond to having the AMP not even inform the CSM of some TTFs. There are arguments for and against this approach. The story we stress in this case is that simply looking at the TTFs (the probability of them occurring in a state in which they are enabled) in the knowledge base would lead us to believe that IR missiles are more likely to lead to failure when they are used. An approach that does not model the state space probabilities would thus decide to guarantee against IR missiles instead of radar missiles. This leads to the world model and controller design shown in Figure 13. In this case, only IR missiles are handled because responses could not be scheduled for everything. Of course, if radar-guided missiles are encountered, the plane is frequently destroyed, because the CSM has not even considered their existence. It turns out, however, that in this domain IR missiles are not actually encountered that often, because the plane is much less likely to be at low altitude. When the PSSP is given the full set of TTFs and generates the state space, it discovers that the likelihood of failing is actually greater for radar missiles: even though they are less likely to lead to failure in a state in which they are enabled than IR missiles are in a state where they are enabled, the probability of being in a state with radar missiles enabled is much higher. The resulting controller and world model (state space) are shown in Figure 14. Qualitatively, then, we might say that, in a particular mission, the probability of 27 Figure 13. The PSSP state space when only IR missile threats are handled. encountering no missiles is relatively low, and the probability of encountering only IR missiles is medium. The probability of encountering only radar missiles is high. The probability of encountering both kinds of missiles is low-to-medium. Using the probabilities, the PSSP determines that the no-missile/radar-missile combination is the more likely. So in the above, the plan formed handles the no-missile/IR-missile cases. Using the probability reasoning, the chances of survival are increased. However, the aircraft may still encounter both IR and radar missiles, and now it is not prepared for the IR type. So we actually allow the PSSP to build plans that explicitly handle regions of the state space that were pruned for earlier controller designs, and cache these new controllers as contingency plans. Figure 15 shows the resulting world model when the PSSP builds the same nominal plan as above, but also builds a detection TAP for recognizing the unhandled (IR missile) case, and pre-builds a contingency plan to handle this case. Note that the contingency plan itself “deadends” in a state where the IR-missile threat has been defeated. The contingency plan has its own detection TAP for reaching a deadend state, which in turn triggers the dispatcher to retrieve a suitable plan - which in this case is simply going back to the nominal plan. This series of tests illustrates the increasing robustness in the system thanks to modeling state probabilities and detecting/reacting-to unhandled states. 7. Conclusions The SA-CIRCA research has resulted in the development of two novel planning systems (controller synthesis algorithms). The ﬁrst combines AI search techniques and heuristics with formal model-checking methods to produce guaranteed, veriﬁed real-time control plans with useful formal properties. The second system uses probabilities to optimally 28 Figure 14. The PSSP state space when only radar missile threats are handled. Figure 15. The PSSP state space when radar missile threats are handled in one plan, and a contingency plan is built for IR missile threats. 29 allocate planning eﬀort to most-probable threats, and develop contingency plans to handle secondary threats that are less likely. These planning techniques will form key elements of future autonomous, self-evaluating, self-adaptive control systems that operate robustly in mission-critical environments. 30 References [1] R. Alur, “Timed Automata,” in NATO-ASI Summer School on Veriﬁcation of Digital and Hybrid Systems, 1998. [2] E. Asarin, O. Maler, and A. Pneuli, “Symbolic controller synthesis for discrete and timed systems,” in Proceedings of Hybrid Systems II, P. Antsaklis, W. Kohn, A. Nerode, and S. Sastry, editors, Springer Verlag, 1995. [3] C. Boutilier, T. Dean, and S. Hanks, “Decision-Theoretic Planning: Structural Assumptions and Computational Leverage,” Journal of Artiﬁcial Intelligence Research, vol. 11, pp. 1–94, July 1999. [4] D. L. Dill, “Timing Assumptions and Veriﬁcation of Finite-State Concurrent Systems,” in Automatic Veriﬁcation Methods for Finite State Systems, J. Sifakis, editor, pp. 197–212, Springer Verlag, Berlin, June 1989. Proceedings of the International Workshop. [5] M. Fröhlich and M. Werner, “Demonstration of the interactive Graph Visualization System daVinci,” in Proceedings of DIMACS Workshop on Graph Drawing ’94, R. Tamassia and I. Tollis, editors. Springer Verlag, 1995. [6] F. Kabanza, “On the Synthesis of Situation Control Rules under Exogenous Events,” in Theories of Action, Planning, and Robot Control: Bridging the Gap, C. Baral, editor, number WS-96-07, pp. 86–94. AAAI Press, 1996. [7] F. Kabanza, M. Barbeau, and R. St.-Denis, “Planning Control Rules for Reactive Agents,” Artiﬁcial Intelligence, vol. 95, no. 1, pp. 67–113, August 1997. [8] H. Li, E. Atkins, E. Durfee, and K. Shin, “Resource Allocation for a Limited Real-Time Agent Using a Temporal Probabilistic World Model,” forthcoming, December 1999. [9] O. Maler, A. Pneuli, and J. Sifakis, “On the synthesis of Discrete Controllers for Timed Systems,” in STACS 95: Theoretical Aspects of Computer Science, E. W. Mayr and C. Puech, editors, pp. 229–242, Springer Verlag, 1995. [10] D. J. Musliner, CIRCA: The Cooperative Intelligent Real-Time Control Architecture, PhD thesis, University of Michigan, Ann Arbor, 1993. Available as University of Maryland Computer Science Technical Report CS-TR-3157. [11] D. J. Musliner, E. H. Durfee, and K. G. Shin, “World Modeling for the Dynamic Construction of Real-Time Control Plans,” Artiﬁcial Intelligence, vol. 74, no. 1, pp. 83–127, March 1995. 31 Appendix A. CIRCA-II and the Probabilistic State Space Planner 32 Resource Allocation for a Limited Real-Time Agent Using a Temporal Probabilistic World Model Haksun Li, Ella Atkins*, Edmund Durfee, Kang Shin EECS Dept., University of Michigan, Ann Arbor MI *Aerospace Engineering Dept., Univ. of Maryland, College Park MD {haksunli, durfee, kgshin} @umich.edu, atkins@eng.umd.edu environmental dynamics to intervene. In the context of events that can lead to catastrophe, this simply means that the agent should (re)act fast enough to preempt1 such catastrophic events. In this paper, we present a framework for modeling the dynamics of time-dependent event probabilities and an algorithm for computing state probabilities. We thus can prioritize the states of the world by their state probabilities, so that the system can devote its resources to the most probable eventualities over the less probable ones. To ground our discussion, we describe our timedependent state-transition probability model in the context of CIRCA-II (Section 2). In Section 3, we describe how we model state transitions with a discrete model, and contrast this with a continuous time model in Section 4. We detail the state probability computation in Section 5 and contrast its performance to previous versions of CIRCA. In section 6, we discuss how a chain of state transitions complicates the problem and how we handle them. Section 7 describes our stochastic simulation to verify the correctness of the theory in practice. Section 8 concludes our works and suggests directions for development. Abstract In a dynamic, unpredictable world, a resource-limited agent cannot be ready to recognize and respond fast enough to every possible situation, and thus should prepare for situations that are more likely to occur, and that are more severe. The likelihood of encountering a particular world state, however, is dependent not only on the choices of what (re)actions the agent periodically considers, but also on how frequently it considers them. We have developed and tested a framework for modeling the dynamics of time-dependent event probabilities and an algorithm for computing state probabilities, so that our resource-limited agent can devote its resources to the most probable eventualities over the less probable ones. 1. Introduction An agent with limited perceptual, computational, and actuator resources must carefully allocate these resources to do the best job it can in monitoring aspects of the world state and acting appropriately to achieve its goals (and maintain its safety) in a complex, dynamic world. If the resource-limited agent allocates more of its resources (or some of its resources more frequently) to monitoring for some states (or state features), then it will be less capable of tracking others successfully. Designing an effective schedule of monitoring for and responding to activities with the available resources, therefore, is a complex optimization problem. Although some clever techniques, such as indexing, hashing, or grouping states together into action classes may alleviate the problem of allocating limited resources to some extent (Musliner, et al. 1993), the fundamental problem is that monitoring for and responding to all conceivable contingencies by a resource-limited agent in a complex and dynamic domain is not realistic. It is therefore reasonable to have the agent prioritize the use of its resources to prepare for those situations that are most likely to occur. Determining the likelihood of encountering a particular world state, however, can be challenging because the likelihood is dependent not only on the choices of what (re)actions the agent should perform, but also by its choices of how frequently it checks whether to perform them. By definition, a dynamic environment is one in which the state can change via events outside of the agent's control. Thus, the sooner an agent detects and responds to a situation, the less opportunity there is for the 2. Background As AI systems are applied in increasingly dynamic and complex domains, it becomes more difficult to make assurances about how well they will perform. Techniques have been under development to support the construction of probabilistic plans (Kushmerick, Hanks, and Weld 1994) for applications where the actions taken by an agent can have uncertain outcomes. Planners based on Markov Decision Processes (MDPs) incorporate representations of uncertainty about how actions and events may move the world among states. MDP planners generate plans (or policies) about what action (if any) an agent should perform for each observable state so as to maximize the agent's expected performance. MDP planning (Dean, Kaelbling, Kirman, and Nicholson 1993) is most straightforward under assumptions such as: that states are fully observable (the agent executing the plan can know exactly what state it is in); that event models are implicit (the uncertainties about how events beyond the agent's 1 Preemption thus means that the agent’s action occurs before the undesirable event, thus forestalling that event. 33 control affect the states it can reach via an action it takes are folded into the state transition probabilities associated with the action); and that a transition probability is stationary (dependent only upon the state and not upon timing associated with the state). In the terminology of (Boutilier, Dean, and Hanks 1998), these assumptions hold for a stationary, Fully Observable MDP with implicit event models. Our efforts in developing AI-based agents for realtime applications have forced us to relax some of these assumptions in ways different from typical relaxations (such as for Partially Observable MDPs). In a real-time application, not surprisingly, a core concern is "how long" something will take. Making real-time guarantees means that the agent must take some actions before catastrophic events could occur. That is, between the time when a state arises that heralds an impending event to be avoided (e.g., a slow-moving car enters your lane ahead of you), and the earliest time that that event (e.g., collision) could occur after the heralding state is entered, a real-time agent must be assured of taking an action (e.g., swerving) that averts the undesirable event. Because an agent is possibly moving non-deterministically among many states, it must be prepared to recognize and react fast enough to a number of heralding states. Given limited sensory and processing resources, the set of "recognition-reactions" must be carefully selected and scheduled to ensure each is checked frequently enough to preempt failure. The Cooperative Intelligent Real-time Control Architecture (CIRCA-II) is an example of a system that has been developed for selecting, scheduling, and executing recognition-reactions in a resource-limited system. CIRCA-II extends the original CIRCA architecture (Musliner et al, 1993, 1995) to use probabilities for making resource allocation decisions and to cache plans for foreseeable but low-probability states. The CIRCA-II architecture is shown in Figure 2-1. Here, we concentrate only on the relevant CIRCA-II features to explain how we compute state probabilities in a real-time environment. More complete treatments of the overall CIRCA-II architecture are presented elsewhere (Atkins 1999). Planner Knowledge Base Environment initial state(s), subgoals, state transitions State-space Planner Real-time Plan Executor TAP plans state feedback Plan Cache TAP plans state feedback Real-Time Subsystem (RTS) Scheduler Database resource usage, real-time constraints, fault list TAP lists schedules, utilizations Resource Scheduler Artificial Intelligence Subsystem (AIS) CIRCA -- The Cooperative Intelligent Real-time Control Architecture. Figure 2-1 There are two main subsystems in CIRCA-II, the Artificial Intelligence Subsystem (AIS) and the Real-Time Subsystem (RTS). Inside the AIS are the probabilistic planner and the scheduler. The planner has a representation of world states and transitions among them in terms of a state space diagram. If there is any transition leading to a failure state (failure transition) from a state, the planner will plan a guaranteed action such that the failure transition is preempted. Otherwise, it may plan a ‘if-time’ action, which is only executed should resources allow, to get the system closer to the goal state. A design goal of CIRCA-II's State-Space Planner is that it uses a compact representation of the information needed to generate realtime temporal control plans. Specifically, CIRCA-II does not generate more than one state for any state in the state diagram even if it can be reached via multiple paths from the initial states. Therefore, the state diagram is not necessarily a tree-structured diagram  cyclic paths may be introduced into it. In other words, we do not keep track of the history, and we plan a single action for a group of situations with the same state features (which is represented only by one state in the state diagram), regardless of how that state is reached. The plan developed is thus a recognition-reaction plan. It is precisely this characteristic that allows CIRCA-II to formulate a cyclic schedule in which actions are planned for states (no matter when those states are encountered) so that undesirable transitions cannot occur. 34 II has mechanisms to detect it and replan or retrieve a contingency plan, as are discussed elsewhere (Atkins 1999). Furthermore, because event probabilities depend on which actions are scheduled and how frequently, implicit event models become untenable because of the large space of potential actions and each possible delay for each action. CIRCA-II therefore explicitly models events. As pointed out by (Boutilier, Dean, and Hanks 1998), specifying a transition function for an action and one or more exogenous events is generally not easy. In what follows, we describe and evaluate our strategy for doing this. Therefore, CIRCA-II's planning could be viewed as a means for formulating an approximately optimal plan for a (partially) non-stationary, FOMDP with explicit event models. An action planned by the planner together with the test part for state recognition is called a Test-Action-Pair (TAP). The set of TAPs, generated by the planner, is then passed onto the scheduler to find a feasible schedule for all the TAPs. The RTS is a predictable, reactive system that repeatedly loops over the TAP schedule, executing each TAP’s test expression (which encapsulates all the relevant sensing actions) and executing its action if the test returns true. Since the scheduler is working with a RTS with limited resources, it could be the case that not all of the requested TAPs can be scheduled. When this occurs, the planning system could identify and try out alternative TAP combinations in the hope of finding one that works. Often, however, the RTS simply may be incapable of guaranteeing any combination of TAPs that completely assures that all critical real-time responses are met. In such cases, it is up to the planner to find a schedulable set of TAPs that assures that as many of the most critical responses are met as possible. In this paper, we will concentrate on the planner, which employs the new probabilistic temporal transition framework, and plans selectively based on state probabilities given the resource constraints of the RTS. Here we summarize the algorithm of the planner, which is the setting for the discussion in the later sections. The planner, during each planning cycle, selects the most probable state and “expands” it by applying all enabled event transitions (whose conditions are met in the state) and computing their time-dependent transition probabilities. Depending on the possible consequences of these events, the planner may select an action for the state, and the action's resulting state will be added to the set of possible successor states of the action. The probabilities of each of these states are computed, and the planning cycle loops. If one of these states has already been generated in an earlier planning cycle, its probability (and the probabilities of its successors, if any have yet been generated) must be updated. The planner continues in this cyclic manner until all reachable states have been investigated, or until only the states whose probabilities fall below some threshold parameter remain. If one of the low probability "unplanned" states is actually reached, CIRCA- 3. Temporal Transitions The CIRCA-II world model is constructed from a set of state features and state transitions included as part of the planner knowledge base. A state in the world model is created dynamically by applying a transition to its parent state during planning. There are two types of transitions. Action transitions are explicitly controlled by the plan executor in the RTS, and thus only occur when selected during planning. Events outside the system’s control are modeled as temporal transitions, either “innocuous” temporal transitions (labeled tt) or “malicious” temporal transitions to system failure (labeled ttf). CIRCA-II’s primary objective is to avoid system failure. It considers goal achievement only when the safety requirements are met. When expanding a state, CIRCA-II bases its action selection primarily on failure avoidance when a ttf is present, only considering proximity to the goal when more than one action is capable of avoiding failure. Alternatively, if no ttf is present, CIRCA-II selects the best action, called a ‘if-time’ action, to achieve the system goals, as described in (Musliner 1993). If no action is required for failure avoidance or needed for goal achievement, CIRCA-II selects no action. As mentioned in the previous sections, we have to model the likelihood of a transition as a function of time to capture the dynamic nature of the world. We will detail in what follows our framework for modeling temporal transitions. For a temporal transition ttj, the user specifies what is called a “probability rate” function Prate(ttj,th) over a time interval [th,th+1) (e.g., seconds), given that transition ttj has been continuously active for h time steps. For example, if a fair coin is flipped once per second, the “probability rate” function for the transition from “heads” to “tails” has a constant value of 0.5 (50%) over each second, regardless of how much time has passed, given that the state is “heads” after the flips so far. Figure 3-1a shows the probability rate function for this coin flip example. Many realistic state transitions can be easily defined in terms of “probability rate”. Figure 3-1b describes when an engine is first put into service, its “failure rate” decreases during a break-in period. Afterwards, the rate is very small during the normal operation period. When the engine nears the end of its life (e.g., 2000 hours for a small propeller-driven engine), the failure rate increases until the engine is considered “unsafe” and must be retired. 35 Pr(tt,t) Pr(tt,t) Pr(tt,t) 0.5 t a) Heads-to-tails tt. break-in t b) Engine failure tt. min∆ max∆ t c) Minimum delay ttf. Figure 3-1: Temporal transition probability rate functions. Figure 3-1c shows a transition probability rate function for a temporal transition with a minimum delay (min∆) before it is possible and a maximum delay (max∆) after which the transition cannot occur. CIRCA-II must select an action to preempt this ttf provided such an action can be scheduled before time min∆. Action transitions, like temporal transitions, are also described by probability rate functions, such that the conditional probabilities of the transitions from a state can be computed. Equation 3-1 (where i stands for the i-th time interval [ti, ti+1)) shows the probability rate function we adopt for guaranteed actions. The important point to note is that the action has an associated maximum period (maxp) in which it must execute to preempt its ttf(s). 1   Pr( ac ,t ) = ( maxp − i ) i  0  (0 ≤ i < maxp )  (i ≥ maxp)  in our discrete case we are considering time intervals. Hence, there is a non-zero probability that two (or more) transitions could fire over the same time interval. Because we only want to allow one transition to fire (presumably, one would come before the others in the time interval), we have to divide the probability of a combination of transitions among the separate transitions. Rather than dividing it equally (treating each transition in the combination as being equally likely to occur first in the interval), our intuitions (borne out in Section 4) are that more probable transitions jump in front more often. Thus, Equation 3-2 is essentially saying that the conditional probability in time interval [th,th+1) of a transition from state sk is proportional to the relative logarithm of one minus its likelihood among all other transitions, given that the system is still in state sk in the time interval. We will explain in Section 4 the significance of this heuristic. Currently, Prate(none, th, sk) is calculated using 3-3. Prate ( none ,t h ,sk ) = ∏ (1 − Prate (trans j ,hh ,sk )) (3-3) ∀transr ∈trans ( sk ) Equations 3-2 and 3-3 are heuristics to estimate the conditional probabilities from the unconditional probabilities given the compact representation that does not capture conditional dependency information among the temporal transitions. Two assumptions are made in 3-2 and 3-3 to model the conditional dependencies. First, we are assuming that the probability of nothing happening is calculated as if all temporal transitions were independent. Second, we are assuming also that the conditional probabilities of the transitions in each time step are proportional to their unconditional probabilities. As a simple example, let tt1 and tt2 be two temporal transitions which have unconditional probabilities of 0.8 and 0.4 respectively at each time step. From our equations, the conditional probabilities in any time interval are as shown in Figure 3-3. Equation 3-5 tells us that the transition probabilities are 0.759 for tt1 and 0.241 for tt2. (3-1) For ‘if-time’ (best-effort) actions, which are executed in slack time slots produced when guaranteed actions do not require their worst-case execution time (Musliner 1993), the probability rate function is for now a constant value that is only a heuristic guess and can be assigned by the user. In future work, we plan to incorporate a function similar to that for guaranteed actions to more accurately estimate the timing information associated with ‘if-time’ action transitions. The probability rate function captures the probability of an event occurring independently of other events over any time interval. To model the dependency among a set of tts that match a state, we can approximate the conditional probability rate function Pcond(transj, th, sk) for transition transj from state sk during time interval [th,th+1) heuristically as given by Equation 3-2. ln (1 − Prate (trans j , t h , sk )) (1 − Prate (none, t h , sk )) Pcond (trans j , t h , sk ) ≈ (3-2) ∑ ln(1 − Prate (trans r , th , sk )) ∀trans r ∈ trans ( s k ) Prate(transj,th,sk) is the unconditional probability of transition transj from state sk during time interval [th,th+1). In a continuous time model, the probability of two transitions firing simultaneously is vanishingly small, but 36 values of their probability rate functions in that time interval. This ensures that the probability of following multiple transitions at the same time interval is zero, because (unless it is modeled as an explicit transition) the probability of two transitions occurring at exactly the same time point is zero. We can compare the results from our heuristic in the discrete time interval case with the results one could get using a continuous time model. Our analysis is two-folded. We will first show the case in which there are only temporal transitions in a state. Then, we will consider the case in which a state has a guaranteed action. To simplify our analysis, we begin by assuming the simplest case in which all probability rate functions for temporal transitions are constant, as in the example in Figure 3-3. If there are only temporal transitions in a state, we can pose the question mathematically: let tt1, tt2, …, ttn be n different independent temporal transitions which have constant rates, r1, r2, … rn. Ti is the time that transition tti fires. T = min{T1, T2, … Tn}. What is the probability that P{Ti = T}? When a temporal transition has a constant rate, the probability of it firing in any time interval [t, t+∆t) is the same as any other time interval. Another way of saying this is that the probability of a transition firing after time t+s, given that time s has elapsed is the same as the probability of the transition firing after time t: P(tt1) = 0.8 P(tt1) = 0.668 P(tt2) = 0.2112 P(tt2) = 0.4 P(none) =0.12 The unconditional probabilities of tt1 and tt2. The conditional probabilities of tt1 and tt2. Figure 3-3 In general, the problem of calculating conditional probabilities from only unconditional probabilities is difficult. Our heuristic, based on the two biases we have chosen, is only one of the many ways to do the estimation. In Section 4, we will consider this example using continuous-time models to justify our heuristic. To compute the state probability, CIRCA-II calculates the transition probability for transj from state sk to another using the (overall) cumulative probability Pcum(transj, sk). This describes the relative probability of leaving state sk via transj, and is equal to the sum of Pcond(transj, th, sk) * P(sk, th) over time. P(sk, th) is the probability of the system still being in state sk in time interval th, and it is recursively calculated using 3-4, where P(sk, 0) = 1.0. (3-4) P (s , t ) = P (s , t )P (none, t , s ) k h k h −1 rate h −1 ( ) The only real-valued functions satisfying f(s + t) = f(s)f(t) are of the form f(t) = e-λt. Hence, the distribution of Ti must be an exponential distribution with parameter λ. To summarize, we have: • The probability of tti firing after time t is e −λt . • The cumulative probability function of tti, i.e., the probability of tti firing before time t, is 1 − e − λt . • The probability density function of tti is λe − λt . For independent transitions: k Equation 3-5 computes the transition probability from the initial time interval, (t0, t1), up to the “converged” time interval (tn, tn+1) when either the likelihood of being in state sk is below a preset threshold ε (i.e., P(sk,tn) < ε) or after which all the transition probability rates are negligible (Pcond(transj, ti, sk) < ε), where i > n. P(Ti > t ) = P (T1 > t ,T2 > t ,...,Tn > t ) = P (T1 > t )P(T2 > t )...P(Tn > t ) = e − λ1t e − λ2 t ...e − λn t = e − (λ1 + λ2 + ...λn )t (3-5) Pcum (trans j , sk ) = ) ( P Ti ≥ s + t | Ti ≥ s = P Ti ≥ t To calculate the probability that tti fires first: ∞ ∑ Pcond (trans j ,th ,sk ) P (sk ,th ) h =0 4. Heuristic Justification Equation 3-2 describes how we approximate the relative likelihood of all the temporal transitions in a time interval. The heuristic divides the probability of transitioning out from a state in a particular time interval among the alternative transitions in proportion to the 37 preempted. To go through the same analysis above, we will have to have the probability density function for a guaranteed action, ac, which has a period P. According to our definition of a guaranteed action transition, and as in Section 3 using the approximation2 that the action is equally likely to fire at any time within P, we have the following: • The cumulative probability function of ac, i.e., the probability of ac firing before time t is t , if 0 ≤ t ≤ P P 0 , if t > P • ∞ P (Ti = T ) = ∫ P (T1 > t ,...,Ti −1 > t ,Ti +1 > t ,...,Tn > t )dP{Ti = t} Taking the derivate of the cumulative probability function, the probability density function of ac is 1 , if 0 ≤ t ≤ P P 0 , if t > P 0 ∞ = ∫ e −λ1t ...e −λi −1t e −λi +1t ...e −λnt λi e −λi t dt • 0 The probability of ac firing after time t is ∞ = ∫ e −(λ1 + ...+λi −1 + λi +1 + ...λn )t λi e −λi t dt 0 = λi We can derive the probability rate function from these equations. The probability of an ac firing in the time interval (t1, t2) given that it has not fired before t1 is (4 - 1) λ1 + ... + λn P(t1 ≤ T ≤ t 2 | t1 < T ) = Therefore, the relative likelihood of tti is proportional to λi among those of all temporal transitions. We are now ready to connect λ to rate r in the discrete case. Let us assume that in the discrete case, the size of a time interval is 1. So, the probability of tti firing after t, i.e, t time intervals, is (1 – r)t. Equating this with the continuous time counterpart above, we get: (1 − r )t = e − λ t 1 − r = e = P(Ti = T ) = = = t 2 − t1 P − t1 P ∫ P(T > t ,..., Ti − 1 > t , Ti + 1 > t ,..., Tn > t , ac > t )dP{Ti = t} P  P − t  −λi t ...e − λ i − 1 t e − λ i + 1 t ...e − λ n t  λi e dt  P  1 0 −λ = ∫ (e − λ1 t 0 λ = − ln (1 − r ) Therefore, it is possible to convert from the discrete rate, r, to the continuous rate, λ, and vice versa. Their relationships are: r = 1 − e −λ λ = − ln (1 − r ) If we substitute r into the (4-1), we get P(Ti = T ) = P(t1 ≤ T ≤ t 2 ) P(t1 ≤ T ) As t2 moves toward P, keeping the time interval (difference between t1 and t2) unchanged, the probability rate goes to 1 as we would expect. As in the analysis above, to calculate the probability that tti fires first: − λ r = 1 − e P −t , if 0 ≤ t ≤ P P 0 , if t > P )  P−t = ∫ e − (λ 1 + ... + λ i − 1 + λ i + 1 + ...λ n )t λi e − λ i t   dt  P  0 P P  1 P = λi  ∫ e − Λt dt − ∫ te − Λt dt  P0 0  = λi Λ (1 − e ) + λ − ΛP i Λ e − PΛ − λi PΛ 2 (1 − e ) − ΛP n , where Λ = ∑ λ i i =1 This answer is much more complicated than the logarithm heuristic we had before because of the added consideration of a guaranteed action. In fact, when P → ∞ (an infinite period means that the probability of the action firing in a finite time approaches zero), this expression becomes λi , which is exactly what we had before. λi λ1 + ... + λ n − ln(1 − ri ) − ln(1 − r1 ) + ... + − ln(1 − rn ) ln(1 − ri ) ln(1 − r1 ) + ... + ln(1 − rn ) Λ This is essentially the logarithm proportionality heuristic in Equation 3-2, and using this formulation we get the same transition probabilities as in Figure 3-2. λtt1 is 1.609 and λtt2 is 0.511. The transition probabilities are therefore 0.759 for tt1 and 0.241 for tt2. This agrees with what is calculated by Equation 3-5. Now, let us consider a case that involves a nonconstant function. Often in CIRCA's state space diagram, a state may have a guaranteed action if a ttf has to be We will also want to find the probability of the action, ac, being the first transition to fire. 2 This is only an approximation because it ignores lag time introduced by sensing, and also the “jump” a transition might get if it is part of a dependent chain (see Section 5). 38 problem with arbitrary rate functions in a continuous time domain can be complicated and costly. On the contrary, our discrete approximation does not increase much in complexity even if the probability rate functions are arbitrary. If we can prove what our results above suggest -- that the discrete approximation can give arbitrarily accurate results by discretizing more finely regardless of the shapes of the transition functions, then our approximation provides a simple, yet effective and powerful method to compute transition probabilities from domain-dependent transition functions. As will be discussed in the later sections, from these transition probabilities, we can compute the state probabilities reasonably well. P P(Ti = T ) = ∫ P(T1 ≥ t ,..., Tn ≥ t )dP{Tac = t} 0 P = ∫ e −λ1t ...e −λnt 0 P = ∫ e −(λ1 +...+λn )t 0 = 1 dt P 1 dt P 1 − e − PΛ PΛ n , where Λ = ∑ λ i i =1 We have done a sanity check to verify that the sum of all the probabilities of any transitions firing first indeed equals 1. (This can be worked out by the reader.) To verify how well our heuristic approximation in the discrete case works when a state also has a guaranteed action, we consider the following example: a state has tt1 of constant probability rate 0.1, tt2 of 0.6, and an action with period 5. In general, the more finely we discretize the time intervals, the better results we get. Here is a summary of our calculations. For the action transition: 5. State Probabilities With our framework for computing transition probabilities, we can estimate the probability of the system ever entering a particular state in order to decide whether to allocate resources to responding to that state. After computing the transition probabilities, as described in number of time section 3, we can construct a state diagram, in which each intervals time each discretized true value as node corresponds to exactly one state. The state diagram (discretitime calculation value: error rate can be represented by a matrix Mij as in Figure 5-1. M[i][j] computed in (per- is the transition probability from node i to node j (we will zation interval action transition continuous time level) represents centage) probability domain use “node” and “state” interchangeably). Note that, given 5 1 sec 0.200162756 0.194578 2.870137 our characterization so far, the transition probabilities 10 0.5 sec 0.195908691 0.194578 0.683835 satisfy the Markov property. That is, once we have 100 0.05 sec 0.194591233 0.194578 0.006751 computed the probability of transitioning from one state to For tt1 transition: another based on the various time-dependent probability number of rate functions, we can form the state transition diagram. discretized time Given the diagram, we are now ready to compute the state calculation true value as intervals time each probabilities. We present a theorem from (Kemeny and value: tt1 computed in (discretitime Snell 1960): transition continuous time error rate (perzation interval The probabilities of going from any nodes in the probability domain centage) level) represents transient set (a transient node is one which has outgoing 5 1 sec 0.0824 0.083059 0.79281 transitions, i.e., tts, ttfs, and/or actions) to any nodes in the 10 0.5 sec 0.0829 0.083059 0.19083 absorbing set (an absorbing node is one which has no 100 0.05 sec 0.08306 0.083059 0.001806 outgoing transitions) are given in the matrix: For tt2 transition: (5-1) number of P = NR, where N = (1 − Q) −1 time intervals time each true value as R is the matrix indicating the probabilities of going time simulation value: (discreticomputed in from the transient nodes (row indices) to the absorbing tt2 transition continuous time error rate (per- nodes (column indices) in one step. Q is the matrix interval zation probability level) represents domain centage) indicating the probabilities of going from the transient 5 1 sec 0.71735 0.722361 0.69366 nodes (row indices) to other transient nodes (column 10 0.5 sec 0.721167 0.722361 0.16525 indices) in one step. 100 0.05 sec 0.722349 0.722361 0.00162 Again, as suggested by the data, the more we discretize the time intervals, the more closely our discrete approximation matches the continuous time model even when some transitions are not constant rate functions. Future work will examine the limits of our discrete approximation, such as how well it works with more arbitrary functions. Techniques for solving a general 39 1 ……. n probabilities from all the initial state nodes to T is the state probability of T. 1 Identity 0 Algorithm 5-1: Computing state probabilities ……. To see how this algorithm improves CIRCA-II’s performance, we will compare the results of computing the state probability by the previous CIRCA-II method without an update mechanism (Atkins 1996), and this new CIRCAII formulation using the example in Figure 5-2. This example illustrates the situation of an aircraft trying to maintain its altitude (perhaps in turbulence) and flying to FIX2. State 1 has two transitions, tt2 and ac1, which have constant probability rate functions of 0.9 and 0.5 respectively. The transition probabilities are 0.6429 and 0.3571 respectively as computed by Equation 3-2 and 3-3. The previous CIRCA-II would predict that the likelihood of reaching state 2 and state 4, the destination, are only 0.3571 (Atkins 1996), because it ignored the contribution of ac2 which was only known to the planner after expanding state 1. However, the probabilities should be 1.0 as computed by the new formulation, algorithm 5-1. As the flight is able to restore a safe altitude at all times, it can fly to its destination with certainty. This is something that the old CIRCA-II would not conclude! Q R transient states to transient states transient states to absorbing states n Figure 5-1 Both matrices, R and Q, are readily available from the transition matrix Mij. To compute the state probabilities of all absorbing nodes in the state diagram, we only need to apply 5-1 once. However, to compute the probability of a transient node, we make it into an absorbing node by truncating all its outgoing transitions. Since we are only concerned with the probability of ever visiting a node, all paths coming out from the node will not further affect the state probability even if the path subsequently enters into the node again, i.e. a cycle. Unfortunately, N and R will thus be different in each computation of the state probability of each transient node. The cost of computing the state probabilities for transient states will therefore be higher than that for absorbing states because equation 5-1 has to be used for each transient state probability. Here we outline the algorithm of computing the state probabilities of all nodes in a state diagram. Let G be a state diagram with n nodes, i initial state nodes, p absorbing nodes, and q transient nodes. Initial states are assumed transient. Therefore p + q = n. 1. 2. 3. 4. 5. 6. tt2 has the prob. rate function of 0.9 at all time intervals. ac1 has the prob. rate function of 0.5 at all time intervals INITIAL: INITIAL: Loc==FIX1 FIX1 Loc Alt==High High Alt Heading==Undef Undef Heading 11 ac1 Loc==FIX1 FIX1 Loc Alt==High High Alt Heading==FIX2 FIX2 Heading 22 tt1 Loc==FIX2 FIX2 Loc Alt==High High Alt Heading==Undef Undef Heading 44 tt1 = fly-to-FIX2 tt2 = lost-altitude tt2 ac2 ttf1 = crash ac1 = begin-to-fly-to-FIX2 Index the nodes of G such that the indices of the absorbing nodes are less than those of the transient nodes. Construct the transition matrix M for G for this particular indexing. Compute R and Q from M. R = M[p+1 … n][1 … p]. Q = M[p+1 … n][p+1 … n]. Compute N = (1 – Q)-1. Compute P = NR. P is a q * p matrix giving the probability Pij from any transient node i to any absorbing node j. Therefore, the sum of the probabilities from all the initial state nodes to an absorbing node is the state probability of the absorbing node. At this point, we have computed all the state probabilities for all absorbing nodes. For each transient node T, do: i. Truncate all outgoing transitions of T. ii. Index the nodes in G such that the indices of the absorbing nodes are less than those of the transient nodes. T has index 1. Construct a transition matrix M for G for this particular indexing. iii. Compute R and Q from M. R = M[p+2 … n][1 … p+1]. Q = M[p+2 … n][p+2 … n]. iv. Compute N = (1 – Q)-1. v. Compute P = NR. vi. P is an (q – 1)* (p + 1) matrix giving the probability PiT from any transient node i to the absorbing node T. Therefore, the sum of the Loc==FIX1 FIX1 Loc Alt==Low Low Alt Heading==Undef Undef Heading 33 ttf1 Failure Failure 5 ac2 = climb 5 CIRCA Aircraft Simulation State-space with Cyclic Structure. Loc = Location along the route of flight among the various “FIX” values (a “FIX” is a navigation term). Alt = Altitude of the aircraft (with High or Low discrete values) Figure 5-2 Algorithm 5-1 can accurately compute the state probabilities given that the transition probabilities are accurate. We described in Section 3 a means of approximating these probabilities, but as we will see in the next section, sometimes the calculations can be more problematic. 6. Dependent Temporal Transitions Although a temporal transition can fire in any of the states, the probability rate functions of the same temporal transition may differ across states depending on how they are reached. When a temporal transition is triggered across a sequence of states, its rate function in a later state must consider the time spent in prior states. The time-dependent probability rate functions of the event in each of the states must take into consideration the time spent across multiple states, such as tt1 in Figure 6-1. We call such temporal 40 transitions dependent temporal transitions (labelled dtt) because their probability rate functions depend on their parent states. For example, suppose the probability of engine failure, tt1, increases with time. Then the probability of tt1 occurring is higher in FIX2 than in FIX1 because tt1 is already "enabled" in FIX1 before the flight enters FIX2. INITIAL: INITIAL: Loc = FIX1 Loc = FIX1 Status = Norm Status = Norm Heading Undef Heading ==Undef ac1 Loc==FIX1 FIX1 Loc Status = Norm Status = Norm Heading = FIX2 Heading = FIX2 tt1 tt2 Loc==FIX2 FIX2 Loc Status = Norm Status = Norm Heading = Undef Heading = Undef tt1 Loc = FIX1 Loc = FIX1 Status==Emer Emer Status Heading = Undef Heading = Undef ttf1 Loc==FIX2 FIX2 Loc Status = Norm Status = Norm Heading = FIX3 Heading = FIX3 tt1 Loc = FIX1 Loc = FIX1 Status==Emer Emer Status Heading =FIX2 Heading =FIX2 ac3 ac2 ac3 ttf1 Loc = FIX2 Loc = FIX2 Status==Emer Emer Status Heading = FIX3 Heading = FIX3 ttf1 ac3 ac3 tt1 Pdtt(tti ,th ,sk ) = Loc = FIX3 Loc = FIX3 Status==Emer Emer Status Heading = Undef Heading = Undef     1  × P(sp ) Pcum(transj ,sk )  Pinitial(sk ) + ∑ transj   ∀( p, j ) ∋ (sp  →sk )   ttf1 ac3 tt1 = engine-failure Loc = LAND Loc = LAND Status==Emer Emer Status Heading = Undef Heading = Undef Failure Failure GOAL: GOAL: Loc==FIX3 FIX3 Loc Status = Norm Status = Norm Heading = Undef Heading = Undef tt1 Loc = FIX2 Loc = FIX2 Status==Emer Emer Status Heading = Undef Heading = Undef ttf1 tt3 transition transj from parent state sp. If there is no tti in sp, then the probability rate function is “unshifted”, i.e. it is simply the unmodified probability rate function. Otherwise, it is the sum of all probability rate functions “shifted” by all possible time delays in sp weighted by their relative probabilities. The probability rate function Pdtt(tti, th, sk) for a dependent temporal transition tti from state sk in the h-th time interval [th, th+1), is:    P ( s )P ( tt ,t ) + ∑transj P(sp ) Pcum(transj ,sk )Pdtt(tti ,th ,sp ,transj )  initial k rate i h ∀( p, j ) ∋ (sp →sk )   tt2 = fly-to-FIX2 tt3 = fly-to-FIX3 ttf1 = crash ac1 = begin-to-fly-to-FIX2 CIRCA Aircraft Simulation State-space with Tree Structure ac2 = begin-to- fly-to-FIX3 and a Dependent Temporal Transition ac3 = emergency-land where P(s) is the state probability of state s. Equation 6-3 is essentially saying that Pdtt(tt, th, sk) is a weighted average of all possible “shifted” dtts from all parents, and it is justified in Theorem 6.1. Theorem 6-1: If the states which have the dependent temporal transitions comprise an acyclic graph, then the probability rate function estimated by Equation 6-3 is the best in terms of minimizing the mean square error at any given time interval. Proof: In full paper. Complication arises when we try to apply the formulation to world models with cyclic structures. Figure 6-1 Suppose state sk has a dtt, tti. If we knew from which parent sp the system entered sk via transj, and the time tg it spent in sp, then we could model the probability rate function of the dtt, tti, using Equation 6-1. (6-1) Pdtt tti ,t h , s p ,trans j = Prate ( tti ,t g + t h , s p ) ( ) However, this piece of information is not available during planning. In general, there are multiple parents to state sk, and the system can stay in any of the parent states for a variable amount of time. Each combination of parent state sp and time spent in sp, tg, gives rise to a particular “shifted” probability rate function for the dtt, as described in Equation 6-1. However, we can take the average of these probability rate functions by weighting them by the probabilities of the time tg spent in sp, i.e., Pcond(transj, tg, Sp). The sum of these weights is simply the cumulative probability of transj from sp to sk, i.e. Pcum(transj, sp). Equation 6-2 and 6-3 describe how CIRCA-II models a dtt based on the probability rate functions of the dtt in the parent states. INITIAL: INITIAL: Loc = FIX1 Loc = FIX1 Fuel==Yes Yes Fuel ac1 tt1 Loc==FIX1 FIX1 Loc Fuel==No No Fuel ttf1 Failure Failure ac2 Loc==FIX2 FIX2 Loc Fuel==Yes Yes Fuel tt1 Loc==FIX2 FIX2 Loc Fuel==No No Fuel tt1 = out-of-fuel ttf1 = crash ac1 = fly-to-FIX2 ac2 = fly-to-FIX1 ttf1 CIRCA Aircraft Simulation State-space (Holding Pattern) with a Cycle and a Dependent Temporal Transition for All States in the Cycle when tti ∉ { trans(s p ) }, Figure 6-2 As in Figure 6-2, to accurately model the dependent temporal transition tt1 from the initial state using Equation 6-3, it requires the probability rate function of the other tt1, which, unfortunately depends in turn on the tt1 in the initial state. A reasonable heuristic to compute the probability rate functions for the dtts that are in every state in a cycle is to traverse the cycle, applying equation 6-3, until the computation converges. when tti ∈ { trans (s p ) }. 7. Evaluation (6-2) Pdtt (tti ,th , s p ,trans j ) = Prate (tti ,th )     ∞ P (trans ,t , s )P (tt ,t + t , s ) j g p rate i g h p  ∑ cond Pcum (trans j , s p ) t g =0 (6-3) Although the theory points out the expected error for modeling a dtt probability rate function is the variance of Ptrue(dtt, t, sk), which is domain dependent, we would like to see how well our heuristic works in practice. We have Pdtt(tti,th,sp, transj) is the "shifted" probability rate function to reflect the effects of dependent temporal transition (dtt), given that the current state sk is reached via 41 generated a set of state diagrams with various temporal transitions. We find the state probabilities of the states using a stochastic method, a random walk in the state diagram. Then, we compare the results of the simulations with the state probabilities computed by the set of equations described earlier. As the probabilities calculated by the two methods are reasonably close, we are confident that the theory provides a reasonable framework to model dtt probability rate functions. Over the 20+ experiments we ran, each for 50000 to 100000 trials, the errors ranged from about 1% to 8%. We have noticed that the errors tend to be larger for periodic functions (in which the probabilities rise and fall periodically). Qualitatively speaking, since we are essentially taking an average of all the "delayed" probability rate functions, the error between the predicted value and the "observed" value is therefore very sensitive to the shape of the "true" probability rate function. Details of our experiments are in the full paper. We have also evaluated the backward compatibility of our new model to the non-probabilistic CIRCA originally developed by Musliner (Musliner, 1993). Space limitations preclude a full description here (see the full paper for more) but by describing temporal (and event) transitions as having a step-shaped probability rate function reaching a value strictly between 0 and 1, and a (guaranteed) action transition of having a step that reaches probability 1, our model generates the same “reachable state” space where reachable states have probability greater than zero. 9. Acknowledgements The authors gratefully acknowledge the detailed comments and directions suggested by their Honeywell colleagues, Robert Goldman and David Musliner. 10. References E. M. Atkins, E. H. Durfee, and K. G. Shin, “Plan Development using Local Probabilistic Models,” in Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference, August 1996. E. M. Atkins, E. H. Durfee, K. G. Shin, “Detecting and Reacting to Unplanned-for World States,” Proceedings of AAAI, pp. 571-576, July 1997. E. M. Atkins, "Plan Generation and Hard Real-Time Execution with Application to Safe, Autonomous Flight", Ph.D. Thesis, The University Of Michigan, 1999. C. Boutilier, T. Dean, and S. Hanks, “Planning Under Uncertainty: Structural Assumptions and Computational Leverage,” EWSP, 1995 T. Dean, L. P. Kaelbling, J. Kirman, and A. Nicholson, “Planning with Deadlines in Stochastic Domains,” Proceedings of AAAI, pp. 574-579, July 1993. J. G. Kemeny, and J. L. Snell, “Finite Markov Chanis,” p.52-52, 1960. N. K. Kushmerick, S. Hanks, D. Weld, “An Algorithm for Probabilistic Least-Commitment Planning,” Proceedings Of AAAI, pp. 1073-1078, July 1994. M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the Complexity of Solving Markov Decision Problems,” Proceedings of UAI-95, August 1995. G. F. Lawler, “Introduction to Stochastic Processes,” pp. 55-57, 1995. D. J. Musliner, “CIRCA: The Cooperative Intelligent Real-Time Control Architecture,” Ph.D. Thesis, The University Of Michigan, 1993. D. J. Musliner, E. H. Durfee, and K. G. Shin, “World Modeling for the Dynamic Construction of Real-Time Control Plans,” Artificial Intelligence, vol. 74, pp. 83-127, 1995. 8. Conclusions and Future Work A framework for modeling the probabilistic nature of external events as functions of time has been developed. In contrast with traditional AI planners which concentrate on “what to do”, CIRCA-II emphasizes also on “when to do it” such that the probability of system failure is below a threshold. Moreover, we also present an algorithm to order the states by their likelihood so that the system can cut off the unlikely events from consideration in case of insufficient resources as illustrated in the previous section. CIRCA-II is also a planner such that each state is only represented by one node in the state diagram regardless of the multiple paths leading to the state. Unlike a traditional AI planner, CIRCA-II either ignores or aggregates the information about how it reaches a state; this has significant benefits in devising a real-time control schedule, but also is a weakness. To plan only a single action for a group of states, CIRCA-II is considering only the worst-case temporal transitions (or, in the case of dependent temporal transitions, the worst-case temporal transition chains). There is a class of problems that CIRCA-II cannot find the solutions for because of its overly conservative philosophy. So, CIRCA-II is sound but not complete. There is ongoing research to address this dilemma. 42 View publication stats

Log In

SA-CIRCA: Self-Adaptive Control for Mission-Critical Systems

Free related PDFsRelated papers

Free related PDFsRelated papers

Related topics