A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP.

AllImages Shopping Books Maps Videos News

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Feb 8, 2022 · We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost.

Scholarly articles for A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP.

scholar.google.com › citations

… algorithm for the risk-sensitive exponential cost MDP
Moharrami · Cited by 11

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

pubsonline.informs.org › moor.2022.0139

Mar 11, 2024 · We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to ...

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

www.semanticscholar.org › paper

A trajectory-based gradient algorithm is developed to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which ...

[PDF] A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

arxiv.org › pdf

Aug 29, 2022 · We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of ...

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

www.researchgate.net › publication › 35...

We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost ...

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

www.researchgate.net › publication › 37...

We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the ...

[PDF] Modified Policy Iteration for Exponential Cost Risk Sensitive ...

proceedings.mlr.press › ...

Abstract. Modified policy iteration (MPI) also known as optimistic policy iteration is at the core of many reinforcement learning algorithms.

Average Reward MDPs and Reinforcement Learning

yashaswinimurthy.web.illinois.edu › Pubs

Main Contribution: Developing a trajectory-based gradient algorithm for risk-sensitive exponential cost MDPs, introducing a truncated and smooth cost ...

Publications - Yashaswini Murthy

yashaswinimurthy.web.illinois.edu › Publ...

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP. Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant. Math of Operations ...

Papers with Code - Yashaswini Murthy

paperswithcode.com › author › yashaswi...

We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost ...