Correction: Landmark based guidance for reinforcement learning agents under partial observability

998 Accesses
1 Altmetric
Explore all metrics

The Original Article was published on 16 November 2022

Avoid common mistakes on your manuscript.

Correction: International Journal of Machine Learning and Cybernetics https://doi.org/10.1007/s13042-022-01713-5

Unfortunately, in the Section 3.3, paragraphs 6, 7 and 8, the variables \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\) were published incorrectly in the online published article. The correct paragraphs are given below.

The overall algorithm for Landmark Based Guidance (LBG) is given in Algorithm 1. The algorithm assumes that the set of landmarks \(\mathcal {L}\) is given. It requires as input the learning rate \(\alpha _{v}\) and the discount rate \(\gamma _{v}\) for the Landmark-SMDP value iteration. The algorithm starts with initialization of the value function V of the Landmark-SMDP, the time index t, the previous and the current landmark variables \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\), the previous and the current time variables \(\tau\) and \(\tau '\) to keep the time that \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\) are seen and the short history H (Lines 1–3). Upon deciding that the initial estimated state is a landmark, \(\mathcalligra{l}\) and \(\tau\) are initialized by the estimated state and the current time respectively (Lines 4–5). Then comes the familiar observation-action loop, where the agent interacts with its environment and observes transitions between the estimated states \(x_{t}\) and \(x_{t+1}\). Meanwhile, the algorithm keeps track of a transition in H to check if there is a previously observed landmark, in order to calculate the discounted sum of rewards to be used in the Line 18.

Since the precondition to provide guiding rewards through the abstract model, it is checked whether or not the agent arrives at a landmark. If it does, the current landmark \({\mathcalligra{l}'}\) and the current landmark time \(\tau '\) are set (Lines 13–14). If there is a landmark previously observed, this means it is possible to provide the additional reward, which is calculated in Line 16.

Following the internal reward calculation, the algorithm determines the sum of discounted rewards, gathered between the previous landmark \(\mathcalligra{l}\) and the current landmark \({\mathcalligra{l}'}\) by using H and makes a value update on \(\mathcalligra{l}\) (Line 18), where n represents the number of steps taken between the two (Line 17).

The original article has been corrected.

Author information

Authors and Affiliations

Department of Computer Engineering, İzmir University of Economics, İzmir, 35330, Turkey
Alper Demir
STM Defense Technologies Engineering and Trade Inc., Ankara, 06530, Turkey
Erkin Çilden
Department of Computer Engineering, Middle East Technical University, Ankara, 06531, Turkey
Faruk Polat

Authors

Alper Demir
View author publications
You can also search for this author in PubMed Google Scholar
Erkin Çilden
View author publications
You can also search for this author in PubMed Google Scholar
Faruk Polat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alper Demir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demir, A., Çilden, E. & Polat, F. Correction: Landmark based guidance for reinforcement learning agents under partial observability. Int. J. Mach. Learn. & Cyber. 14, 1565 (2023). https://doi.org/10.1007/s13042-022-01763-9

Download citation

Published: 15 February 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s13042-022-01763-9