Avoid common mistakes on your manuscript.
Correction: International Journal of Machine Learning and Cybernetics https://doi.org/10.1007/s13042-022-01713-5
Unfortunately, in the Section 3.3, paragraphs 6, 7 and 8, the variables \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\) were published incorrectly in the online published article. The correct paragraphs are given below.
The overall algorithm for Landmark Based Guidance (LBG) is given in Algorithm 1. The algorithm assumes that the set of landmarks \(\mathcal {L}\) is given. It requires as input the learning rate \(\alpha _{v}\) and the discount rate \(\gamma _{v}\) for the Landmark-SMDP value iteration. The algorithm starts with initialization of the value function V of the Landmark-SMDP, the time index t, the previous and the current landmark variables \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\), the previous and the current time variables \(\tau\) and \(\tau '\) to keep the time that \(\mathcalligra{l}\) and \({\mathcalligra{l}'}\) are seen and the short history H (Lines 1–3). Upon deciding that the initial estimated state is a landmark, \(\mathcalligra{l}\) and \(\tau\) are initialized by the estimated state and the current time respectively (Lines 4–5). Then comes the familiar observation-action loop, where the agent interacts with its environment and observes transitions between the estimated states \(x_{t}\) and \(x_{t+1}\). Meanwhile, the algorithm keeps track of a transition in H to check if there is a previously observed landmark, in order to calculate the discounted sum of rewards to be used in the Line 18.
Since the precondition to provide guiding rewards through the abstract model, it is checked whether or not the agent arrives at a landmark. If it does, the current landmark \({\mathcalligra{l}'}\) and the current landmark time \(\tau '\) are set (Lines 13–14). If there is a landmark previously observed, this means it is possible to provide the additional reward, which is calculated in Line 16.
Following the internal reward calculation, the algorithm determines the sum of discounted rewards, gathered between the previous landmark \(\mathcalligra{l}\) and the current landmark \({\mathcalligra{l}'}\) by using H and makes a value update on \(\mathcalligra{l}\) (Line 18), where n represents the number of steps taken between the two (Line 17).
The original article has been corrected.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Demir, A., Çilden, E. & Polat, F. Correction: Landmark based guidance for reinforcement learning agents under partial observability. Int. J. Mach. Learn. & Cyber. 14, 1565 (2023). https://doi.org/10.1007/s13042-022-01763-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01763-9