research-article

Generating three binary addition algorithms using reinforcement programming

Authors:

Spencer White,

Tony Martinez,

George RudolphAuthors Info & Claims

ACMSE '10: Proceedings of the 48th annual ACM Southeast Conference

Article No.: 46, Pages 1 - 6

https://doi.org/10.1145/1900008.1900072

Published: 15 April 2010 Publication History

Get Access

Abstract

Reinforcement Programming (RP) is a new technique for automatically generating a computer program using reinforcement learning methods. This paper describes how RP learned to generate code for three binary addition problems: simulate a full adder circuit, increment a binary number, and add two binary numbers. Each problem is presented as an extension of the one previous to it, which provides an introduction to the practical application of RP. Each solution uses a dynamic, episodic form of delayed Q-Learning algorithm. "Dynamic" means that grows the policy during learning, and prunes it before the policy is translated to source code. This is different from Q-Learning models that use fixed-size tables or neural net function approximators to store q-values associated with (state, action) pairs. The states, actions, rewards, other parameters, and results of experiments are presented for each of the three problems.

References

[1]

T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 345--352. The MIT Press, 1995.

Google Scholar

[2]

K. E. Kinnear. Evolving a Sort: Lessons in Genetic Programming. In Proceedings of the 1993 International Conference on Neural Networks, volume 2, pages 881--888. IEEE Press, 1993.

Crossref

Google Scholar

[3]

J. R. Koza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Academic Publishers, 2003.

Digital Library

Google Scholar

[4]

T. M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.

Digital Library

Google Scholar

[5]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

Digital Library

Google Scholar

[6]

C. J. Watkins. Learning from delayed rewards. PhD thesis, Cambridge university, 1989.

Google Scholar

[7]

S. K. White. Reinforcement programming: A new technique in automatic algorithm development. Master's thesis, Brigham Young University, 2006.

Google Scholar

Cited By

View all

White SMartinez TRudolph G(2012)Reinforcement ProgrammingComputational Intelligence10.1111/j.1467-8640.2012.00413.x28:2(176-208)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1111/j.1467-8640.2012.00413.x
Rana SCrowe MFyfe C(2012)Reinforcement Programming for function approximation2012 12th UK Workshop on Computational Intelligence (UKCI)10.1109/UKCI.2012.6335777(1-5)Online publication date: Sep-2012
https://doi.org/10.1109/UKCI.2012.6335777

Index Terms

Generating three binary addition algorithms using reinforcement programming
1. Computing methodologies
  1. Machine learning

Recommendations

Reinforcement learning algorithms: A brief survey
Highlights
- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
Abstract
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Generating Tonal Counterpoint Using Reinforcement Learning
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part I

This report discusses the behavioural learning properties of a musical agent learning to generate a two-part counterpoint using SARSA , one of the on-policy temporal difference learning approaches. The policy was learned using hand-crafted rules ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACMSE '10: Proceedings of the 48th annual ACM Southeast Conference

April 2010

488 pages

ISBN:9781450300643

DOI:10.1145/1900008

Conference Chair:
H. Conrad Cunningham
University of Mississippi
,
Program Chairs:
Paul Ruth,
Nicholas A. Kraft

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACM SE '10

Sponsor:

ACM SE '10: ACM Southeast Regional Conference

April 15 - 17, 2010

Mississippi, Oxford

Acceptance Rates

ACMSE '10 Paper Acceptance Rate 48 of 94 submissions, 51%;

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
68
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

White SMartinez TRudolph G(2012)Reinforcement ProgrammingComputational Intelligence10.1111/j.1467-8640.2012.00413.x28:2(176-208)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1111/j.1467-8640.2012.00413.x
Rana SCrowe MFyfe C(2012)Reinforcement Programming for function approximation2012 12th UK Workshop on Computational Intelligence (UKCI)10.1109/UKCI.2012.6335777(1-5)Online publication date: Sep-2012
https://doi.org/10.1109/UKCI.2012.6335777

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Reinforcement learning algorithms: A brief survey

Reward Shaping in Episodic Reinforcement Learning

Generating Tonal Counterpoint Using Reinforcement Learning