Nothing Special   »   [go: up one dir, main page]

Project3-Arc1 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Introduction

This research discusses the use of reinforcement learning and convolutional neural networks

(CNNs) to train agents (computer algorithms) to play video games. The hit mobile game 2048 is

the example used. This research describes how reinforcement learning is used to train the neural

network and how agents interact with their environment using non-cooperative Game Theory

principles. The article also explains the use of different neural network architectures as a factor

in maximizing the score of playing multiple games of 2048.

Arc 1

The entirety of this research is based on the implementations of neural networks. A neural

network is a computational model inspired by the structure and function of the human brain. It

consists of interconnected nodes or neurons that are organized into layers and use a set of

mathematical functions to process information. By learning from data, neural networks can be

trained to recognize patterns and make predictions, and have found applications in various fields,

including computer vision, natural language processing, and speech recognition. Reinforcement

learning is the intended method used to train the neural network. Reinforcement learning is a

method of teaching a computer program, known as an agent, to learn on its own by interacting

with its environment. It's analogous to instructing a robot to do something without explicitly

telling it what to do. The agent learns by attempting different tasks in its surroundings and

receives rewards or penalties according on its performance. The agent learns from its mistakes in

the same way that organisms do, and seeks to do better in the next attempt. The agent improves

at attaining its goal over time, similarly to how humans improve at a game the more they play it.
1

Reinforcement relies on the principles of Game Theory to function. According to Dr. Başar1,

Game Theory, in the context of Machine Learning, is an agent’s interaction with its non-

cooperative environment where it tries to minimize a loss function2 or maximize a utility

function3 to ultimately approach the most optimal solution for a given scenario. (Zhang, B. J.-F.,

2020). Combining Reinforcement Learning with Game Theory means agents use action-

generating algorithms coupled with the data they collect through observations. The environment

in turn reflects on these actions using ‘rewards’ they receive during the decision-making process.

Reinforcement Learning itself is modeled from behaviors in nature. The general principle was

modeled by Edward Thorndike4. This experiment began with cats placed in boxes. While

exploring their environment, they would step on a level by chance that resulted in an escape

route (a positive reward in the context of machine learning). The cats would then begin to

correlate the level with a positive reward, and increase their speed and efficiency at escaping.

Today, deep neural networks are used as the “agents” in reinforcement learning. These models

are the ones processing observations, generating reward values, and ultimately finding the

optimal solution to the task in their environment (Knight, 2017).

1 Dr. Tamar Başar is a Professor at University of Illinois at Urbana-Champaign whose research focuses
on topics including applications of control and game theory in economics and mean-field game theory.
(Başar, 2020).
2 Regression algorithms estimate a function from the input variables to continuous output variables.

3 Classification algorithms estimate a function from the input variables to discrete output variables.
4 A function that maps the error or “cost” of an agent's interactions with its environment.
2

Fig. 1 Examples of feature extraction on a 4 x 4 matrix

Video games are a very popular tool in training reinforcement learning algorithms. This is due to

the fact that each action the agent takes will generate state data from the game, thus resulting in a

dynamic dataset able to train the agent until it resets the environment. A prime example of a

game like this is 2048, a single-player stochastic sliding puzzle game. 2048 is played on a 4 by 4

grid and begins with two randomly placed tiles— the initial state. Each tile on the puzzle is either

an empty tile or a tile numbered with a power of two (2, 4, 8, 16, 32, 64, etc.). The player selects

a direction, up, down, left, or right, to slide the puzzle. Tiles of the same number slid together

will combine to form the next value in the 2n sequence. As seen in Figure 1, there are multiple

methods of feature extraction. By creating tuples, graduate students at National Chiao Tung

University were able to submit a series of neural networks with win rates ranging from 85% to

99% (Guei, et al., 2018).


3

Fig. 2 Example of Max Pooling

The types of neural networks used in Video Games with a visual state, 2048 included, are called

Convolutional Neural Networks (CNNs). The network architecture of CNNs was initially

inspired by connections of neurons and synapses in the brain. The first part of the CNN is the

convolution layer. This serves as a feature extraction5 layer, using kernels to identify features in

the images such as edges. After every convolution operation, the output data is transformed using

an activation function, typically ReLU6, which is defined as y={x< 0 :0 , x ≥ 0: x }. The next type

of layer is a pooling layer, which is used to reduce the number of parameters from previous

layers. There are three common types of pooling methods: max, min, and average pooling. Max

pooling, in most cases, has better performance than the other two methods. Using this method,

pooling is done by finding the maximum values as shown in Figure 2. Lastly, a fully connected

layer is implemented which is connected directly to the output layers (Yamashita, 613).

For every neural network, data is used to minimize or optimize the cost or utility function,

respectively. For a traditional CNN, a dataset of images with a variable amount of outputs will be

used. The goal of the CNN is to predict the label(s) based on the image. Image preprocessing

methods, such as skewing7, cropping8, and greyscaling9 may also be used to train the model as

the data retains the same dimensionality. In reinforcement learning, there is not a static dataset.

5 A function that assigns “utility” values of the agents actions within its environment. A greater outputted
utility indicates further progress towards the agent completing its goal.
6 Rectified Linear Unit (ReLU) is an activation function that is a piecewise linear function.
7 Skewing refers to slanting an image, typically changing its viewpoint through mathematical operations.
8 Cropping is the removal of sections in an image.
9 Grayscaling is the process of converting an image from a color space to a one-dimensional matrix
consisting of shades of grey.
4

Instead, a new image is generated each time the agent makes an action. After the action is

performed, a new state representation of 2048 is returned to the agent, for it to again draw

inferences from (Dangtongdee, K. 2018).

To integrate the reinforcement learning field and the CNN, a type of model is used called a Deep

Q-Network (DQN). A DQN model has the same architecture as the CNN in that it contains

convolutional, pooling, and fully connected layers. The output layer of the DQN, however, is

size n where n is the total number of possible actions that the agent can take. The output layer

returns Q-values, or that are scalars as the DQN is a regression algorithm10, rather than a

classification algorithm11). The algorithm with the highest normalized scalar value becomes the

action that the agent takes in the environment (Levine, 2017).

OpenAI Gym12 is a commonly used tool to implement reinforcement learning environments in

Python13. Using this library, a user can load in environments— usually video games that are

simulated through Python, and allow agent interaction. The environment returns pixel-state14 or

RAM data15, and the agent then uses this data to return an action which is passed back into the

OpenAI Gym. From this action, a new information state is returned, and the process repeats

(McElwee, 3).

10 Feature extraction reduces the dimensions of the data by creating new features from previously
existing information, patterns, features, etc.
11 Edward Thorndike (1874-1949) was a psychologist who spent most of his time researching and
developing reinforcement theory and behavior analysis at Columbia University.
12 Pixel State data is the pixel values of the current frame in the environment (usually a video game).
13 RAM data will output the RAM values of the Atari machine of the current step in the environment.
14 Python is a high-level programming language that will be used to implement reinforcement learning
algorithms.
15 OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
5

The question this research aims to answer is, “Will a naive approach to a Deep Neural Network

achieve a higher in-game score than a naive approach to a Convolutional Neural Network after

multiple trials in the video game 2048?” A naive approach is defined as a simple method of

implementing a neural network to interact with the raw data of the environment. The networks

will receive no prior information, algorithm, indication, or other forms of help established to

make assessing and learning the workings of the 2048 environment easier. This is implemented

to make sure that the results of the Neural Network are due solely to the network, and not any

other functions or data-preprocessing methods. The hypothesis that this research aims to test is,

“The Convolutional Neural Network will outperform the traditional Neural Network, as it has

historically performed better on matrix observations while the Neural Network is intended for

single-dimensional observations.”

Works Cited

Başar, Tamer. “Tamer Başar: Introduction.” Tamer Basar, University of Illinois Urbana-
Champaign, 20 Feb. 2014, http://tamerbasar.csl.illinois.edu/.

Dangtongdee, K. (2018). Plant Identification Using Tensorflow. California Polytechnic State


University Computer Engineering Department, 1–17.
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1271&context=cpesp

Guei, Hung, et al. “Using 2048-like Games as a Pedagogical Tool for Reinforcement Learning.”
ICGA Journal, vol. 40, no. 3, 1 May 2019, pp. 281–293., https://doi.org/10.3233/icg-180062.

Levine, Zachariah. “Learning 2048 with Deep Reinforcement Learning.” David R. Cheriton
School of Computer Science, University of Waterloo, 3 Mar. 2017,
https://cs.uwaterloo.ca/~mli/zalevine-dqn-2048.pdf.
6

Knight, Will. “Reinforcement Learning.” MIT Technology Review, MIT Technology Review, 17
Sept. 2021, https://www.technologyreview.com/technology/reinforcement-learning/.

McElwee, Steven, et al. “Deep Learning for Prioritizing and Responding to Intrusion Detection
Alerts.” MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM), 11 Dec.
2017, https://doi.org/10.1109/milcom.2017.8170757.

Yamashita, Rikiya, et al. “Convolutional Neural Networks: An Overview and Application in


Radiology.” Insights into Imaging, vol. 9, no. 4, 22 June 2018, pp. 611–629.,
https://doi.org/10.1007/s13244-018-0639-9.

Yang, Yanwei, et al. “Application of Scikit and Keras Libraries for the Classification of Iron Ore
Data Acquired by Laser-Induced Breakdown Spectroscopy (LIBS).” Sensors, vol. 20, no. 5, Mar.
2020, p. 1393. Crossref, https://doi.org/10.3390/s20051393.

You might also like