IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
Stock Market Prediction using Machine Learning
Algorithm
Akankshya Rout1, Ayush Kumar Bar2, Satya Priya Saha3, Dr. Avijit Kumar Chaudhuri4
1,2,3
4
UG – Computer Science and Engineering, Techno Engineering college Banipur, Kolkata, West Bengal
Professor, Computer Science and Engineering, Techno Engineering college Banipur, Kolkata, West Bengal
Abstract: Stock market price data is huge and it changes every second. As it is a complex system in which people either
make money or lose all their savings, hence it is important to understand the stock market. In the era of big and dynamic
data, machine learning for predicting stock market prices and trends has become even more popular than ever. In this
paper, we tried to predict the trend of the stock market. A model with a supervised machine learning algorithm is used to
predict prices. We collected data of every company from the beginning from Yahoo finance and proposed comprehensive
customization of RNN Machine Learning based models which are known as LSTM for predicting price trends of stock
markets.
The proposed solution is comprehensive as it includes pre-processing of the stock market dataset, utilization of multiple
feature engineering techniques, combined with an RNN based system for stock market price trend prediction.
In the yearly forecasting model, historical prices have been trained and achieved an accuracy of 84.0%.
We conducted comprehensive evaluations on frequently used machine learning models and concluded that our proposed
solution outperforms due to the comprehensive feature engineering that we built.
Through our detailed design and evaluated prediction term lengths, feature engineering and data pre-processing methods,
this work will help investors to invest in the stock by comparing stocks of different enterprises periodically, hence
resulting in less risk. Also, it will contribute to the financial and technical domains of the stock analysis research
community.
Keywords: Stock Market, Machine Learning, LSTM, RNN, Forecast, Feature Engineering
1.
INTRODUCTION
Stock market prediction and analysis are one of the difficult tasks to do. There are various reasons for this, including the
unsteady nature of the market and various dependent and independent variables that impact the worth of a specific stock
in the market.
Nowadays various types of Machine Learning Algorithms have started taking place to analyse stock market data.
In synopsis, Machine Learning Algorithms are broadly used by numerous associations in Stock market
prediction. Therefore, this paper will also help in understanding the stock market and growth of the different enterprises
1.1.
Indian Stock Market Overview
According to the World Bank, the Indian economy is the third-largest in terms of purchasing power parity, which can still
grow within the predictable future. That said, the country's booming economy is probably going to experience many ups
and downs, as well as movements in its stock exchange, which can considerably impact its growth.
So, let's perceive how the stock exchange affects India' economy.
The markets get their unsteady nature from the value fluctuations of individual stocks. As prices increase or decrease,
market volatility influences businesses and shoppers. Throughout a bull phase, the stock prices go up. Additionally, it
helps the economy grow positively. Likewise, consumer defrayment conjointly rises as people become more optimistic
concerning the market and purchase more products and services. So, businesses supplying these products and services
begin to supply and sell more.
In every country there is at least one stock exchange market, where the stocks/shares of listed corporate sectors or
enterprises purchase or sell. When an enterprise initially registers itself in any stock exchange to become a public
company, the promoter groups sell a substantial number of shares to the public as per Government standards. When the
promoter organization unloads the considerable number of shares to public retail investors, then at that point, those could
be exchanged in the secondary market, i.e., stock exchange.
The two most efficacious stock exchanges are BSE (Bombay Stock Exchange) and NSE (National Stock Exchange) in
India. According to [1] in the financial year 2020, a total of over 7,400 companies were listed in the NSE and BSE across
India. Both the exchanges have comparative trading and market opening and shutting time which helps individual
investors to participate in the share market conveniently.
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
218
IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
According to [2] India's benchmark 10-year bond yield rose to 6.73%, up 7 basis points from its previous close and its
highest level since Dec 13, 2019. Until now in January, foreign investors have dumped $2.2 billion of Indian shares after
having bought a net $3.76 billion in 2021. They had bought $23.29 billion worth shares in 2020 and $14.23 billion in
2019. They are still net buyers of $575.35 million worth debt so far this month after having sold $3.66 billion in 2021,
which clearly shows that investing in Indian stocks is greatly profitable nowadays.
Stock market prediction and analysis are one of the difficult tasks to do. There are various reasons for this, including the
unsteady nature of the market and various dependent and independent variables that impact the worth of a specific stock
in the market.
Nowadays various types of Machine Learning Algorithms have started taking place to analyse stock market data.
In synopsis, Machine Learning Algorithms are broadly used by numerous associations in Stock market
prediction. Therefore, this paper will also help in understanding the stock market and growth of the different enterprises
2.
LITERATURE REVIEW
The prime objective of our research was to predict the future stock prices of Indian companies which helps in the growth
of the stock market. As we studied the trends further, we came across a machine learning approach to predict and forecast
future stock prices to help Indian investors.
After going through [3][6][7] research papers, articles and journals our key findings included the use of RNN’s LSTM to
overcome the challenges faced during model training. In many journals such as [2] which they had discovered correlation
between” public sentiment” and” market sentiment” with an accuracy of 75.56%.
To train long term data, LSTM was a really good option because LSTM requires only about half of DNN and more
parameters than CNN. LSTM’s nature of being the slowest to train comes with its advantage for being able to take a look
at longer sequences of inputs without expanding the network size.
3.
LSTM ARCHITECTURE
Long Short-Term Memory Network is a high level RNN, a sequential network, that permits data to persevere. It is
equipped for taking care of the disappearing gradient issue looked at by RNN. An intermittent neural network, otherwise
called RNN is utilized for steady memory.
For example, suppose while watching a video you recall the past scene or while reading a book you realize what occurred
in the prior part. Correspondingly RNNs work, they recollect the past data and use it for handling the current info. The
shortcoming of RNN is, they cannot recall long-term conditions because of disappearing gradients. LSTMs are expressly
intended to keep away from long-term reliance issues.
At an advanced level LSTM works a lot like a RNN cell. Here is the inside functioning of the LSTM network. Similar to
RNN, an LSTM also contains a hidden state or short-term memory in which H(t-1) represents the hidden state of the
previous timestamp and Ht is the hidden state of the current timestamp. Furthermore, LSTM also contains a cell state
represented by C(t-1) for previous and C(t) for current timestamps respectively.
The LSTM comprises three sections, as shown in the picture below and each part performs an individual function.
Fig. 1. LSTM Gate
•
Forget Gate: - It picks whether the data coming from the previous timestamp is to be recollected or is insignificant
and can be neglected.
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
219
IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
Equation for Forget Gate: ft = σ(xt ∗ Uf + Ht−1 ∗ Wf )
Here,
•
xt =Input at the current timestamp
•
Uf =weight associated with the input
•
Ht−1 =The hidden state at the previous timestamp
•
Wf =It is the weight matrix associated with the hidden state
After multiplying ft with sigmoid function, it will make ft a number somewhere in the range of 0 and 1. This ft is
subsequently multiplied with the cell state of the previous timestamp as shown below.
Ct−1 ∗ ft = 0 ...if ft = 0 omit everything
Ct−1 ∗ ft = 0 ...if ft = 1 omit nothing
•
Input Gate: - Here, the cell tries to read new information from the input to this cell.
It is used to evaluate the significance of the new information carried by the input.
Equation for Input gate: it = σ(xt ∗ Ui + Ht−1 ∗ Wi
Here,
•
•
•
•
xt = Input at the current timestamp t
Ui = weight matrix of input
Ht−1 = A hidden state at the previous timestamp
Wi = Weight matrix of input associated with hidden state
New information: Now the new information that should have been passed to the cell state is a function of a hidden state at the previous
timestamp t − 1 and input x at timestamp t. Here, the activation function is tanh. Because of the tanh function, the worth
of new information will be between - 1 and 1. If Nt is negative the information will get subtracted from the cell state else
if it is positive the information will get added to the cell state at the current timestamp.
Nt = tanh(xt + Uc + Ht−1 ∗ Wc )(new information)
Nonetheless, the Nt won't be added straight to the cell state. Hence, the equation is
Ct = ft ∗ Ct−1 + it ∗ Nt (updating cell state)
Where Ct−1 is the cell state of the current timestamp.
Output Gate: - the cell transmits the updated information from the current timestamp to the next timestamp.
Equation for Output gate: ot = (xt + Uo + Ht−1 ∗ Wo )
Its value will also lie somewhere in the range of 0 and 1 because of this sigmoid function. Presently to figure out the
current state we will utilize ot and the tanh of the updated cell state. As shown below.
Ht = ot ∗ tanh (Ct )
After summarizing it turned out, the hidden state is a function of long-term memory (Ct ) and the current output. To get
the output of the current timestamp implement the Adam activation on the hidden state Ht .
Output=adam(Ht )
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
220
IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
4.
METHODOLOGY
The motivation behind our framework is to predict the future share values of various enterprises and compute the future
growth of the enterprises in different periods of time. Then, at that point, we dissect the prediction error for each enterprise
in various sectors. Based on that we can easily compare the future stock prices of different companies for a short period
of time.
We initially anticipate the future shutting price of 4 unique enterprises from a few pre-chosen areas with the help of
LSTM. Two unique models have been built to predict stock market trends. First model anticipates the stock market trend
for
the upcoming day (Daily prediction model) by considering all available data on a daily basis as input. Second model
predicts
the stock market trend for the upcoming week/month by considering available data on yearly or monthly basis.
One of the statistical arguments considered is the relationship between trend of a day and closing price of stock traded on
the same day.
The forecast will be done on historical data and the future anticipation will be done for 1 month, 6 months & 1 years. In
these three different time spans (1 month, 6 months & 1 year), we calculate the growth of those companies. Then by
analysing the variation of shutting price for each time period, we compare the companies which have maximum growth,
i.e., less error for the particular sector.
4.1.
Proposed Method
Fig. 2. Block Diagram
4.2.
Implementation Steps
Step1: Raw Stock price dataset:
Day-wise past stock prices of selected enterprises are gathered from the BSEINDIA official site. Then it is cleaned using
data cleaning methods and removing the null and empty spaces.
Step2: Pre-processing:
This includes the following steps:
Data sampling: Reducing a part of data to use specific data specially for numerical data
Data cleaning: Removing the missing and null values.
Data Transformation: Normalizing the data to reduce bias with bigger numbers.
Data Segregation: Segregation of data into train and test set for evaluation
Step3: Feature Selection:
Here we are selecting the features (i.e., Date and Close) that are to be fed into the neural network.
Step4: Train the NN model:
The Neural Network is trained using the pre-processed training dataset. Proposed LSTM model consists of a sequential
input layer followed by 1 LSTM layer and then a dense layer with activation.
Step5: Output Generation
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
221
IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
4.3.
Pseudocode
model = Sequential ()
model.add (LSTM (units=50, return_sequences=True,input_shape=(x_train.shape[1],1)))
model.add (LSTM (units=50, return_sequences=False))
model.add (Dense (units=25))
model.add (Dense (units=1))
model.compile (optimizer='adam', loss='mean_squared_error')
model.fit (x_train, y_train,validation_data=(x_test, y_test), epochs=20,verbose=0)
5.
RESULTS AND DISCUSSION
The recommended LSTM based model is carried out with Python. In the given Table 1 the Accuracy, Misclassification,
Precision, Sensitivity, Specificity values for
various organizations belong to the IT Sector based on the recorded information of the 1 Year is shown.
Table.1.
Accuracy
Misclassification
Precision
Sensitivity
Specificity
Amazon
0.84
0.152
0.55
0.55
0.91
Google
0.601
0.397
0.32
0.32
0.72
Microsoft
0.605
0.394
0.32
0.32
0.72
Apple
0.685
0.315
0.37
0.37
0.79
As it very well may be seen, the Accuracy for each enterprise is between 60-80% which shows that the LSTM model is
good at distinguishing relationships and patterns between variables in a dataset input and training. Consequently, the
model is capable of generalizing the ‘unseen’ data, and thus better predictions and insights can be produced.
Table.2.
1 month
6 month
1 year
Amazon
0.64
0.64
0.84
Google
0.63
0.63
0.601
Microsoft
0.64
0.63
0.605
Apple
0.65
0.64
0.685
From the above table 2 Accuracy table is given for 1 month, 6 month and 1 year for the 4 IT sectors.
Fig.3. Prediction on test data
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
222
IJARCCE
ISSN (O) 2278-1021, ISSN (P) 2319-5940
International Journal of Advanced Research in Computer and Communication Engineering
Impact Factor 7.39Vol. 11, Issue 3, March 2022
DOI: 10.17148/IJARCCE.2022.11339
6.
CONCLUSION AND FUTURE SCOPE
As observed in the past few years, investing has become very popular in recent years. Investing in the stock market is
beneficial with high returns in the short term as well as long term period. But it also carries the risk of losing investment.
We have many prediction models available in the market which can predict the stock price trend on a weekly basis. So,
we tried to propose a similar kind of model with some different parameters with a good amount of accuracy.
This model can help in getting a better idea of highs and lows during purchasing or selling of stocks.
We can use this model for future studies and prediction of prices of crypto-currencies.
7.
REFERENCES
[1] https://economictimes.indiatimes.com/
[2] https://www.irjet.net/archives/V5/i3/IRJET-V5I3788.pdf
[3]https://www.researchgate.net/publication/321503983_Stock_price_prediction_using_LSTM_RNN_and_CNNsliding_window_model
[4] https://iopscience.iop.org/article/10.1088/1757-899X/790/1/012109/pdf
[5] https://www.ijcrt.org/papers/IJCRT2102617.pdf
[6]https://www.researchgate.net/publication/327967988_Predicting_Stock_Prices_Using_LSTM
[7]https://www.researchgate.net/publication/348390803_Stock_Price_Prediction_Using_LSTM
© IJARCCE
This work is licensed under a Creative Commons Attribution 4.0 International License
223