Stock Fundamentals Data Set Forecast

In this repository, I attempt to predict stock prices using the Stock Fundamentals data set (http://www.usfundamentals.com/). I have chosen to explore this data set because it is easily accessible while also being relatively unclean (e.g. has missing values), allowing me to develop and exercise my data science skills, in addition to letting me apply what I have learned about machine learning thus far.

This is my first end-to-end machine learning personal project.

I have opted to not use Jupyter notebooks due to personal preference, but it is a tool I would like to explore in future personal projects.

Script (in order of production):

exploratory_data_analysis.py - we begin by exploring a simple part of this data set, the latest quarterly snapshot (latest-snapshot-quarterly.csv) to get a feel for the data and how to process and analyse it. quarterly_snapshot_feature _importance.png and quarterly_snapshot_hist.png are two plots generated by this script.
predict_stock_per_company.py - uses the quarter information for each company to try to predict current stock value for that company.
predict_stock_per_indicator.py - uses quarter information for each indicator to predict current stock value given information about said indicator.
predict_stock_from_indicators.py (the most mature prediction model) - uses quarter information from 10+ common indicators to predict stock value using as targets the average EarningsPerShareDiluted across all quarters. Reaches acceptable levels of correlation (test R² = 0.669 ), but error seems too hefty, likely due to outliers (there are some very large positive earning values). Companies that were extreme outliers for our target were removed due to how they affect predictions. This means our model is likely weak when it comes to predict the rare very positive or very negative stock values, but makes it better for the wider breadth of more modest values. XGBoost was chosen to model the data because ensemble methods are robust versus outliers, and gradient boosted trees provide less overfitting and more hyper-parameter control compared to random forests, while still being quite fast.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data/companies		data/companies
README.md		README.md
exploratory_data_analysis.py		exploratory_data_analysis.py
predict_stock_from_indicators.py		predict_stock_from_indicators.py
predict_stock_per_company.py		predict_stock_per_company.py
predict_stock_per_indicator.py		predict_stock_per_indicator.py
quarterly_snapshot_feature_importance.png		quarterly_snapshot_feature_importance.png
quarterly_snapshot_hist.png		quarterly_snapshot_hist.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Fundamentals Data Set Forecast

About

Releases

Packages

Languages

martaccmoreno/stock_fundamentals_forecast

Folders and files

Latest commit

History

Repository files navigation

Stock Fundamentals Data Set Forecast

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages