DSI 2023 Data Science Roadmap
DSI 2023 Data Science Roadmap
DSI 2023 Data Science Roadmap
d e n ce ba
ev i
The founder of Data Science Infinity, Andrew Jones is one of the best-known Data
Science instructors in the world – helping countless students move ahead of the
competition and into amazing roles in the field. His 15+ year career includes time
at global tech giants Amazon & Sony PlayStation, he has 5 patents to his name, has
interviewed hundreds & hundreds of candidates & authored a book on Data Science
recruitment.
Python vs. R 7
Machine Learning: Don’t try to learn every algorithm. Focus on these instead: 11
what’s inside
Think Data Science is just about technical skills. Far from it… 17
Communication is vital! 18
When demand is high, those who have the right skills & tools
can expect to be valued highly.
Salary insights company PayScale calculated that the median salary for a Data
Scientist in the US is $103k (and remember that is the median, so half of Data
Scientists are indeed earning more than that!)
4
Many fail on this journey.
You will succeed.
There is an almost endless number of skills & tools to potentially focus on learning – it can
be overwhelming.
To be successful, you need to put your time & energy into learning the skills that hiring
managers genuinely need & want.
Andrea
As an idea of what you need, make sure you build up confidence with:
ሩ Combining data using INNER JOIN, LEFT JOIN, and CROSS JOIN
As a bonus, learn how to link together multiple queries into one using either
TEMP TABLES or CTE. These can come in really handy in interviews as they
can help you cleanly break the question down into multiple parts!
6
Both are great programming languages – but which should
you learn?
I would advise strongly against trying to learn both languages at the start of
your career. Some courses look to teach both simultaneously and this can
make the whole learning process much harder. Pick one, and focus on that!
Ritesh
>> Pandas
Pandas is a Python library used for data manipulation & helping Data Scientists understand &
explore the data they’re working with.
Get to grips with; importing data, creating Data Frames, accessing specific rows/columns/cells,
sorting data, joining & merging data, aggregating data, and dealing with missing values.
>> Numpy
Numpy is a Python library used for fast mathematical processes on data stored in array format.
It is also often utilized for storing & manipulating image data!
Get familiar with creating & manipulating arrays of differing dimensions as well as applying
mathematical operations on the data stored within!
Get familiar with creating different chart types, utilizing subplots, formatting plot features,
colors & styles, and adding text to aid the interpretability of the plot for the viewer!
>> Scikit-Learn
Scikit-Learn is the most popular ML library in Python, containing dozens of algorithms for
Machine Learning as well as a whole host of data preprocessing techniques.
>> Streamlit
Streamlit is an amazing library that allows you to easily turn your code & projects into
interactive web apps. This is perfect for bringing your learning journey to life and for
showcasing your projects when applying for roles!
There is no shortage of people in this field who will sneer at newcomers who
don’t know X, Y, or Z (I call them “gatekeepers”) but the reality is as follows:
But you DO NOT need to spend a year reading dusty textbooks before you’re
allowed to progress, or build things, or touch anything, or before you’re
allowed to land your first role.
ሩ Types of Data
ሩ Statistical Distributions
ሩ Hypothesis tests
ሩ Confidence Intervals.
A great way to do this is to learn as you start applying things like Machine
Learning algorithms. It's so much more fun (and productive) learning while
testing and modifying things, and seeing what changes, and why!
Many new Data Scientists fall into the trap of learning as many ML algorithms as they can.
This is detrimental to your progress, and your career prospects. The following list of
algorithms & concepts are used to solve 90%+ of business problems that require ML - get a
deep understanding of these, and you’ll position yourself ahead of other candidates.
Start by getting a good understanding of the difference between Supervised Learning vs.
Unsupervised Learning. Simply put, these are two areas within Machine Learning with
slightly different end goals, but it’s important to understand what each is, and which
algorithms might be useful for tasks that fall into each area.
Once you’ve got the grips with that, you want to learn how the following algorithms
work, and how to apply them in practice (these are the ones hiring managers said were
most commonly used to add value in their teams)
>> K-Nearest-Neighbours
Often simply referred to as KNN, this type of algorithm uses the distances between data points
to understand what its prediction should be.
>> K-Means
Often used as a clustering algorithm to group together data points based on distance, to create
useful products such as customer segmentations!
Bonus Algorithms
>> Association Rule Learning
An approach that discovers the strength of relationships between different data points. Think
of the “customers who purchased product A are likely to also purchase product B” you see on
sites like Amazon!
Angelo
Feature Scaling
Don’t skip the foundational skills first - knowledge around concepts like Linear & Logistic
Regression make up a large part of Deep Learning, so you want to know these intimately first.
Deep Learning is being used to solve some very cool problems in the field these days, but
99% of Data Science tasks that even need something along the lines of Machine Learning can
be solved using the list we discussed earlier. Get the core skills first and then take on Deep
Learning!
>> Github
GitHub is a version-control platform used by virtually all Data Science teams, as it provides a
suite of tools for managing code, collaborating with others, and sharing work with the world!
>> Tableau
Tableau is a powerful data visualization tool that allows users to easily connect, visualize, and
share data insights. It enables users to create interactive and dynamic dashboards, reports,
and charts without requiring programming skills.
Tableau supports various data sources including spreadsheets, databases, cloud services,
and big data. With Tableau, users can explore, analyze, and communicate insights from data,
helping organizations to make data-driven decisions.
>> ML Pipelines
These are simply what tie together all the steps from data ingestion, to preprocessing &
cleaning, to prediction, into one standalone object.
>> Deployment
The process of placing the trained model pipeline where it can be used for prediction, i.e. in a
web app sitting behind a website. To start with, get familiar with GitHub, and build something
simple with the Python library Streamlit
Understanding where the data comes from, and knowing how to best work with Data Engineers
is certainly useful, but you don’t need to cover both skill sets!
Khatuna
A good Data Scientist knows a lot of technical concepts. A great Data Scientist can simplify
these down in a way that gets everyone in the business onboard.
As Data Scientists we're here to solve problems, not introduce new ones...we're here to
enhance, and accelerate business decision-making - not get in the way of it!
If you're transitioning from another field, these softer skills can genuinely put you above the
competition - so you're in a strong position to become a great Data Scientist!
Something I say all the time to the aspiring Data Scientists in Data Science Infinity is “No one is
going to pay you just to be good at coding, or just to be good at maths, or just to know a lot of
machine learning algorithms...but they will pay you, and they'll pay you extremely well, to add
tangible value to their business or to the end-user”
Manasi
Learning to
Earning
Now you know a bit more about the skills you need for this exciting, future-proof & lucrative
field - the next step is to move into a great role.
This is often easier said than done as competition for roles is high.
Let me tell give you some inside knowledge from my time as an interviewer - to help you move
ahead of the pack, and into that role you want!
Luka
HIRED!
A portfolio of projects can be an excellent way to showcase your skills when you’re early in
your Data Science or Analytics career.
But, I want to quickly bust a myth about portfolio projects (based on my experience
interviewing & screening hundreds & hundreds of candidates at companies such as Amazon &
Sony)
Hiring Managers & Recruiters have very little time to get into the depths of your projects so
you must make it quick and easy for them to see your value, and the types of tasks you have
the ability to solve.
There are no right or wrong projects for a Data Science portfolio - it really all comes down to
how well they are written up!
Make it easy for the reader! Right at the top, showcase the highlights from
the full write-up
2. Concept Overview:
4. Application:
Explain what would you do to improve this, or what you would do if you had more time!
10 29 +
From my research with Data Science leaders, hiring managers, and recruiters from various
companies around the world while creating Data Science Infinity - it came out that 61% did get
candidates to undertake a coding or technical test, 29% did sometimes, and the remaining 10%
did not.
61
Never
10%
Do you give
candidates a Yes
Sometimes coding test? 61%
29%
Those numbers would suggest that is it very important to be prepared for something like this
when interviewing!
For most entry-level Data Science or Data Analyst positions the test will simply be based in
SQL, or potentially Python.
The level of difficulty will vary - and often it will start with basic questions and move up to
more complex ones.
If the role is focused more on insights & analytics, then the test will probably be less complex
than for a role focused more on data engineering or software engineering.
Get as much information about what it will entail - and then practice. There is nothing better
for this type of test than being in the right mindset, and this comes from consistent practice.
When completing the test, if you don’t know the answer, or the exact syntax - don’t worry.
Put what you think should happen in words. You can still get a lot of kudos for knowing what
steps to follow, or what considerations are important.
william
The simplest way to nail questions in these interviews is to prepare in the same way you’d want
to answer. Here is a high-level framework from Data Science Infinity called the CRAIG System
For each project, make sure you are well versed on;
C = Context
Give context around the business problem and why it needed to be solved – this pulls the interviewer
into your story
R = Roles
What was your role in the project – quick and easy!
A = Action
The specific actions you took from inception to conclusion. Refine this to the most succinct &
compelling narrative but ensure you keep auxiliary context up your sleeve, for example, “why you
chose solution C over solution A and B”. A good interviewer *will* ask this!
I = Impact
What was the result of your work? Super important – but often missed or underemphasized. Use
tangible figures, for example “drove $x sales” or “saved y hours”
G = Growth
Ask yourself “If I could have started the project again, what would I do differently?”
This sort of thinking around your career and the projects within it, can be so much more
impactful than you might think.
It shows you have an awareness of business impact, it shows you have an understanding of the
nuance of what you do from a technical point of view as you’re thinking about why different
solutions would work more effectively, and in general, it just shows a growth mindset, that
you’re always looking to build, and to improve. And, trust me, that is a lethal combination!
“I did a lot of research before choosing DSI, I asked other students and their experience had
been really good. It was definitely worth it - I feel so confident in Data Science now!”
Lovepreet
If you have any questions about Data Science or indeed Data Science
Infinity, please do reach out using the links below - I’m here to help!
Andrew
Data Science Infinity focuses on the journey and obsesses over the results of students. Created
by former Amazon & PlayStation Data Scientist, Andrew Jones,
Learn the skills & tools that will get you hired
The curriculum of 300+ tutorials, downloadable resources, and quizzes are based upon input
from hundreds of hiring managers & recruiters in the field. The curriculum grows & evolves over
time - and therefore you as a Data Scientist do as well.
SQL Python Statistics Tableau A/B Testing Github Data Preparation & Cleaning
Machine Leaning Deep Leaning Project Best Practices Interviewing & Application Tips & Inside Knowledge
Be part of a community
Join a community of equally invested peers also chasing success
“DSI gave me the confidence to apply for & land my amazing new
role! The support provided for technical questions & tackling the
hiring process is phenomenal”
Qasem
“DSI is the best program. I learned the right skills and built a mindset
for success. I felt confident in interviews and I’ve now landed an
amazing DS role!”
Umar
GA
Jose
Sergio
“The course has such high-quality content. You get your ROI even
from the first module”
Donabel
“DSI is the best program I’ve been part of. It’s given me so much
confidence to move forward and land a great job in the field”
Ankit
“DSI truly surpasses the competition, it’s amazing. You get life-
long access, all the key knowledge is there, complicated topics are
made accessible, and there is so much hands-on learning to put into
practice in the real world”
Fabrizio