Nothing Special   »   [go: up one dir, main page]

DS Curriculum 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

DATA SCIENCE

CURRICULUM
Leveraging Data Science for Social Good
Table of Content
01 About Sabudh

02 Data Science

03 Courses

i. Python Programming

ii. Machine Learning

iii. Deep Learning

iv. Natural Language Processing & Recommender Systems

v. Data Structure & Algorithms

vi. Dataiku

vii. Passion Project

04 Certifications

1
About Sabudh
Sabudh Foundation was founded by Dr. Sarabjot Singh Anand in 2018
along with Dr. Sukhjit Sehra. It empower students to apply Data
Science to address real-world challenges and collaborate with citizens,
governments and NGOs to drive tangible societal impact.

Recognizing the fact that real Data Scientists need not only to be
delivering on customer requirements but also need to feed their
hunger to grow their knowledge base in the fast-evolving field.

Data Science [6 Months]

[48 hrs/week]

Program Learning Objectives


The program aims to provide learners with a comprehensive skill set
across various domains. The Python course covers a broad spectrum of
programming skills, from fundamentals like control structures to
advanced topics such as web scraping and GUI implementation,
encompassing software testing and debugging practices. Participants
will also acquire proficiency in data acquisition, cleaning, analysis, and
visualization, fostering data-driven decision-making. Additionally,
they'll gain expertise in machine learning, deep learning concepts, and
practical application, including advanced algorithms and training
strategies. The Natural Language Processing & Recommender System
course focuses on understanding interactions between computers and
human language, alongside proficiency in data preprocessing, feature
engineering, and recommendation systems.

The Dataiku certification course guides IT professionals through


progressive mastery of the platform, covering machine learning
models, data pipelines, code integrations, and MLOps practices. The
Data Structure & Algorithm course aims to understand data
representation concepts and efficient computation methods. Finally,
the Passion Project equips students with research methodology skills
to define problems, conduct literature reviews, apply research
methods, analyze data, and effectively communicate findings.

2
01 Python Programming [6 weeks]

This course covers a solid foundation in Python programming,


emphasizing practical skills essential for software development and
data analysis. The learning objectives of this course will be
accomplished through the following topics

Python Installation and Setting up ID


Python Basics: Problem-solving, Formatted Output, Comments &
Docstrings, Identifiers, Keyword
Python Data Types: Including Type Casts and Operator
Python Functions: Function Calls, Different forms of Arguments,
Scope and Lifetime of Variables, Use of the Global Keyword, Types
of Functions (built-in and user-defined
Control Flow Statements: if/else, for Loops, while Loops, Transfer
Statements (break, continue, pass
Python Data Structures: Overview of Built-in and User-defined
Data Structure
File Handling: Basic File Operations in Python
NumPy: Arrays, Basic Operators, Universal Functions, Shape
Manipulation, Stacking Arrays, Linear Algebra Operations
Pandas: Data Manipulation, Analysis, Merging Data, Basic
Visualization with Panda
Matplotlib: Introduction to the Library, State-based vs Object-
oriented Interfaces, Creating Various Types of Plots
Seaborn: Advanced Data Visualization Technique
Web Scraping: Using BeautifulSoup and Selenium for Scraping Dat
Object-Oriented Programming: Basic and Advanced OOP concepts
in Pytho
Sklearn: Introduction to Machine Learning with Scikit-Lear
Scipy: Overview, Stats Module, Algebra Functions, Handling Sparse
Data, Image Processing with SciPy
APIs: Working with REST APIs, JSON Parsing, Security and
Authentication in API
Flask: Building and Deploying Web Applications with Flas
Dockerization: Basics of Containerizing Python Applications with
Docke
Asynchronous Tasks: Using Celery and Redis for Handling
Background Tasks
3
Working with SQL Basics: Integrating SQL Databases with Pytho
GUI Programming: Introduction to GUI Development with
Streamlit and Gradi
Testing and Debugging: Techniques and Best Practices for Testing
and Debugging Python code

02 Machine Learning [12 weeks]

This course covers a comprehensive introduction to data science, core


machine learning techniques, and the mathematical foundations
essential for understanding and implementing machine learning
algorithms. Learning objectives of this course will be accomplished
through the following topics

Introduction to data Science and Machine Learnin


Probability and Statistics for Machine Learnin
Supervised Learning: Linear Regression, Logistic Regression, Naive
Bayes. Support Vector Machines, Decision Trees , k-Nearest
Neighbor (k-NN) Algorithm, Ensemble learning Methods: Bagging,
Boosting and Random Forest
Model Selection, Building and Evaluation
Feature Engineering and Pre-processin
Model Optimization: Hyperparameter Tuning, Regularization, Grid
Search Cross-Validation
Unsupervised Learning: Clustering ( K-Means, Hierarchical, DB
Scan, Expectation-Maximization algorithm (EM) and Balanced
Iterative Reducing and Clustering using Hierarchies (BIRCH)
Dimensionality Reduction: Principal Component Analysis, Singular
Value Decomposition, Matrix factorization, Multidimensional
Scaling, T-distributed Stochastic Neighbor Embedding (t-SNE
Time Series Forecasting

4
03 Data Structures and [20 weeks]

Algorithms
This course progresses from fundamental concepts to more advanced
topics in DSA, ensuring a smooth learning curve. Each topic is
accompanied by practice problems and exercises to reinforce learning
and problem-solving skills. Learning objectives of this course will be
accomplished through the following topics

Introduction to Data Structures and Algorithm


Basics of Algorithm Analysis: Time Complexity Analysis (Big O
notation, time complexity classes), Space Complexity Analysi
Introduction to Arrays ; Basics of Arrays and their Implementation,
Basic Operations on Array
Introduction to Lists; linked lists, Singly linked lists, Doubly Linked
Lists and Circular Linked List
Stacks & Queues; Implementation using Arrays and Linked List
Recursio
Trees: Binary Trees and Binary Search Trees (BST), Operations on
Binary Search Trees (insertion, deletion, searching
Tree Traversal Algorithms; Depth-first Traversal (pre-order, in-
order, post-order), Breadth-first Traversal (level-order
Graphs; Graph Representation (adjacency matrix, adjacency list),
Graph Traversal Algorithms (depth-first search, breadth-first
search
Graph Theory; Graph Properties (connectedness, cycles, degrees),
Graph Algorithms (shortest path, minimum spanning tree
Tries; Introduction to Tries (prefix trees), Operations on Tries,
Applications of Tries

04 Dataiku [6 weeks]

Dataiku certifications are highly regarded in the data science and


analytics industry. This course will cover the fundamental concepts
and functionalities of Dataiku DSS and guide you through the process 5
of designing and building end-to-end data pipelines. Learning
objectives of this course will be accomplished through the following
certificates
Core Designer Certificate: Dataiku Datasets and Visual Recipes
ML Practitioner Certificate: Visual Machine Learning and
Interactive Statistics
Advanced Designer Certificate: Variables, Data Pipelines, and
Scenarios
Developer Certificate: Code Recipes, Webapps, and the APIs.

05 Deep Learning [12 weeks]

This module provides a comprehensive journey through deep


learning, beginning with foundational concepts to advanced
techniques. This will also cover practical applications to gain hands-
on experience. Learning objectives of this course will be accomplished
through the following topics
Introduction to Deep Learning: From Logistic to Neural Networks,
Basics of Neural Networks, Perceptron Model and Activation
Functions, Forward Propagation and Backpropagation,
Introduction to TensorFlow for Gradient Computatio
Multilayer Perceptron (MLPs); Structure and Architecture of MLPs,
Training MLPs using TensorFlow, Activation Functions for MLPs,
Applications of MLP
Convolutional Neural Networks (CNN) and Digital Image Analysis;
CNN Architectures (e.g. VGG, ResNet)
Training CNNs for Image Classification, Digital Image Analysis
Tasks (e.g., object detection, segmentation
Object Detection; Overview, Evolution and Strategies, Single Shot,
Few Shot Learnin
Recurrent Neural Networks (RNN); RNN Architectures (e.g. LSTM,
GRU), Applications of RNNs (e.g., sequence prediction, natural
language processing)

6
Generative Models; Generative Adversarial Networks (GANs),
Vector Autoregressive (VAR) models, Attention Models and
Transformer
Sequence to Sequence Models: Encoder-Decoder Model
Autoencoder
Speech Data Processing

06 Natural Language Processing &


Recommender systems [13 weeks]

This course provides a thorough journey through the fundamentals


and applications of Natural Language Processing (NLP), spanning
from basic text processing techniques to advanced topic modeling
methods. Additionally, it covers the practical insights into content
recommendation systems and dimensionality reduction techniques
for enhancing the understanding of real-world NLP challenges and
solutions. Learning objectives of this course will be accomplished
through the following topics

Natural Language Processing (NLP): Fundamentals of NLP, Basic


Text Processing Techniques, Tokenization, Lemmatization, and
Stemming, Part-of-Speech Tagging and Named Entity Recognitio
Syntactic Vectorizatio
Latent Semantic Analysis
Multivariate Bernoulli and Multinomial Naive Bayes for Text
Classificatio
Latent Variables in NL
Advanced Topic Modeling Techniques: Latent Dirichlet Allocation,
Probabilistic Modeling
Text Vectorizatio
Content Recommendation: Introduction to Recommender Systems,
Content-Based Recommendation Techniques, TF-IDF and Cosine
Similarity for Content-Based Recommendatio
Documents as Vectors
PCA and Singular Value Decomposition

7
07 Passion Project [25 weeks]

We provide opportunities to delve into passion projects with a focus on


creating positive social change. Through these projects, learners will
not only enhance their technical skills but also gain valuable
experience in applying data science methodologies to address real-
world social issues.

Bird Call Sat Sri Akal Chatbot Smart Glasses Medical NLP

Depression Bus Route Hateful Meme Solar Power


Road Scene
Detection Optimization Detector Prediction Analysis

Long Document AI for Videos AI for Book Intelligent AI Smart


Keyphrase Analysis Analysis Document Search
Generation Processing Documents

ML Approached Speech To Text Socio-Economic & The ECG Text-to-


for Quality Model for Political Analysis Monitoring & Speech
Assessment of Indian through Decision (Punjabi
the Map Data Languages Bollywood Songs Support System Language)

8
PII Information Raag Medical Review/ Sentiment Object Detection
Extraction Identification & Diagnosis Analysis for & Classification
from Text Understanding Prediction News Covered using Satellite
in Print Media Imagery

LLMs
LLMs

Structure Parsing & Using Deep Document Detecting


Prediction in Information Learning for Analysis using Pronunciation
Proteins using Retrieval from Image Large Errors for
Aminoacid Document Using Processing in Language Automatic
Properties LLMs Pathology & Models ( LLMs) Correcting of
Radiology Speech Based
Answers

Program Outcomes
Upon completion of this program, students will achieve
Mastery of Python programming skills, with proficiency in software
testing and debugging
Ability to acquire, clean, analyze, and visualize data, enabling
informed decision-making through data-driven insights
Expertise in machine learning and deep learning concepts,
including the practical application of advanced algorithms and
training strategies
Understanding of natural language processing and recommender
systems, with skills in data preprocessing, feature engineering, and
recommendation algorithms
Proficiency in utilizing the Dataiku platform for machine learning
model development, data pipelines, code integrations, and MLOps
practices.
9
Understanding of efficient computation methods through the Data
Structure & Algorithm course
Research methodology skills, including problem definition,
literature review, research methods application, data analysis, and
effective communication of findings, developed through the
Passion Project.

Certifications
Upon successful completion of Sabudh’s Internship Program, you will
be awarded a “Certificate of Completion” certifying you as JOB-
READY in Data Science. This certificate will validate your proficiency in
the field and enhance your employability.

Additionally, participants have the opportunity to enroll for individual


courses. For the passion project component, eligibility is limited to
students enrolled in the full-time 6-month program only.  

Overall, this portfolio will serve as a showcase of your skills and


capabilities to prospective employers. Beyond demonstrating your
technical knowledge, it will highlight your critical thinking skills and
your ability to apply data science methodologies to solve real-world
problems.

10
THANK

YOU

To know more visit- https://sabudh.org/

Contact Number- 8837662054

Facebook LinkedIn YouTube Instagram

Or Scan

11

You might also like