Syllabus DS&E 22 23 4Y
Syllabus DS&E 22 23 4Y
Syllabus DS&E 22 23 4Y
The syllabus applies to students admitted in the academic year 2022-23 and thereafter under the four-
year curriculum.
Each course offered shall be classified as either introductory level course or advanced level course.
A Discipline Core course is a compulsory course which a candidate must pass in the manner provided
for in the Regulations.
A Discipline Elective course refers to any technical course offered for the fulfillment of the curriculum
requirements of the degree of BEng in Data Science and Engineering that are not classified as discipline
core course.
Curriculum
Elective Courses
Students are required to complete 72 credits of elective course(s) offered by any department, except
Common Core Courses.
University Requirements
Students are required to complete:
a) 12 credits in English language enhancement, including 6 credits in “CAES1000 Core
University English” and 6 credits in “CAES9542 Technical English for Computer Science”;
b) 6 credits in Chinese language enhancement course “CENG9001 Practical Chinese for
Engineering Students”;
c) 36 credits of courses in the Common Core Curriculum, comprising at least one and not more
than two courses from each Area of Inquiry with not more than 24 credits of courses being
selected within one academic year except where candidates are required to make up for failed
credits; and
d) non-credit bearing courses as required by the University.
Capstone Experience
Students are required to complete the 6-credit “COMP3522 Real-life data science” and the 6-credit
“COMP4501 Data science in discipline project” or “COMP4502 Final year project” to fulfill the
capstone experience requirement for the degree of BEng in Data Science and Engineering.
1
Internship
Students are required to complete the non-credit bearing internship “COMP3510 Internship”, which
normally takes place after their third year of study.
The details of the distribution of the above course categories are as follows:
The curriculum of BEng(DS&E) comprises 240 credits of courses with the following structure:
2
Capstone Experience and Internship (12 credits)
*Internship
+
Capstone Experience
At least 72 credits of courses offered by any department, except Common Core Courses.
Students are encouraged to pursue minor programme(s) related to application of data science.
Recommended minor programmes: Finance, Economics, Marketing, Politics and Public Administration,
Journalism and Media Studies, Social Data Science, Neuroscience, General Linguistics, Genetics and
Genomics, Urban Studies, Urban Infrastructure Informatics, Industrial Engineering and Logistics
Management, Earth Sciences, Environmental Science, Molecular Biology and Biotechnology.
3
Summary of curriculum structure of BEng in Data Science and Engineering
FIRST YEAR
SECOND YEAR
THIRD YEAR
4
Capstone Experience (6 credits)
COMP3522 Real-life data science 6
Internship (0 credit)
COMP3510 Internship 0
FOURTH YEAR
COURSE DESCRIPTIONS
Candidates will be required to do the coursework in the respective courses selected. Not all courses
are offered every semester.
This course introduces fundamental concepts of engineering business; business models and financing;
SWOT and market analysis; engineering entrepreneurship and innovation; system design, integration,
and operation; product design and realization; and engineering sustainability. The course also involves
hands-on projects in which students work in group to experience methods and techniques for the
development of engineering business ideas and plans, products, or services.
This is an introductory course designed for first-year engineering students to learn about computer
5
programming. Students will acquire basic Python programming skills, including syntax, identifiers,
control statements, functions, recursions, strings, lists, dictionaries, tuples and filed. Searching and
sorting algorithms, such as sequential search, binary search, bubble sort, insertion sort and selection
sort, will also be covered.
This course covers intermediate to advanced computer programming topics on various technologies and
tools that are useful for software development. Topics include Linux shell commands, shell scripts,
C/C++ programming, and separate compilation techniques and version control. This is a self-learning
course; there will be no lecture and students will be provided with self-study materials. Students are
required to complete milestone-based self-assessment tasks during the course. This course is designed
for students who are interested in Computer Science /Computer Engineering.
This course aims at students with Core Mathematics plus Module 1 or Core Mathematics plus Module
2 background and provides them with basic knowledge of calculus and some linear algebra that can be
applied in various disciplines. It is expected to be followed by courses such as MATH2012, MATH2101,
MATH2102, MATH2211, and MATH2241. Topics include: Functions; graphs; inverse functions;
Limits; continuity and differentiability; Mean value theorem; Taylor's theorem; implicit differentiation;
L'Hopital's rule; Higher order derivatives; maxima and minima; graph sketching; Radian, calculus of
trigonometric functions; Definite and indefinite integrals; integration by substitutions; integration by
parts; integration by partial fractions; Complex numbers, polar form, de Moivre's formula; Applications:
Solving simple ordinary differential equations; Basic matrix and vector (of orders 2 and 3) operations,
determinants of 2x2 or 3x3 matrices.
Please refer to the University Language Enhancement Courses in the syllabus for the degree of BEng
for details.
Running alongside Computer Science, Financial Technology, Data Science related final-year / capstone
6
project courses, this one-semester, 6-credit course will build and consolidate students’ ability to
compose technical reports, and make technical oral presentations. The focus of this course is on helping
students to report on the progress of their Final Year Project in an effective, professional manner in both
written and oral communication. Topics include accessing, abstracting, analyzing, organizing and
summarizing information; making effective grammatical and lexical choices; technical report writing;
and technical presentations. Assessment is wholly by coursework.
Successful completion of 36 credits of courses in the Common Core Curriculum, comprising at least
one and not more than two courses from each Area of Inquiry with not more than 24 credits of courses
being selected within one academic year except where candidates are required to make up for failed
credits:
Arrays, linked lists, trees and graphs; stacks and queues; symbol tables; priority queues, balanced trees;
sorting algorithms; complexity analysis.
The course introduces basic concepts and methodology of data science. The goal of this course is to
provide students with an overview and practical experience of the entire data analysis process. Topics
include: data source and data acquisition, data preparation and manipulation, exploratory data analysis,
statistical and predictive analysis, data visualization and communication.
This course provides students with a solid foundation in calculus of several variables and linear algebra,
which they will need in the study of mathematics related subjects. Topics include: Vectors and Matrices:
Vectors in space, dot product and cross product, determinants (with geometric interpretations); Partial
Derivatives: Functions of several variables, partial derivatives, extreme values and Lagrange multipliers,
Taylor's formula; Multiple Integrals: Double and triple integrals, substitution in multiple integrals;
7
Matrix Algebra: Matrix addition and multiplication, system of linear equations as a matrix equation;
Vector Spaces: The Euclidean spaces as vector spaces, its subspaces, span of vectors, linear
independence, basis and dimension; Eigenvalues and Eigenvectors: Diagonalization and computing
powers; Numerical Methods: Bisection method and Newton's method for finding roots of equations,
Simpson's rule and Trapezoidal rule for numerical integration.
The discipline of statistics is concerned with situations in which uncertainty and variability play an essential
role and forms an important descriptive and analytical tool in many practical problems. Against a
background of motivating problems this course develops relevant probability models for the description of
such uncertainty and variability. Topics include: Sample spaces; Operations of events; Probability and
probability laws; Conditional probability; Independence; Discrete random variables; Cumulative
distribution function (cdf); Probability mass function (pmf); Bernoulli, binomial, geometric, and Poisson
distributions; Continuous random variables; Cumulative distribution function (cdf); Probability density
function (pdf); Exponential, Gamma, and normal distributions; Functions of a random variable; Joint
distributions; Marginal distributions; Independent random variables; Functions of jointly distributed
random variables; Expected value; Variance and standard deviation; Covariance and correlation.
This course builds on STAT2601, introducing further the concepts and methods of statistics. Emphasis
is on the two major areas of statistical analysis: estimation and hypothesis testing. Through the
disciplines of statistical modelling, inference and decision making, students will be equipped with both
quantitative skills and qualitative perceptions essential for making rigorous statistical analysis of real-
life data. Topics include: Overview: random sample; sampling distributions of statistics; moment
generating function; large-sample theory: laws of large numbers and Central Limit Theorem; likelihood;
sufficiency; factorisation criterion; Estimation: estimator; bias; mean squared error; standard error;
consistency; Fisher information; Cramer-Rao Lower Bound; efficiency; method of moments; maximum
likelihood estimator; Hypothesis testing: types of hypotheses; test statistics; p-value; size; power;
likelihood ratio test; Neyman-Pearson Lemma; generalized likelihood ratio test; Pearson chi-squared
test; Wald tests; Confidence interval: confidence level; confidence limits; equal-tailed interval;
construction based on hypothesis tests.
Prerequisite: STAT2601
Mutually exclusive with: STAT3902
Assessment: 25% continuous assessment, 75% examination
This course studies the principles, design, administration, and implementation of database management
systems. Topics include: entity-relationship model, relational model, relational algebra, database
design and normalization, database query languages, indexing schemes, integrity and concurrency
8
control.
This course introduces algorithms, tools, practices, and applications of machine learning. Topics include
core methods such as supervised learning (classification and regression), unsupervised learning
(clustering, principal component analysis), Bayesian estimation, neural networks; common practices in
data pre-processing, hyper-parameter tuning, and model evaluation; tools/libraries/APIs such as scikit-
learn, Theano/Keras, and multi/many-core CPU/GPU programming.
The primary objective of this course is to explore the legal and ethical challenges and ramifications in the
modern practice of data science. Using a case-based approach, students will analyse contemporary
controversies from a techno-legal and ethical perspectives. The focuses are data privacy and the regulation
of using data in specific areas of law. Topics include basic privacy protection techniques, such as
encryption and data anonymization data privacy laws, open data policy, data protection process and
technology, issues in the usage of sensitive personal data and public data.
This is an introduction course on the subject of artificial intelligence. Topics include: intelligent agents;
search techniques for problem solving; knowledge representation; logical inference; reasoning under
uncertainty; statistical models and machine learning.
This course introduces the principles, mathematical models and applications of computer vision. Topics
include: image processing techniques, feature extraction techniques, imaging models and camera
calibration techniques, stereo vision, and motion analysis.
9
COMP3323. Advanced database systems (6 credits)
The course will study some advanced topics and techniques in database systems, with a focus on the
system and algorithmic aspects. It will also survey the recent development and progress in selected
areas. Topics include: query optimization, spatial-spatiotemporal data management, multimedia and
time-series data management, information retrieval and XML, data mining.
Prerequisite: COMP3278
Mutually exclusive with: FITE3010
Assessment: 50% continuous assessment, 50% examination
An introduction to algorithms and applications of deep learning. The course helps students get hands-
on experience of building deep learning models to solve practical tasks including image recognition,
image generation, reinforcement learning, and language translation. Topics include: machine learning
theory; optimization in deep learning; convolutional neural networks; recurrent neural networks;
generative adversarial networks; reinforcement learning; self-driving vehicle.
The goal of the course is for students to be grounded in basic bioinformatics concepts, algorithms, tools,
and databases. Students will be leaving the course with hands-on bioinformatics analysis experience
and empowered to conduct independent bioinformatics analyses. We will study: 1) algorithms,
especially those for sequence alignment and assembly, which comprise the foundation of the rapid
development of bioinformatics and DNA sequencing; 2) the leading bioinformatics tools for comparing
and analyzing genomes starting from raw sequencing data; 3) the functions and organization of a few
essential bioinformatics databases and learn how they support various types of bioinformatics analysis.
This course introduces the principles, mechanisms and implementation of cyber security and data
protection. Knowledge about the attack and defense are included. Topics include notion and terms of
cyber security; network and Internet security, introduction to encryption: classic and modern
encryption technologies; authentication methods; access control methods; cyber attacks and defenses
(e.g. malware, DDoS).
10
COMP3361. Natural language processing (6 credits)
Natural language processing (NLP) is the study of human language from a computational perspective.
The course will be focusing on machine learning and corpus-based methods and algorithms. We will
cover syntactic, semantic and discourse processing models. We will describe the use of these methods
and models in applications including syntactic parsing, information extraction, statistical machine
translation, dialogue systems, and summarization. This course starts with language models (LMs),
which are both front and center in natural language processing (NLP), and then introduces key machine
learning (ML) ideas that students should grasp (e.g. feature-based models, log-linear models and then
the neural models). We will land on modern generic meaning representation methods (e.g. BERT/GPT-
3) and the idea of pretraining / finetuning.
This course comprises two main components: students first acquire the basic know-how of the state-of-
the-art AI technologies, platforms and tools (e.g., TensorFlow, PyTorch, scikit-learn) via example-
based modules in a self-paced learning mode. Students will then identify a creative or practical data-
driven application and implement an AI-powered solution for the application as the course project.
Students will be able to experience a complete AI experimentation and evaluation cycle throughout the
project.
Prerequisite: COMP3314
Mutually exclusive with: COMP3359
Assessment: 100% continuous assessment
This course provides an overview and covers the fundamentals of scientific and numerical computing.
It focuses topics in numerical analysis and computation, with discussions on applications of scientific
computing.
The objective of this course is to study the design and implementation of Big Data systems. Topics
include: data analytics pipelines, data processing framework, distributed and parallel data systems,
network attached storage, data storage virtualization, query language support, data center architecture,
fault tolerance, and recovery.
This course introduces basic concepts, technologies, and applications of the Internet of Things (IoT),
11
with a focus on data analytics. The course covers a range of enabling techniques in sensing, computing,
analytics, learning for IoT and connects them to exciting applications in smart homes, healthcare,
security, etc. The lectures cover the pipeline of data generation, data acquisition, data transportation,
data analysis and learning, and data applications, with various topics from the fundamentals (e.g., signal
processing, statistical analysis, machine learning) to real-world systems. Billions of things are
connected today, and this course helps students to understand how IoT will evolve into AIoT (Artificial
Intelligence of Things).
Prerequisite: COMP2119
Assessment: 60% continuous assessment, 40% examination
Data science is an emerging area. The primary objective of this course is to introduce
new development in this area, including but not limited to advanced computational
techniques, latest advances in technologies related to data science, and challenging
R&D problems. Selected topics in data science that are of current interest will
be discussed. Topics may vary from year to year.
This course aims to give an overview of the basic principles and techniques for visualization and
visual analytics. In particular, topics including human visual perception, color and visualization
techniques for various data kinds (e.g., spatial, geospatial and multivariate data, graphs and networks,
text and document) will be covered. The use of interactive visual interface to facilitate analytical
reasoning will also be discussed. Students will use practical tools and apply visualization principles
and techniques to perform visual data analysis on large datasets.
Prerequisite: COMP2119 or COMP2502 or ELEC2543 or FITE2000
Assessment: 50% continuous assessment, 50% examination
This course introduces basic theories of blockchain and distributed ledger, which includes basic
cryptography, public key cryptosystem, distributed computing and consensus protocols. Financial
applications of blockchain and distributed ledger will be discussed.
The goal of the course is to study the main methods used today for data mining and on-line analytical
processing. Topics include Big Data Architecture, Data Mining Algorithms, Classification, and
Clustering.
Do Google and Facebook understand us better than we know ourselves? Are we being descended to lab
rats every time we go online? Can we extract information from electronic health records to prevent
diseases or even suicide? Is the impartially designed algorithm for predicting an individual’s probability
of recidivism truly fair for sentencing individuals who have committed crimes? When big data analytics
are routinely applied to nudging our daily lives, the ability to audit the algorithms adopted by these
analytics becomes crucial.
The course will focus on elaborating the core principles of a variety of techniques adopted when
predicting future phenomena through the lens of big data. We will use a case study approach to provide
an in-depth understanding of how predictions are made using various big data analytics. Students will
be guided to develop a rich contextual understanding of consequences associated with applications of
big data in different scenarios. The goal of this course is to inspire the students to think creatively and
critically about how big data analytics can be used to making scientific discoveries and doing social
good. Meanwhile, they will also learn to identify potential prejudices embedded in poorly designed
algorithms and be able to stand up against the abuse of big data.
The analysis of variability is mainly concerned with locating the sources of the variability. Many
statistical techniques investigate these sources through the use of 'linear' models. This course presents
the theory and practice of these models. Topics include: Simple linear regression: least squares method,
analysis of variance, coefficient of determination, hypothesis tests and confidence intervals for
regression parameters, prediction; Multiple linear regression: least squares method, analysis of variance,
coefficient of determination, reduced vs full models, hypothesis tests and confidence intervals for
regression parameters, prediction, polynomial regression; One-way classification models: one-way
ANOVA, analysis of treatment effects, contrasts; Two-way classification models: interactions, two-
way ANOVA for balanced data structures, analysis of treatment effects, contrasts, randomised complete
block design; Universal approach to linear modelling: dummy variables, 'multiple linear regression'
representation of one-way and two-way (unbalanced) models, ANCOVA models, concomitant
variables; Regression diagnostics: leverage, residual plot, normal probability plot, outlier, studentized
residual, influential observation, Cook's distance, multicollinearity, model transformation.
Prerequisite: STAT2602
Mutually exclusive with: STAT3907
Assessment: 25% continuous assessment, 75% examination
_________________________________________________________________________________
Machine learning is the study of computer algorithms that build models of observed data in order to
make predictions or decisions. Statistical machine learning emphasizes the importance of statistical
theory and methodology in the algorithmic development. This course provides a comprehensive and
practical coverage of essential machine learning concepts and a variety of learning algorithms under
supervised and unsupervised settings. The course materials are presented with lots of examples and
reproducible codes. Topics include: Data science, data exploration, generalized linear models, variable
13
selection, basis expansion, regularization, cross-validation, tree-based methods, kernel methods, neural
networks, dimension reduction, principal component analysis, cluster analysis, stochastic optimization,
interpretable machine learning.
Prerequisites: STAT2602, or (STAT1603 and any University level 2 course) or STAT3902; and
STAT3600 or STAT3907
Mutually exclusive with: STAT4904
Assessment: 100% continuous assessment
_________________________________________________________________________________
Building on prior coursework in statistical methods and modeling, students will get a deeper
understanding of the entire process of data analysis. The course aims to develop skills of model
selection and hypotheses formulation so that questions of interest can be properly formulated and
answered. An important element deals with model review and improvement, when one's first attempt
does not adequately fit the data. Students will learn how to explore the data, to build reliable models,
and to communicate the results of data analysis to a variety of audiences. Topics include: Descriptive
statistics, presentation and visualization of data; Simple statistical analyses for the one-sample and two-
sample case using parametric and nonparametric methods; Regression analyses: model fitting; variable
selection and model diagnostic checking; Analysis of Variance (ANOVA): 1-way, two-way and higher-
way ANOVA; Covariance analysis; Categorical and count data: binary logistic regression, Poisson
regression. Real data sets will be presented for modelling and analysis using statistical software for
gaining hands-on experience.
A time series consists of a set of observations on a random variable taken over time. Time series arise
naturally in climatology, economics, environment studies, finance and many other disciplines. The
observations in a time series are usually correlated; the course establishes a framework to discuss this.
This course distinguishes different type of time series, investigates various representations for the
processes and studies the relative merits of different forecasting procedures. Students will analyse real
time-series data on the computer. Topics include: Stationarity and the autocorrelation functions; linear
stationary models; linear non-stationary modes; model identification; estimation and diagnostic
checking; seasonal models and forecasting methods for time series.
Prerequisite: STAT3600
Mutually exclusive with: STAT3614, STAT3907
Assessment: 40% continuous assessment, 60% examination
_________________________________________________________________________________
In many designed experiments or observational studies, the researchers are dealing with multivariate
data, where each observation is a set of measurements taken on the same individual. These
measurements are often correlated. The correlation prevents the use of univariate statistics to draw
inferences. This course develops the statistical methods for analysing multivariate data through
examples in various fields of application and hands-on experience with the statistical software SAS.
Topics include: Problems with multivariate data. Multivariate normality and transforms. Mean
structure for one sample. Tests of covariance matrix. Correlations: Simple, partial, multiple and
14
canonical. Multivariate regression. Principal components analysis. Factor analysis. Problems for
means of several samples. Multivariate analysis of variance. Discriminant analysis. Classification.
Multivariate linear model.
The course consists of two components: internship and professionalism. Internship requires students to
spend a minimum of four weeks employed, full-time, as IT interns or trainees. During this period, they
are engaged in work of direct relevance to their programme of study. The Internship provides students
with practical, real-world experience and represents a valuable complement to their academic training.
Professionalism exposes students to social and professional issues in computing. Students need to
understand their professional roles when working as data science professionals as well as the
responsibility that they will bear. They also need to develop the ability to ask serious questions about
the social impact of data science and engineering and to evaluate proposed answers to those questions.
Topics include: intellectual property, privacy, social context of computing, risks, safety and security
concerns for data science professionals, professional and ethical responsibilities, and continuing
professional development.
In this course, students will learn data science step by step through real analytics example: data mining,
modelling, tableau visualization and more. Unlike many classes where everything works just the way it
should and the training is smooth sailing, this course will give students a data science odyssey through
experiencing the pains a data scientist goes through on a daily basis. Corrupt data, anomalies,
irregularities, etc. Upon completing this course, the students will enhance their data wrangling skills
and learn how to 1) model their data, 2) curve-fit their data, and 3) how to communicate their findings.
The students will develop a good understanding of Tableau, SQL, SSIS, and Gretl that give them a safe
ride in data lakes. With no final exam, the students will be given practical exercises that prepare them
to be at the helm for real-world challenges.
Prerequisite: ENGG1330
Assessment: 100% continuous assessment
Students will work in groups or individually on a capstone project which is on data science in
association with a domain focus. Students are required to identify a data-intensive problem in a specific
application domain, and to implement a data-driven solution for the problem. Students will undergo a
complete data science project life cycle, from problem understanding, data collection, data exploration
to data modelling, analysis and interpretation, and finally deliver a data science solution.
15
COMP4502. Final year project (6 credits)
Student individuals or groups, during the final year of their studies, undertake full end-to-end
development of a substantial project, taking it from initial concept through to final delivery. Topics
range from applied technologies to assignments on basic research in relation to data science and
engineering. In case of a team project, significant contribution is required from each member and
students are assessed individually. Strict standards of quality will be enforced throughout the project
development.
16