Assignment 2 Specification SWE5204 Advanced Databases and Big Data

Assignment 2 Specification

Advanced Databases and Big Data
Course/Program BEng Software Engineering & BSc (Hons) Computing
Module Name SWE5204: Advanced Database and Big Data
Assessment Number 2 of 2
Assessment Type (and weighting) Project output (50% of overall mark)
Assessment Name Data Science and Big Data
Issue Date 19 November 2021
Assessment Submission Date
Assessment item Due Date Weight

1 Assignment 2 05 January 50%

2022 by 23:55

Learning Outcomes Assessed

LO3: Apply appropriate database concepts and techniques to solve given problems.

LO4: Demonstrate the application of appropriate Big Data tools for advanced analytics


BoltFlix is an on-demand movie company that operates all over the world via the internet.
The company has generated a huge amount of data on movie ratings over the years. They
want to write an article analysing movie ratings by the critics and audiences. The report
should also justify the budget of the movies.
You have joined BoltFlix recently as a Junior Data Scientist. They want you to analyse movie
data for the years 2007 to 2001. Since they have never done any data analysis on the data,
they do not know exactly what they want to see or need. They have asked you to look at the
data and tell a story about the data.

Part 1: Solving Data Science Problem

Exploring the Data:

You must analyse the data very carefully by running some simple tests in R and you have set
up the following tasks, which are agreed upon by your team leader. For the following tasks,
you must write code and generate graphs (at least one graph per task).

1. Explore your dataset (using str, nrow etc.) and explain your understanding
2. How Genre impacts the budget of the movie?
3. Is there any relation between the critic rating and the budget?
4. Is there any relationship between the audience ratings and the budget?
5. Show the correlation between audience and critic ratings has evolved throughout the
years by movie genre. (Request from the CEO)

In your report, you must add the code, graphs and explanation for each of the tasks.

Advanced Analytics:
Once you have completed the above tasks, your manager gives you an extended Movie data
set. The dataset contains more columns than the previous one. Using this new data set, you
should complete the following tasks.

1. They give you the following graph as the R code is not found. You need to recreate
the graph by writing R code. You must use the Grammar of Graphics to recreate the
following graph. You must also explain your code and display the output at each step.
2. Write R code to find the trend of the Day of the week that most/least movies were
released compared to other days. (Provide graph with code in the report)
3. Identify if the profit of a movie depends on any of the features in this data set
4. Use ggplot and boxplot to identify if there is an anomaly in the data?
5. Find if there is any further insight you can find from this data set

Note: You must provide code, graphs and appropriate explanations for each of the sub-tasks.
Also, you need to copy and paste your R code in your report. Screenshots will not be

Demonstration: You must demonstrate your data science solution via Zoom. The date and
time of the demonstration will be published via Moodle in due course.

Important: No demonstration means Zero marks.

Part 2: Big Data Tools and Techniques

Evaluate appropriate Big Data technologies for BoltFlix to develop a database solution. Since
the company wants to analyse both structured and unstructured data in real-time to check
the performance of movie recommendations, they need a system that can deploy effective
data analytics. Analyse big data analysis and visualisation techniques that influence the
organisation’s decision-making in a cost-effective database solution.

Write a report of 2500 words to inform the company management about the technologies
available and how they will fit for the company’s new database solutions. The report should
identify and compare various Big Data visual tools and techniques suggest three suitable
visual tools and/ or techniques to meet the company’s future need.

Word Count (Part 2): The report should have a word count of 2500 words.

Expected Number of Sources: The white paper should have at least 10 references of which 3
should be relevant peer-reviewed journal/ conference papers.

Secondary Research Requirements:

Secondary research support is expected should be correctly cited using Harvard Referencing
for both in-text citations and Reference Structure. For further details please see

Submission: You must submit Part 1 and 2 in a single (MS Word) document through the
appropriate Moodle Turnitin link by 23:55 on 05 January 2022.

A percentage mark will be provided based on General Assessment Guidelines for Written
Assessments. Grading is as follows:
A: 70 - 100%
B: 60 - 69%
C: 50 - 59%
D: 40 - 49%
Marks below 40% will be classed as fail.

Specific Assessment Criteria:

• Have analysed, understood and implemented database systems for a specific
• Have provided domain-specific solutions and provided a clear logical conclusion
• Have provided a significant review of current and state-of-the-art big data tools and
• Have demonstrated the use of a range of current and quality secondary research
Note: This assignment will also be assessed by using the General Assessment Guidelines
for Written Assessments Level HE5.

Guidelines for the Preparation and Submission of Written Assessments

1. Written assessments should be word-processed in Arial or Calibri Light font size 12.
There should be double-spacing and each page should be numbered.

2. There should be a title page identifying the programme name, module title, assessment
title, your student number, your marking tutor and the date of submission.

3. You should include a word-count at the end of the assessment (excluding references,
figures, tables and appendices).
Where a word limit is specified, the following penalty systems applies:
• Up to 10% over the specified word length = no penalty
• 10 – 20% over the specified indicative word length = 5 marks subtracted (but if the
assessment would normally gain a pass mark, then the final mark to be no lower than
the pass mark for the assessment).
• More than 20% over the indicative word length = if the assessment would normally gain
a pass mark or more, then the final mark will capped at the pass mark for the

4. All written work should be referenced using the standard University of Bolton
referencing style– see:

5. Unless otherwise notified by your Module Tutor, electronic copies of assignments should
be saved as word documents and uploaded into Turnitin via the Moodle class area. If you
experience problems in uploading your work, then you must send an electronic copy of
your assessment to your Module Tutor via email BEFORE the due date/time.

6. Please note that when you submit your work to Moodle, it will automatically be checked
for matches against other electronic information. The individual percentage text
matches may be used as evidence in an academic misconduct investigation (see Section

Late work will be subject to the penalties:

• Up to 7 calendar days late = 10 marks subtracted but if the assignment would normally
gain a pass mark, then the final mark to be no lower than the pass mark for the
• More than 7 calendar days late = This will be counted as non-submission and no marks
will be recorded.

Late submission of assessments on refer and those which are graded Pass/Fail only, is not
permitted. Students may request an extension to the original published deadline date as
described below.
In the case of exceptional and unforeseen circumstances, an extension of up to 14 days
after the assessment deadline may be granted. This must be agreed by your Programme
Leader, following a discussion the Module Tutor. You should complete an Extension
Request Form available from your Tutor and attach documentary evidence of your
circumstances, prior to the published submission deadline.

Extensions over 14 calendar days should be requested using the Mitigating Circumstances
procedure, with the exception of extensions for individual projects and artefacts which,
at the discretion of the Programme Leader, may be longer than 14 days.
Requests for extensions which take a submission date past the end of the module
(normally week 15) must be made using the Mitigating Circumstances procedures.

Some students with registered disabilities will be eligible for revised submission deadlines.
Revised submission deadlines do not require the completion extension request

Please note that the failure of data storage systems is not considered to be a valid reason
for an extension. It is therefore important that you keep multiple copies of your work on
different storage devices before submitting it.

Academic Misconduct
Academic misconduct may be defined as any attempt by a student to gain an unfair advantage
in any assessment. This includes plagiarism, collusion, commissioning (contract cheating)
amongst other offences. In order to avoid these types of academic misconduct, you should
ensure that all your work is your own and that sources are attributed using the correct
referencing techniques. You can also check originality through Turnitin.

Please note that penalties apply if academic misconduct is proven. See the following link for
further details:
General Assessment Guidelines for Written Assessments Level HE5

% Relevance Knowledge Argument/Analysis Structure Presentation Written English Research/Referencing

85- Directly relevant to title. Demonstrates an exceptional Makes exceptional use of Coherently articulated The presentational style & An exceptionally well written Sources accurately cited
100% Expertly addresses the knowledge/understanding of appropriate arguments and/or and logically layout is correct for the type answer with standard spelling in the text. A wide range

assumptions of the title theory and practice for this theoretical models. Demonstrates structured. An of assignment. and grammar. of contemporary and
Class I

and/or the requirements of level through the identificationsome distinctive or independent appropriate format is Effective inclusion of figures, Style is clear, resourceful and relevant references cited
the brief. and critical analysis of the thinking. Presents an exceptional used. tables, plates (FTP). academic. in the reference list in
most important issues and critical analysis of the material the correct style.
themes. resulting in clear, logical and
original conclusions.
70- Directly relevant to title. Demonstrates an excellent Makes creative use of appropriate Coherently articulated The presentational style & An excellently written Sources accurately cited
84% Addresses the assumptions knowledge/understanding of arguments and/or theoretical and logically layout is correct for the type answer with standard spelling
in the text. A range of
Class I

of the title and/or the theory and practice for this models. structured. of assignment. and grammar. contemporary and
requirements of the brief. level through the identification Presents an excellent analysis of An appropriate Effective inclusion of figures, Style is clear, resourceful and
relevant references cited
and analysis of the most the material resulting in clear, format is used. tables, plates (FTP). academic. in the reference list in
important issues and themes. logical conclusions. the correct style.
60- Directly relevant to title. Demonstrates a very good Uses sound arguments or Logically constructed The presentational style & A very well written answer Sources accurately cited
(Very Good

69% Addresses most of the knowledge/understanding of theoretical models. Presents a in the main. An layout is correct for the type with standard spelling and in the text and a range
Class II/i


assumptions of the title theory and practice for this clear and valid analysis of the appropriate format is of assignment. grammar. Style is clear and of appropriate
and/or the requirements of level through the identification material in the main with clear, used. Effective inclusion of FTP. academic. references cited in
the brief. and analysis of key issues. logical conclusions. reference list in the
correct style.
Generally addresses the Demonstrates a good Presents largely coherent For the most part The presentational style & Competently written with Most sources accurately
50- title/brief, but sometimes knowledge/understanding of arguments. Evidence of attempted coherently articulated layout is correct for the type minor lapses in spelling and cited in the text and an
Class II/ii


59% considers irrelevant issues. theory and practice for this analysis, with some descriptive or and logically of assignment. grammar. Style is readable and appropriate reference
level through the identification narrative passages. Conclusions are structured. An Inclusion of FTP but lacks academic in the main. list is provided which is
and analysis of some key fairly clear and logical. acceptable format is selectivity. largely in the correct
issues. used. style.
40- Some degree of irrelevance Demonstrates an adequate Presents basic arguments, but Adequate attempt at The presentational style & Generally competently written Some relevant sources
49% to the title/brief. knowledge/understanding of focus and consistency lacking in articulation and layout is largely correct for although intermittent lapses in cited.

theory and practice for this places. Issues are vaguely stated. logical structure. the type of assignment. grammar and spelling pose Some weaknesses in
Class III

Superficial consideration of level. An attempt is made to Descriptive or narrative passages An acceptable format Inappropriate use of FTP or obstacles for the reader. Style referencing technique.
the issues. analyse key issues. evident which lack clear purpose. is used. not used where clearly limits communication and is
Conclusions are not always clear or needed to aid non-academic in a number of
logical. understanding. places.
Significant degree of Demonstrates weaknesses in Limited argument, which is Poorly structured. For the type of assignment Deficiencies in spelling and Limited sources and
irrelevance to the title/ knowledge of theory and descriptive or narrative in style Lack of articulation. the presentational style &/or grammar makes reading weak referencing.

39% brief. Only most obvious practice for this level, with with little evidence of analysis. Format deficient. layout is lacking. difficult. Simplistic or

issues are addressed at a poor understanding of key Conclusions are neither clear nor FTP ignored in text or not repetitious style impairs
superficial level and in issues. logical. used where clearly needed. clarity.
unchallenging terms. Style is non-academic.
Relevance to the title/brief Demonstrates a lack of basic Inadequate arguments and no Unstructured. For the type of assignment Poorly written with numerous An absence of academic
<34% is intermittent or missing. knowledge of either theory or analysis. Lack of articulation. the presentational style &/or deficiencies in grammar, sources and poor

The topic is reduced to its practice for this level, with Conclusions are sparse. Format deficient layout is lacking. spelling and expression. referencing technique.
vaguest and least little evidence of FTP as above. Style is non-academic.
challenging terms. understanding.

