Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
13
providers offer resources in the cloud environment for of data through economical, industry-standard servers that stores
organisations to reap the benefits of their business data. and processes the data. A HDFS storage layer is used and
MapReduce component executes a variety of analytic functions to
Popular Big Data technology providers offer great processing analyse thedata efficiently. Hadoop uses YARN for cluster
potential and management of large quantities of data through management and scheduling applications for the user. In-depth
platforms such as Apache Hadoop and Google MapReduce. The analysis of data using machine learning algorithms could be done
Big data technology is growing, and the vision in the use of big data using SPARK on top of HDFS. However, with such opensource
is the way relevant, large and fast-growing data can be captured technologies there are additional risks of ongoing maintenance and
from any source and analysed to assist the organisation to gain support.
useful insight towards helping their overall goal. Businesses look
towards using big data to gain competitive advantages and to help 3.2 NoSQL
them achieve their business goal such as increase revenues, NoSQL stands for ‘Not Only SQL’ and refers to non-relational
improve customer satisfaction, and enhance their productivity [14]. database technologies such as Cassandra, Neo4j, Redis, and
As businesses continue to store large volumes of data, they look MongoDB, which are also effective and economic choices for Big
towards more sophisticated tools to mine and analyse data into Data infrastructure. NoSQL databases are better tailored to handle
meaningful way. Organisations are starting to realise that big data dynamic and semi-structured data with low latency. While NoSQL
is more about business transformation and making the change to is better suited for operational and analytical tasks to process
exploit data. Big data allows businesses to gain a deeper selective criteria-based data in real-time, Hadoop is more employed
understanding of the dynamics of their business by analysing and for harnessing all data and in-depth analysis with high-throughput.
visualising big data and integrating the results with traditional Since both Hadoop and NoSQL have different advantages and
information so as to get a new perspective on their day-to-day purposes, both can be used simultaneously as in the case of HBase.
operations. However, Big Data pose various challenges [15][16]. However, security is one of the major concerns of NoSQL.
The first and foremost challenge is the lack of scalability due to
poor infrastructure management, whether it is on-premise 3.3 In-Memory Database (IMDB)
infrastructure or on the cloud [4]. Organisations do not want to An IMDB is also known as a main memory database system
maintain and pay for substantially more Big Data infrastructure (MMDB) that is popularly used in high-volume environments
when it is underutilized currently. However, if the infrastructure where response time is very critical. Since data resides in the
does not increase in size as the business data grows, they will not memory of the system rather than in the disk storage, data access
be able to gain any value from Big Data. Another issue is that many time and processing is very fast. Hence, IMDBs have become
applications and data analytics software tools do not make us of popular in recent years for handling High-Performance Computing
optimal data transformation, efficient analysis and appropriate data (HPC) and Big Data applications. Related to this is also in-memory
visualisation [3][6]. While performing data transformation, if the data grid (IMDG), which is the real-time analytics engine that
quality of data is lost, whatever be the infrastructure used, it will produces real-time changes to data providing smart grid features.
not meet the organisation’s Big Data needs. Above all these Using such technologies, In Big Data applications, huge quantities
challenges, the security and privacy issue is the most compelling of data for processing can be stored in-memory, while the original
one since any data hosted by third party can raise questions about and persistent data could be residing on an external disk.
the security and privacy of an organisation’s confidential
information [17][18].
3.4 Massively Parallel Processing (MPP)
MPP technology is a form of collective processing of massive
With new technology emerging for Big Data, organisations must be amounts of data using several processors working on different parts
prepared to face challenges in supporting their dependence on Big of the same program. Each processor takes up different threads of
Data due to the high costs involved, technological complexities, the program to execute its own operating system and memory. A
data availability, privacy and integrity concerns. Using Big Data messaging interface in necessary to organize and manage the
infrastructure without understanding these issues may not thread handling of the different processes involved in the MPP
necessarily be the right way for any organisation as Big Data forms architecture. Many MPP technologies have partnerships with other
an essential component of management decision making that major players among the Big Data technology providers. Hence,
requires new capabilities, as well as organisational and culture MPP technologies also have a crossover with other Big Data
change [19]. The next section describes the industry standards and technologies.
tools in Big Data infrastructure that can benefit organisations to
create an implementation plan. 3.5 Cloud Computing
Big players of Big Data infrastructure providers offer cloud
3. BIG DATA INFRASTRUCTURE computing that cover a range of products, technologies and services
The first and foremost requirement of an orgnaisation before to various organisations in order to jump start with their Big Data
plunging into the Big Data landscape is to understand the ventures. All the resources and applications are hosted in cloud, and
infrastructural tools and technologies: what are they, how they is considered to have minimal cost implications as organisations
operate and what is best used for. Some of the popular technologies can pay based on the infrastructure, platform or software services
for Big Data architecture are described below: used. Amazon, Microsoft, Oracle, IBM are some of the big players
offering cost-effective Big Data architectures in the cloud. While
3.1 Hadoop cloud computing can deliver data insights seamlessly for
Hadoop is a readily available open source framework that uses a organisations to benefit from, security and privacy issues of
coste-effective programming model to allow distributed processing confidential and sensitive data are of great concern.
of big datasets by efficiently breaking it and distributing smaller
parts for parallel or concurrent processing and analysing of them. The rapid technological developments in Big Data could
Hadoop permits distributed parallel processing of gigantic amounts overwhelm traditional computing frameworks in businesses. Hence,
the National Institute of Standards and Technology (NIST) has
14
provided a high-level conceptual framework as shown in Figure 1 4. DATA VISUALISATION
[20]. The purpose of this framework is to serve as a reference model With the fast developments in Big Data technologies and
to facilitate understanding of the operational intricacies, design application solutions, organisations are gaining meaningful data
structures and requirements in Big Data. The advantage of the insights that can transform their businesses by utilising the large
model is that it can be adopted by any organization as it is not tied volumes of data for efficient decision-making and management.
to any specific vendor products, services, or reference Organisations can use different analytical strategies such as
implementation. predictive analytics to reveal patterns and provide decision making
effectively. However, Big Data can add value only if the following
key elements are planned well:
• data collection,
• data storage,
• data analysis, and
• data visualisation/output.
15
decision-making (based on varying paraments and different criteria artefacts. Hence, even visual security is an important concern in Big
for analysis from the chosen models). The output of the analysis Data.
must be visually comprehensible.
Data visualisation is the key to the success of Big Data. Unless the
final output is in the form acceptable by people who need the data
to be analysed, the whole Big Data venture is of no value. Huge
reports or complicated graphics that seldom people understand will
result in no meaningful decision-making or actions. There are
various visualisation tools that include management dashboards
and commercial data visualisation platforms that output attractive
charts and graphs for clear and concise communication in order to
gain data insights.
Figure 4 gives a simple data visualisation chart showing that the
peak flu season in Australia did not occur until August from data Figure 6. Data insights - multiple views/decision parameters.
collected for 6 years. However, the chart does show that there is a
dramatic increase in number of patients affected by flu in 2017 as 5. CONCLUSIONS AND FUTURE WORK
compared to previous years. Hence, hospitals across Australia can Big Data plays a critical role in the industry and continues to grow
plan increasing staff in hospital workforce accordingly. exponentially. The data-driven business revolution adds new levels
of complexity for analysing data to match with the velocity at which
data is generated from diverse sources. Big data technology
developments have driven the transformation of organisations that
look to leveraging Big Data for competitive advantage and to
facilitate in achieving business goals. However, understanding Big
Data technology, and modelling data visualisation with the data
captured and analysed in a meaningful and intelligent way are
important for the planning and management of Big Data in an
organization. This paper provided guidelines for Big Data
infrastructure using NIST framework and the importance of data
visualisation for effective decision-making using illustrations from
industry scenarios.
This paper has made a modest initial step to bring out the
opportunities and challenges of Big Data. Future work would have
a focus on the security and privacy concerns, in particular, with
reference to the proliferation of IoT and blockchain technologies.
Figure 4. Data visualisation of time series data of flu patients.
6. REFERENCES
[1] Frizzo-Barker J, Chow-White PA, Mozafari M, Ha D An
empirical study of the rise of big data in business scholarship.
International Journal of Information Management 36(3),
(2016), 403–413.
[2] Chen M. et al., Big Data: A Survey, Mobile Networks and
Applications, 19(2), (2014), 171-209.
[3] Gorodov E. Y. and Gubarev V. V. Analytical review of data
visualization methods in application to big data. Journal of
Electrical and Computer Engineering (4), (2013), 1-7
[4] Tian W. and Zhao Y., Big data technologies and cloud
computing, Optimized Cloud Resource Management and
Scheduling Theory and Practice, (2015), 17–49.
Figure 5. Data visualisation of movie genres prediction. [5] McNeely CL, Hahm, J. The big (data) bang: policy, prospects,
and challenges. Review of Policy Research 31(4), (2014),
304–310.
Figure 5 shows a visualisation of the prediction of movie genres for
[6] Gandomi A, Haider M. Beyond the hype: Big data concepts,
box office hit in a particular year using a forecasting model. It
methods, and analytics. International Journal of Information
shows that while action movies have an increasing trend, horror
Management 35(2), (2015),137–144.
movie genre is the worst, showing highest decreasing trend.
Another example in Figure 6 provides rich data about influenza flu [7] Xindong W., Xingquan Z., Gong-Qing W., Wei, D. Data
demographics in a country. Various sensitive information and Mining with Big Data, IEEE Transactions on Knowledge and
decision parameters behind the data visualisation are used by the data Engineering, 26(1), (2014), 97-107.
data analytics model to provide big data insights to aid in various [8] Chang, V. A. A model to compare cloud and non-cloud
drill-down analysis and decision-making. However, they are just a storage of Big Data, Future Generation Computer Systems, 57,
click away to anyone who has access to the visual tools and (2016), 56–76.
16
[9] Goli-Malekabadi, Z. Sargolzaei-Javan, M. Akbari, M. K. An [16] Nelson B, Olovsson T Security and privacy for big data: A
effective model for store and retrieve big health data in cloud systematic literature review. In: Big Data (Big Data), 2016
computing, Computer Methods and Programs in Biomedicine, IEEE International Conference on, IEEE, (2016), 3693–3702
132, (2016), 75–82. [17] Li-chuan M., Qing-qi P., Hao L., Hong-ning L.. Survey of
[10] Kumar, N. Vasilakos, A. V. and Rodrigues, J. J. A multi-tenant Security Issues in Big Data, Radio Communications
cloud-based DC nano grid for self-sustained smart buildings Technology, 41(1), (2015), 1-7.
in smart cities, IEEE Communications Magazine, 55(3), [18] Deng-Guo F., Min Z., Hao L. Big Data Security and Privacy
(2017), 14–21. Protection, Chinese Journal of Computers, 37(1), (2014),
[11] Laney, D. 3D Data Management: Controlling Data Volume 246-258.
Velocity and Variety, Tech. rep. META Group, (2001). [19] Jina, X.. Waha B., Chenga X., and Wanga Y., Significance and
[12] Gronwald, K.-D. Big Data Analytics, In: Integrated Business challenges of big data research, Big Data Research, 2, (2015),
Information Systems A Holistic View of the Linked Business 59–64.
Process Chain ERP-SCM-CRM-BI-Big Data, (2017), 127- [20] NIST, Big Data Interoperability Framework: Volume 6,
157. Reference Architecture, NIST, USA (2018).
[13] Huang T., Lan L., Fang X., An P., Min J., and Wang F., [21] Subashini S. and Kavitha V., A survey on security issues in
Promises and challenges of big data computing in health service delivery models of cloud computing, Journal of
sciences, Big Data Research, 2(1), (2015), 2–11. Network and Computer Applications, 34(1), (2011), 1–11.
[14] Kshetri N. The emerging role of Big Data in key development [22] Cheng H., Wang W., and Rong C., Privacy protection beyond
issues: Opportunities, challenges, and concerns. Big Data & encryption for cloud big data, in Proceedings of the 2nd
Society 1(2), (2014), 1-20. International Conference on Information Technology and
[15] Jing, P. A new model of data protection on cloud storage, Electronic Commerce, (ICITEC ’14), (2014), 188–191, IEEE,
Journal of Networks, 9( 3), (2014), 666–671. Dalian, China.
17