Nothing Special   »   [go: up one dir, main page]

Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Big Data Infrastructure, Data Visualisation and Challenges

Ramanathan Venkatraman Sitalakshmi Venkatraman


Institute of Systems Science Melbourne Polytechnic
Singapore Melbourne
rvenkat.iss@gmail.com sitavenkat@mp.edu.au

ABSTRACT business intelligence when data is properly synthesized, analyzed


The importance of Big Data is being realised worldwide with the and visualised. Though the increasing global Internet population
advancement of information technologies, leveraging the to use new technologies for personal communication is driving the
capabilities of virtualization and cloud computing. Big Data Big Data trend, there are limitations for businesses to make full
infrastructure and the use of its tools and applications will utilization of the benefits of Big Data due to their resource
significantly transform the data centers of businesses in the next constraints, cost and flexibility [4]. Big data presents a lot of
decade. Data analytics is evolving with the new real-time capability challenges in terms of infrastructure and security. However, due to
of Big Data solutions to provide business intelligence for timely its capability of providing various business opportunities, industries
and effective decision making. However, Big Data poses various do not consider all the challenges and risks before venturing into
challenges related to the infrastructure and resource constraints, Big Data applications [5][6]. This forms the key motivation in this
and other issues including security and privacy. This paper takes an research paper to provide various insights in adopting the right Big
initial step in recognizing the value of creating Big Data Data infrastructure for business intelligence for organisations.
infrastructure for delivering high performance and scalable Big Data technology and services involve a variety of
business intelligence in an organization. It presents the state-of-the- hardware/software resources, tools and techniques such as NoSQL
art tools and technologies for Big Data infrastructure and NIST databases, Hadoop or MapReduce file systems, virtualization,
framework. The advantages of data visualisation are illustrated cloud platforms and related software ad well as analytics solutions
thorough industry case scenarios. The Big Data trends and [7][8]. These will impact an organisation’s information technology
challenges are also discussed. Overall, this paper contributes to (IT) related to server, storage, and networking infrastructure that
providing valuable insights unto the Big Data journey of an are specifically designed to leverage and optimize the business
organization to enable a scalable infrastructure for achieving services in different industry applications [9][10]. Since, the
mission critical decision-making through data visualisation. infrastructure forms the cornerstone of Big Data architecture for the
successful use of data analytics in businesses, in this paper, we
CCS Concepts closely examine the infrastructural platforms, the analytical tools
• Information systems➝Data management systems➝Database for data visualisation and the challenges such as security to better
management system engines understand the Big Data landscape. The innovation of this research
is that it is unique in exploring the big data infrastructures with the
Keywords objective of achieving successful data visualization.
Big Data; Big Data infrastructure; cloud; NoSQL database; Hadoop;
data visualisation. The rest of the paper is organized as follows. Section 2 provides the
background information on the Big Data concepts, trends and
1. INTRODUCTION challenges. In Section 3, we describe the prominent Big Data
The world with various advancements in technology is changing infrastructure technologies. Section 4 presents data visualisation
dramatically as businesses experience an unprecedented data with illustrations from industry case scenarios. Finally, Section 5
explosion [1][2]. With the rise of Internet of Things (IoT), an gives the conclusion and future work.
increased enterprise aggregation of huge data and user statistics
collected from diverse geographic locations, sensors and other 2. TRENDS AND CHALLENGES
sources of Big Data could be used effectively for real-time data Big data is a term that refers to gigantic datasets that are huge
analytics and visualisation [3]. From the increase in data growth to (volume), having more wide-ranging compound structure (variety)
the way it is structured and used, businesses look towards and constantly produced and dynamically changing (velocity) [11].
leveraging Big Data to make accurate business decisions using data Occasionally, the data also perishes at the equivalent high speed as
visualisation. Hence, Big Data has become a recent area of strategic it is produced. Infrastructural technology is considered as the basis
investment for businesses by providing extremely powerful of the Big Data ecosystem for the storage, analytics and
Permission to make digital or hard copies of all or part of this work for visualisation of data [12]. Big data has become a hot topic for
personal or classroom use is granted without fee provided that copies are businesses in this digital world as they face growing challenges that
not made or distributed for profit or commercial advantage and that copies deal with large volumes of structured and unstructured data that are
bear this notice and the full citation on the first page. Copyrights for complex to process using traditional database and software
components of this work owned by others than ACM must be honored. methods. The growth of big data exceeds the capabilities of
Abstracting with credit is permitted. To copy otherwise, or republish, to traditional IT infrastructures and represents a largely undeveloped
post on servers or to redistribute to lists, requires prior specific permission area of computing and data management problems in different
and/or a fee. Request permissions from Permissions@acm.org.
industry applications [9][10][13]. Big data has the potential to help
BDIOT 2019, August 22–24, 2019, Melbourne, VIC, Australia
© 2019 Association for Computing Machinery. companies grow, improve business operations, making faster, and
Copyright 2019 ACM 978-1-4503-7246-6/19/08…$15.00 more intelligent decisions. Hence, many Big Data technology
DOI: https://doi.org/10.1145/3361758.3361768

13
providers offer resources in the cloud environment for of data through economical, industry-standard servers that stores
organisations to reap the benefits of their business data. and processes the data. A HDFS storage layer is used and
MapReduce component executes a variety of analytic functions to
Popular Big Data technology providers offer great processing analyse thedata efficiently. Hadoop uses YARN for cluster
potential and management of large quantities of data through management and scheduling applications for the user. In-depth
platforms such as Apache Hadoop and Google MapReduce. The analysis of data using machine learning algorithms could be done
Big data technology is growing, and the vision in the use of big data using SPARK on top of HDFS. However, with such opensource
is the way relevant, large and fast-growing data can be captured technologies there are additional risks of ongoing maintenance and
from any source and analysed to assist the organisation to gain support.
useful insight towards helping their overall goal. Businesses look
towards using big data to gain competitive advantages and to help 3.2 NoSQL
them achieve their business goal such as increase revenues, NoSQL stands for ‘Not Only SQL’ and refers to non-relational
improve customer satisfaction, and enhance their productivity [14]. database technologies such as Cassandra, Neo4j, Redis, and
As businesses continue to store large volumes of data, they look MongoDB, which are also effective and economic choices for Big
towards more sophisticated tools to mine and analyse data into Data infrastructure. NoSQL databases are better tailored to handle
meaningful way. Organisations are starting to realise that big data dynamic and semi-structured data with low latency. While NoSQL
is more about business transformation and making the change to is better suited for operational and analytical tasks to process
exploit data. Big data allows businesses to gain a deeper selective criteria-based data in real-time, Hadoop is more employed
understanding of the dynamics of their business by analysing and for harnessing all data and in-depth analysis with high-throughput.
visualising big data and integrating the results with traditional Since both Hadoop and NoSQL have different advantages and
information so as to get a new perspective on their day-to-day purposes, both can be used simultaneously as in the case of HBase.
operations. However, Big Data pose various challenges [15][16]. However, security is one of the major concerns of NoSQL.
The first and foremost challenge is the lack of scalability due to
poor infrastructure management, whether it is on-premise 3.3 In-Memory Database (IMDB)
infrastructure or on the cloud [4]. Organisations do not want to An IMDB is also known as a main memory database system
maintain and pay for substantially more Big Data infrastructure (MMDB) that is popularly used in high-volume environments
when it is underutilized currently. However, if the infrastructure where response time is very critical. Since data resides in the
does not increase in size as the business data grows, they will not memory of the system rather than in the disk storage, data access
be able to gain any value from Big Data. Another issue is that many time and processing is very fast. Hence, IMDBs have become
applications and data analytics software tools do not make us of popular in recent years for handling High-Performance Computing
optimal data transformation, efficient analysis and appropriate data (HPC) and Big Data applications. Related to this is also in-memory
visualisation [3][6]. While performing data transformation, if the data grid (IMDG), which is the real-time analytics engine that
quality of data is lost, whatever be the infrastructure used, it will produces real-time changes to data providing smart grid features.
not meet the organisation’s Big Data needs. Above all these Using such technologies, In Big Data applications, huge quantities
challenges, the security and privacy issue is the most compelling of data for processing can be stored in-memory, while the original
one since any data hosted by third party can raise questions about and persistent data could be residing on an external disk.
the security and privacy of an organisation’s confidential
information [17][18].
3.4 Massively Parallel Processing (MPP)
MPP technology is a form of collective processing of massive
With new technology emerging for Big Data, organisations must be amounts of data using several processors working on different parts
prepared to face challenges in supporting their dependence on Big of the same program. Each processor takes up different threads of
Data due to the high costs involved, technological complexities, the program to execute its own operating system and memory. A
data availability, privacy and integrity concerns. Using Big Data messaging interface in necessary to organize and manage the
infrastructure without understanding these issues may not thread handling of the different processes involved in the MPP
necessarily be the right way for any organisation as Big Data forms architecture. Many MPP technologies have partnerships with other
an essential component of management decision making that major players among the Big Data technology providers. Hence,
requires new capabilities, as well as organisational and culture MPP technologies also have a crossover with other Big Data
change [19]. The next section describes the industry standards and technologies.
tools in Big Data infrastructure that can benefit organisations to
create an implementation plan. 3.5 Cloud Computing
Big players of Big Data infrastructure providers offer cloud
3. BIG DATA INFRASTRUCTURE computing that cover a range of products, technologies and services
The first and foremost requirement of an orgnaisation before to various organisations in order to jump start with their Big Data
plunging into the Big Data landscape is to understand the ventures. All the resources and applications are hosted in cloud, and
infrastructural tools and technologies: what are they, how they is considered to have minimal cost implications as organisations
operate and what is best used for. Some of the popular technologies can pay based on the infrastructure, platform or software services
for Big Data architecture are described below: used. Amazon, Microsoft, Oracle, IBM are some of the big players
offering cost-effective Big Data architectures in the cloud. While
3.1 Hadoop cloud computing can deliver data insights seamlessly for
Hadoop is a readily available open source framework that uses a organisations to benefit from, security and privacy issues of
coste-effective programming model to allow distributed processing confidential and sensitive data are of great concern.
of big datasets by efficiently breaking it and distributing smaller
parts for parallel or concurrent processing and analysing of them. The rapid technological developments in Big Data could
Hadoop permits distributed parallel processing of gigantic amounts overwhelm traditional computing frameworks in businesses. Hence,
the National Institute of Standards and Technology (NIST) has

14
provided a high-level conceptual framework as shown in Figure 1 4. DATA VISUALISATION
[20]. The purpose of this framework is to serve as a reference model With the fast developments in Big Data technologies and
to facilitate understanding of the operational intricacies, design application solutions, organisations are gaining meaningful data
structures and requirements in Big Data. The advantage of the insights that can transform their businesses by utilising the large
model is that it can be adopted by any organization as it is not tied volumes of data for efficient decision-making and management.
to any specific vendor products, services, or reference Organisations can use different analytical strategies such as
implementation. predictive analytics to reveal patterns and provide decision making
effectively. However, Big Data can add value only if the following
key elements are planned well:
• data collection,
• data storage,
• data analysis, and
• data visualisation/output.

Figure 1. NIST Big Data Reference Architecture

Figure 3. Typical Model for Big Data Visualisation.


Figure 3 provides a typical model for developing data visualisation
with Big Data.
An organisation’s data collection consists of each and every
transactional record of business operation that would include
history of sales, marketing promotions, emails, customer database,
feedback, social media, and any data required for monitoring, and
measuring in facilitating decision-making. Some data are already
available in the local storage, some could be from the cloud and
some would be accumulated or processed data. Data could be
captured using Internet of Things (IoT) devices and sensors,
customer apps, websites and social media profiles. However,
collection of certain type and data formats require special data
storage.
The data storage of an organization houses the data gathered from
various sources. It includes traditional servers, data warehouses,
data lakes, and other distributed/cloud-based storage systems.
With cloud-based storage, organisations do not require physical
systems on-site and it is flexible and cost-effective saving from
maintenance and information security costs [21]. It is also
considerably cheaper than investing in expensive dedicated
Figure 2. Big Data requirement use case template (NIST). systems and data warehouses. Organisations could choose in-house
data storage for confidential data and cloud storage for data that
The NIST framework is useful for organisations to plan for Big requires low privacy in order to minimize the impact of possible
Data and the scalability of their resources as they would want to security breaches and privacy risks in the cloud [22].
make the most of what they have and not to hastily commit
significantly on investments of new technology that can lead to The Data analysis element consists of three main steps: 1. data
risks. They could also make use of the Big Data Use case template preparation (identifying, cleaning and transforming the data into
provided by NIST as shown in Figure 2 in order to understand their the required format for analysis); 2. analytical model development
requirements. (employing the relevant model classified under predictive,
descriptive and prescriptive analytical models) and 3. insights for

15
decision-making (based on varying paraments and different criteria artefacts. Hence, even visual security is an important concern in Big
for analysis from the chosen models). The output of the analysis Data.
must be visually comprehensible.
Data visualisation is the key to the success of Big Data. Unless the
final output is in the form acceptable by people who need the data
to be analysed, the whole Big Data venture is of no value. Huge
reports or complicated graphics that seldom people understand will
result in no meaningful decision-making or actions. There are
various visualisation tools that include management dashboards
and commercial data visualisation platforms that output attractive
charts and graphs for clear and concise communication in order to
gain data insights.
Figure 4 gives a simple data visualisation chart showing that the
peak flu season in Australia did not occur until August from data Figure 6. Data insights - multiple views/decision parameters.
collected for 6 years. However, the chart does show that there is a
dramatic increase in number of patients affected by flu in 2017 as 5. CONCLUSIONS AND FUTURE WORK
compared to previous years. Hence, hospitals across Australia can Big Data plays a critical role in the industry and continues to grow
plan increasing staff in hospital workforce accordingly. exponentially. The data-driven business revolution adds new levels
of complexity for analysing data to match with the velocity at which
data is generated from diverse sources. Big data technology
developments have driven the transformation of organisations that
look to leveraging Big Data for competitive advantage and to
facilitate in achieving business goals. However, understanding Big
Data technology, and modelling data visualisation with the data
captured and analysed in a meaningful and intelligent way are
important for the planning and management of Big Data in an
organization. This paper provided guidelines for Big Data
infrastructure using NIST framework and the importance of data
visualisation for effective decision-making using illustrations from
industry scenarios.
This paper has made a modest initial step to bring out the
opportunities and challenges of Big Data. Future work would have
a focus on the security and privacy concerns, in particular, with
reference to the proliferation of IoT and blockchain technologies.
Figure 4. Data visualisation of time series data of flu patients.
6. REFERENCES
[1] Frizzo-Barker J, Chow-White PA, Mozafari M, Ha D An
empirical study of the rise of big data in business scholarship.
International Journal of Information Management 36(3),
(2016), 403–413.
[2] Chen M. et al., Big Data: A Survey, Mobile Networks and
Applications, 19(2), (2014), 171-209.
[3] Gorodov E. Y. and Gubarev V. V. Analytical review of data
visualization methods in application to big data. Journal of
Electrical and Computer Engineering (4), (2013), 1-7
[4] Tian W. and Zhao Y., Big data technologies and cloud
computing, Optimized Cloud Resource Management and
Scheduling Theory and Practice, (2015), 17–49.
Figure 5. Data visualisation of movie genres prediction. [5] McNeely CL, Hahm, J. The big (data) bang: policy, prospects,
and challenges. Review of Policy Research 31(4), (2014),
304–310.
Figure 5 shows a visualisation of the prediction of movie genres for
[6] Gandomi A, Haider M. Beyond the hype: Big data concepts,
box office hit in a particular year using a forecasting model. It
methods, and analytics. International Journal of Information
shows that while action movies have an increasing trend, horror
Management 35(2), (2015),137–144.
movie genre is the worst, showing highest decreasing trend.
Another example in Figure 6 provides rich data about influenza flu [7] Xindong W., Xingquan Z., Gong-Qing W., Wei, D. Data
demographics in a country. Various sensitive information and Mining with Big Data, IEEE Transactions on Knowledge and
decision parameters behind the data visualisation are used by the data Engineering, 26(1), (2014), 97-107.
data analytics model to provide big data insights to aid in various [8] Chang, V. A. A model to compare cloud and non-cloud
drill-down analysis and decision-making. However, they are just a storage of Big Data, Future Generation Computer Systems, 57,
click away to anyone who has access to the visual tools and (2016), 56–76.

16
[9] Goli-Malekabadi, Z. Sargolzaei-Javan, M. Akbari, M. K. An [16] Nelson B, Olovsson T Security and privacy for big data: A
effective model for store and retrieve big health data in cloud systematic literature review. In: Big Data (Big Data), 2016
computing, Computer Methods and Programs in Biomedicine, IEEE International Conference on, IEEE, (2016), 3693–3702
132, (2016), 75–82. [17] Li-chuan M., Qing-qi P., Hao L., Hong-ning L.. Survey of
[10] Kumar, N. Vasilakos, A. V. and Rodrigues, J. J. A multi-tenant Security Issues in Big Data, Radio Communications
cloud-based DC nano grid for self-sustained smart buildings Technology, 41(1), (2015), 1-7.
in smart cities, IEEE Communications Magazine, 55(3), [18] Deng-Guo F., Min Z., Hao L. Big Data Security and Privacy
(2017), 14–21. Protection, Chinese Journal of Computers, 37(1), (2014),
[11] Laney, D. 3D Data Management: Controlling Data Volume 246-258.
Velocity and Variety, Tech. rep. META Group, (2001). [19] Jina, X.. Waha B., Chenga X., and Wanga Y., Significance and
[12] Gronwald, K.-D. Big Data Analytics, In: Integrated Business challenges of big data research, Big Data Research, 2, (2015),
Information Systems A Holistic View of the Linked Business 59–64.
Process Chain ERP-SCM-CRM-BI-Big Data, (2017), 127- [20] NIST, Big Data Interoperability Framework: Volume 6,
157. Reference Architecture, NIST, USA (2018).
[13] Huang T., Lan L., Fang X., An P., Min J., and Wang F., [21] Subashini S. and Kavitha V., A survey on security issues in
Promises and challenges of big data computing in health service delivery models of cloud computing, Journal of
sciences, Big Data Research, 2(1), (2015), 2–11. Network and Computer Applications, 34(1), (2011), 1–11.
[14] Kshetri N. The emerging role of Big Data in key development [22] Cheng H., Wang W., and Rong C., Privacy protection beyond
issues: Opportunities, challenges, and concerns. Big Data & encryption for cloud big data, in Proceedings of the 2nd
Society 1(2), (2014), 1-20. International Conference on Information Technology and
[15] Jing, P. A new model of data protection on cloud storage, Electronic Commerce, (ICITEC ’14), (2014), 188–191, IEEE,
Journal of Networks, 9( 3), (2014), 666–671. Dalian, China.

17

You might also like