Nothing Special   »   [go: up one dir, main page]

Modeling and Management of Big Data Challenges and Opportunities

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Future Generation Computer Systems 63 (2016) 96–99

Contents lists available at ScienceDirect

Future Generation Computer Systems


journal homepage: www.elsevier.com/locate/fgcs

Editorial

Modeling and Management of Big Data: Challenges and opportunities


David Gil a,∗ , Il-Yeol Song b
a
Lucentia Research Group, Computing Technology and Data Processing, University of Alicante, Spain
b
College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA

highlights
• Objectives of the third International Workshop on Modeling and Management of Big Data (MoBiD’14).
• Summary of the selected papers.
• Conceptual modeling in the big data era.
• Expectation in these topics for this and the next editions of this workshop.

article info abstract


Article history: The term Big Data denotes huge-volume, complex, rapid growing datasets with numerous, autonomous
Received 14 June 2015 and independent sources. In these new circumstances Big Data bring many attractive opportunities;
Received in revised form however, good opportunities are always followed by challenges, such as modelling, new paradigms, novel
28 July 2015
architectures that require original approaches to address data complexities. The purpose of this special
Accepted 31 July 2015
Available online 24 August 2015
issue on Modeling and Management of Big Data is to discuss research and experience in modelling and to
develop as well as deploy systems and techniques to deal with Big Data. A summary of the selected papers
Keywords:
is presented, followed by a conceptual modelling proposal for Big Data. Big Data creates new requirements
Conceptual modeling Big Data based on complexities in data capture, data storage, data analysis and data visualization. These concerns
Ecosystem are discussed in detail in this study and proposals are recommended for specific areas of future research.
Integrate & analyze & visualize © 2015 Elsevier B.V. All rights reserved.

1. Introduction their conceptual modeling phase. Papers focusing on the applica-


tion and the use of conceptual modeling approaches for Big Data,
The aim of the International Workshop on Modeling and Man- MapReduce, Hadoop and its ecosystems, big data analytics, social
agement of Big Data is to bring together researchers, develop- networking, cloud computing, security and privacy, data science,
ers and practitioners to discuss research issues and experience in etc. were highly encouraged. Therefore, the workshop has been an
modeling, developing and deploying systems and techniques to international forum for researchers and practitioners who are in-
deal with Big Data. The third International Workshop on Mod- terested in the different facets related to the use of the conceptual
eling and Management of Big Data (MoBiD’14), held in Atlanta, modeling approaches for the development of next generation of
October, 27–30, 2014 was a continuation/evolution of the previ- applications based on Big Data. We view that several key themes
ous workshops, the International Workshop on Modeling for Data- with the Big Data trend include (i) using a cloud for large-scale
Intensive Computing (MoDIC’12), held in Florence, Italy, October external and internal data; (ii) providing an easy-to-use but pow-
15–18, 2012 and MoBiD’13 held in Hong Kong, November 11–13, erful services to access/manage/analyze the big data in the cloud;
2013. MoBiD’14 was presented with the aim to attract papers on (iii) defining a problem-solving space and developing an architec-
the latest and best proposals for modeling and managing Big Data ture for a big data environment to conceptualize goals, tasks, and
problem-solving methods to apply to domains; and (iv) managing
in this new era of the data-drive paradigm. This new conceptual-
big data and analyzing them to discover business values.
ization of big data applications incorporating both internal and ex-
ternal Big Data requires new models and methods to accomplish
2. Papers

MoBiD’14 attracted papers from 9 different countries dis-


∗ Corresponding editor. tributed all over the world: France, Greece, India, Japan, Kenya, Ko-
E-mail address: dgil@dtic.ua.es (D. Gil). rea, Spain, United Kingdom and USA. We have finally received 14
http://dx.doi.org/10.1016/j.future.2015.07.019
0167-739X/© 2015 Elsevier B.V. All rights reserved.
D. Gil, I.-Y. Song / Future Generation Computer Systems 63 (2016) 96–99 97

Fig. 1. Defining big data with 3 V’s and moving towards 5 V’s.

papers and the Program Committee has selected 5 papers, making 3. Conceptual modeling in the big data era
an acceptance rate of 35%. In the following, we summarize these
selected papers: The experience in conjunction with the novelty and the new
The first paper, ‘‘From Business Intelligence to Semantic trends developed during the last three years lead us to summarize
Data Stream Management’’ by Marie-Aude Aufaure, Raja Chiky, our thoughts and expectation in these topics for this and the next
editions of this workshop.
Olivier Curé, Houda Khrouf and Gabriel Kepeklian [1], introduces
Big data is a very broad term which is often easily understood
recent work on Real-Time Business Intelligence that utilizes
by means of a graphical representation (Fig. 1) in order to not only
semantic data stream management. This paper addresses the
pay attention to the ‘‘Big’’ word, but especially to understand that
new tendencies of real-time systems that are continuously ‘‘big data’’ express the difficulty into dealing with data in different
generating data to be analyzed, processed, and stored. They also dimensions.
present underlying approaches to continuous queries and data Currently there are too many scenarios where the term Big Data
summarization. appears. Scientists, business executives, practitioners of media and
The second paper, entitled ‘‘Design Science Research Contribu- advertising and governments alike regularly meet difficulties with
tion to Business Intelligence in the Cloud — A Systematic Literature large data sets in areas including Internet search, finance and
Review’’ by Odette Sangupamba Mwilu, Nicolas Prat and Isabelle business informatics [6,7].
Comyn-Wattiau [2] deals with the new opportunities for business There are many domains whose data management needs
intelligence (BI) and analytics offered by Cloud computing and big have exploded. For example, we can discuss data management
data. They propose a typology of artifacts potentially produced by challenges of E-commerce along the three dimensions: volume,
researchers in design science. Then, after analyzing the state of the velocity and variety.
art through that typology, they use it to sketch opportunities of • On Volume: ‘‘The lower cost of e-channels enables an enterprise
new research to improve BI and analytics capabilities in the cloud to offer its goods or services to more individuals or trading
and from big data. partners. The explosion of the data to be collected in e-
The third paper, ‘‘A Data Quality in Use Model for Big Data’’ commerce are even up to 10x of the quantity of data about
by Jorge Merino, Ismael Caballero, Bibiano Rivas, Manuel Serrano an individual transaction, thereby significantly increasing the
overall volume of data to be managed.’’
and Mario Piattini [3] is a position paper that proposes the
• On Velocity: ‘‘E-commerce has also increased point-of-inter-
3Cs model, which is composed of three data quality dimensions
action (POI) speed, and consequently the pace data used to
for assessing the quality-in-use of big datasets: Contextual
support interactions and generated by interactions.’’
Consistency, Operational Consistency and Temporal Consistency. • On Variety: ‘‘No greater barrier to effective data management
The aim is that the quality of data lacks a quality-in-use model will exist than the variety of incompatible data formats, non-
adapted for big data. aligned data structures, and inconsistent data semantics.’’
The fourth paper, ‘‘A Hybrid Integrated Architecture for Energy
Where does big data come from? (i) ‘‘data exhaust’’ from
Consumption Prediction’’ by Alejandro Maté, Jesus Peral, Antonio
customers; (ii) new and pervasive sensors; (iii) the ability to ‘‘keep
Ferrández, David Gil, Juan Trujillo [4] explores the opportunities everything’’ [8,9].
of using ICT (Information and Communication Technologies) In [10] it is indicated that with the significant advances
as an enabling technology to reduce energy consumption in in Information and Communications Technology (ICT) over the
cities. It proposes a multidimensional hybrid architecture that last half century, there is an increasingly perceived vision that
makes use of current energy data and external information (with computing will one day be the 5th utility (after water, electricity,
unstructured data sources) to improve knowledge acquisition and gas, and telephony). It is defined Cloud computing and provide
allow managers to make better decisions. the architecture for creating Clouds with market-oriented resource
The last paper of this Special Issue, entitled ‘‘Benchmarking allocation by leveraging technologies such as Virtual Machines
Performance for Migrating a Relational Application to a Parallel (VMs). The proliferation of the devices in a communicating
Implementation’’ by Krishna Karthik Gadiraju, Manik Verma, Karen actuating network creates the Internet of Things (IoT) [11]. In [12]
C. Davis, and Paul G. Talaga [5] investigates the impact of scaling it is analysed the challenges and requirements for next-generation
Big Data services and presented a solution designed to support
up the data sizes for several benchmarking queries. They illustrate
next-generation Big Data applications.
what kind of performance results an organization could expect
Regarding the difficulty of managing Big Data, it has been stated
when they migrate current applications to big data environments.
that Big Data is any data that is expensive to manage and hard to
The authors measure the speedup for query execution for all extract value from [13]. Among the Vs shown in Fig. 1, in this brief
dataset sizes resulting from the scale up. They conclude that Hive summary, we will focus on The V of Variety. This will lead us to the
loads the large datasets faster than MySQL, while it is marginally main topic of the workshop which is Conceptual modeling of Big
slower than MySQL when loading the smaller datasets. Data.
98 D. Gil, I.-Y. Song / Future Generation Computer Systems 63 (2016) 96–99

Fig. 4. Challenges to better address the opportunities of big data.

Some mechanisms in order to solve this situation are:


• The People’s Ontology [Open Information Extraction]. Mine a
Fig. 2. Conceptualization.
database of entities and classes from the Web [22,23].
• Recovering Table Semantics [24].
• Recovering Binary Relationships [24].
• Attribute Correlations [25].
• Synonym Discovery [26].

4. Conclusions

The objective of this special issue was to conduct a survey of the


recognised techniques to deal with big data. The results indicate
that the current strategies, methodologies and architectural
models are not adequate for solving problems arising from this
Fig. 3. The goal is a structured data ecosystem. data complexity. In conclusion, while much work in this area has
been accomplished, there is still much to be done in order to
With the help of various conceptual modeling techniques, such manage big data efficiently and exploit the opportunities hidden
as ontologies (Web Ontology Language — OWL), semantic, RDF in the data.
schemas, SPARQL Language, etc, it could be possible to formulate Big Data bring many attractive opportunities, as has been
novel integration architectures. The framework will make it possi- stated, along with some challenges, involving several issues such
ble to take advantage of data integration on the Web [14]. Embley as complexity in data capture, storage, analysis and visualization.
explains in his paper entitled ‘‘Big Data Conceptual Modeling to the Future research should focus on the critical aspects which are
Rescue’’ [15] several layers of this conceptualization including ex- the challenges to be addressed in the successful and efficient
amples as well as various techniques for this goal (Fig. 2). modeling and management of big data. These challenges [7,27,28]
The intersection of the terms ‘‘Conceptual Modeling’’ & ‘‘BIG could be divided into several steps as shown in Fig. 4. (i) Data
DATA’’ still appear as one of the challenges in big data. We keep capture. Many data are recorded from diverse data generating
finding Volume too big, Variety too many, Velocity too fast, and sources. (ii) Cleaning and storage. The objective in this case is to
Veracity too uncertain. store the data in a structured form suitable for analysis. (iii) Data
Chen’s article [16] illustrates to show historical groundings with Integration, Aggregation. Besides the former challenge, the reality
comments about its original contributions (including mapping is that very often the variety of the sources makes it hard to deal
to database) — to organize data well for search and business with the information [29]. In this context, NoSQL database, also
transaction processing. called Not Only SQL, is a current approach for large and distributed
For instance, the conceptualization of the Web includes data management and database design [30]. (iv) Query Processing,
semantic search as well as keyword search and World-wide Data Modeling and Analysis. In order to query, model and analyse
knowledge sharing. There are some examples, such as DB-pedia, data, data mining is a set of techniques to extract precious
Conceptual Graphs (like Google’s Knowledge Graph, Yahoo!’s information from data. There are several techniques which include
Web of Object, Facebook’s Graph Search, Microsoft’s/Bing’s Satori clustering analysis, classification, regression and association rule
Knowledge Base), Metaweb, FamilySearch. learning. However, Big Data mining is more challenging compared
The World-Wide Web provides access to millions of data with traditional data mining algorithms. In [9] the current status
tables with high-quality content, formatted either in HTML tables, of data mining is analysed since new mining techniques are
HTML lists, or other structured formats, or stored in on-line data necessary due to the volume, variability, and velocity, of those
management services. These tables contain data about virtually data. (v) Interpretation and visualization. Choosing proper data
every domain of interest to mankind. Several research projects aim representation tools is crucial when we try to visualize Big Data
at enabling search over these data sets and ultimately the ability to due to the complexity of the data [31,32].
answer queries and to combine data from multiple sources. In this
context one of the main goal is to achieve structured Data in an Acknowledgments
Ecosystem as shown Fig. 3.
Halevy and others researches have contribute enormously to We would like to thank all the authors who revised and
the search of the structured data on the Web, integration [17–20]. extended their papers for this special issue and the reviewers
There exist some tools like using fusion tables as they are easy-to- for their hard work in revising these extended papers twice and
use. They are database systems that are integrated with the Web providing critical and useful comments that helped the authors in
[21]. improving their papers. Absolutely, all of them have contributed
Another idea is to use WebTables, which basically is discovering to creating this special issue of a high quality. We hope the readers
a (structured) needle in an (unstructured) haystack. The challenges will enjoy reading this issue and find the content beneficial to their
are (i) Finding the good tables on the Web; (ii) Understanding their research. Finally, we would also like to express our gratitude to the
semantics; (iii) Understanding user’s intentions ref The Needle in FGCS editorial staff of Elsevier, in particular to Prof. Peter Sloot,
the Haystack is to find high quality HTML tables as Very often Hilda Xu and Balu Kavitha to all their patience and availability
semantics embedded in surrounding text makes it harder. during this process.
D. Gil, I.-Y. Song / Future Generation Computer Systems 63 (2016) 96–99 99

References [25] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data, IEEE
Intel. Syst. 24 (2) (2009) 8–12.
[1] M.-A. Aufaure, R. Chiky, O. Curé, H. Khrouf, G. Kepeklian, From Business [26] B. He, K. C.-C. Chang, Statistical schema matching across web query interfaces,
Intelligence to semantic data stream management, Future Gener. Comput. in: Proceedings of the 2003 ACM SIGMOD International Conference on
Syst. 63 (2016) 100–107. Management of Data, ACM, 2003, pp. 217–228.
[2] O. Sangupamba Mwilu, N. Prat, I. Comyn-Wattiau, Design science research [27] S. Kaisler, F. Armour, J.A. Espinosa, W. Money, Big data: Issues and challenges
contribution to business intelligence in the cloud — a systematic literature moving forward, in: System Sciences (HICSS), 2013 46th Hawaii International
review, Future Gener. Comput. Syst. 63 (2016) 108–122. Conference, IEEE, 2013, pp. 995–1004.
[3] J. Merino, I. Caballero, B. Rivas, M. Serrano, M. Piattini, A Data Quality in Use [28] C.P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, tech- niques
model for Big Data, Future Gener. Comput. Syst. 63 (2016) 123–130. and technologies: A survey on big data, Inform. Sci. 275 (2014) 314–347.
[4] A. Maté, J. Peral, A. Ferrández, D. Gil, J. Trujillo, A hybrid integrated architecture [29] D. Petcu, G. Macariu, S. Panica, C. Craciun, Portable cloud applicationsfrom
for energy consumption prediction, Future Gener. Comput. Syst. 63 (2016) theory to practice, Future Gener. Comput. Syst. 29 (6) (2013) 1417–1430.
131–147. [30] J. Han, E. Haihong, G. Le, J. Du, Survey on nosql database, in: Pervasive
[5] K.K. Gadiraju, M. Verma, K.C. Davis, P.G. Talaga, Benchmarking performance for Computing and Applications (ICPCA), 2011 6th International Conference, IEEE,
migrating a relational application to a parallel implementation, Future Gener. 2011, pp. 363–366.
Comput. Syst. 63 (2016) 148–156. [31] H. Chen, R.H. Chiang, V.C. Storey, Business intelligence and analytics: From big
[6] K. Cukier, Data, data everywhere: a special report on managing information, data to big impact, MIS Quart. 36 (4) (2012) 1165–1188.
in: Economist Newspaper, 2010. [32] A. Graves, Techniques to reduce cluttering of rdf visualizations, Future Gener.
[7] A. Labrinidis, H. Jagadish, Challenges and opportunities with big data, Proc. Comput. Syst. 53 (2015) 152–156.
VLDB Endow. 5 (12) (2012) 2032–2033.
[8] D. Laney, 3-d data management: Controlling data volume. Velocity and
David Gil is an associated professor at the Department
Variety, META Group Original Research Note, 2001.
of Computing Technology and Data Processing at the
[9] W. Fan, A. Bifet, Mining big data: current status, and forecast to the future, ACM
University of Alicante, Spain. David received a Ph.D.
sIGKDD Explorations Newsletter 14 (2) (2013) 1–5.
in Computer Science from the University of Alicante
[10] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and
(Spain) in 2008. His research interests include Applications
emerging it platforms: vision, hype, and reality for delivering computing as
of Artificial Intelligence, data mining, data warehouses,
the 5th utility, Future Gener. Comput. Syst. 25 (6) (2009) 599–616.
multidimensional databases, OLAP, design with UML,
[11] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of things (iot): a vision,
and MDA. He has published papers in high quality
architectural elements, and future directions, Future Gener. Comput. Syst. 29
international conferences such as IJCNN, SAC, HEALTHINF,
(7) (2013) 1645–1660.
DCAI, SCAI, SAIS, etc. He has also published papers in
[12] C. Dobre, F. Xhafa, Intelligent services for big data science, Future Gener.
highly cited international journals such as Expert Systems
Comput. Syst. 37 (2014) 267–281.
With Applications, Applied Soft Computing. Dr. Gil has served as a Program
[13] M. J. Franklin, Making sense of big data with the berkeley data analytics stack,
Committee member of several conferences and workshops such as DAWAK, ARES
in: SSDBM, 2013, p. 1.
and CAiSE. He is a reviewer of several journals such as Neurocomputing, Expert
[14] A. Doan, A. Halevy, Z. Ives, Principles of Data Integration, Elsevier, 2012.
Systems and Soft Computing. He is also involved in the organization of several
[15] D. W. Embley, S. W. Liddle, Big data conceptual modeling to the rescue,
international workshops (MoDIC’12, MoBiD’13–14).
in: Conceptual Modeling, Springer, 2013, pp. 1–8.
[16] P. P.-S. Chen, The entity-relationship model toward a unified view of data, ACM
Trans. Database Syst. (TODS) 1 (1) (1976) 9–36. Dr. Il-Yeol Song is professor in the College of Computing
[17] A. Halevy, Best-effort modeling of structured data on the web, in: Conceptual and Informatics of Drexel University and Director of Ph.D.
Modeling–ER 2011, Springer, 2011, 32-32. Program in Information Studies in his college. He served
[18] A. Halevy, A. Rajaraman, J. Ordille, Data integration: the teenage years, as Deputy Director of NSF-sponsored research center on
in: Proceedings of the 32nd International Conference on Very Large Data Bases, Visual & Decision Informatics (CVDI) between 2012–2014.
2006, pp. 9–16. VLDB Endowment. He is also an affiliated professor of Computer Science
[19] M. J. Cafarella, A. Halevy, J. Madhavan, Structured data on the web, Commun. Department of KAIST, Korea. He is an ACM Distinguished
ACM 54 (2) (2011) 72–79. Scientist and an ER Fellow. He is the recipient of 2015
[20] M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, Y. Zhang, Webtables: exploring the Peter P. Chen Award in Conceptual Modeling. His research
power of tables on the web, Proceedings of the VLDB Endowment 1 (1) (2008) interests include conceptual modeling, data warehousing
538–549. & OLAP, big data management & analytics, CRM, object-
[21] H. Gonzalez, A. Y. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapley, oriented analysis & design, healthcare informatics, and smart health. Dr. Song
W. Shen, J. Goldberg-Kidon, Google fusion tables: web-centered data published over 190 peer-reviewed papers and co-edited 22 proceedings. He is a co-
management and collaboration, in: Proceedings of the 2010 ACM SIGMOD Editor-in-Chief of Journal of Computing Science and Engineering (JCSE) and is in an
International Conference on Management of data, ACM, 2010, pp. 1061–1066. editorial board member of DKE, JDM, IJEBR, and JDFSL. He won the Best Paper Award
[22] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Dbpedia: A in the IEEE CIBCB 2004. He won 14 research awards from competitions of annual
Nucleus for a Web of Open Data, Springer, 2007. Drexel Research Days. He also won four teaching awards from Drexel, including
[23] H. Alani, S. Kim, D. E. Millard, M. J. Weal, W. Hall, P. H. Lewis, N. R. Shadbolt, the most prestigious Lindback Distinguished Teaching Award. Dr. Song served as
Automatic ontology-based knowledge extraction from web documents, IEEE the Steering Committee chair of the ER conference between 2010 and 2012. He is
Intel. Syst. 18 (1) (2003) 14–21. a steering committee member of ER, DOLAP, BigComp, and ADFSL conferences. He
[24] P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, C. Wu, served as a program/general chair of over 20 international conferences/workshops
Recovering semantics of tables on the web, Proc. VLDB Endow. 4 (9) (2011) including DOLAP’98–14, CIKM’99, ER’03, FP-UML’06, DaWaK’07–’08,, DESRIST’09,
528–538. CIKM ’09, MoDiC’12, and MoBiD’13–’15.

You might also like