Modeling and Management of Big Data Challenges and Opportunities
Modeling and Management of Big Data Challenges and Opportunities
Modeling and Management of Big Data Challenges and Opportunities
Editorial
highlights
• Objectives of the third International Workshop on Modeling and Management of Big Data (MoBiD’14).
• Summary of the selected papers.
• Conceptual modeling in the big data era.
• Expectation in these topics for this and the next editions of this workshop.
Fig. 1. Defining big data with 3 V’s and moving towards 5 V’s.
papers and the Program Committee has selected 5 papers, making 3. Conceptual modeling in the big data era
an acceptance rate of 35%. In the following, we summarize these
selected papers: The experience in conjunction with the novelty and the new
The first paper, ‘‘From Business Intelligence to Semantic trends developed during the last three years lead us to summarize
Data Stream Management’’ by Marie-Aude Aufaure, Raja Chiky, our thoughts and expectation in these topics for this and the next
editions of this workshop.
Olivier Curé, Houda Khrouf and Gabriel Kepeklian [1], introduces
Big data is a very broad term which is often easily understood
recent work on Real-Time Business Intelligence that utilizes
by means of a graphical representation (Fig. 1) in order to not only
semantic data stream management. This paper addresses the
pay attention to the ‘‘Big’’ word, but especially to understand that
new tendencies of real-time systems that are continuously ‘‘big data’’ express the difficulty into dealing with data in different
generating data to be analyzed, processed, and stored. They also dimensions.
present underlying approaches to continuous queries and data Currently there are too many scenarios where the term Big Data
summarization. appears. Scientists, business executives, practitioners of media and
The second paper, entitled ‘‘Design Science Research Contribu- advertising and governments alike regularly meet difficulties with
tion to Business Intelligence in the Cloud — A Systematic Literature large data sets in areas including Internet search, finance and
Review’’ by Odette Sangupamba Mwilu, Nicolas Prat and Isabelle business informatics [6,7].
Comyn-Wattiau [2] deals with the new opportunities for business There are many domains whose data management needs
intelligence (BI) and analytics offered by Cloud computing and big have exploded. For example, we can discuss data management
data. They propose a typology of artifacts potentially produced by challenges of E-commerce along the three dimensions: volume,
researchers in design science. Then, after analyzing the state of the velocity and variety.
art through that typology, they use it to sketch opportunities of • On Volume: ‘‘The lower cost of e-channels enables an enterprise
new research to improve BI and analytics capabilities in the cloud to offer its goods or services to more individuals or trading
and from big data. partners. The explosion of the data to be collected in e-
The third paper, ‘‘A Data Quality in Use Model for Big Data’’ commerce are even up to 10x of the quantity of data about
by Jorge Merino, Ismael Caballero, Bibiano Rivas, Manuel Serrano an individual transaction, thereby significantly increasing the
overall volume of data to be managed.’’
and Mario Piattini [3] is a position paper that proposes the
• On Velocity: ‘‘E-commerce has also increased point-of-inter-
3Cs model, which is composed of three data quality dimensions
action (POI) speed, and consequently the pace data used to
for assessing the quality-in-use of big datasets: Contextual
support interactions and generated by interactions.’’
Consistency, Operational Consistency and Temporal Consistency. • On Variety: ‘‘No greater barrier to effective data management
The aim is that the quality of data lacks a quality-in-use model will exist than the variety of incompatible data formats, non-
adapted for big data. aligned data structures, and inconsistent data semantics.’’
The fourth paper, ‘‘A Hybrid Integrated Architecture for Energy
Where does big data come from? (i) ‘‘data exhaust’’ from
Consumption Prediction’’ by Alejandro Maté, Jesus Peral, Antonio
customers; (ii) new and pervasive sensors; (iii) the ability to ‘‘keep
Ferrández, David Gil, Juan Trujillo [4] explores the opportunities everything’’ [8,9].
of using ICT (Information and Communication Technologies) In [10] it is indicated that with the significant advances
as an enabling technology to reduce energy consumption in in Information and Communications Technology (ICT) over the
cities. It proposes a multidimensional hybrid architecture that last half century, there is an increasingly perceived vision that
makes use of current energy data and external information (with computing will one day be the 5th utility (after water, electricity,
unstructured data sources) to improve knowledge acquisition and gas, and telephony). It is defined Cloud computing and provide
allow managers to make better decisions. the architecture for creating Clouds with market-oriented resource
The last paper of this Special Issue, entitled ‘‘Benchmarking allocation by leveraging technologies such as Virtual Machines
Performance for Migrating a Relational Application to a Parallel (VMs). The proliferation of the devices in a communicating
Implementation’’ by Krishna Karthik Gadiraju, Manik Verma, Karen actuating network creates the Internet of Things (IoT) [11]. In [12]
C. Davis, and Paul G. Talaga [5] investigates the impact of scaling it is analysed the challenges and requirements for next-generation
Big Data services and presented a solution designed to support
up the data sizes for several benchmarking queries. They illustrate
next-generation Big Data applications.
what kind of performance results an organization could expect
Regarding the difficulty of managing Big Data, it has been stated
when they migrate current applications to big data environments.
that Big Data is any data that is expensive to manage and hard to
The authors measure the speedup for query execution for all extract value from [13]. Among the Vs shown in Fig. 1, in this brief
dataset sizes resulting from the scale up. They conclude that Hive summary, we will focus on The V of Variety. This will lead us to the
loads the large datasets faster than MySQL, while it is marginally main topic of the workshop which is Conceptual modeling of Big
slower than MySQL when loading the smaller datasets. Data.
98 D. Gil, I.-Y. Song / Future Generation Computer Systems 63 (2016) 96–99
4. Conclusions
References [25] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data, IEEE
Intel. Syst. 24 (2) (2009) 8–12.
[1] M.-A. Aufaure, R. Chiky, O. Curé, H. Khrouf, G. Kepeklian, From Business [26] B. He, K. C.-C. Chang, Statistical schema matching across web query interfaces,
Intelligence to semantic data stream management, Future Gener. Comput. in: Proceedings of the 2003 ACM SIGMOD International Conference on
Syst. 63 (2016) 100–107. Management of Data, ACM, 2003, pp. 217–228.
[2] O. Sangupamba Mwilu, N. Prat, I. Comyn-Wattiau, Design science research [27] S. Kaisler, F. Armour, J.A. Espinosa, W. Money, Big data: Issues and challenges
contribution to business intelligence in the cloud — a systematic literature moving forward, in: System Sciences (HICSS), 2013 46th Hawaii International
review, Future Gener. Comput. Syst. 63 (2016) 108–122. Conference, IEEE, 2013, pp. 995–1004.
[3] J. Merino, I. Caballero, B. Rivas, M. Serrano, M. Piattini, A Data Quality in Use [28] C.P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, tech- niques
model for Big Data, Future Gener. Comput. Syst. 63 (2016) 123–130. and technologies: A survey on big data, Inform. Sci. 275 (2014) 314–347.
[4] A. Maté, J. Peral, A. Ferrández, D. Gil, J. Trujillo, A hybrid integrated architecture [29] D. Petcu, G. Macariu, S. Panica, C. Craciun, Portable cloud applicationsfrom
for energy consumption prediction, Future Gener. Comput. Syst. 63 (2016) theory to practice, Future Gener. Comput. Syst. 29 (6) (2013) 1417–1430.
131–147. [30] J. Han, E. Haihong, G. Le, J. Du, Survey on nosql database, in: Pervasive
[5] K.K. Gadiraju, M. Verma, K.C. Davis, P.G. Talaga, Benchmarking performance for Computing and Applications (ICPCA), 2011 6th International Conference, IEEE,
migrating a relational application to a parallel implementation, Future Gener. 2011, pp. 363–366.
Comput. Syst. 63 (2016) 148–156. [31] H. Chen, R.H. Chiang, V.C. Storey, Business intelligence and analytics: From big
[6] K. Cukier, Data, data everywhere: a special report on managing information, data to big impact, MIS Quart. 36 (4) (2012) 1165–1188.
in: Economist Newspaper, 2010. [32] A. Graves, Techniques to reduce cluttering of rdf visualizations, Future Gener.
[7] A. Labrinidis, H. Jagadish, Challenges and opportunities with big data, Proc. Comput. Syst. 53 (2015) 152–156.
VLDB Endow. 5 (12) (2012) 2032–2033.
[8] D. Laney, 3-d data management: Controlling data volume. Velocity and
David Gil is an associated professor at the Department
Variety, META Group Original Research Note, 2001.
of Computing Technology and Data Processing at the
[9] W. Fan, A. Bifet, Mining big data: current status, and forecast to the future, ACM
University of Alicante, Spain. David received a Ph.D.
sIGKDD Explorations Newsletter 14 (2) (2013) 1–5.
in Computer Science from the University of Alicante
[10] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and
(Spain) in 2008. His research interests include Applications
emerging it platforms: vision, hype, and reality for delivering computing as
of Artificial Intelligence, data mining, data warehouses,
the 5th utility, Future Gener. Comput. Syst. 25 (6) (2009) 599–616.
multidimensional databases, OLAP, design with UML,
[11] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of things (iot): a vision,
and MDA. He has published papers in high quality
architectural elements, and future directions, Future Gener. Comput. Syst. 29
international conferences such as IJCNN, SAC, HEALTHINF,
(7) (2013) 1645–1660.
DCAI, SCAI, SAIS, etc. He has also published papers in
[12] C. Dobre, F. Xhafa, Intelligent services for big data science, Future Gener.
highly cited international journals such as Expert Systems
Comput. Syst. 37 (2014) 267–281.
With Applications, Applied Soft Computing. Dr. Gil has served as a Program
[13] M. J. Franklin, Making sense of big data with the berkeley data analytics stack,
Committee member of several conferences and workshops such as DAWAK, ARES
in: SSDBM, 2013, p. 1.
and CAiSE. He is a reviewer of several journals such as Neurocomputing, Expert
[14] A. Doan, A. Halevy, Z. Ives, Principles of Data Integration, Elsevier, 2012.
Systems and Soft Computing. He is also involved in the organization of several
[15] D. W. Embley, S. W. Liddle, Big data conceptual modeling to the rescue,
international workshops (MoDIC’12, MoBiD’13–14).
in: Conceptual Modeling, Springer, 2013, pp. 1–8.
[16] P. P.-S. Chen, The entity-relationship model toward a unified view of data, ACM
Trans. Database Syst. (TODS) 1 (1) (1976) 9–36. Dr. Il-Yeol Song is professor in the College of Computing
[17] A. Halevy, Best-effort modeling of structured data on the web, in: Conceptual and Informatics of Drexel University and Director of Ph.D.
Modeling–ER 2011, Springer, 2011, 32-32. Program in Information Studies in his college. He served
[18] A. Halevy, A. Rajaraman, J. Ordille, Data integration: the teenage years, as Deputy Director of NSF-sponsored research center on
in: Proceedings of the 32nd International Conference on Very Large Data Bases, Visual & Decision Informatics (CVDI) between 2012–2014.
2006, pp. 9–16. VLDB Endowment. He is also an affiliated professor of Computer Science
[19] M. J. Cafarella, A. Halevy, J. Madhavan, Structured data on the web, Commun. Department of KAIST, Korea. He is an ACM Distinguished
ACM 54 (2) (2011) 72–79. Scientist and an ER Fellow. He is the recipient of 2015
[20] M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, Y. Zhang, Webtables: exploring the Peter P. Chen Award in Conceptual Modeling. His research
power of tables on the web, Proceedings of the VLDB Endowment 1 (1) (2008) interests include conceptual modeling, data warehousing
538–549. & OLAP, big data management & analytics, CRM, object-
[21] H. Gonzalez, A. Y. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapley, oriented analysis & design, healthcare informatics, and smart health. Dr. Song
W. Shen, J. Goldberg-Kidon, Google fusion tables: web-centered data published over 190 peer-reviewed papers and co-edited 22 proceedings. He is a co-
management and collaboration, in: Proceedings of the 2010 ACM SIGMOD Editor-in-Chief of Journal of Computing Science and Engineering (JCSE) and is in an
International Conference on Management of data, ACM, 2010, pp. 1061–1066. editorial board member of DKE, JDM, IJEBR, and JDFSL. He won the Best Paper Award
[22] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Dbpedia: A in the IEEE CIBCB 2004. He won 14 research awards from competitions of annual
Nucleus for a Web of Open Data, Springer, 2007. Drexel Research Days. He also won four teaching awards from Drexel, including
[23] H. Alani, S. Kim, D. E. Millard, M. J. Weal, W. Hall, P. H. Lewis, N. R. Shadbolt, the most prestigious Lindback Distinguished Teaching Award. Dr. Song served as
Automatic ontology-based knowledge extraction from web documents, IEEE the Steering Committee chair of the ER conference between 2010 and 2012. He is
Intel. Syst. 18 (1) (2003) 14–21. a steering committee member of ER, DOLAP, BigComp, and ADFSL conferences. He
[24] P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, C. Wu, served as a program/general chair of over 20 international conferences/workshops
Recovering semantics of tables on the web, Proc. VLDB Endow. 4 (9) (2011) including DOLAP’98–14, CIKM’99, ER’03, FP-UML’06, DaWaK’07–’08,, DESRIST’09,
528–538. CIKM ’09, MoDiC’12, and MoBiD’13–’15.