Nothing Special   »   [go: up one dir, main page]

Software Engineering For Big Data Projects: Domains, Methodologies and Gaps

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

2016 IEEE International Conference on Big Data (Big Data)

Software Engineering for Big Data Projects:


Domains, Methodologies and Gaps

Vijay Dipti Kumar Paulo Alencar


David R. Cheriton School of Computer Science David R. Cheriton School of Computer Science
University of Waterloo University of Waterloo
Waterloo, Canada Waterloo, Canada
vdkumar@uwaterloo.ca palencar@cs.uwaterloo.ca

Abstract—Context: Big data has become the new buzzword “Customers Who Bought This Item Also Bought” service
in the information and communication technology industry. provided by Amazon.
Researchers and major corporations are looking into big data
applications to extract the maximum value from the data available
to them. However, developing and maintaining stable and scalable A. Big Data and Software Engineering
big data applications is still a distant milestone. Objective: To
look at existing research on how software engineering concepts, Big data has been characterised by 4Vs: Volume, Velocity,
namely the phases of the software development project life cycle Variety and Veracity. Volume implies the data explosion that
(SDPLC), can help build better big data application projects. has come to mark the last decade in the world of computing.
Method: A literature survey was performed. A manual search Velocity is the constraint that demands real time processing
covered papers returned by search engines resulting in approx- of the data available, the failure of which would result in its
imately 2,000 papers being searched and 170 papers selected loss or obsolescence. Variety means data format could be struc-
for review. Results: The search results helped in identifying data tured, semi-structured or unstructured or have multiple sources
rich application projects that have the potential to utilize big data and applications must be capable of handling these sources.
successfully. The review helped in exploring SDPLC phases in the Finally, veracity implies that the available data, historical or
context of big data applications and performing a gap analysis
of the phases that have yet to see detailed research efforts but
even real-time streaming data, might have to be cleaned prior
deserve attention. to processing to ensure its usefulness to the application.

Keywords—Big data, software engineering, application projects, The whole process of developing software is fraught with
domains, methodologies, gaps, literature review. errors which arise due to changes in requirements, the environ-
ment or due to communication issues between the stakeholders.
This is true for software development for ordinary applications
I. I NTRODUCTION even today despite the fact we have been developing software
There have been several success stories of big data being for more than 20 years. When factors such as the 4Vs are taken
used by technology giants to dominate their competitors in into account, the complexities involved in developing software
areas such as social media, search engines, e-commerce and for big data application projects only increase.
video streaming services. Few of the popular players and Despite the complexities involved in following the SDPLC
leaders in these fields leveraging big data are Facebook, phases in building a big data application, these phases and
LinkedIn, Twitter, Google, Amazon, and Netflix.The success concepts must be leveraged to build robust and scalable big
of these big data use cases have piqued the interest of and data applications projects. It would be to the advantage of the
spurred numerous companies to take an interest in developing stakeholders and developers involved in building a big data
big data applications, to extract maximum value from all the application that the best practices and methodologies laid down
data that is becoming available. by the software engineering research community be applied to
According to a Gartner survey1 , 64% out of 720 re- build systems that are fault tolerant and capable of handling
spondents of the survey had invested or planned to invest even more data than envisioned at the time of its creation.
in big data applications in 2013. However, less than 8%
had actually deployed at the time of the survey. As shown B. Motivation
in a case study of optimizing the manufacturing process of
digital displays [1], it is possible to enhance process efficiency There have been numerous reviews on big data itself. One
using big data. Different sectors such as healthcare, trading, discusses state of the art in architecture and large scale data
agriculture, tourism and politics, in which big data is being analysis platforms [168], one is a comprehensive big data
used to stakeholders’ advantage, are detailed in [2]. Enriching survey [169], another describes the related technologies and
customer experience using big data has been exemplified by the acquisition and applications of big data [170] and one
the “People you may know” option offered by Facebook and even lists the different definitions of big data [171]. The
LinkedIn, movie recommendations by Netflix, and even the guest editor’s introduction to one of the issues of the IEEE
Software magazine [172] discussed the software engineering
1 Big Data Gartner survey-http://www.gartner.com/newsroom/id/2593815 challenges in building data-intensive, or big data software

978-1-4673-9005-7/16/$31.00 ©2016 IEEE 2886


TABLE I. A PPLICATION D OMAIN C ATEGORIES & C LASSIFICATION

Application Domain Papers Count


Information Technology [1], [3–99] 98
Healthcare [100–112] 13
Geospatial Data Processing/Geographic Information Systems [4], [101], [113–122] 12
Infrastructure [123–133] 11
Transport [76], [123], [128], [131], [134–139] 10
Retail/Tourism/Commerce [36], [76], [101], [140–144] 8
Social Networks [7], [115], [116], [145–148] 7
Environmental Monitoring/Conservation [4], [85], [120], [149–151] 6
Manufacturing [152–154] 3
Meteorology [118], [155], [156] 3
Cyber Physical Systems [157–159] 3
Law & Order/Criminal Investigation/Forensic Analysis [23], [160], [161] 3
Agriculture [162], [163] 2
Banking and Financial Industry [164], [165] 2
Military [161], [166] 2
Aviation Industry [159] 1
Astronomy [167] 1
National Security [161] 1

systems. However, no comprehensive study reviewing exist- eling”, testing, verification, validation, maintenance, quality,
ing software engineering research methodologies for enabling analysis, framework, process, and patterns.
development of big data application project was found.
The idea behind formulating the queries was to search for
C. Goals and Contributions papers that combined topics about software engineering and
big data. We have adopted an extensive search and selection
The main goal of this study was to look into the existing re- process to identify a set of studies that is as complete as
search on applying software engineering to big data application possible. Our search process involved automatic and manual
projects. The results from this literature survey help identify searching. An example of the pattern of queries used in our
the domains that have been studied in detail for big data study is as follows:
application development. It also helped recognize the SDPLC
phases that were most commonly being utilized in developing
big data applications. Additional results include pinpointing the “big data” AND (engineering OR requirement OR specifi-
domains that have the potential to deploy big data application cation OR design OR architecture OR analysis OR testing OR
projects with advantageous outcomes and have seen early verification OR validation OR maintenance OR framework OR
research efforts in the field but remain underexplored and quality OR design OR evolution OR patterns OR process OR
definitely deserve more attention from researchers. reuse OR “domain modeling”)

II. R ESEARCH M ETHOD The abstract, introduction and conclusion of each search
result was examined in detail to ensure the papers were eligible
For conducting the literature review, popular academic to be included in the study. Reproducibility was not given a
search engines, namely Scopus, Web of Science and IEEE priority because a manual search was the main method applied.
Xplore Digital Library were targeted. The search was per-
formed using the Command option under Advanced Search
section of these search engines. A combination of keywords Admittedly, the quality of our results depends on the
related to SDPLC phases were selected from the software search queries used and the efficiency of the search engines.
engineering standard textbook Software Engineering: A Prac- Numerous publications from relevant conferences related to
titioner’s Approach [175] to understand which subfields were big data did not show up in the search results because of the
popular among researchers. SDPLC specific search terms used in the queries. For example,
the use of big data in the field of advertising [176] was not
In the context of software engineering, the search terms covered because there is no specific SDPLC phase referenced
used were architecture, evolution, process, quality, reuse, spec- or used in this study despite the fact that the paper dealt with
ification, requirement(s) engineering, design, “domain mod- an interesting application domain, namely advertising.

2887
TABLE II. SDPLC P HASES

Software Engineering Subfield Papers Count


Requirements [17],[18],[21],[24],[26],[29],[55],[70],[74],[81],[98],[136],[142],[150],[153],[159] 16
Design [1],[5],[13],[20],[22],[26],[30],[38],[42],[44],[47–49],[55],[57],[60],[65–67],[72],[75], 31
[77],[84],[85],[96],[110],[123],[131],[144],[157],[159]
Framework [7],[8],[11],[14–16],[25],[34],[39],[41],[45],[53],[58],[59],[62],[63],[69–71], 51
[76],[80],[83],[88],[90],[91],[94],[100],[101],[103],[107],[109],[113],[114],[116],[122],
[124–126],[132],[134],[139],[143],[145],[152],[155],[157],[158],[162],[164],[167],[173]
Architecture [1],[3–6],[9],[12],[17],[22],[23],[27],[29–33],[35],[40],[43],[50],[52],[54],[56],[61],[64], 68
[68],[73],[78],[79],[82],[85],[89],[92],[93],[95],[97],[99],[101],[102],[104–106],[110–
112],[117],[119],[120],[127–129],[133],[137],[138],[141],[146–149],[151],[156],[160],
[161],[162],[163],[165],[166],[174]
Testing [28], [39], [43], [86], [51], [55], [79], [81], [109], [173] 10
Validation/Verification [46], [109] 2
Maintenance [55], [166] 2
Quality Assurance [14], [28], [87], [98], [109], [147] 6
Domain Specific Languages/Ontology [4], [19], [41], [53], [72], [94], [108], [121], [127], [141], [146], [149], [160] 13

A. Research Questions IV. R ESULTS


The main research questions that were addressed through The main purpose of this literature review in addition
the study were: to looking into the existing research of applying software
engineering to big data, was to perform a gap analysis of
RQ1. Which application domains have received attention the research till date. The gap analysis helped identify the
for the development of big data application projects application domains and the SDPLC phases in the context
and which domains require more attention? of big data that have not yet received much attention from
RQ2. Which SDPLC phases were used to enable big data researchers but have huge potential.
applications and which fields need more research The application domains identified through this study are
efforts? illustrated in Table I and the classification of the SDPLC
phases for big data addressed in each paper is listed in Table II.
B. Classification Criteria
A. Study Analysis
The following were the main criteria taken into account
RQ1. Which application domains have received attention
when analyzing and categorizing each paper. Under each
for the development of big data application projects and which
criteria, multiple categories were identified. The first criteria
domains require more attention?
categorized the papers on the basis of the application domain
addressed by the paper and the second criteria identified the Table I lists all the application domains found by analyzing
SDPLC phase studied. the papers in this study. The papers that proposed new methods
or customized versions of existing technology but did not
C1. Which application domain does the paper belong to? mention a specific domain, such as healthcare, military, or in-
C2. Which SDPLC phase is utilized in the paper? frastructure, were classified into the “Information Technology”
category. Out of the 170 papers selected in this review, the
number of papers dealing with this category was 98, as can
III. L IMITATIONS be seen in the column “Count” of Table I. Nearly 57% of the
analyzed papers focused on the Information Technology do-
This literature review was conducted through a manual main, which indicates that most of the papers are focusing on
search of the results of targeted queries in popular search topics that would directly affect the world of computing. The
engines like Scopus, Web of Science and IEEE Xplore Dig- primary observation was that researchers focused on improving
ital Library. Since the main method for reviewing and se- existing technologies to better suit current requirements.
lecting/discarding the papers was through a manual process,
reproducibility of this study was not taken into consideration. Application domains such as healthcare and the banking
and financial industry are commonly believed to be data rich.
Admittedly, there is room for improvement and opportunity The healthcare industry is a source of huge amounts of data
to include more relevant papers. More papers may have been sourced from the electronic medical records of patients. Data
added to the these search engines after the queries for this study from hospitals, clinics, medical governing bodies and even
were run. Any relevant papers so missed would potentially be insurance providers can be mined to study disease affliction
due to a matter of timing rather than oversight. rates, patterns and susceptibility trends. Analyzing medical

2888
data can also help in developing innovative treatment methods, built, it continues to meet its goals and satisfy its stakeholders’
customize more effective and economical treatment plans, or expectations without compromising performance and remains
even help healthcare professionals in dispensing medication to scalable in the future when even more data could be leveraged.
groups of patients suffering from similar afflictions that have
The requirements subfield fares comparatively better,
identical medical history of symptoms and reactions.
which could be expected since, although the requirements
The banking and financial industry has access to the gathering for big data applications due to the 4V properties of
monetary blueprint of the world. The tremendous amount of the data involved is a complex process, application data and
transactional data that commercial banks handle on a daily processing requirements need to be analyzed, to some extent,
basis can be used to better understand the spending patterns before the applications are built. Conducting requirements
of its customers. Credit card usage, mobile banking application research can provide pointers to new adopters of big data
usage patterns, mortgage and credit history of customers can technology, and help to establish benchmarks and standard
give banks greater insight into the needs of its customers practices.
and help tailor their products accordingly, thereby enhancing
customer experience and satisfaction. B. Classification Procedure
Other data rich domains that have seen early research The papers were classified on the basis of the classification
are infrastructure, transport and manufacturing. Metropolitan criteria C1 and C2 mentioned in an earlier section. Each
authorities responsible for traffic management and building paper was analyzed and categorized on the basis of application
infrastructure facilities can use current as well as historical domain and SDPLC phase. For example, if a paper discussed
data to build smart cities and green buildings that use minimum the application or implementation of big data in the health-
amounts of energy and water for heating and maintenance. care industry then its application domain was classified as
The aviation industry and the military also have huge “Healthcare”. If a paper proposed a domain specific language
potential for using big data for better performance and main- or described an ontology technique, it was categorized under
tenance of equipment, aircrafts and vehicles. Interestingly, “Domain Specific Language/Ontology”.
Geographic Information Systems (GIS) have attracted the Papers that did not mention any application domain explic-
attention of big data researchers. This domain can enhance and itly were classified under “Information Technology”. Several
benefit more from the data available from location services that papers provided a framework approach for their system as well
come built into several mobile and Internet applications used as an architectural implementation. However, if the author(s)
today. of these papers identified the contribution of the paper as a
Global warming is causing serious damage to our environ- framework-oriented method then the paper was designated the
ment and wildlife and meteorologists can use big data from category “Framework” and not “Architecture”.
the global weather sensors to make more accurate weather
predictions and provide timely natural disaster alerts. V. D ISCUSSION
RQ2. Which SDPLC phases were used to enable big data As illustrated by the results of this review in Tables I
applications and which fields need more research efforts? and II, there are noticeable differences in the amount of
research attention received by the different application domains
Table II provides a breakdown of the SDPLC phases that and SDPLC phases. The Information Technology application
were addressed by each paper. Numerous papers dealt with domain received much more research attention when compared
methodologies involving software architectures and frame- to other important and promising data rich domains such as
works and a good number dealt with design. This does not Healthcare and the Banking and Financial Industry.
come as a surprise since building big data applications with the
latest cutting edge technological developments would primarily There is a lot of potential for making technological ad-
deal with the design and architecture of a system. Thus, we see vances using big data in domains identified by this review such
a lot of research done that specifically deals with designs and as Aviation, Infrastructure, Transport and Environmental Mon-
software architectures or frameworks, which act as templates itoring/Conservation. These domains and many others have
for building similar software systems. not yet witnessed much work from researchers but we need
to prioritize these domains. More focused research utilizing
SDPLC phases such as maintenance, validation, verifica- big data applications has the ability to transform the software
tion and quality assurance of big data applications have not management and development of these domains.
seen much research. Only two papers were identified that ex-
plicitly reported research on the topic of verification [46] [109] Research on software requirements methodologies for big
and were classified under ”Verification/Validation” in Table II. data applications would help reduce the chances of system
Verification and validation processes ensure that the software errors, project failures and unsatisfied stakeholder expectations.
system has been designed and will function according to The technologies and environment around big data continues
the requirements and design foundations laid down before to evolve constantly and big data applications designed today
development. have to deal with these unknown but inevitable changes, which
makes the need for requirements research even more urgent.
Similarly, only two papers were found that directly dealt
with “Maintenance” [55] [166]. Maintenance of the soft- Similarly, more research needs to be conducted into en-
ware systems is what keeps the software application running hancing the existing methods and practices and develop-
smoothly even after years of deployment. These areas are ex- ing novel methodologies for maintenance, testing, valida-
tremely important to ensure that once a big data application is tion/verification and quality assurance of big data applications.

2889
The high stakes and risks involved, such as system unpre- [6] C. Esposito, M. Ficco, F. Palmieri, and A. Castiglione,
dictability and project failure in big data applications, can be “A knowledge-based platform for big data analytics
mitigated by conducting thorough system testing, validation based on publish/subscribe services and stream process-
and verification and laying the foundations for good software ing,” Knowledge-Based Systems, 2015.
quality assurance practices. [7] A. Vinay, V. S. Shekhar, J. Rituparna, T. Aggrawal,
K. B. Murthy, and S. Natarajan, “Cloud based big
VI. C ONCLUSION & F UTURE W ORK data analytics framework for face recognition in social
networks using machine learning,” Procedia Computer
This is mostly likely to be the first comprehensive study
Science, 2015.
into existing software engineering research in context of big
[8] S. Meng, W. Dou, X. Zhang, and J. Chen, “KASR:
data applications. The purpose of this literature review was
A Keyword-Aware Service Recommendation method
to understand how software engineering research till date was
on MapReduce for big data applications,” Parallel and
enabling big data application projects and to focus on research
Distributed Systems, IEEE Transactions on, 2014.
that highlights use of the SDPLC phases in building robust,
[9] S. O. Fadiya, S. Saydam, and V. V. Zira, “Advancing
scalable big data applications.
big data for humanitarian needs,” Procedia Engineering,
A gap analysis was performed to identify the more popular 2014.
application domains among researchers in this field. This [10] P. O’Sullivan, G. Thompson, and A. Clifford, “Applying
analysis also aimed at revealing the main SDPLC phases that data models to big data architectures,” IBM Journal of
have seen significant research efforts and the domains and Research and Development, 2014.
phases that need more research attention in the future. [11] O. Belo, A. Cuzzocrea, and B. Oliveira, “Modeling
and supporting ETL processes via a pattern-oriented,
This paper aims to provide perspective to future researchers
task-reusable framework,” in Tools with Artificial Intel-
looking into big data applications from a software engineering
ligence, Intl. Conf. on. IEEE, 2014.
point of view. It can help potential researchers identify promis-
[12] J. Chen, J. Ma, N. Zhong, Y. Yao, J. Liu, R. Huang,
ing but underexplored application domains and focus on using
W. Li, Z. Huang, Y. Gao, and J. Cao, “WaaS: Wisdom
specific software engineering methodologies to develop better
as a Service,” Intelligent Systems, IEEE, 2014.
big data applications. More research in these areas should
[13] C. Ordonez, S. Maabout, D. S. Matusevich, and
motivate and help big data application developers and project
W. Cabrera, “Extending ER models to capture database
managers to contribute more time, effort and resources in the
transformations to build data sets for data mining,” Data
different SDPLC phases of big data application development.
& Knowledge Engineering, 2014.
Future work encompasses widening the net to look for [14] G. Casale, D. Ardagna, M. Artac, F. Barbier, E. D. Nitto,
more papers. There may be more research published after A. Henry, G. Iuhasz, C. Joubert, J. Merseguer, V. I.
the time of the initial search for this study was performed. Munteanu, J. F. Pérez, D. Petcu, M. Rossi, C. Sheri-
By widening the search, more promising application domains dan, I. Spais, and D. Vladušič, “DICE: Quality-driven
can be identified that could benefit from utilizing big data development of data-intensive cloud applications,” in
applications. Proceedings of the 7th Intl. Workshop on Modeling in
Software Engineering. IEEE Press, 2015.
With respect to the search method, the next step is to
[15] A. Rajbhoj, V. Kulkarni, and N. Bellarykar, “Early
supplement the existing manual search with an automated
experience with model-driven development of MapRe-
search to make this review reproducible by other researchers.
duce based big data application,” in 21st Asia-Pacific
This would also help in widening the range covered by the
Software Engineering Conference, 2014.
search to get more relevant results and avoiding false positives.
[16] C. C. Douglas, “An open framework for dynamic big-
data-driven Application Systems (DBDDAS) develop-
R EFERENCES
ment,” Procedia Computer Science, 2014.
[1] P. Pääkkönen and D. Pakkala, “Reference architecture [17] H. Demirkan and D. Delen, “Leveraging the capabilities
and classification of technologies, products and services of service-oriented decision support systems: Putting
for big data systems,” Big Data Research, 2015. analytics and big data in cloud,” Decision Support
[2] F. Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “An Systems, 2013.
overview of big data opportunities, applications and [18] F. Yang-Turner, L. Lau, and V. Dimitrova, “A model-
tools,” in Intelligent Systems and Computer Vision, driven prototype evaluation to elicit requirements for
2015. a sensemaking support tool,” in Software Engineering
[3] M. Vanauer, C. Bohle, and B. Hellingrath, “Guiding the Conf., Asia-Pacific. IEEE, 2012.
introduction of big data in organizations: A methodol- [19] D. Breuker, “Towards model-driven engineering for
ogy with business-and data-driven ideation and enter- big data analytics–An exploratory analysis of domain-
prise architecture management-based implementation,” specific languages for machine learning,” in System
in System Sciences, Hawaii Intl. Conf.on. IEEE, 2015. Sciences, Hawaii Intl. Conf. on. IEEE, 2014.
[4] M. Krämer and I. Senner, “A modular software architec- [20] P. M. Marı́n-Ortega, V. Dmitriyev, M. Abilov, and J. M.
ture for processing of big geospatial data in the cloud,” Gómez, “ELTA: New approach in designing business
Computers & Graphics, 2015. intelligence solutions in era of big data,” Procedia
[5] G. Chen, S. Wu, and Y. Wang, “The evolvement of big Technology, 2014.
data systems: From the perspective of an information [21] H. Eridaputra, B. Hendradjaya, and W. D. Sunindyo,
security application,” Big Data Research, 2015. “Modeling the requirements for big data application

2890
using goal oriented approach,” in Data and Software [37] S. Ceri, T. Palpanas, E. D. Valle, D. Pedreschi, J.-C.
Engineering, Intl. Conf on. IEEE, 2014. Freytag, and R. Trasarti, “Towards mega-modeling: A
[22] C. Li, L. Huang, and L. Chen, “Breeze graph grammar: walk through data analysis experiences,” ACM SIGMOD
A graph grammar approach for modeling the software Record, 2013.
architecture of big data-oriented software systems,” Soft- [38] V. Chang and M. Ramachandran, “A proposed case for
ware: Practice and Experience, 2014. the cloud software engineering in security,” in Proceed-
[23] J. Dajda and G. Dobrowolski, “Architecture dedicated ings of the Intl. Workshop on Emerging Software as a
to data integration,” in Intelligent Information and Service and Analytics. SciTePress, 2014.
Database Systems. Springer, 2015. [39] N. Li, A. Escalona, Y. Guo, and J. Offutt, “A scalable big
[24] R. Girardi and L. B. Marinho, “A domain model of web data test framework,” in Software Testing, Verification
recommender systems based on usage mining and col- and Validation, Intl. Conf. on. IEEE, 2015.
laborative filtering,” Requirements Engineering, 2007. [40] M. Villari, A. Celesti, M. Fazio, and A. Puliafito,
[25] Y. Jararweh, M. Jarrah, Z. Alshara, M. N. Alsaleh, “Alljoyn Lambda: An architecture for the management
M. Al-Ayyoub et al., “CloudExp: A comprehensive of smart environments in IoT,” in Smart Computing
cloud computing experimental framework,” Simulation Workshops, Intl. Conf. on. IEEE, 2014.
Modelling Practice and Theory, 2014. [41] L. M. Pham, A. Tchana, D. Donsez, V. Zurczak, P.-
[26] D. N. Jutla, P. Bodorik, and S. Ali, “Engineering privacy Y. Gibello, and N. de Palma, “An adaptable framework
for big data apps with the Unified Modeling Language,” to deploy complex applications onto multi-cloud plat-
in Big Data, Intl. Congress on. IEEE, 2013. forms,” in Computing & Communication Technologies-
[27] H.-M. Chen, R. Kazman, S. Haziyev, and O. Hrytsay, Research, Innovation, and Vision for the Future, Intl.
“Big data system development: An embedded case study Conf. on. IEEE, 2015.
with a global outsourcing firm,” in Proceedings of the [42] S. B. Elagib, A. R. Najeeb, A. H. Hashim, and R. F.
First Intl. Workshop on BIG Data Software Engineering. Olanrewaju, “Big data analysis solutions using MapRe-
IEEE Press, 2015. duce framework,” in Computer and Communication
[28] H. M. Sneed and K. Erdoes, “Testing big data (Assuring Engineering, Intl. Conf. on. IEEE, 2014.
the quality of large databases),” in Software Testing, [43] P. Yongpisanpop, H. Hata, and K. Matsumoto, “Bugar-
Verification and Validation Workshops, 8th Intl. Conf. ium: 3D interaction for supporting large-scale bug
on. IEEE, 2015. repositories analysis,” in Companion Proceedings of the
[29] C. Cecchinel, M. Jimenez, S. Mosser, and M. Riveill, Intl. Conf. on Software Engineering. ACM, 2014, pp.
“An architecture to support the collection of big data in 500–503.
the Internet of Things,” in Services, World Congress on. [44] E. Begoli and J. Horey, “Design principles for effec-
IEEE, 2014. tive knowledge discovery from big data,” in Software
[30] K. M. Anderson, “Embrace the challenges: Software Architecture and European Conference on Software Ar-
engineering in a big data world,” in Proceedings of the chitecture, Joint Working IEEE/IFIP Conf. on, 2012.
First Intl. Workshop on BIG Data Software Engineering. [45] Y. Huai, R. Lee, S. Zhang, C. H. Xia, and X. Zhang,
IEEE Press, 2015. “DOT: A matrix model for analyzing, optimizing and
[31] I. Gorton and J. Klein, “Distribution, Data, Deployment: deploying software for big data analytics in distributed
Software architecture convergence in big data systems,” systems,” in Proceedings of the ACM Symposium on
IEEE Software, vol. 32, no. 3, pp. 78–85, May 2015. Cloud Computing, 2011.
[32] A. Zimmermann, M. Pretz, G. Zimmermann, D. G. [46] M. Camilli, “Formal verification problems in a big
Firesmith, I. Petrov, and E. El-Sheikh, “Towards service- data world: Towards a mighty synergy,” in Companion
oriented enterprise architectures for big data applica- Proceedings of the Intl. Conf. on Software Engineering.
tions in the cloud,” in Enterprise Distributed Object ACM, 2014.
Computing Conference Workshops, Intl. IEEE, 2013. [47] F. Shull, “Getting an intuition for big data,” Software,
[33] M. A. Martı́nez-Prieto, C. E. Cuesta, M. Arias, and IEEE, vol. 30, no. 4, pp. 3–6, 2013.
J. D. Fernández, “The SOLID architecture for real-time [48] S. Bazargani, J. Brinkley, and N. Tabrizi, “Implement-
management of big semantic data,” Future Generation ing conceptual search capability in a cloud-based feed
Computer Systems, 2015. aggregator,” in Innovative Computing Technology, Intl.
[34] H.-L. Truong and S. Dustdar, “Sustainability data and Conf. on. IEEE, 2013.
analytics in cloud-based M2M systems,” in Big Data [49] W. Sun, F. Li, W. Guo, Y. Jin, and W. Hu, “Store,
and Internet of Things: A Roadmap for Smart Environ- schedule and switch - A new data delivery model in
ments. Springer, 2014. the big data era,” in Transparent Optical Networks, Intl.
[35] F. Bonomi, R. Milito, P. Natarajan, and J. Zhu, “Fog Conf. on. IEEE, 2013.
computing: A platform for Internet of Things and ana- [50] S. Shukla and G. Sadashivappa, “A distributed random-
lytics,” in Big Data and Internet of Things: A Roadmap ization framework for privacy preservation in big data,”
for Smart Environments. Springer, 2014. in IT in Business, Industry and Government, Conference
[36] B. T. Hazen, C. A. Boone, J. D. Ezell, and L. A. on. IEEE, 2014.
Jones-Farmer, “Data quality for data science, predictive [51] Z. Liu, “Research of performance test technology for
analytics, and big data in supply chain management: big data applications,” in Information and Automation,
An introduction to the problem and suggestions for IEEE Intl. Conf. on, 2014.
research and applications,” Intl. Journal of Production [52] C. Wang, X. Li, and X. Zhou, “SODA: software defined
Economics, 2014. fpga based accelerators for big data,” in Proceedings of

2891
the Design, Automation & Test in Europe Conference & “UniMiner: Towards a unified framework for data min-
Exhibition. EDA Consortium, 2015. ing,” in Information and Communication Technologies,
[53] M. G. Al Zamil and S. Samarah, “The application of Fourth World Congress on. IEEE, 2014.
semantic-based classification on big data,” in Informa- [70] D. Tracey and C. Sreenan, “A holistic architecture for
tion and Communication Systems, Intl. Conf. on. IEEE, the internet of things, sensing services and big data,”
2014. in Cluster, Cloud and Grid Computing,IEEE/ACM Intl.
[54] R. Agrawal, A. Imran, C. Seay, and J. Walker, “A layer Symposium on, 2013.
based architecture for provenance in big data,” in Big [71] G. Kousiouris, G. Vafiadis, and T. Varvarigou, “Enabling
Data, IEEE Intl. Conf. on, 2014. proactive data management in virtualized hadoop clus-
[55] N. H. Madhavji, A. Miranskyy, and K. Kontogiannis, ters based on predicted data activity patterns,” in P2P,
“Big picture of big data software engineering: With Parallel, Grid, Cloud and Internet Computing, Eighth
example research challenges,” in Proceedings of the Intl. Intl. Conf. on. IEEE, 2013.
Workshop on BIG Data Software Engineering. IEEE [72] B. T. Kumara, I. Paik, J. Zhang, T. Siriweera, and
Press, 2015. K. R. Koswatte, “Ontology-based workflow generation
[56] K. Kanoun, M. Ruggiero, D. Atienza, and M. Van for intelligent big data analytics,” in Web Services, IEEE
Der Schaar, “Low power and scalable many-core archi- Intl. Conf. on, 2015.
tecture for big-data stream computing,” in VLSI, IEEE [73] M. Westerlund, U. Hedlund, G. Pulkkis, and K.-M.
Computer Society Annual Symposium on, 2014. Björk, “A generalized scalable software architecture for
[57] I. Mytilinis, D. Tsoumakos, V. Kantere, A. Nanos, analyzing temporally structured big data in the cloud,”
and N. Koziris, “I/O performance modeling for big in New Perspectives in Information Systems and Tech-
data applications over cloud infrastructures,” in Cloud nologies, Volume 1. Springer, 2014.
Engineering, IEEE Intl. Conf. on, 2015. [74] H. S. Lamba and S. K. Dubey, “Analysis of requirements
[58] Y. Li, K. Wang, Q. Guo, X. Li, X. Zhang, G. Chen, for big data adoption to maximize IT business value,”
T. Liu, and J. Li, “Breaking the boundary for whole- in Reliability, Infocom Technologies and Optimization
system performance optimization of big data,” in Pro- (Trends and Future Directions), 2015 4th International
ceedings of the Intl. Symposium on Low Power Elec- Conference on, 2015.
tronics and Design. IEEE Press, 2013. [75] H. M. Chen, R. Kazman, and S. Haziyev, “Strategic
[59] K. Holley, G. Sivakumar, and K. Kannan, “Enrichment prototyping for developing big data systems,” IEEE
patterns for big data,” in Big Data, IEEE Intl. Congress Software, 2016.
on, 2014. [76] W. Zhang, L. Xu, Z. Li, Q. Lu, and Y. Liu, “A deep-
[60] I. Gorton, J. Klein, and A. Nurgaliev, “Architecture intelligence framework for online video processing,”
knowledge for evaluating scalable databases,” DTIC IEEE Software, 2016.
Document, Tech. Rep., 2015. [77] M. M. Bersani, F. Marconi, D. A. Tamburri, P. Jamshidi,
[61] E. Xinhua, J. Han, Y. Wang, and L. Liu, “Big Data-as-a- and A. Nodari, “Continuous architecting of stream-
Service: Definition and architecture,” in Communication based systems,” in 13th Working IEEE/IFIP Conf. on
Technology, IEEE Intl. Conf. on, 2013. Software Architecture, 2016.
[62] L. Xu, M. Li, and A. R. Butt, “GERBIL: MPI + YARN,” [78] A. Zimmermann, B. Gonen, R. Schmidt, E. El-Sheikh,
in Cluster, Cloud and Grid Computing, IEEE/ACM Intl. S. Bagui, and N. Wilde, “Adaptable enterprise architec-
Symposium on, 2015. tures for software evolution of SmartLife ecosystems,”
[63] N. Mishra, C.-C. Lin, and H.-T. Chang, “A cogni- in Intl. Enterprise Distributed Object Computing Conf.
tive oriented framework for IoT big-data management Workshops and Demonstrations. IEEE, 2014.
prospective,” in Communication Problem-Solving, IEEE [79] B. Li, M. Grechanik, and D. Poshyvanyk, “Sanitizing
Intl. Conf. on. IEEE, 2014. and minimizing databases for software application test
[64] E.-E. Durham, A. Rosen, R. W. Harrison et al., “A outsourcing,” in Intl. Conf. on Software Testing, Verifi-
model architecture for big data applications using re- cation and Validation. IEEE, 2014.
lational databases,” in Big Data, IEEE Intl. Conf. on, [80] C. L. Wu, T. C. Chiang, L. C. Fu, and Y. C. Zeng,
2014. “Nonparametric discovery of contexts and preferences
[65] A. Chebotko, A. Kashlev, and S. Lu, “A big data in smart home environments,” in Systems, Man, and
modeling methodology for Apache Cassandra,” in Big Cybernetics, IEEE Intl. Conf. on, 2015.
Data, IEEE Intl. Congress on, 2015. [81] M. Thangaraj and S. Anuradha, “State of art in testing
[66] R. J. Nowling and J. Vyas, “A domain-driven, generative for big data,” in IEEE Intl. Conf. on Computational
data model for big pet store,” in Big Data and Cloud Intelligence and Computing Research, 2015.
Computing, IEEE Intl. Conf. on, 2014. [82] M. O. Gökalp, A. Koçyigit, and P. E. Eren, “A cloud
[67] A. Ochian, G. Suciu, O. Fratu, and V. Suciu, “Big data based architecture for distributed real time processing
search for environmental telemetry,” in Communications of continuous queries,” in Euromicro Conf. on Software
and Networking, IEEE Intl. Black Sea Conf. on, 2014. Engineering and Advanced Applications, 2015.
[68] A. Desai and K. Nagegowda, “Advanced control dis- [83] D. Wu, L. Zhu, X. Xu, S. Sakr, D. Sun, and Q. Lu,
tributed processing architecture (ACDPA) using SDN “Building pipelines for heterogeneous execution envi-
and Hadoop for identifying the flow characteristics and ronments for big data processing,” IEEE Software, 2016.
setting the quality of service (QoS) in the network,” in [84] D. G. Tesfagiorgish and L. JunYi, “Big data transforma-
Advance Computing Conf., IEEE Intl., 2015. tion testing based on data reverse engineering,” in Intl.
[69] M. Habib ur Rehman, C. S. Liew, and T. Y. Wah, Conf. on Ubiquitous Intelligence and Computing, Intl.

2892
Conf. on Autonomic and Trusted Computing, Intl. Conf. time streaming data in healthcare applications,” Future
on Scalable Computing and Communications (UIC- Generation Computer Systems, 2015.
ATC-ScalCom), 2015. [101] M. Zhang, H. Wang, Y. Lu, T. Li, Y. Guang, C. Liu,
[85] K. Taneja, Q. Zhu, D. Duggan, and T. Tung, “Linked E. Edrosa, H. Li, and N. Rishe, “TerraFly Geocloud: An
enterprise data model and its use in real time analytics online spatial data analysis and visualization system,”
and context-driven data discovery,” in IEEE Intl. Conf. ACM Transactions on Intelligent System and Technol-
on Mobile Services, 2015. ogy, 2015.
[86] Z. Liu, “Research of performance test technology for [102] Z. Xu, X. Wei, X. Luo, Y. Liu, L. Mei, C. Hu, and
big data applications,” in Information and Automation, L. Chen, “Knowle: A semantic link network based
IEEE Intl. Conf. on, 2014. system for organizing large scale online news events,”
[87] H. Zhou, J. G. Lou, H. Zhang, H. Lin, H. Lin, and Future Generation Computer Systems, 2015.
T. Qin, “An empirical study on quality issues of pro- [103] T. Shah, F. Rabhi, and P. Ray, “Investigating an
duction big data platform,” in IEEE/ACM Intl. Conf. on ontology-based approach for big data analysis of inter-
Software Engineering, 2015. dependent medical and oral health conditions,” Cluster
[88] A. Samuel, M. I. Sarfraz, H. Haseeb, S. Basalamah, and Computing, 2014.
A. Ghafoor, “A framework for composition and enforce- [104] Q. Yao, Y. Tian, P.-F. Li, L.-L. Tian, Y.-M. Qian, and
ment of privacy-aware and context-driven authorization J.-S. Li, “Design and Development of a Medical Big
mechanism for multimedia big data,” IEEE Transactions Data Processing System based on Hadoop,” Journal of
on Multimedia, 2015. medical systems, 2015.
[89] A. Doyle, G. Katz, K. Summers, C. Ackermann, I. Za- [105] M. A. Saleem, Y.-K. Lee, and S. Lee, “Trajectory
vorin, Z. Lim, S. Muthiah, L. Zhao, C. T. Lu, P. Butler, patterns mining towards lifecare provisioning,” Wireless
R. P. Khandpur, Y. Fayed, and N. Ramakrishnan, “The Personal Communications, 2014.
EMBERS architecture for streaming predictive analyt- [106] S. J. Rysavy, D. Bromley, and V. Daggett, “DIVE: A
ics,” in Big Data, IEEE Intl. Conf. on, 2014. graph-based visual-analytics framework for big data,”
[90] F. Shen, “A pervasive framework for real-time activity IEEE Computer Graphics and Applications, 2014.
patterns of mobile users,” in Pervasive Computing and [107] A. Naseer, B. Y. Alkazemi, and E. U. Waraich, “A
Communication Workshops, IEEE Intl. Conf. on, 2015. big data approach for proactive healthcare monitoring
[91] S. Yang, W. Yu, Y. Hu, K. Wang, J. Wang, and S. Li, of chronic patients,” in Intl. Conf. on Ubiquitous and
“An automatic discovery framework of cross-source Future Networks (ICUFN), 2016.
data inconsistency for web big data,” in Intl. Conf. on [108] K. Gai, M. Qiu, L. C. Chen, and M. Liu, “Electronic
Advanced Cloud and Big Data, 2015. health record error prevention approach using ontology
[92] H. M. Chen, R. Kazman, and S. Haziyev, “Agile big data in big data,” in High Performance Computing and
analytics for web-based systems: An architecture-centric Communications, Intl. Symposium on Cyberspace Safety
approach,” IEEE Transactions on Big Data, 2016. and Security, Intl. Conf. on Embedded Software and
[93] S. Singh and Y. Liu, “A cloud service architecture for Systems, Intl. Conf. on. IEEE, 2015.
analyzing big monitoring data,” Tsinghua Science and [109] J. Ding, D. Zhang, and X. H. Hu, “A framework for
Technology, pp. 55–70, 2016. ensuring the quality of a big data service,” in IEEE Intl.
[94] N. Wilder, J. M. Smith, and A. Mockus, “Exploring Conf. on Services Computing, 2016.
a framework for identity and attribute linking across [110] R.S̊endelj, I.Ognjanović, E.Ammenwerth, and W.Hackl,
heterogeneous data systems,” in Proceedings of the “Towards semantically enabled development of service-
2nd Intl. Workshop on BIG Data Software Engineering. oriented architectures for integration of socio-medical
ACM, 2016. data,” in Mediterranean Conference on Embedded Com-
[95] J. Klein, I. Gorton, L. Alhmoud, J. Gao, C. Gemici, puting, 2016.
R. Kapoor, P. Nair, and V. Saravagi, “Model-driven [111] K. Kaur and R. Rani, “A smart polyglot solution for big
observability for big data storage,” in 13th Working data in healthcare,” IT Professional, 2015.
IEEE/IFIP Conf. on Software Architecture, 2016. [112] A. Forkan, I. Khalil, A. Ibaida, and Z. Tari, “BDCaM:
[96] G. Gousios, D. Safaric, and J. Visser, “Streaming soft- Big data for context-aware monitoring - A Personalized
ware analytics,” in Proceedings of the 2nd Intl. Work- Knowledge Discovery Framework for Assisted Health-
shop on BIG Data Software Engineering. ACM, 2016. care,” IEEE Transactions on Cloud Computing, 2015.
[97] M. Guerriero, S. Tajfar, D. A. Tamburri, and E. Di Nitto, [113] R. Giachetta, “A framework for processing large scale
“Towards a model-driven design tool for big data archi- geospatial and remote sensing data in MapReduce en-
tectures,” in Proceedings of the 2nd Intl. Workshop on vironment,” Computers & Graphics, 2015.
BIG Data Software Engineering. ACM, 2016. [114] M. Müller, L. Bernard, and D. Kadner, “Moving code–
[98] I. Noorwali, D. Arruda, and N. H. Madhavji, “Under- sharing geoprocessing logic on the web,” ISPRS Journal
standing quality requirements in the context of big data of Photogrammetry and Remote Sensing, 2013.
systems,” in Proceedings of the 2nd Intl. Workshop on [115] T. Shelton, A. Poorthuis, M. Graham, and M. Zook,
BIG Data Software Engineering. ACM, 2016. “Mapping the data shadows of Hurricane Sandy: Un-
[99] S. Marchal, X. Jiang, R. State, and T. Engel, “A big covering the sociospatial dimensions of big data,” Geo-
data architecture for large scale security monitoring,” in forum, 2014.
IEEE Intl. Congress on Big Data, June 2014, pp. 56–63. [116] G. Cao, S. Wang, M. Hwang, A. Padmanabhan,
[100] F. Zhang, J. Cao, S. U. Khan, K. Li, and K. Hwang, Z. Zhang, and K. Soltani, “A scalable framework for
“A task-level adaptive MapReduce framework for real-

2893
spatiotemporal analysis of location-based social media work for missing data prediction,” in High Performance
data,” Computers, Environment and Urban Systems, Computing and Communications, Intl. Symposium on
2015. Cyberspace Safety and Security, Intl. Conf. on Embed-
[117] M. Akmal, I. Allison, and H. González-Vélez, “As- ded Software and Systems, Intl. Conf. on. IEEE, 2015.
sembling cloud-based geographic information systems: [133] B. Cheng, S. Longo, F. Cirillo, M. Bauer, and E. Ko-
A pragmatic approach using off-the-shelf components,” vacs, “Building a big data platform for smart cities:
Cloud Computing with e-Science Applications, 2015. Experience and lessons from Santander,” in IEEE Intl.
[118] J. Alder and S. Hostetler, “Web based visualization of Congress on Big Data, June 2015, pp. 592–599.
large climate data sets,” Environmental Modelling & [134] H. Li, D. Parikh, Q. He, B. Qian, Z. Li, D. Fang,
Software, 2015. and A. Hampapur, “Improving rail network velocity: A
[119] M. Deng and L. Di, “Building open environments to machine learning approach to predictive maintenance,”
meet big data challenges in Earth sciences,” in Big Data Transportation Research Part C: Emerging Technolo-
: Techniques and Technologies in Geoinformatics. CRC gies, 2014.
Press, 2013. [135] A. Thaduri, D. Galar, and U. Kumar, “Railway assets:
[120] S. Fang, L. Da Xu, Y. Zhu, J. Ahati, H. Pei, J. Yan, and A potential domain for big data analytics,” Procedia
Z. Liu, “An integrated system for regional environmen- Computer Science, 2015.
tal monitoring and management based on internet of [136] C. D. Cottrill and S. Derrible, “Leveraging big data for
things,” Industrial Informatics, IEEE Transactions on, the development of transport sustainability indicators,”
2014. Journal of Urban Technology, 2015.
[121] C. Ledur, D. Griebler, I. Manssour, and L. G. Fernandes, [137] S. Kwoczek, S. D. Martino, T. Rustemeyer, and W. Ne-
“Towards a domain-specific language for geospatial data jdl, “An architecture to process massive vehicular traffic
visualization maps with big data sets,” in IEEE/ACS Intl. data,” in Intl. Conf. on P2P, Parallel, Grid, Cloud and
Conf. of Computer Systems and Applications, 2015. Internet Computing, 2015.
[122] J. Anderson, R. Soden, K. M. Anderson, M. Kogan, [138] R. O. Sinnott, L. Morandini, and S. Wu, “SMASH:
and L. Palen, “EPIC-OSM: A software framework for A cloud-based architecture for big data processing and
OpenStreetMap Data Analytics,” in Hawaii Intl. Conf. visualization of traffic data,” in IEEE Intl. Conf. on Data
on System Sciences, 2016. Science and Data Intensive Systems, 2015.
[123] N. Pelekis, Y. Theodoridis, and D. Janssens, “On the [139] J. Yang and J. Ma, “A big-data processing framework for
management and analysis of our lifesteps,” SIGKDD uncertainties in transportation data,” in Fuzzy Systems,
Explor. Newsl., 2014. IEEE Intl. Conf. on, Aug 2015, pp. 1–6.
[124] Y. Zhang, M. Chen, S. Mao, L. Hu, and V. Leung, [140] M. Yesudas, G. Menon, and V. Ramamurthy, “Intelligent
“CAP: Community Activity Prediction based on big operational dashboards for smarter commerce using big
data analysis,” Network, IEEE, 2014. data,” IBM Journal of Research and Development, 2014.
[125] S. D‘Oca and T. Hong, “Occupancy schedules learning [141] L. Shi, F. Lin, T. Yang, J. Qi, W. Ma, and S. Xu,
process through a data mining framework,” Energy and “Context-based ontology-driven recommendation strate-
Buildings, 2015. gies for tourism in ubiquitous computing,” Wireless
[126] I. Widjaja, P. Russo, C. Pettit, R. Sinnott, and M. Tomko, Personal Communications, 2014.
“Modeling coordinated multiple views of heterogeneous [142] R. P. Kinsley and J. Portenoy, “Perspectives of emerging
data cubes for urban visual analytics,” Intl. Journal of museum professionals on the role of big data in muse-
Digital Earth, 2014. ums,” in System Sciences, Hawaii Intl. Conf. on. IEEE,
[127] D. Bonino and G. Procaccianti, “Exploiting semantic 2015.
technologies in smart environments and grids: Emerging [143] L. Deng, J. Gao, and C. Vuppalapati, “Building a big
roles and case studies,” Science of Computer Program- data analytics service framework for mobile advertising
ming, 2014. and marketing,” in Big Data Computing Service and
[128] C. Dobre and F. Xhafa, “Intelligent services for big data Applications, IEEE Intl. Conf. on, 2015.
science,” Future Generation Computer Systems, 2014. [144] G. Suciu, C. Dobre, V. Suciu, G. Todoran, A. Vulpe,
[129] W. Q. Wang, X. Zhang, J. Zhang, and H. B. Lim, “Smart and A. Apostu, “Cloud computing for extracting price
traffic cloud: An infrastructure for traffic applications,” knowledge from big data,” in Complex, Intelligent, and
in Parallel and Distributed Systems, Intl. Conf. on. Software Intensive Systems, Intl. Conf. on. IEEE, 2015.
IEEE, 2012. [145] Q. Huang and C. Xu, “A data-driven framework for
[130] P. A. Mathew, L. N. Dunn, M. D. Sohn, A. Mercado, archiving and exploring social media data,” Annals of
C. Custudio, and T. Walter, “Big-data for building GIS, 2014.
energy performance: Lessons from assembling a very [146] K. Tao, C. Hauff, G. J. Houben, F. Abel, and
large national database of building energy use,” Applied G. Wachsmuth, “Facilitating Twitter data analytics: Plat-
Energy, 2015. form, language and functionality,” in Big Data, Intl.
[131] F. J. Wu, X. Zhang, and H. B. Lim, “A cooperative Conf. on. IEEE, 2014.
sensing and mining system for transportation activity [147] A. Immonen, P. Pääkkönen, and E. Ovaska, “Evaluating
survey,” in IEEE Wireless Communications and Net- the quality of social media data in big data architecture,”
working Conference, 2014. IEEE Access, 2015.
[132] W. Shi, Y. Zhu, J. Zhang, X. Tao, G. Sheng, Y. Lian, [148] I. D. Addo, D. Do, R. Ge, and S. I. Ahamed, “A
G. Wang, and Y. Chen, “Improving power grid monitor- reference architecture for social media intelligence ap-
ing data quality: An efficient machine learning frame- plications in the cloud,” in Computer Software and

2894
Applications Conference. IEEE, 2015. port,” Environmental Modelling & Software, 2014.
[149] E. W. Patton, P. Seyed, P. Wang, L. Fu, F. J. Dein, [164] N. Sun, J. Morris, J. Xu, X. Zhu, and M. Xie, “iCARE:
R. S. Bristol, and D. L. McGuinness, “SemantEco: A se- A framework for big data-based banking customer an-
mantically powered modular architecture for integrating alytics,” IBM Journal of Research and Development,
distributed environmental and ecological data,” Future 2014.
Generation Computer Systems, 2014. [165] C. Restrepo-Arango, A. Henao-Chaparro, and
[150] C. Beal and J. Flynn, “Toward the digital water age: C. Jiménez-Guarı́n, “Using the web to monitor a
Survey and case studies of Australian water utility customized unified financial portfolio,” in Advances in
smart-metering programs,” Utilities Policy, 2015. Conceptual Modeling. Springer, 2012.
[151] E. Moguel, J. C. Preciado, F. Sanchez-Figueroa, M. Pre- [166] D. Ning, P. Chen, G. Yuan, J. Xu, and L. Xu, “Research
ciado, J. Hernandez et al., “Multilayer big data ar- on Warship Communication Operation and Maintenance
chitecture for remote sensing in eolic parks,” Journal Management Based on Big Data,” in Cloud Computing
of Selected Topics in Applied Earth Observations and and Big Data, Intl. Conf. on. IEEE, 2014.
Remote Sensing, 2015. [167] E. Sciacca, C. Pistagna, U. Becciani, A. Costa, P. Mas-
[152] D. Dutta and I. Bose, “Managing a big data project: simino, S. Riggi, F. Vitello, M. Bandieramonte, and
The case of Ramco Cements limited,” Intl. Journal of M. Krokos, “Towards a big data exploration framework
Production Economics, 2015. for astronomical archives,” in High Performance Com-
[153] N. Kushiro, S. Matsuda, and K. Takahara, “Model puting & Simulation, Intl. Conf. on. IEEE, 2014.
oriented system design on big-data,” Procedia Computer [168] E. Begoli, “A short survey on the state of the art in
Science, 2014. architectures and platforms for large scale data analysis
[154] J. Lee, H.-A. Kao, and S. Yang, “Service innovation and knowledge discovery from data,” in Proceedings of
and smart analytics for industry 4.0 and big data envi- the WICSA/ECSA Companion Volume. ACM, 2012.
ronment,” Procedia CIRP, 2014. [169] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,”
[155] C. Li, Y. Liu, R. Li, and H. Zhang, “Research and Mobile Networks and Applications, 2014.
application of one-key publishing technologies for me- [170] C. P. Chen and C.-Y. Zhang, “Data-intensive applica-
teorological service products,” in IEEE Intl. Conf. on tions, challenges, techniques and technologies: A survey
Big Data Analysis, 2016. on big data,” Information Sciences, 2014.
[156] M. J. Divn, Y. B. Saibene, M. D. L. . Martn, M. L. [171] J. S. Ward and A. Barker, “Undefined by data: A survey
Belmonte, G. Lafuente, and J. Caldera, “Towards a data of big data definitions,” arXiv preprint arXiv:1309.5821,
processing architecture for the weather radar of the 2013.
INTA Anguil,” in Intl. Workshop on Data Mining with [172] I. Gorton, A. B. Bener, and A. Mockus, “Software
Industrial Applications, 2015. Engineering for Big Data Systems,” IEEE Software,
[157] L. Zhang, “A framework to model big data driven 2016.
complex cyber physical control systems,” in Automation [173] K. S. Yim, “Norming to Performing: Failure analysis
and Computing, Intl. Conf. on. IEEE, 2014. and deployment automation of big data software devel-
[158] C. Cecchinel, S. Mosser, and P. Collet, “Software de- oped by highly iterative models,” in Software Reliability
velopment support for shared sensing infrastructures: Engineering, Intl. Symposium on. IEEE, 2014.
A generative and dynamic approach,” in Software [174] C. E. Cuesta, M. A. Martı́nez-Prieto, and J. D.
Reuse for Dynamic Systems in the Cloud and Beyond. Fernández, “Towards an architecture for managing big
Springer, 2014. semantic data in real-time,” in Software Architecture.
[159] L. Zhang, “Designing big data driven cyber physical Springer, 2013.
systems based on AADL,” in Systems, Man and Cyber- [175] R. S. Pressman, Software Engineering: A Practitioner’s
netics, IEEE Intl. Conf. on, 2014. Approach. Palgrave Macmillan, 2005.
[160] M. Rahmes, G. Lemieux, K. Fox, and C. Casseus, [176] J. S. Saltz and I. Shamshurin, “Exploring the process of
“Multi-disciplinary ontological geo-analytical incident doing data science via an ethnographic study of a media
modeling,” in IEEE Consumer Communications and advertising company,” in Big Data, IEEE Intl. Conf. on,
Networking Conference, 2015. 2015.
[161] J. Klein, R. Buglak, D. Blockow, T. Wuttke, and
B. Cooper, “A reference architecture for big data sys-
tems in the National Security domain,” in Proceedings
of the 2nd Intl. Workshop on Big Data Software Engi-
neering. ACM, 2016.
[162] W. Akio Goya, M. Risse de Andrade, A. Carvalho Zuc-
chi, N. Mimura Gonzalez, R. de Fatima Pereira,
K. Langona, T. C. Melo de Brito Carvalho, J.-E. Mangs,
and A. Sefidcon, “The use of distributed processing
and cloud computing in agricultural decision-making
support systems,” in Cloud Computing, Intl. Conf. on.
IEEE, 2014.
[163] R. Dutta, A. Morshed, J. Aryal, C. D’este, and A. Das,
“Development of an intelligent environmental knowl-
edge system for sustainable agricultural decision sup-

2895

You might also like