Abstract
Most people have become “big data” producers in their daily life. Our desires, opinions, sentiments, social links as well as our mobile phone calls and GPS track leave traces of our behaviours. To transform these data into knowledge, value is a complex task of data science. This paper shows how the SoBigData Research Infrastructure supports data science towards the new frontiers of big data exploitation. Our research infrastructure serves a large community of social sensing and social mining researchers and it reduces the gap between existing research centres present at European level. SoBigData integrates resources and creates an infrastructure where sharing data and methods among text miners, visual analytics researchers, socio-economic scientists, network scientists, political scientists, humanities researchers can indeed occur. The main concepts related to SoBigData Research Infrastructure are presented. These concepts support virtual and transnational (on-site) access to the resources. Creating and supporting research communities are considered to be of vital importance for the success of our research infrastructure, as well as contributing to train the new generation of data scientists. Furthermore, this paper introduces the concept of exploratory and shows their role in the promotion of the use of our research infrastructure. The exploratories presented in this paper represent also a set of real applications in the context of social mining. Finally, a special attention is given to the legal and ethical aspects. Everything in SoBigData is supervised by an ethical and legal framework.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The IEEE Glossary defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1].
The formal definition of Virtual and Transnational access (and their key performance indicators) is defined by the European Community—Infraia-1-2014-2015 call (https://goo.gl/E6Cyze).
The description of the project consortium is available at the following link: http://project.sobigdata.eu/consortium.
The access of users not working in a EU or associated country is limited to 20% of the total amount of units of access provided under the grant.
The list of international experts inside the SoBigData advisory board is available at the following link: http://project.sobigdata.eu/management-bodies/project-advisory-board.
The VREs [5] are web-based, community-oriented, comprehensive, flexible, and secure working environments. They are conceived and tailored to satisfy the needs of a designated community. Generally, they offer: (i) a rich array of services for data discovery and access, (ii) a data analytics platform, (iii) collaboration-oriented facilities enabling scientists.
The SoBigData e-Infra is powered by D4Science [6].
The master in big data of the University of Pisa is an annual course to become data scientists (http://masterbigdata.it/en).
The Ph.D. in Data Science is aimed at educating the new generation of researchers that combine their disciplinary competences with those of a data scientist (http://phd.sns.it/it/data-science/).
The updated list and the description of all dissemination and training events inside SoBigData is available at the following link: http://www.sobigdata.eu/events/.
The description of these initiatives (called Tuscan Big Data Challenge) is available at the following link: http://www.sobigdata.eu/blog/tuscan-big-data-challenge-20172018.
The number of available resources is growing up thanks to the collaboration between the original partners, new organizations and users.
From May 2018 the new General Data Protection Regulation (GDPR) shall apply replacing the Data Protection Directive (DPD) and its national implementations.
The following link reports list of exploratory available in SoBigData: http://www.sobigdata.eu/exploratories/.
Espresso is an Italian newspaper edited by Gruppo Editoriale l’Espresso.
An important Spanish newspaper edited by PRISA.
The walk-throughs are currently freely available and published at http://sobigdata.ee.
SoBigData project is developing a language and an execution platform for representing scientific process in highly heterogeneous e-Infrastructures in terms of so-called hybrid workflows. Currently, SoBigData workflows can express sequences of manually executable actions, which offer a formal and high-level description of a reasoning, protocol, or procedure, and machine-executable actions, which enable the fully automated execution of one (or more) web services [20].
The GATE platform provides end-to-end text processing solutions. A last version of the GATE platform is available at cloud.gate.ac.uk.
Mímir is a DBMS used by GATE Infrastructure for collecting documents with information stored as annotations.
Using SoBigData Gateway the user can access all the information required for register and upload a dataset or a method (https://sobigdata.d4science.org/group/sobigdata-gateway).
References
Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991)
Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)
Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015)
Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015)
Candela, L., Castelli, D., Pagano, P.: Virtual research environments: an overview and a research agenda. Data Sci. J. 12, GRDI75–GRDI81 (2013). https://doi.org/10.2481/dsj.GRDI-013
Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014)
Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017). https://doi.org/10.1016/j.is.2015.11.002
Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance (CRM). ACM TIST 8(1), 14:1–14:26 (2016). https://doi.org/10.1145/2912148
Moise, I., Gaere, E., Merz, R., Koch, S., Pournaras, E.: Tracking language mobility in the twitter landscape. In: (IEEE) International Conference on Data Mining Workshops (ICDM) Workshops (2016). https://doi.org/10.1109/ICDMW.2016.0099
Mazzarisi, P., Lillo, F.: Methods for Reconstructing Interbank Networks from Limited Information: A Comparison. In: Abergel, F., et al. (eds.) Econophysics and Sociophysics: Recent Progress and Future Directions. New Economic Windows. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47705-3_15
Garimella, K., De Francisci Morales, G., Gionis, A., Mathioudakis, M.: Reducing controversy by connecting opposing views. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17 (2017). https://doi.org/10.1145/3018661.3018703
Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008)
Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016)
Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014)
Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017)
Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015)
Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)
Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016)
Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015)
Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017)
Grossi, V., Rapisarda, B., Romano, V.: Fact sheets aimed at different stakeholders, SoBigData project deliverable. https://goo.gl/aCC8Le (2015)
Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://goo.gl/NUiWhR (2016)
Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://goo.gl/5MLkzN (2017)
Acknowledgements
This work has been supported by the EC H2020 INFRAIA-1-2014-2015—Project “SoBigData Research Infrastructure: Social Mining & Big Data Ecosystem” (Grant Agreement No. 654024). This work would not have been possible without the contributions of all the people involved in the SoBigData project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Grossi, V., Rapisarda, B., Giannotti, F. et al. Data science at SoBigData: the European research infrastructure for social mining and big data analytics. Int J Data Sci Anal 6, 205–216 (2018). https://doi.org/10.1007/s41060-018-0126-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0126-x