Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Data science at SoBigData: the European research infrastructure for social mining and big data analytics

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Most people have become “big data” producers in their daily life. Our desires, opinions, sentiments, social links as well as our mobile phone calls and GPS track leave traces of our behaviours. To transform these data into knowledge, value is a complex task of data science. This paper shows how the SoBigData Research Infrastructure supports data science towards the new frontiers of big data exploitation. Our research infrastructure serves a large community of social sensing and social mining researchers and it reduces the gap between existing research centres present at European level. SoBigData integrates resources and creates an infrastructure where sharing data and methods among text miners, visual analytics researchers, socio-economic scientists, network scientists, political scientists, humanities researchers can indeed occur. The main concepts related to SoBigData Research Infrastructure are presented. These concepts support virtual and transnational (on-site) access to the resources. Creating and supporting research communities are considered to be of vital importance for the success of our research infrastructure, as well as contributing to train the new generation of data scientists. Furthermore, this paper introduces the concept of exploratory and shows their role in the promotion of the use of our research infrastructure. The exploratories presented in this paper represent also a set of real applications in the context of social mining. Finally, a special attention is given to the legal and ethical aspects. Everything in SoBigData is supervised by an ethical and legal framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The IEEE Glossary defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1].

  2. The formal definition of Virtual and Transnational access (and their key performance indicators) is defined by the European Community—Infraia-1-2014-2015 call (https://goo.gl/E6Cyze).

  3. The description of the project consortium is available at the following link: http://project.sobigdata.eu/consortium.

  4. The access of users not working in a EU or associated country is limited to 20% of the total amount of units of access provided under the grant.

  5. The list of international experts inside the SoBigData advisory board is available at the following link: http://project.sobigdata.eu/management-bodies/project-advisory-board.

  6. The VREs [5] are web-based, community-oriented, comprehensive, flexible, and secure working environments. They are conceived and tailored to satisfy the needs of a designated community. Generally, they offer: (i) a rich array of services for data discovery and access, (ii) a data analytics platform, (iii) collaboration-oriented facilities enabling scientists.

  7. The SoBigData e-Infra is powered by D4Science [6].

  8. The master in big data of the University of Pisa is an annual course to become data scientists (http://masterbigdata.it/en).

  9. The Ph.D. in Data Science is aimed at educating the new generation of researchers that combine their disciplinary competences with those of a data scientist (http://phd.sns.it/it/data-science/).

  10. The updated list and the description of all dissemination and training events inside SoBigData is available at the following link: http://www.sobigdata.eu/events/.

  11. The description of these initiatives (called Tuscan Big Data Challenge) is available at the following link: http://www.sobigdata.eu/blog/tuscan-big-data-challenge-20172018.

  12. The number of available resources is growing up thanks to the collaboration between the original partners, new organizations and users.

  13. From May 2018 the new General Data Protection Regulation (GDPR) shall apply replacing the Data Protection Directive (DPD) and its national implementations.

  14. The following link reports list of exploratory available in SoBigData: http://www.sobigdata.eu/exploratories/.

  15. Espresso is an Italian newspaper edited by Gruppo Editoriale l’Espresso.

  16. An important Spanish newspaper edited by PRISA.

  17. The walk-throughs are currently freely available and published at http://sobigdata.ee.

  18. SoBigData project is developing a language and an execution platform for representing scientific process in highly heterogeneous e-Infrastructures in terms of so-called hybrid workflows. Currently, SoBigData workflows can express sequences of manually executable actions, which offer a formal and high-level description of a reasoning, protocol, or procedure, and machine-executable actions, which enable the fully automated execution of one (or more) web services [20].

  19. The GATE platform provides end-to-end text processing solutions. A last version of the GATE platform is available at cloud.gate.ac.uk.

  20. Mímir is a DBMS used by GATE Infrastructure for collecting documents with information stored as annotations.

  21. Using SoBigData Gateway the user can access all the information required for register and upload a dataset or a method (https://sobigdata.d4science.org/group/sobigdata-gateway).

References

  1. Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991)

    Google Scholar 

  2. Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)

    Article  Google Scholar 

  3. Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015)

  4. Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015)

  5. Candela, L., Castelli, D., Pagano, P.: Virtual research environments: an overview and a research agenda. Data Sci. J. 12, GRDI75–GRDI81 (2013). https://doi.org/10.2481/dsj.GRDI-013

    Article  Google Scholar 

  6. Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014)

  7. Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017). https://doi.org/10.1016/j.is.2015.11.002

    Article  Google Scholar 

  8. Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance (CRM). ACM TIST 8(1), 14:1–14:26 (2016). https://doi.org/10.1145/2912148

    Article  Google Scholar 

  9. Moise, I., Gaere, E., Merz, R., Koch, S., Pournaras, E.: Tracking language mobility in the twitter landscape. In: (IEEE) International Conference on Data Mining Workshops (ICDM) Workshops (2016). https://doi.org/10.1109/ICDMW.2016.0099

  10. Mazzarisi, P., Lillo, F.: Methods for Reconstructing Interbank Networks from Limited Information: A Comparison. In: Abergel, F., et al. (eds.) Econophysics and Sociophysics: Recent Progress and Future Directions. New Economic Windows. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47705-3_15

    Chapter  Google Scholar 

  11. Garimella, K., De Francisci Morales, G., Gionis, A., Mathioudakis, M.: Reducing controversy by connecting opposing views. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17 (2017). https://doi.org/10.1145/3018661.3018703

  12. Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008)

  13. Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016)

  14. Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014)

    Google Scholar 

  15. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017)

  16. Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015)

  17. Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)

    Article  Google Scholar 

  18. Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016)

    Google Scholar 

  19. Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015)

  20. Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017)

  21. Grossi, V., Rapisarda, B., Romano, V.: Fact sheets aimed at different stakeholders, SoBigData project deliverable. https://goo.gl/aCC8Le (2015)

  22. Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://goo.gl/NUiWhR (2016)

  23. Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://goo.gl/5MLkzN (2017)

Download references

Acknowledgements

This work has been supported by the EC H2020 INFRAIA-1-2014-2015—Project “SoBigData Research Infrastructure: Social Mining & Big Data Ecosystem” (Grant Agreement No. 654024). This work would not have been possible without the contributions of all the people involved in the SoBigData project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valerio Grossi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grossi, V., Rapisarda, B., Giannotti, F. et al. Data science at SoBigData: the European research infrastructure for social mining and big data analytics. Int J Data Sci Anal 6, 205–216 (2018). https://doi.org/10.1007/s41060-018-0126-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0126-x

Keywords

Navigation