The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data Quality
<p>Relationship between artificial intelligence, machine learning, and deep learning.</p> "> Figure 2
<p>A taxonomy showing the integration of machine learning and citizen science based on the three citizen science steps of engagement, data collection, and data quality.</p> "> Figure 3
<p>Screenshot from Braindr application [<a href="#B73-sustainability-13-08087" class="html-bibr">73</a>] where citizen scientists are required to label the MRI images by selecting pass or fail.</p> "> Figure 4
<p>Benefits and risks of combining citizen science and machine learning.</p> ">
Abstract
:1. Introduction
- What are some examples of successful citizen science projects where ML is integrated?
- What ML techniques have been used in these projects?
- What citizen science tasks have been affected by ML in such projects?
- What are the benefits and risks of integrating ML in citizen science for practitioners and citizen scientists?
- What are the possible future challenges that might arrive as a result of the combination of ML and citizen science?
- What are the gaps and limitations of including ML in citizen science?
2. Types of Machine Learning and Applications
- Supervised learning: In supervised learning, the training data are labeled, and the task is to map the input (independent variables) to the output (dependent variables). The two typical types of supervised learning are classification, where the output variable is categorized, and regression, where the output variable is continuous [16]. The most widely known algorithms of supervised learning are k-nearest neighbors (KNN), linear regression, logistic regression, support vector machines (SVMs), decision trees, random forest (RF), and neural networks (NN).
- Unsupervised learning: In unsupervised learning, the training data are not labeled, and the goal is to identify structures and patterns in the data [16]. The typical types of unsupervised learning include clustering (grouping similar input data), dimension reduction (extracting meaningful features from the data), and association (exploring the data to discover relationships between attributes) [16]. Some of the most known algorithms of unsupervised learning are k-means, one-class SVM, hierarchical cluster analysis (HCA), and principal component analysis (PCA).
- Reinforcement learning: In reinforcement learning, the learning algorithm, also called the agent, observes the environment and learns through a system of rewards and punishments. Reinforcement learning is commonly used in robotics, such as walking robots and self-driving vehicles, as well as in real-time decision making and game AI [16].
- Deep learning, a subset of ML (See Figure 1 for the relationship between AI, ML, and deep learning), is concerned with algorithms known as artificial neural networks that attempt to simulate the structure and functions of a biological brain [17]. Since there is a significant body of literature on AI and ML algorithms, we briefly discuss some of the common AI, ML, and deep learning techniques applied largely in scientific projects:
- Computer vision (CV): CV is an interdisciplinary scientific field which aims at developing techniques so that computers can identify and understand the contents in digital images and videos. In other words, CV aims at enabling computers to identify elements in images the same as humans would do. The advances in artificial neural networks and deep learning have had great impact on CV, which in some cases outperforms the human power to identify objects [18]. Some popular applications of CV include self-driving cars, face recognition, etc. [16]. Moreover, starting in the year 2020 and with the COVID-19 pandemic, CV has been applied in monitoring and detecting social distancing among people [19]. CV has also been commonly used in species identification, with Plant@net [20] and iNaturalist being two well-known citizen science examples. A class of deep learning which is commonly used in CV is the convolutional neural network (CNN).
- Natural language processing (NLP): NLP is a subfield of linguistics, computer science and AI that deals with human–computer interactions through the use of natural language, which means that NLP aims to enable computers to read and understand human language [21]. The mechanism involves the machine capturing the human’s words (text or audio), processing the words and preparing a response, and returning the produced response (in the form of audio or text) to the human. Language translation applications such as Google Translate or DeepL [22], as well as personal assistant applications (e.g., Siri or Alexa), are common uses of NLP in people’s daily lives.
- Acoustic identification: Acoustic identification is a technique based on pattern recognition and signal analysis, where the acoustic data are processed and features are extracted and classified. Main applications of acoustic identification are in species detection [23]. For example, BirdNet [24] is an application to identify bird species based on the bird song.
- Automated reasoning: Automated reasoning is a branch of AI that seeks to train machines to solve problems using logical reasoning [25]. In other words, in automated reasoning, the computer is given knowledge and can generate new knowledge from it, which it then uses to make rational decisions. Automated reasoning is mainly used to assess if something is true or false or whether an event will occur or not.
3. The Influence of ML on Citizen Science Steps
- Defining the problem: Exploring the problem that needs to be solved by answering questions, such as why this issue is important, who the stakeholders are, and what will be achieved.
- Designing the project: Identifying the objectives, allocating the necessary resources (funding, team members, equipment, etc.), and defining the project planning.
- Building a community: Encouraging the general public to participate in the project and sustaining their engagement by establishing a trusting relationship with the volunteers.
- Data collection, quality assurance, and analysis: Designing data collection tools, training volunteers, determining how to store data, filtering and cleaning collected data, analyzing data to detect trends, and sharing data with participants or other practitioners.
- Sustain and improve the project: Maintaining project funding by searching for different sources of funding, and sustaining participation by communicating with volunteers and receiving/giving feedback from/to them.
3.1. ML for Engaging the Public and Sustaining Participation
- Automatic community search: The traditional approaches such as word-of-mouth, social media posts, direct emails, workshops, etc., while beneficial for building a community, can be time consuming or require financial resources (for instance, for organization of workshops or ads in newsletters). Antoniou et al. [37] have proposed a guidance tool to provide information to volunteers so that they can find the VGI (volunteered geographic information) project of their choice based on their motivations and interests. To automate what they have proposed, ML algorithms can be used to find and classify the potential target participants based on their interests and to introduce a project to them accordingly. Several studies have been conducted to apply ML algorithms to extract relevant information from social media (e.g., Twitter or Instagram) posts, such as where the images were taken, what type of content is contained in the image, or what topic is mostly discussed in the textual posts [38,39]. As a result, similar approaches can be adapted to citizen science projects by employing ML techniques such as CV and NLP to identify people’s interests from social media posts and linking them to the relevant citizen science project. Furthermore, to the best of our knowledge, the use of ML in user profiling to create a recommendation system [40,41] where citizen science projects are recommended to people based on their sociodemographic details is not used as a way to engage people to contribute to citizen science projects. Moreover, the use of chat bots in citizen science projects can be a potential approach in engaging and sustaining participation, which has been applied in few studies [42,43]. Chat bots may also help as a real-time guide for participants.
- Automatic feedback to participants: As discussed in some studies, participants may become discouraged if they do not receive feedback on their contributions [44,45]. Moreover, due to massive amounts of data, it is time-consuming to provide feedback to all participants, or often, feedback from experts is provided after a long time has passed [45,46]. In order to inform participants regarding the quality of their contributions and to update them regarding the project advancements, automatic informative and user-based feedback can be generated using ML algorithms [47]. The participants can be informed about the quality of their contribution and how they can enhance it and can learn from the feedback provided (e.g., learning about biodiversity through feedback regarding species habitat characteristics). Thus, human–computer interaction through machine-generated feedback can be a strategy for increasing and sustaining participation in citizen science projects.
3.2. ML for Data Collection
- Machines as sensors (adapted from citizens as sensors): The integration of ML in the first form of data collection, crowdsourcing, can be performed using AI-based tools, such as AI-based cameras. A well-known example in ecological studies is the use of camera traps to automatically capture images of species [53]. Moreover, sensors integrated with ML techniques can automatically record measurements such as noise recording [54] or air pollution [55].
- Machine thinking (adapted from volunteer thinking): For the second form of data collection, where cognition is involved, ML algorithms can learn to automate certain tasks, such as object detection in images/videos, which is the most common technique, or more complex tasks, such as automated prediction of protein structures using deep learning [56].
3.3. ML for Data Validation
- Automatic data quality assurance: The static comparison of the contributed data with reference datasets has been used in biodiversity citizen science projects to perform automated filtering of unusual observations [45]. However, rather than comparing the submitted data with the historical records, the ML algorithms could be used to perform real-time validation and confirmation of the newly contributed data. For example, species distribution models can be used to validate the spatial accuracy of biodiversity observations, or a CNN algorithm can be used to validate images labeled by the participants.
- Classification of participant’s level of expertise: The level of expertise and experience in contribution varies among participants in citizen science projects. For example, in biodiversity monitoring projects such as eBird [57] or iNaturalist, some participants contribute observations casually, while others are very involved and experienced and may even be considered as expert volunteers not only to contribute data but also to verify others’ observations [58]. Thus, the contributors’ previous records can be used in ML algorithms to classify the participants (e.g., by assigning them scores based on their level of expertise), and the newly contributed data can be validated based on the classification of the participants’ levels of expertise.
4. Use Cases
- Camera trap projects: when it comes to the combination of ML and citizen science in biodiversity research, one of the most common approaches is the use of camera traps, where cameras are installed in nature to take photos of species, and the photos are then labeled by citizen scientists to feed and train ML algorithms [11,59]. Citizen scientists may, depending on the project, be involved in only one or all the activities of camera placement, submission of images, and labeling and classification of images/videos from camera traps [59]. MammalWeb [60], eMammal [61], and WildBook [62] are three examples of projects focused on camera traps data, and depending on the projects’ goals, they invite volunteers to either collect or classify images (Table 1). The use of contributed images to train CNN algorithms for automatic wildlife identification can result in the implementation of software packages such as the R package MLWIC (Machine Learning for Wildlife Image Classification) [63], which can be useful for environmental studies, particularly for ecologists. Another approach of integrating human and machine intelligence in camera trap projects is to invite volunteers to observe species images and confirm machine predicted labels in each image [11]. This approach helps to balance the time required for labeling images while maintaining high quality classification, and human intelligence is used for verification and identifying more challenging species that are difficult for machines to classify.
- Species identification based on images and metadata: the majority of species identification projects use only images to train ML algorithms [64]. However, the identification of some species only with images and in the absence of other metadata is very complex both for humans and machines, and only human experts are able to distinguish among various images. Including metadata such as the spatial and temporal distribution or the ability of observers to identify species can increase ML predictive performance and provide more confidence in species identification. One example in this case is a study performed by Terry et al. [5] to identify ladybirds using both images and metadata such as location, date, and observer’s expertise (Table 1). Another example is the eBird project [65], where a probabilistic model has been developed to classify observers as experts and novices, taking into account their experience in making contributions (Table 1). Another project, BeeWatch, invites citizen scientists to identify bumblebee species in images [66], and it employs natural language generation (NLG) to provide volunteers with real-time feedback (Table 1). Experiments conducted by the BeeWatch researchers with project participants revealed that the automatically generated feedback improved the participants’ learning and increased their engagement [66].
- Marine life identification: unlike other species, marine life identification by combining ML and citizen science has rarely been discussed [67]. In an article by Langenkämper et al. [67], the authors focused on combining ML and citizen science in annotation of marine life images. Citizen scientists are requested to annotate the images (digitize a bounding box around the species in the image); however, there is a possibility that volunteers may miss identifying the species (false negative), annotate a species which is not present in the image (false positive), or place the bounding box incorrectly. Despite all of the possible annotation errors, the authors conclude that merging citizen science with ML in marine life studies has considerable promise, providing that citizen scientists receive sufficient training prior to image annotation (Table 1).
- Automatic wildlife counts from aerial images: estimating wildlife abundance is an important aspect of biodiversity conservation studies. One approach is to count the species in aerial images. However, if done entirely manually, this is an extremely time consuming and labor-intensive process. A study focused on the counts of wildebeests in aerial images [68] has illustrated promising results in obtaining accurate counts by combining citizen science and deep learning (Table 1). In this study, the counting is done by both citizen scientists and machines (a trained CNN algorithm), and while the results indicate that the machine performance is faster and more accurate than the human, the authors state that the citizen scientists’ contributions are essential in providing training data to feed the algorithm.
Science Field | Use Case Example | Impact on Citizen Science Task | Machine Learning Technique | Brief Objective |
---|---|---|---|---|
Environmental science | Wildlife species identification using camera traps [11,59,60,61,62] | Automate data collection:
| Supervised learning: computer vision and the use of CNN |
|
Ladybird identification based on images and metadata [5] | Automate data validation:
| Supervised learning: NN for metadata only, CNN (use of transfer learning) for image only and a combined model for metadata and image | Train ML algorithms to automatically identify ladybird species using images along with the structured metadata (date, location, and citizen scientists’ experiences) | |
eBird, use observers’ expertise to verify the contribution [65,74] | Automating data validation:
| Using probabilistic models and automated reasoning based on observers’ previous contributions | Classification of citizen scientists to experts and novices to improve identification of new species, and pass the rare species detection task to the expert observers | |
BeeWatch, identification of bumblebees [66] | Generate automatic feedback to:
| Natural Language Generation (NLG) | Automatically generate feedback with the aim of improving participants’ ability to identify bumblebees and increasing their engagement | |
Marine life identification [67] | Automatic data collection and validation:
| Supervised learning: computer vision and CNN (use of transfer learning) | Improving marine species identification by combining citizen scientists and deep learning | |
Automatic species count from aerial images [68] | Automatic data validation:
| Supervised learning: computer vision and use of CNN | Combination of citizen scientists and deep learning to improve wildlife counting in aerial images for conservation purposes | |
Neuroscience | Braindr [26] | Automatic data collection:
| Supervised learning: computer vision and use of CNN | Amplification of expert-labeled MRI images with the help of citizen scientists, followed by the use of the amplified labels to train an algorithm to automatically replicate the labeling task of experts |
Astronomy | Galaxy Zoo [70] | Automatic data collection:
| Supervised learning: computer vision and use of CNN | Classify galaxy images by training an ML algorithm based on citizen scientists’ input |
Milky Way [72] | Automatic data collection:
| Supervised learning:Random Forest algorithm |
|
5. Benefits and Risks
5.1. Engagement
- Benefits: As mentioned earlier, one of the benefits of AI for community building in citizen science projects is to encourage engagement by targeting the potential volunteers through social media. Another important factor in citizen science is the impact of the interaction with and feedback to the participants on the basis of their contributions [76,77]. Thus, the use of ML in citizen science in providing automated feedback to the participants might promote engagement through human–computer interaction and result in sustaining participation. Furthermore, the intelligently generated feedback can provide participants with useful knowledge about the research subject, allowing them to learn while contributing, which can be another factor in increasing participation (e.g., BeeWatch project). Another potential benefit of combining ML and citizen science is that it encourages interdisciplinary engagement among volunteers and researchers, which can lead to collaborations from several scientific fields [9]. Finally, automating certain simple tasks allows volunteers to concentrate on more complicated ones, such as identifying common species from camera trap images using CNN and leaving the identification of the unusual species to volunteers. However, there is another side to the task automation, which is discussed in the risk section.
- Risks: The use of ML in citizen science could result in the automation of most tasks, which may demotivate participants because they are fully or partially being replaced by machines. As previously mentioned in the use cases, in most projects, citizen science data is used to train ML algorithms, and then the tasks can be performed entirely by machines, effectively replacing humans. While it has been mentioned that in the case of task automation, citizen scientists would then concentrate on more challenging tasks, some participants would like to contribute to citizen science projects to fill their spare time with activities that make them feel good, such as helping science or spending time in nature (see [36]), which are not inherently challenging. For example, in the sMapShot project [52] (a citizen science project for georeferencing historical images), there is strong competition among participants of higher age groups, and the incentive system plays an important role in motivating them; therefore, if the computer performs the task more efficiently, motivation is expected to drop, and thus participation will decline. One solution is that, considering all activity levels among participants, participants are allowed to contribute with their task of interest even if the task can be fully automated by machines, and thus the contributions can be helpful in retraining the algorithms to have a better performance. Another recommendation is to incorporate new forms of contributions to fill in the gap caused by automated tasks. Furthermore, another potential risk is the overestimation of AI power in citizen science projects, such as trusting model predictions over expert volunteers, which could result in disengaging the participants [8].
5.2. Data Quality
- Benefits: The use of ML in citizen science will speed up the process of big data validation, reducing the workload of manual data quality assurance for experts [46,47]. Prescreening and filtering data (for example, removing empty images or low-quality images in camera trap projects), flagging erroneous observations, and submitting only flagged observations for expert verification will save a lot of time and allow the experts to concentrate on the scientific aspects of the project rather than the manual filtering of all data. Furthermore, the generation of real-time informative and user-centered feedback for participants with information about their contributions will improve the participants’ knowledge on the subject, their proficiency, and, as a result, the quality of data they contribute over time. Another finding from the BeeWatch project concerning the impact of feedback on volunteers was that NLG feedback resulted in increased learning, and the identification accuracy was higher for those who received informative feedback than for those who only received confirmation of correct identification [66].
- Risks: Although the benefits of automatic filtering and validation have been discussed, the efficiency and reliability of automated validation and feedback are highly dependent on the data used to train the ML algorithms. For example, if the training data are biased in some way, such as spatially or temporally, the automated data validation based on the trained model is also biased and could provide participants and experts with misleading information [9]. In addition to bias in the data, it is critical that the data used to train the model are of a gold standard and validated by experts, since the trained model will be used to verify new data, and if the input data are uncertain, the model will predict false detections [9], such as failing to identify a species, in the case of a false negative, or incorrectly detecting an abnormal shape in an MRI image, in the case of a false positive. It is important to keep in mind that machine intelligence should not be overestimated in comparison to human intelligence. In other words, when participants receive machine-generated feedback on their contribution, the decision to either modify or retain the contribution should be made by the participants, and human experts will make the final confirmation in such cases. It is also necessary to note that when a model is trained on data from a specific region, it cannot necessarily be applicable in other areas, and doing so can result in misevaluation and the generation of misleading information. Furthermore, training algorithms for small datasets (such as rare species, see [12]) or multitype datasets (such as a mix of images and metadata, see [5]) and learning how to tune the parameters of the algorithms to achieve the desired performance are hard challenges that must be considered prior to performing automated data validation in citizen science projects.
5.3. Ethics
- Benefits: The use of machine learning (ML) can be advantageous in filtering sensitive information from citizen science data, such as human faces or license plates in images. Furthermore, ML can be used to detect illegal actions, such as illegal animal trades, by sentiment analysis using information posted on social media platforms such as Twitter [78].
- Risks: One major concern of integrating ML in citizen science is the use of data collected from participants for other commercial reasons, which may go against the participants wishes and result in their disengagement from the project. Thus, it is critical to be transparent and communicate effectively with participants on how their inputs are being used in the algorithms, rather than simply creating a black box project in which the participants function is limited to producing data and feeding the algorithms [8,9,10]. As discussed in [8], technology giants like Google and Facebook offer target-oriented advertisement services by selling personal information, which can be a danger for the future of AI-based services used in citizen science projects, as it may lead to a lack of confidence on the part of participants to freely share their contributions and personal information. Another ethical issue that may emerge from ML-based citizen science projects is the sharing of sensitive data that may be deceptive or result in geoprivacy violations, such as predicting the position of endangered species or predicting participant activity based on the history of their contributions.
6. Future Challenges and Conclusions
- (1)
- One potential challenge is to explore the integration of ML in biodiversity citizen science projects for rare species identification, for instance, by using approaches such as few-shot learning [79]. In contrast to common ML algorithms, few-shot learning requires a very minimal amount of data to train the model, and it is primarily utilized in computer vision [80], of which a particular case is one-shot learning for face recognition [81].
- (2)
- The focus of the use of ML in citizen science is currently more on automatic identification and less on user engagement; thus, exploring the use of ML in increasing engagement and sustaining participation remains an area for future investigation. For instance, one potential approach to be explored is the use of gamified AI in citizen science towards attracting more volunteers as well as sustaining participation [10,82].
- (3)
- While the impact of machine-generated feedback on sustaining participation is discussed, one possible future challenge is to determine whether the generation of feedback that simulates more human responses, rather than repetitive generated feedback, can have an impact on increasing engagement.
- (4)
- Training participants has been shown in studies to improve data quality; however, providing training is not always simple and requires both human and financial resources. A possible suggestion will be to use AI to provide training prior to data collection; although this has been achieved in the case of feedback (for example, in the BeeWatch project [66]), AI can be used to provide training in a variety of ways, such as through interactive courses entirely managed by AI.
- (5)
- Participants are more motivated to contribute to a project if there have been prior contributions or if there are other participants for the sake of competition; however, large numbers of contributions will make participants feel less motivated and assume they have little to contribute to the project. One theory is that people in older age groups can become demotivated if there are too many contributions. One role of AI may be to consider user demographics and, as a result, balance how much data each user can visualize.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shoham, Y.; Perrault, R.; Brynjolfsson, E.; Openai, J.C.; Manyika, J.; Niebles, J.C.; Lyons, T.; Etchemendy, J.; Grosz, B.; Bauer, Z. The AI Index 2018 Annual Report; AI Index Steering Committee, Human-Centered AI Initiative, Stanford University: Stanford, CA, USA, 2018. [Google Scholar]
- Shinde, P.P.; Shah, S. A Review of Machine Learning and Deep Learning Applications. In Proceedings of the 2018 4th International Conference on Computing, Communication Control and Automation, ICCUBEA, Pune, India, 16–18 August 2018; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2018. [Google Scholar]
- Popenici, S.A.D.; Kerr, S. Exploring the impact of artificial intelligence on teaching and learning in higher education. Res. Pract. Technol. Enhanc. Learn. 2017, 12, 22. [Google Scholar] [CrossRef]
- Rzanny, M.; Seeland, M.; Wäldchen, J.; Mäder, P. Acquiring and preprocessing leaf images for automated plant identification: Understanding the tradeoff between effort and information gain. Plant Methods 2017, 13, 97. [Google Scholar] [CrossRef] [Green Version]
- Terry, J.C.D.; Roy, H.E.; August, T.A. Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data. Methods Ecol. Evol. 2020, 11, 303–315. [Google Scholar] [CrossRef] [Green Version]
- Hecker, S.; Bonney, R.; Haklay, M.; Hölker, F.; Hofer, H.; Goebel, C.; Gold, M.; Makuch, Z.; Ponti, M.; Richter, A.; et al. Innovation in Citizen Science—Perspectives on Science-Policy Advances. Citiz. Sci. Theory Pract. 2018, 3, 4. [Google Scholar] [CrossRef] [Green Version]
- Wright, D.E.; Fortson, L.; Lintott, C.; Laraia, M.; Walmsley, M. Help Me to Help You. ACM Trans. Soc. Comput. 2019, 2, 1–20. [Google Scholar] [CrossRef]
- Ceccaroni, L.; Bibby, J.; Roger, E.; Flemons, P.; Michael, K.; Fagan, L.; Oliver, J.L. Opportunities and Risks for Citizen Science in the Age of Artificial Intelligence. Citiz. Sci. Theory Pract. 2019, 4, 29. [Google Scholar] [CrossRef] [Green Version]
- McClure, E.C.; Sievers, M.; Brown, C.J.; Buelow, C.A.; Ditria, E.M.; Hayes, M.A.; Pearson, R.M.; Tulloch, V.J.D.; Unsworth, R.K.F.; Connolly, R.M. Artificial Intelligence Meets Citizen Science to Supercharge Ecological Monitoring. Patterns 2020, 1, 100109. [Google Scholar] [CrossRef] [PubMed]
- Franzen, M.; Kloetzer, L.; Ponti, M.; Trojan, J.; Vicens, J. Machine Learning in Citizen Science: Promises and Implications. In The Science of Citizen Science; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Willi, M.; Pitman, R.T.; Cardoso, A.W.; Locke, C.; Swanson, A.; Boyer, A.; Veldthuis, M.; Fortson, L. Identifying animal species in camera trap images using deep learning and citizen science. Methods Ecol. Evol. 2019, 10, 80–91. [Google Scholar] [CrossRef] [Green Version]
- Norouzzadeh, M.S.; Nguyen, A.; Kosmala, M.; Swanson, A.; Palmer, M.S.; Packer, C.; Clune, J. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, E5716–E5725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- iNaturalist. Available online: https://www.inaturalist.org/ (accessed on 20 May 2021).
- Ueda, K. iNaturalist. Available online: https://www.inaturalist.org/blog/31806-a-new-vision-model (accessed on 26 May 2021).
- Horn, G.; Van Aodha, O.; Mac Song, Y.; Cui, Y.; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; Belongie, S. The iNaturalist Species Classification and Detection Dataset. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8769–8778. [Google Scholar] [CrossRef] [Green Version]
- Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Paeglis, A.; Strumfs, B.; Mezale, D.; Fridrihsone, I. A Review on Machine Learning and Deep Learning Techniques Applied to Liquid Biopsy. In Liquid Biopsy; IntechOpen: London, UK, 2019. [Google Scholar]
- Borji, A.; Itti, L. Human vs. computer in scene and object recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 113–120. [Google Scholar] [CrossRef]
- Saponara, S.; Elhanashi, A.; Gagliardi, A. Implementing a real-time, AI-based, people detection and social distancing measuring system for Covid-19. J. Real-Time Image Process. 2021, 1–11. [Google Scholar] [CrossRef]
- Pl@ntNet. Available online: https://identify.plantnet.org/ (accessed on 28 May 2021).
- Chowdhury, G.G. Natural language processing. Annu. Rev. Inf. Sci. Technol. 2003, 37, 51–89. [Google Scholar] [CrossRef] [Green Version]
- DeepL. Available online: https://www.deepl.com/translator (accessed on 5 July 2021).
- Stowell, D.; Petrusková, T.; Šálek, M.; Linhart, P. Automatic acoustic identification of individuals in multiple species: Improving identification across recording conditions. J. R. Soc. Interface 2019, 16, 20180940. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- BirdNet. Available online: https://birdnet.cornell.edu/ (accessed on 26 May 2021).
- Robinson, A.J.A.; Voronkov, A. Handbook of Automated Reasoning; Elsevier: Amsterdam, The Netherlands, 2001. [Google Scholar]
- Keshavan, A.; Yeatman, J.D.; Rokem, A. Combining citizen science and deep learning to amplify expertise in neuroimaging. Front. Neuroinform. 2019, 13, 29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Joppa, L.N. The Case for Technology Investments in the Environment. Nature 2017, 552, 325–328. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mac Aodha, O.; Gibb, R.; Barlow, K.E.; Browning, E.; Firman, M.; Freeman, R.; Harder, B.; Kinsey, L.; Mead, G.R.; Newson, S.E.; et al. Bat detective—Deep learning tools for bat acoustic signal detection. PLoS Comput. Biol. 2018, 14, e1005995. [Google Scholar] [CrossRef] [Green Version]
- Parham, J.; Stewart, C.; Crall, J.; Rubenstein, D.; Holmberg, J.; Berger-Wolf, T. An Animal Detection Pipeline for Identification. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV, Lake Tahoe, NV, USA, 12–15 March 2018; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2018; pp. 1075–1083. [Google Scholar]
- Deng, D.P.; Chuang, T.R.; Shao, K.T.; Mai, G.S.; Lin, T.E.; Lemmens, R.; Hsu, C.H.; Lin, H.H.; Kraak, M.J. Using social media for collaborative species identification and occurrence: Issues, methods, and tools. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information, Redondo Beach, CA, USA, 6 November 2012; ACM Press: New York, NY, USA, 2012; pp. 22–29. [Google Scholar]
- Joshi, S.; Randall, N.; Chiplunkar, S.; Wattimena, T.; Stavrianakis, K. ‘We’—A Robotic System to Extend Social Impact of Community Gardens. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA, 5–8 March 2018; IEEE Computer Society: Washington, DC, USA, 2018; pp. 349–350. [Google Scholar]
- Bonney, R.; Ballard, H.; Jordan, R.; McCallie, E.; Phillips, T.; Shirk, J.; Wilderman, C. Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education; A CAISE Inquiry Group Report; Center for Advancement of Informal Science Education (CAISE): Washingtong, DC, USA, 2009. [Google Scholar]
- Bonney, R.; Cooper, C.B.; Dickinson, J.; Kelling, S.; Phillips, T.; Rosenberg, K.V.; Shirk, J. Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy. Bioscience 2009, 59, 977–984. [Google Scholar] [CrossRef]
- CitizenScience.gov. Basic Steps for Your Project Planning. Available online: https://www.citizenscience.gov/toolkit/howto (accessed on 20 May 2021).
- Rotman, D.; Hammock, J.; Preece, J.; Hansen, D.; Boston, C. Motivations Affecting Initial and Long-Term Participation in Citizen Science Projects in Three Countries. In iConference 2014 Proceedings; iSchools: Grandville, MI, USA, 2014. [Google Scholar]
- Lotfian, M.; Ingensand, J.; Brovelli, M.A. A Framework for Classifying Participant Motivation that Considers the Typology of Citizen Science Projects. ISPRS Int. J. Geo-Inf. 2020, 9, 704. [Google Scholar] [CrossRef]
- Antoniou, V.; Fonte, C.; Minghini, M.; See, L.; Skopeliti, A. Developing a Guidance Tool for VGI Contributors. 2016. Available online: https://core.ac.uk/download/pdf/80335283.pdf (accessed on 30 May 2021).
- Devaraj, A.; Murthy, D.; Dontula, A. Machine-learning methods for identifying social media-based requests for urgent help during hurricanes. Int. J. Disaster Risk Reduct. 2020, 51, 101757. [Google Scholar] [CrossRef]
- Park, J.; Krishna, R.; Khadpe, P.; Fei-Fei, L.; Bernstein, M. AI-Based Request Augmentation to Increase Crowdsourcing Participation. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Stevenson, WA, USA, 28–30 October 2019. [Google Scholar]
- Kanoje, S.; Mukhopadhyay, D.; Girase, S. User Profiling for University Recommender System Using Automatic Information Retrieval. Phys. Procedia 2016, 78, 5–12. [Google Scholar] [CrossRef] [Green Version]
- Barnard, T.C. User Profiling Using Machine Learning. Ph.D. Thesis, University of Southampton, Southampton, UK, 2012. [Google Scholar]
- Schade, S.; Manzoni, M.; Fullerton, K.T. Activity Report on Citizen Science—Discoveries from a Five Year Journey; Publications Office of the European Union: Luxembourg, 2020. [Google Scholar]
- Tinati, R.; Simperl, E.; Luczak-Roesch, M. To help or hinder: Real-time chat in citizen science. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media; The AAAI Press: Palo Alto, CA, USA, 2017; pp. 270–279. [Google Scholar]
- Ingensand, J.; Nappez, M.; Joost, S.; Widmer, I.; Ertz, O.; Rappo, D. The urbangene project experience from a crowdsourced mapping campaign. In Proceedings of the 2015 1st International Conference on Geographical Information Systems Theory, Applications and Management (GISTAM), Barcelona, Spain, 28–30 April 2015; pp. 178–184. [Google Scholar] [CrossRef]
- Kelling, S.; Yu, J.; Gerbracht, J.; Wong, W.K. Emergent filters: Automated data verification in a large-scale citizen science project. In Proceedings of the 2011 IEEE Seventh International Conference on e-Science Workshops, Stockholm, Sweden, 5–8 December 2011; pp. 20–27. [Google Scholar] [CrossRef]
- Bonter, D.N.; Cooper, C.B. Data validation in citizen science: A case study from Project FeederWatch. Front. Ecol. Environ. 2012, 10, 305–307. [Google Scholar] [CrossRef]
- Lotfian, M.; Ingensand, J.; Ertz, O.; Oulevay, S.; Chassin, T. Auto-filtering validation in citizen science biodiversity monitoring: A case study. Proc. Int. Cartogr. Assoc. 2019, 2, 78. [Google Scholar] [CrossRef]
- Haklay, M. Citizen Science and Volunteered Geographic Information—Overview and typology of participation. Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice. Crowdsour. Geogr. Knowl. 2013, 9789400745, 1–396. [Google Scholar] [CrossRef]
- Guillaume, G.; Can, A.; Petit, G.; Fortin, N.; Palominos, S.; Gauvreau, B.; Bocher, E.; Picaut, J. Noise mapping based on participative measurements. Noise Mapp. 2016, 3, 140–156. [Google Scholar] [CrossRef] [Green Version]
- Yadav, P.; Charalampidis, I.; Cohen, J.; Darlington, J.; Grey, F. A Collaborative Citizen Science Platform for Real-Time Volunteer Computing and Games. IEEE Trans. Comput. Soc. Syst. 2018, 5, 9–19. [Google Scholar] [CrossRef] [Green Version]
- Cooper, S.; Khatib, F.; Treuille, A.; Barbero, J.; Lee, J.; Beenen, M.; Leaver-Fay, A.; Baker, D.; Popović, Z. Foldit players: Predicting protein structures with a multiplayer online game. Nature 2010, 466, 756–760. [Google Scholar] [CrossRef] [Green Version]
- Produit, T.; Ingensand, J. 3D Georeferencing of historical photos by volunteers. Lect. Notes Geoinf. Cartogr. 2018, 113–128. [Google Scholar] [CrossRef] [Green Version]
- Wiggers, K. Google’s AI Can Identify Wildlife from Trap-Camera Footage with Up to 98.6% Accuracy. Available online: https://venturebeat.com/2019/12/17/googles-ai-can-identify-wildlife-from-trap-camera-footage-with-up-to-98-6-accuracy/ (accessed on 30 May 2021).
- Monti, L.; Vincenzi, M.; Mirri, S.; Pau, G.; Salomoni, P. RaveGuard: A Noise Monitoring Platform Using Low-End Microphones and Machine Learning. Sensors 2020, 20, 5583. [Google Scholar] [CrossRef] [PubMed]
- Le, D.; Van Tham, C.K. Machine learning (Ml)-based air quality monitoring using vehicular sensor networks. In Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, China, 15–17 December 2017; IEEE Computer Society: Washington, DC, USA, 2018; pp. 65–72. [Google Scholar]
- Panou, D.; Reczko, M. DeepFoldit-A Deep Reinforcement Learning Neural Network Folding Proteins. arXiv 2020, arXiv:2011.03442. [Google Scholar]
- EBird. Available online: https://ebird.org/home (accessed on 28 May 2021).
- Kelling, S.; Johnston, A.; Hochachka, W.M.; Iliff, M.; Fink, D.; Gerbracht, J.; Lagoze, C.; La Sorte, F.A.; Moore, T.; Wiggins, A.; et al. Can observation skills of citizen scientists be estimated using species accumulation curves? PLoS ONE 2015, 10, e139600. [Google Scholar] [CrossRef] [Green Version]
- Green, S.E.; Rees, J.P.; Stephens, P.A.; Hill, R.A.; Giordano, A.J. Innovations in Camera Trapping Technology and Approaches: The Integration of Citizen Science and Artificial Intelligence. Animals 2020, 10, 132. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hsing, P.Y.; Bradley, S.; Kent, V.T.; Hill, R.A.; Smith, G.C.; Whittingham, M.J.; Cokill, J.; Crawley, D.; Stephens, P.A. Economical crowdsourcing for camera trap image classification. Remote Sens. Ecol. Conserv. 2018, 4, 361–374. [Google Scholar] [CrossRef]
- McShea, W.J.; Forrester, T.; Costello, R.; He, Z.; Kays, R. Volunteer-run cameras as distributed sensors for macrosystem mammal research. Landsc. Ecol. 2016, 31, 55–66. [Google Scholar] [CrossRef]
- Berger-Wolf, T.Y.; Rubenstein, D.I.; Stewart, C.V.; Holmberg, J.A.; Parham, J.; Menon, S. Wildbook: Crowdsourcing, computer vision, and data science for conservation. arXiv 2017, arXiv:1710.08880. [Google Scholar]
- Tabak, M.A.; Norouzzadeh, M.S.; Wolfson, D.W.; Sweeney, S.J.; Vercauteren, K.C.; Snow, N.P.; Halseth, J.M.; Di Salvo, P.A.; Lewis, J.S.; White, M.D.; et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol. Evol. 2019, 10, 585–590. [Google Scholar] [CrossRef] [Green Version]
- Weinstein, B.G. A computer vision for animal ecology. J. Anim. Ecol. 2018, 87, 533–545. [Google Scholar] [CrossRef]
- Yu, J.; Wong, W.K.; Hutchinson, R.A. Modeling experts and novices in citizen science data for species distribution modeling. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; pp. 1157–1162. [Google Scholar] [CrossRef] [Green Version]
- Van der Wal, R.; Sharma, N.; Mellish, C.; Robinson, A.; Siddharthan, A. The role of automated feedback in training and retaining biological recorders for citizen science. Conserv. Biol. 2016, 30, 550–561. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Langenkämper, D.; Simon-Lledó, E.; Hosking, B.; Jones, D.O.B.; Nattkemper, T.W. On the impact of Citizen Science-derived data quality on deep learning based classification in marine images. PLoS ONE 2019, 14, e0218086. [Google Scholar] [CrossRef] [Green Version]
- Torney, C.J.; Lloyd-Jones, D.J.; Chevallier, M.; Moyer, D.C.; Maliti, H.T.; Mwita, M.; Kohi, E.M.; Hopcraft, G.C. A comparison of deep learning and citizen science techniques for counting wildlife in aerial survey images. Methods Ecol. Evol. 2019, 10, 779–787. [Google Scholar] [CrossRef] [Green Version]
- Lintott, C.J.; Schawinski, K.; Slosar, A.; Land, K.; Bamford, S.; Thomas, D.; Raddick, M.J.; Nichol, R.C.; Szalay, A.; Andreescu, D.; et al. Galaxy Zoo: Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc. 2008, 389, 1179–1189. [Google Scholar] [CrossRef] [Green Version]
- Jimenez, M.; Torres, M.T.; John, R.; Triguero, I. Galaxy image classification based on citizen science data: A comparative study. IEEE Access 2020, 8, 47232–47246. [Google Scholar] [CrossRef]
- Kendrew, S.; Simpson, R.; Bressert, E.; Povich, M.S.; Sherman, R.; Lintott, C.J.; Robitaille, T.P.; Schawinski, K.; Wolf-Chase, G. The milky way project: A statistical study of massive star formation associated with infrared bubbles. Astrophys. J. 2012, 755, 71. [Google Scholar] [CrossRef]
- Beaumont, C.N.; Goodman, A.A.; Kendrew, S.; Williams, J.P.; Simpson, R. The milky way project: Leveraging citizen science and machine learning to detect interstellar bubbles. Astrophys. J. Suppl. Ser. 2014, 214, 3. [Google Scholar] [CrossRef] [Green Version]
- Braindr. Available online: https://braindr.us/ (accessed on 20 May 2021).
- Johnston, A.; Fink, D.; Hochachka, W.M.; Kelling, S. Estimates of observer expertise improve species distributions from citizen science data. Methods Ecol. Evol. 2018, 9, 88–97. [Google Scholar] [CrossRef] [Green Version]
- Pettibone, L.; Vohland, K.; Ziegler, D. Understanding the (inter)disciplinary and institutional diversity of citizen science: A survey of current practice in Germany and Austria. PLoS ONE 2017, 12, e178778. [Google Scholar] [CrossRef] [Green Version]
- Tang, J.; Zhou, X.; Yu, M. Designing feedback information to encourage users’ participation performances in citizen science projects. Proc. Assoc. Inf. Sci. Technol. 2019, 56, 486–490. [Google Scholar] [CrossRef]
- Zhou, X.; Tang, J.; Zhao, Y.; Wang, T. Effects of feedback design and dispositional goal orientations on volunteer performance in citizen science projects. Comput. Hum. Behav. 2020, 107, 106266. [Google Scholar] [CrossRef]
- Di Minin, E.; Fink, C.; Hiippala, T.; Tenkanen, H. A framework for investigating illegal wildlife trade on social media with machine learning. Conserv. Biol. 2019, 33, 210–213. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Y.X.; Girshick, R.; Hebert, M.; Hariharan, B. Low-Shot Learning from Imaginary Data. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7278–7286. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Chanda, S.; Gv, A.C.; Brun, A.; Hast, A.; Pal, U.; Doermann, D. Face recognition—A one-shot learning perspective. In Proceedings of the 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Sorrento, Italy, 26–29 November 2019; pp. 113–119. [Google Scholar] [CrossRef]
- Bowser, A.; Hansen, D.; He, Y.; Boston, C.; Reid, M.; Gunnell, L.; Preece, J. Using gamification to inspire new citizen science volunteers. In Proceedings of the First International Conference on Gameful Design, Research, and Applications; Association for Computing Machinery: New York, NY, USA, 2013; pp. 18–25. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, Y.; Zhang, S.; Ogai, H. Deep 3D Object Detection Networks Using LiDAR Data: A Review. IEEE Sens. J. 2021, 21, 1152–1171. [Google Scholar] [CrossRef]
- Engels, G.; Aranjuelo, N.; Arganda-Carreras, I.; Nieto, M.; Otaegui, O. 3D object detection from LiDAR data using distance dependent feature extraction. In Proceedings of the 6th International Conference on Vehicle Technology and Intelligent Transport Systems, Online, 2–4 May 2020; pp. 289–300. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lotfian, M.; Ingensand, J.; Brovelli, M.A. The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data Quality. Sustainability 2021, 13, 8087. https://doi.org/10.3390/su13148087
Lotfian M, Ingensand J, Brovelli MA. The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data Quality. Sustainability. 2021; 13(14):8087. https://doi.org/10.3390/su13148087
Chicago/Turabian StyleLotfian, Maryam, Jens Ingensand, and Maria Antonia Brovelli. 2021. "The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data Quality" Sustainability 13, no. 14: 8087. https://doi.org/10.3390/su13148087
APA StyleLotfian, M., Ingensand, J., & Brovelli, M. A. (2021). The Partnership of Citizen Science and Machine Learning: Benefits, Risks, and Future Challenges for Engagement, Data Collection, and Data Quality. Sustainability, 13(14), 8087. https://doi.org/10.3390/su13148087