Abstract
The abundance of mined, predicted and uncertain biological data warrant massive, efficient and scalable curation efforts. The human expertise required for any successful curation enterprise is often economically prohibitive, especially for speculative end user queries that ultimately may not bear fruit. So the challenge remains in devising a low cost engine capable of delivering fast but tentative annotation and curation of a set of data items that can later be authoritatively validated by experts demanding significantly smaller investment. The aim thus is to make a large volume of predicted data available for use as early as possible with an acceptable degree of confidence in their accuracy while the curation continues. In this paper, we present a novel approach to annotation and curation of biological database contents using crowd computing. The technical contribution is in the identification and management of trust of mechanical turks, and support for ad hoc declarative queries, both of which are leveraged to enable reliable analytics using noisy predicted interactions. While the proposed approach and the CrowdCure system are designed for literature mined protein-protein interaction data curation, they are amenable to substantial generalization.
Similar content being viewed by others
Notes
It is often believed biases also exist due to sparseness of data in high throughput PPI data collection techniques. For example, mass-spectrometry proteomics are known to be biased in detecting large, abundant or sticky proteins [56].
By no means we preclude the possibility of inserting ground data into the tables using traditional database operations such as INSERT or UPDATE for subsequent crowd curation.
The alternative choice is to separate mining from curation by decoupling the USING ON clause from SOURCE BEFORE clause. This alternate choice would allow importing data other sources and initiate curation as a distinct step.
Turks may opt out from any assignment even though they remain registered.
In the current edition of CrowdCure, we do not allow data queries to list the assigned curators or their votes (the source vector) for users to view, this can be enabled in a later version if there is a need. We are on the fence on this choice still.
Figure 13 shows a better annotation possibility using the EverMap annotator to which we currently do not have unrestricted access but we hope to offer in the future. Using its AutoBookmark plug-in for Acrobat, EverMap is able to highlight a large set of keywords with all distinct colors for better differentiation. However, we need an even improved annotator that can actually pinpoint the relevant sentences as shown in Fig. 10 and export only relevant parts of the document for provenance.
Available at http://www.jbc.org/content/278/4/2388.long.
Available at http://www.jbc.org/content/278/4/2388.full.pdf.
While it is reasonable, and expected in CrowdCure, that the curators at a higher level will have more expertise and more reliable, CureQL leaves this choice to the users. One way to ensure an increasingly credible curator hierarchy is to only accept queries that select such curator hierarchies, i.e., we could test that no curator level contains an expert less in reliability than anyone in a lower strata.
References
Abekawa, T., Aizawa, A.: Sidenoter: scholarly paper browsing system based on PDF restructuring and text annotation. In: COLING (Demos), pp. 136–140. ACL (2016)
Alagar, V.S., Sadri, F., Said, J.N.: An extended relational model for managing uncertain information. In: DEXA, Workshop, pp. 257–266 (1995)
Alagar, V.S., Sadri, F., Said, J.N.: Semantics of an extended relational model for managing uncertain information. In: CIKM, pp. 234–240 (1995)
Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R., Wang, X.: Assisted curation: does text mining really help? In: Biocomputing 2008, Proceedings of the Pacific Symposium, Kohala Coast, Hawaii, USA, 4–8 January 2008, pp. 556–567 (2008)
Alonso, O., Marshall, C.C., Najork, M.A.: A human-centered framework for ensuring reliability on crowdsourced labeling tasks. In: Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, An Adjunct to the Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, 7–9 November, Palm Springs, CA, USA (2013)
Antony, A., Basetty, S., Hartanto, S., Palakal, M.J.: Computational approach to biological validation of protein–protein interactions discovered using literature mining. In: ACM SAC, Fortaleza, Ceara, Brazil, 16–20 March, pp. 1302–1306 (2008)
Askalidis, G., Stoddard, G.: A theoretical analysis of crowdsourced content curation. In: Workshop on Social Computing and User Generated Content (2013)
Attrill, H., Falls, K., Goodman, J.L., Millburn, G.H., Antonazzo, G., Rey, A.J., Marygold, S.J., the FlyBase consortium: FlyBase: establishing a gene group resource for drosophila melanogaster. Nucleic Acids Res. 44(D1), D786–D792 (2016)
Bhaskar, P., Buzzi, M., Geraci, F., Pellegrini, M.: From literature to knowledge: exploiting PubMed to answer biomedical questions in natural language. In: ITBAM, Spain, 3–4 September, pp. 3–15 (2015)
BlueBeam: https://www.bluebeam.com/us/products/revu/search.asp. Accessed 24 June 2017
Bozzon, A., Brambilla, M., Ceri, S., Silvestri, M., Vesci, G.: Choosing the right crowd: expert finding in social networks. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 637–648 (2013)
Breitkreutz, B.-J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2008 update. NAR 36, D637–D640 (2008)
Budescu, D.V., Chen, E.: Identifying expertise to extract the wisdom of crowds. Manag. Sci. 61(2), 267–280 (2015)
Burger, J.D., Doughty, E., Khare, R., Wei, C., Mishra, R., Aberdeen, J.S., Tresner-Kirsch, D., Wellner, B., Kann, M.G., Lu, Z., Hirschman, L.: Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database (2014). doi:10.1093/database/bau094
Cao, D., Xiao, N., Xu, Q., Chen, A.F.: Rcpi: R/bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31(2), 279–281 (2015)
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform. 5(1), 1–13 (2004)
Cooper, S., Khatib, F., Makedon, I., Lü, H., Barbero, J., Baker, D., Fogarty, J., Popovic, Z., Players, F.: Analysis of social gameplay macros in the FoldIt cookbook. In: Foundations of Digital Games, FDG’11, Bordeaux, France, June 28–July 1, pp. 9–14 (2011)
Crescenzi, V., Merialdo, P., Qiu, D.: Crowdsourcing large scale wrapper inference. Distrib. Parallel Databases 33(1), 95–122 (2015)
Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.-R., Simonis, N., Rual, J.-F., Borick, H., Braun, P., Dreze, M., Vandenhaute, J., Galli, M., Yazaki, J., Hill, D.E., Ecker, J.R., Roth, F.P., Vidal, M.: Literature-curated protein interaction datasets. Nat. Methods 6(1), 39–46 (2009)
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, Toronto, Canada, August 31–September 3, pp. 864–875 (2004)
Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS, pp. 1–12 (2007)
Davis, A.P., Wiegers, T.C., Roberts, P.M., King, B.L., Lay, J.M., Lennon-Hopkins, K., Sciaky, D., Johnson, R.J., Keating, H., Greene, N., Hernandez, R., McConnell, K.J., Enayetallah, A., Mattingly, C.J.: A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database (2013). doi:10.1093/database/bat080
Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Pick-a-Crowd: tell me what you like, and i’ll tell you what to do. In: Proceedings of the 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro. Brazil, vol. 13–17, pp. 367–374 (2013)
EverMap: https://www.evermap.com/HighlightText.asp. Accessed 24 June 2017
Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., Matthews, L., May, B., Milacic, M., Rothfels, K., Shamovsky, V., Webber, M., Weiser, J., Williams, M., Wu, G., Stein, L., Hermjakob, H., D’Eustachio, P.: The reactome pathway knowledgebase. Nucleic Acids Res. 44(D1), D481–D487 (2016)
Fourches, D., Muratov, E.N., Tropsha, A.: Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50(7), 1189–1204 (2010)
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB: answering queries with crowdsourcing. In: ACM SIGMOD, Athens, Greece, 12–16 June, pp. 61–72 (2011)
Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: WSDM, New York, 4–6 February, pp. 131–140 (2010)
Gama-Castro, S., Rinaldi, F., López-Fuentes, A., Balderas-Martínez, Y.I., Clematide, S., Ellendorff, T.R., Santos-Zavaleta, A., Marques-Madeira, H., Collado-Vides, J.: Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12. Database (2014). doi: 10.1093/database/bau049
Goodspeed, R., Spanring, C., Reardon, T.: Crowdsourcing as data sharing: a regional web-based real estate development database. In: ICEGOV, NY, USA, 22–25 October, pp. 460–463 (2012)
Hirschman, L., Fort, K., Boué, S., Kyrpides, N., Dogan, R.I., Cohen, K.B.: Crowdsourcing and curation: perspectives from biology and natural language processing. Database (2016). doi:10.1093/database/baw115
Jacquin, T., Fambon, O., Chidlovskii, B.: A web-based document harmonization and annotation chain: from PDF to RDF. In: ACM Symposium on Document Engineering, pp. 225–226. ACM (2005)
Jamieson, D.G., Roberts, P.M., Robertson, D.L., Sidders, B., Nenadic, G.: Cataloging the biomedical world of pain through semi-automated curation of molecular interactions. Database (2013). doi:10.1093/database/bat033
Jamil, H.M., Sadri, F.: Recognizing credible experts in inaccurate databases. In: Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, ISMIS ’94, Charlotte, North Carolina, USA, 16–19 October, pp. 46–55 (1994)
Joseph, T., Saipradeep, V.G., Kotte, S., Rao, A., Srinivasan, R.: Plugin for concept-assisted search and navigation on PubMed. In: IEEE BIBM, Washington, DC, USA, 9–12 November, pp. 1712–1714 (2015)
Kalathur, R.K.R., Pinto, J.P., Hernández-Prieto, M.A., Machado, R.S.R., Almeida, D., Chaurasia, G., Futschik, M.E.: UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks. Nucleic Acids Res. 42(Database–Issue), 408–414 (2014)
Kamar, E., Kapoor, A., Horvitz, E.: Identifying and accounting for task-dependent bias in crowdsourcing. In: AAAI HCOMP, 8–11 November, San Diego, CA, pp. 92–101 (2015)
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y.: KEGG for linking genomes to life and the environment. NAR 36(Database–Issue), 480–484 (2008)
Karp, P.D.: Crowd-sourcing and author submission as alternatives to professional curation. Database (2016). doi:10.1093/database/baw149
Kazemi, L., Shahabi, C., Chen, L.: GeoTruCrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, Orlando, FL, 5–8 November, pp. 304–313 (2013)
Keseler, I.M., Skrzypek, M., Weerasinghe, D., Chen, A.Y., Fulcher, C., Li, G.-W., Lemmer, K.C., Mladinich, K.M., Chow, E.D, Sherlock, G., Karp, P.D.: Curation accuracy of model organism databases. Database (2014). doi:10.1093/database/bau058
Khare, R., Burger, J.D., Aberdeen, J.S., Tresner-Kirsch, D., Corrales, T.J., Hirschman, L., Lu, Z.: Scaling drug indication curation through crowdsourcing. Database (2015). doi:10.1093/database/bav016
Kifer, M., Li, A.: On the semantics of rule-based expert systems with uncertainty. In: ICDT, pp. 102–117 (1988)
Kim, S., Islamaj Dogan, R., Chatr-Aryamontri, A., Chang, C.S., Oughtred, R., Rust, J., Batista-Navarro, R., Carter, J., Ananiadou, S., Matos, S., Santos, A., Campos, D., Oliveira, J.L., Singh, O., Jonnagaddala, J., Dai, H.-J., Su, E.C.-Y., Chang, Y.-C., Su, Y.-C., Chu, C.-H., Chen, C.C., Hsu, W.-L., Peng, Y., Arighi, C., Wu, C.H., Vijay-Shanker, K., Aydin, F., Hsnbeyi, Z.M., zgr, A., Shin, S.-Y., Kwon, D., Dolinski, K., Tyers, M., Wilbur, W.J., Comeau, D.C.: BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (2016). doi:10.1093/database/baw121
Kostakos, V.: Is the crowd’s wisdom biased? A quantitative analysis of three online communities. In: IEEE CSE, Vancouver, BC, Canada, 29–31 August, pp. 251–255 (2009)
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein–protein interaction annotation extraction task of BioCreative II. Genome Biol 9(Suppl 2), S4 (2008)
Krallinger, M., Vazquez, M., Leitner, F., Salgado, D., Aryamontri, A.C., Winter, A., Perfetto, L., Briganti, L., Licata, L., Iannuccelli, M., Castagnoli, L., Cesareni, G., Tyers, M., Schneider, G., Rinaldi, F., Leaman, R., Gonzalez, G., Matos, S., Kim, S., Wilbur, W., Rocha, L., Shatkay, H., Tendulkar, A., Agarwal, S., Liu, F., Wang, X., Rak, R., Noto, K., Elkan, C., Lu, Z.: The protein–protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform. 12(Suppl 8), S3 (2011)
Kuperstein, I., Cohen, D.P.A., Pook, S., Viara, E., Calzone, L., Barillot, E., Zinovyev, A.: NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps. BMC Syst. Biol. 7, 100 (2013)
Kwon, D., Kim, S., Shin, S., Chatr-aryamontri, A., Wilbur, W.J.: Assisting manual literature curation for protein–protein interactions using BioQRator. Database (2014). doi:10.1093/database/bau067
Lakshmanan, L.V.S., Shiri, N.: A parametric approach to deductive databases with uncertainty. IEEE Trans. Knowl. Data Eng. 13(4), 554–570 (2001)
Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)
Liu, W., Laulederkind, S.J.F., Hayman, G.T., Wang, S.-J., Nigam, R., Smith, J.R., De Pons, J., Dwinell, M.R., Shimoyama, M.: Ontomate: a text-mining tool aiding curation at the rat genome database. Database (2015). doi:10.1093/database/bau129
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 465–476 (2013)
Mallory, E.K., Zhang, C., Ré, C., Altman, R.B.: Large-scale extraction of gene interactions from full-text literature using deepdive. Bioinformatics 32(1), 106–113 (2016)
Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: query processing with people. In: Biennial Innovative Data Systems Research Conference, Asilomar, CA, USA, 9–12 January, pp. 211–214 (2011)
Mazloom, A.R., Dannenfelser, R., Clark, N.R., Grigoryan, A.V., Linder, K.M., Cardozo, T.J., Bond, J.C., Boran, A.D.W., Iyengar, R., Malovannaya, A., Lanz, R.B., Ma’ayan, A.: Recovering protein–protein and domain-domain interactions from aggregation of ip-ms proteomics of coregulator complexes. PLOS Comput Biol, 7(12):1–10, 12 (2011)
McDowall, M.D., Scott, M.S., Barton, G.J.: PIPs: human protein–protein interaction prediction database. NAR 37(suppl 1), D651–D656 (2009)
Mehla, J., Caufield, J.H., Uetz, P.: Mapping protein–protein interactions using yeast two-hybrid assays. Cold Spring Harb. Protoc. 5, 2015 (2015)
Moal, I.H., Jiménez-García, B., Fernández-Recio, J.: CCharPPI web server: computational characterization of protein–protein interactions from structure. Bioinformatics 31(1), 123–125 (2015)
Mou, X., Jamil, H.M., Ma, X.: Visflow: A visual database integration and workflow querying system. In: Proceedings of the 33rd International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April, pp. 1421–1422 (2017)
Mou, X., Jamil, H.M., Rinker, R.: Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, Shenzhen, China, 15–18 December, pp. 752–757 (2016)
Mou, X., Jamil, H.M., Rinker, R.: Implementing computational biology pipelines using visflow. In. J. Data Min. Bioinform. 17(2), 115–131 (2017)
Murali, T., Pacifico, S., Yu, J., Guest, S., Roberts, G.G., Finley Jr., R.L.: DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for drosophila. Nucleic Acids Res. 39(suppl 1), D736–D743 (2011)
Nakatsu, R.T., Iacovou, C.L.: An investigation of user interface features of crowdsourcing applications. In: HCI International, Crete, Greece, 22–27 June, pp. 410–418 (2014)
Park, H., Widom, J.: CrowdFill: collecting structured data from the crowd. In: ACM SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 577–588 (2014)
PDF Annotator: https://www.pdfannotator.com/en/. Accessed 24 June 2017
PDF Editor: https://pdf.iskysoft.com/edit-pdf/highlight-pdf-mac.html. Accessed 24 June 2017
Peng, J., Liu, Q., Ihler, A., Berger, B.: Crowdsourcing for structured labeling with applications to protein folding. In: Proceedings of the Machine Learning Meets Crowdsourcing Workshop, ICML (2013)
Peng, W., Wang, J., Cai, J., Chen, L., Li, M., Wu, F.-X.: Improving protein function prediction using domain and protein complexes in ppi networks. BMC Syst. Biol. 8(1), 1–13 (2014)
Perkel, J.M.: Annotating the scholarly web. Nature 528(7580), 153–154 (2015)
Pochampally, R., Sarma, A.D., Dong, X.L., Meliou, A., Srivastava, D.: Fusing data with correlations. In: SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 433–444 (2014)
Powley, B., Dale, R., Anisimoff, I.: Enriching a document collection by integrating information extraction and PDF annotation. In: DRR, SPIE Proceedings, vol. 7247, p. 724707. SPIE (2009)
Rahmanian, B., Davis, J.G.: User interface design for crowdsourcing systems. In: AVI, Como, Italy, 27–29 May, pp. 405–408 (2014)
Raja, K., Subramani, S., Natarajan, J.: PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (2013). doi:10.1093/database/bas052
Ramanath, R., Choudhury, M., Bali, K., Roy, R.S.: Crowd prefers the middle path: a new IAA metric for crowdsourcing reveals turker biases in query segmentation. In: ACL, 4-9 August, Sofia, Bulgaria, Volume 1: Long Papers, pp. 1713–1722 (2013)
Roberts, R.J., Varmus, H.E., Ashburner, M., Brown, P.O., Eisen, M.B., Khosla, C., Kirschner, M., Nusse, R., Scott, M., Wold, B.: Building a “GenBank” of the published literature. Science 291(5512), 2318–2319 (2001)
Rodriguez-Esteban, R.: Biocuration with insufficient resources and fixed timelines. In: Biocuration, Geneva, Switzerland, 10–14 April, Oral presentation (2016)
Rogstadius, J., Vukovic, M., Teixeira, C.A., Kostakos, V., Karapanos, E., Laredo, J.: CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM J. Res. Dev. 57(5), 1–4 (2013)
Sadri, F.: Modeling uncertainty in databases. In: ICDE, pp. 122–131 (1991)
Sadri, F.: On the foundations of probabilistic information integration. In: CIKM, Maui, HI, USA, October 29–November 2, pp. 882–891 (2012)
Sadri, F.: Reliability of answers to queries in relational databases. IEEE Trans. Knowl. Data Eng. 3(2), 245–251 (1991)
Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)
Sadri, F.: Information source tracking method: efficiency issues. IEEE Trans. Knowl. Data Eng. 7(6), 947–954 (1995)
Sadri, F.: Integrity constraints in the information source tracking method. IEEE Trans. Knowl. Data Eng. 7(1), 106–119 (1995)
Sarjant, S., Legg, C., Stannett, M., Willcock, D.: Crowd-sourcing ontology content and curation: the massive ontology interface. In: FOIS, Rio de Janeiro, Brazil, 22–25 September, pp. 251–260 (2014)
Sevimoglu, T., Arga, K.Y.: The role of protein interaction networks in systems biomedicine. Comput. Struct. Biotechnol. J. 11(18), 22–27 (2014)
Shakarian, P., Parker, A., Simari, G.I., Subrahmanian, V.S.: Annotated probabilistic temporal logic. ACM Trans. Comput. Log. 12(2), 14 (2011)
Subramani, S., Kalpana, R., Monickaraj, P.M., Natarajan, J.: HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. J. Biomed. Inform. 54, 121–131 (2015)
Suter, B., Zhang, X., Pesce, C.G., Mendelsohn, A.R., Dinesh-Kumar, S.P., Mao, J.-H.H.: Next-generation sequencing for binary protein–protein interactions. Front. Genet. 6, 346 (2015)
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., Kuhn, M., Bork, P., Jensen, L.J., von Mering, C.: STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)
Takis, J., Islam, A.Q.M.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in PDF. In: SEMANTICS, pp. 1–8. ACM (2015)
Tastan, O., Qi, Y., Carbonell, J.G., Klein-Seetharaman, J.: Refiing Literature Curated Protein Interactions Using Expert Opinions, pp. 318–329. World Scientific, Singapore (2014)
Thomas, P., Starlinger, J., Vowinkel, A., Arzt, S., Leser, U.: GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 40(W1), W585–W591 (2012)
Turinsky, A.L., Razick, S., Turner, B., Donaldson, I.M., Wodak, S.J.: Literature curation of protein interactions: measuring agreement across major public databases. Database (2010). doi:10.1093/database/baq026
U. of Washington: FoldIt: Solve Puzzles for Science. https://fold.it/portal/. Accessed 14 September 2016
U. of Washington: Play FoldIt: Games for Change. http://www.gamesforchange.org/play/foldit/. Accessed 14 September 2016
Vasilescu, J., Figeys, D.: Mapping protein–protein interactions by mass spectrometry. Curr. Opin. Biotechnol. 17(4), 394–399 (2006)
Wang, H., Ganapathiraju, M.K.: Evaluation of protein–protein interaction predictors with noisy partially labeled data sets. CoRR, abs/1509.05742 (2015)
Wang, P.: The Scientist in Us All: How crowdsourcing in science is changing the world. http://yalescientific.org/thescope/2016/04/the-scientist-in-us-all-how-crowdsourcing-in-science-is-changing-the-world/ (2016). Accessed 6 September 2016
Wang, Z., Clark, N.R., Ma’ayan, A.: Dynamics of the discovery process of protein–protein interactions from low content studies. BMC Syst. Biol. 9, 26 (2015)
Xie, S., Hu, Q., Zhang, J., Gao, J., Fan, W., Yu, P.S.: Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources. In IEEE International Conference on Big Data, CA, USA, October 29–November 1, pp. 815–820 (2015)
Zadeh, L.A.: Knowledge representation in fuzzy logic. IEEE Trans. Knowl. Data Eng. 1(1), 89–100 (1989)
Zhang, Y., Lin, H., Yang, Z., Wang, J.: Integrating experimental and literature protein–protein interaction data for protein complex prediction. BMC Genom. 16(S–2), S4 (2015)
Acknowledgements
The current prototype of CrowdCure has been implemented by Xin Mou and the authors gratefully acknowledge his contributions, and many of the investigations into tool choices he helped them with.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jamil, H.M., Sadri, F. Crowd enabled curation and querying of large and noisy text mined protein interaction data. Distrib Parallel Databases 36, 9–45 (2018). https://doi.org/10.1007/s10619-017-7209-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-017-7209-x