Crowd enabled curation and querying of large and noisy text mined protein interaction data

Hasan M. Jamil¹ &
Fereidoon Sadri²

352 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

The abundance of mined, predicted and uncertain biological data warrant massive, efficient and scalable curation efforts. The human expertise required for any successful curation enterprise is often economically prohibitive, especially for speculative end user queries that ultimately may not bear fruit. So the challenge remains in devising a low cost engine capable of delivering fast but tentative annotation and curation of a set of data items that can later be authoritatively validated by experts demanding significantly smaller investment. The aim thus is to make a large volume of predicted data available for use as early as possible with an acceptable degree of confidence in their accuracy while the curation continues. In this paper, we present a novel approach to annotation and curation of biological database contents using crowd computing. The technical contribution is in the identification and management of trust of mechanical turks, and support for ad hoc declarative queries, both of which are leveraged to enable reliable analytics using noisy predicted interactions. While the proposed approach and the CrowdCure system are designed for literature mined protein-protein interaction data curation, they are amenable to substantial generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-specific interactions in literature-curated protein interaction databases

Article Open access 19 October 2018

PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols

Article Open access 19 November 2015

sbv IMPROVER: Modern Approach to Systems Biology

Notes

Young gamers have discovered the structure of a HIV retrovirus enzyme in three weeks playing FoldIt [95], configuration of which remained a puzzle for more than a decade [96].
It is often believed biases also exist due to sparseness of data in high throughput PPI data collection techniques. For example, mass-spectrometry proteomics are known to be biased in detecting large, abundant or sticky proteins [56].
By no means we preclude the possibility of inserting ground data into the tables using traditional database operations such as INSERT or UPDATE for subsequent crowd curation.
The alternative choice is to separate mining from curation by decoupling the USING ON clause from SOURCE BEFORE clause. This alternate choice would allow importing data other sources and initiate curation as a distinct step.
Turks may opt out from any assignment even though they remain registered.
In the current edition of CrowdCure, we do not allow data queries to list the assigned curators or their votes (the source vector) for users to view, this can be enabled in a later version if there is a need. We are on the fence on this choice still.
Figure 13 shows a better annotation possibility using the EverMap annotator to which we currently do not have unrestricted access but we hope to offer in the future. Using its AutoBookmark plug-in for Acrobat, EverMap is able to highlight a large set of keywords with all distinct colors for better differentiation. However, we need an even improved annotator that can actually pinpoint the relevant sentences as shown in Fig. 10 and export only relevant parts of the document for provenance.
Available at http://www.jbc.org/content/278/4/2388.long.
Available at http://www.jbc.org/content/278/4/2388.full.pdf.
While it is reasonable, and expected in CrowdCure, that the curators at a higher level will have more expertise and more reliable, CureQL leaves this choice to the users. One way to ensure an increasingly credible curator hierarchy is to only accept queries that select such curator hierarchies, i.e., we could test that no curator level contains an expert less in reliability than anyone in a lower strata.

References

Abekawa, T., Aizawa, A.: Sidenoter: scholarly paper browsing system based on PDF restructuring and text annotation. In: COLING (Demos), pp. 136–140. ACL (2016)
Alagar, V.S., Sadri, F., Said, J.N.: An extended relational model for managing uncertain information. In: DEXA, Workshop, pp. 257–266 (1995)
Alagar, V.S., Sadri, F., Said, J.N.: Semantics of an extended relational model for managing uncertain information. In: CIKM, pp. 234–240 (1995)
Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R., Wang, X.: Assisted curation: does text mining really help? In: Biocomputing 2008, Proceedings of the Pacific Symposium, Kohala Coast, Hawaii, USA, 4–8 January 2008, pp. 556–567 (2008)
Alonso, O., Marshall, C.C., Najork, M.A.: A human-centered framework for ensuring reliability on crowdsourced labeling tasks. In: Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, An Adjunct to the Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, 7–9 November, Palm Springs, CA, USA (2013)
Antony, A., Basetty, S., Hartanto, S., Palakal, M.J.: Computational approach to biological validation of protein–protein interactions discovered using literature mining. In: ACM SAC, Fortaleza, Ceara, Brazil, 16–20 March, pp. 1302–1306 (2008)
Askalidis, G., Stoddard, G.: A theoretical analysis of crowdsourced content curation. In: Workshop on Social Computing and User Generated Content (2013)
Attrill, H., Falls, K., Goodman, J.L., Millburn, G.H., Antonazzo, G., Rey, A.J., Marygold, S.J., the FlyBase consortium: FlyBase: establishing a gene group resource for drosophila melanogaster. Nucleic Acids Res. 44(D1), D786–D792 (2016)
Bhaskar, P., Buzzi, M., Geraci, F., Pellegrini, M.: From literature to knowledge: exploiting PubMed to answer biomedical questions in natural language. In: ITBAM, Spain, 3–4 September, pp. 3–15 (2015)
BlueBeam: https://www.bluebeam.com/us/products/revu/search.asp. Accessed 24 June 2017
Bozzon, A., Brambilla, M., Ceri, S., Silvestri, M., Vesci, G.: Choosing the right crowd: expert finding in social networks. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 637–648 (2013)
Breitkreutz, B.-J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2008 update. NAR 36, D637–D640 (2008)
Article Google Scholar
Budescu, D.V., Chen, E.: Identifying expertise to extract the wisdom of crowds. Manag. Sci. 61(2), 267–280 (2015)
Article Google Scholar
Burger, J.D., Doughty, E., Khare, R., Wei, C., Mishra, R., Aberdeen, J.S., Tresner-Kirsch, D., Wellner, B., Kann, M.G., Lu, Z., Hirschman, L.: Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database (2014). doi:10.1093/database/bau094
Cao, D., Xiao, N., Xu, Q., Chen, A.F.: Rcpi: R/bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31(2), 279–281 (2015)
Article Google Scholar
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform. 5(1), 1–13 (2004)
Article Google Scholar
Cooper, S., Khatib, F., Makedon, I., Lü, H., Barbero, J., Baker, D., Fogarty, J., Popovic, Z., Players, F.: Analysis of social gameplay macros in the FoldIt cookbook. In: Foundations of Digital Games, FDG’11, Bordeaux, France, June 28–July 1, pp. 9–14 (2011)
Crescenzi, V., Merialdo, P., Qiu, D.: Crowdsourcing large scale wrapper inference. Distrib. Parallel Databases 33(1), 95–122 (2015)
Article Google Scholar
Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.-R., Simonis, N., Rual, J.-F., Borick, H., Braun, P., Dreze, M., Vandenhaute, J., Galli, M., Yazaki, J., Hill, D.E., Ecker, J.R., Roth, F.P., Vidal, M.: Literature-curated protein interaction datasets. Nat. Methods 6(1), 39–46 (2009)
Article Google Scholar
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, Toronto, Canada, August 31–September 3, pp. 864–875 (2004)
Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS, pp. 1–12 (2007)
Davis, A.P., Wiegers, T.C., Roberts, P.M., King, B.L., Lay, J.M., Lennon-Hopkins, K., Sciaky, D., Johnson, R.J., Keating, H., Greene, N., Hernandez, R., McConnell, K.J., Enayetallah, A., Mattingly, C.J.: A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database (2013). doi:10.1093/database/bat080
Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Pick-a-Crowd: tell me what you like, and i’ll tell you what to do. In: Proceedings of the 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro. Brazil, vol. 13–17, pp. 367–374 (2013)
EverMap: https://www.evermap.com/HighlightText.asp. Accessed 24 June 2017
Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., Matthews, L., May, B., Milacic, M., Rothfels, K., Shamovsky, V., Webber, M., Weiser, J., Williams, M., Wu, G., Stein, L., Hermjakob, H., D’Eustachio, P.: The reactome pathway knowledgebase. Nucleic Acids Res. 44(D1), D481–D487 (2016)
Article Google Scholar
Fourches, D., Muratov, E.N., Tropsha, A.: Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50(7), 1189–1204 (2010)
Article Google Scholar
Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB: answering queries with crowdsourcing. In: ACM SIGMOD, Athens, Greece, 12–16 June, pp. 61–72 (2011)
Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: WSDM, New York, 4–6 February, pp. 131–140 (2010)
Gama-Castro, S., Rinaldi, F., López-Fuentes, A., Balderas-Martínez, Y.I., Clematide, S., Ellendorff, T.R., Santos-Zavaleta, A., Marques-Madeira, H., Collado-Vides, J.: Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12. Database (2014). doi: 10.1093/database/bau049
Goodspeed, R., Spanring, C., Reardon, T.: Crowdsourcing as data sharing: a regional web-based real estate development database. In: ICEGOV, NY, USA, 22–25 October, pp. 460–463 (2012)
Hirschman, L., Fort, K., Boué, S., Kyrpides, N., Dogan, R.I., Cohen, K.B.: Crowdsourcing and curation: perspectives from biology and natural language processing. Database (2016). doi:10.1093/database/baw115
Jacquin, T., Fambon, O., Chidlovskii, B.: A web-based document harmonization and annotation chain: from PDF to RDF. In: ACM Symposium on Document Engineering, pp. 225–226. ACM (2005)
Jamieson, D.G., Roberts, P.M., Robertson, D.L., Sidders, B., Nenadic, G.: Cataloging the biomedical world of pain through semi-automated curation of molecular interactions. Database (2013). doi:10.1093/database/bat033
Jamil, H.M., Sadri, F.: Recognizing credible experts in inaccurate databases. In: Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, ISMIS ’94, Charlotte, North Carolina, USA, 16–19 October, pp. 46–55 (1994)
Joseph, T., Saipradeep, V.G., Kotte, S., Rao, A., Srinivasan, R.: Plugin for concept-assisted search and navigation on PubMed. In: IEEE BIBM, Washington, DC, USA, 9–12 November, pp. 1712–1714 (2015)
Kalathur, R.K.R., Pinto, J.P., Hernández-Prieto, M.A., Machado, R.S.R., Almeida, D., Chaurasia, G., Futschik, M.E.: UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks. Nucleic Acids Res. 42(Database–Issue), 408–414 (2014)
Article Google Scholar
Kamar, E., Kapoor, A., Horvitz, E.: Identifying and accounting for task-dependent bias in crowdsourcing. In: AAAI HCOMP, 8–11 November, San Diego, CA, pp. 92–101 (2015)
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y.: KEGG for linking genomes to life and the environment. NAR 36(Database–Issue), 480–484 (2008)
Google Scholar
Karp, P.D.: Crowd-sourcing and author submission as alternatives to professional curation. Database (2016). doi:10.1093/database/baw149
Kazemi, L., Shahabi, C., Chen, L.: GeoTruCrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, Orlando, FL, 5–8 November, pp. 304–313 (2013)
Keseler, I.M., Skrzypek, M., Weerasinghe, D., Chen, A.Y., Fulcher, C., Li, G.-W., Lemmer, K.C., Mladinich, K.M., Chow, E.D, Sherlock, G., Karp, P.D.: Curation accuracy of model organism databases. Database (2014). doi:10.1093/database/bau058
Khare, R., Burger, J.D., Aberdeen, J.S., Tresner-Kirsch, D., Corrales, T.J., Hirschman, L., Lu, Z.: Scaling drug indication curation through crowdsourcing. Database (2015). doi:10.1093/database/bav016
Kifer, M., Li, A.: On the semantics of rule-based expert systems with uncertainty. In: ICDT, pp. 102–117 (1988)
Kim, S., Islamaj Dogan, R., Chatr-Aryamontri, A., Chang, C.S., Oughtred, R., Rust, J., Batista-Navarro, R., Carter, J., Ananiadou, S., Matos, S., Santos, A., Campos, D., Oliveira, J.L., Singh, O., Jonnagaddala, J., Dai, H.-J., Su, E.C.-Y., Chang, Y.-C., Su, Y.-C., Chu, C.-H., Chen, C.C., Hsu, W.-L., Peng, Y., Arighi, C., Wu, C.H., Vijay-Shanker, K., Aydin, F., Hsnbeyi, Z.M., zgr, A., Shin, S.-Y., Kwon, D., Dolinski, K., Tyers, M., Wilbur, W.J., Comeau, D.C.: BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (2016). doi:10.1093/database/baw121
Kostakos, V.: Is the crowd’s wisdom biased? A quantitative analysis of three online communities. In: IEEE CSE, Vancouver, BC, Canada, 29–31 August, pp. 251–255 (2009)
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein–protein interaction annotation extraction task of BioCreative II. Genome Biol 9(Suppl 2), S4 (2008)
Article Google Scholar
Krallinger, M., Vazquez, M., Leitner, F., Salgado, D., Aryamontri, A.C., Winter, A., Perfetto, L., Briganti, L., Licata, L., Iannuccelli, M., Castagnoli, L., Cesareni, G., Tyers, M., Schneider, G., Rinaldi, F., Leaman, R., Gonzalez, G., Matos, S., Kim, S., Wilbur, W., Rocha, L., Shatkay, H., Tendulkar, A., Agarwal, S., Liu, F., Wang, X., Rak, R., Noto, K., Elkan, C., Lu, Z.: The protein–protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform. 12(Suppl 8), S3 (2011)
Article Google Scholar
Kuperstein, I., Cohen, D.P.A., Pook, S., Viara, E., Calzone, L., Barillot, E., Zinovyev, A.: NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps. BMC Syst. Biol. 7, 100 (2013)
Article Google Scholar
Kwon, D., Kim, S., Shin, S., Chatr-aryamontri, A., Wilbur, W.J.: Assisting manual literature curation for protein–protein interactions using BioQRator. Database (2014). doi:10.1093/database/bau067
Lakshmanan, L.V.S., Shiri, N.: A parametric approach to deductive databases with uncertainty. IEEE Trans. Knowl. Data Eng. 13(4), 554–570 (2001)
Article Google Scholar
Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)
Article Google Scholar
Liu, W., Laulederkind, S.J.F., Hayman, G.T., Wang, S.-J., Nigam, R., Smith, J.R., De Pons, J., Dwinell, M.R., Shimoyama, M.: Ontomate: a text-mining tool aiding curation at the rat genome database. Database (2015). doi:10.1093/database/bau129
Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 465–476 (2013)
Mallory, E.K., Zhang, C., Ré, C., Altman, R.B.: Large-scale extraction of gene interactions from full-text literature using deepdive. Bioinformatics 32(1), 106–113 (2016)
Google Scholar
Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: query processing with people. In: Biennial Innovative Data Systems Research Conference, Asilomar, CA, USA, 9–12 January, pp. 211–214 (2011)
Mazloom, A.R., Dannenfelser, R., Clark, N.R., Grigoryan, A.V., Linder, K.M., Cardozo, T.J., Bond, J.C., Boran, A.D.W., Iyengar, R., Malovannaya, A., Lanz, R.B., Ma’ayan, A.: Recovering protein–protein and domain-domain interactions from aggregation of ip-ms proteomics of coregulator complexes. PLOS Comput Biol, 7(12):1–10, 12 (2011)
McDowall, M.D., Scott, M.S., Barton, G.J.: PIPs: human protein–protein interaction prediction database. NAR 37(suppl 1), D651–D656 (2009)
Article Google Scholar
Mehla, J., Caufield, J.H., Uetz, P.: Mapping protein–protein interactions using yeast two-hybrid assays. Cold Spring Harb. Protoc. 5, 2015 (2015)
Google Scholar
Moal, I.H., Jiménez-García, B., Fernández-Recio, J.: CCharPPI web server: computational characterization of protein–protein interactions from structure. Bioinformatics 31(1), 123–125 (2015)
Article Google Scholar
Mou, X., Jamil, H.M., Ma, X.: Visflow: A visual database integration and workflow querying system. In: Proceedings of the 33rd International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April, pp. 1421–1422 (2017)
Mou, X., Jamil, H.M., Rinker, R.: Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, Shenzhen, China, 15–18 December, pp. 752–757 (2016)
Mou, X., Jamil, H.M., Rinker, R.: Implementing computational biology pipelines using visflow. In. J. Data Min. Bioinform. 17(2), 115–131 (2017)
Article Google Scholar
Murali, T., Pacifico, S., Yu, J., Guest, S., Roberts, G.G., Finley Jr., R.L.: DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for drosophila. Nucleic Acids Res. 39(suppl 1), D736–D743 (2011)
Article Google Scholar
Nakatsu, R.T., Iacovou, C.L.: An investigation of user interface features of crowdsourcing applications. In: HCI International, Crete, Greece, 22–27 June, pp. 410–418 (2014)
Park, H., Widom, J.: CrowdFill: collecting structured data from the crowd. In: ACM SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 577–588 (2014)
PDF Annotator: https://www.pdfannotator.com/en/. Accessed 24 June 2017
PDF Editor: https://pdf.iskysoft.com/edit-pdf/highlight-pdf-mac.html. Accessed 24 June 2017
Peng, J., Liu, Q., Ihler, A., Berger, B.: Crowdsourcing for structured labeling with applications to protein folding. In: Proceedings of the Machine Learning Meets Crowdsourcing Workshop, ICML (2013)
Peng, W., Wang, J., Cai, J., Chen, L., Li, M., Wu, F.-X.: Improving protein function prediction using domain and protein complexes in ppi networks. BMC Syst. Biol. 8(1), 1–13 (2014)
Article Google Scholar
Perkel, J.M.: Annotating the scholarly web. Nature 528(7580), 153–154 (2015)
Article Google Scholar
Pochampally, R., Sarma, A.D., Dong, X.L., Meliou, A., Srivastava, D.: Fusing data with correlations. In: SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 433–444 (2014)
Powley, B., Dale, R., Anisimoff, I.: Enriching a document collection by integrating information extraction and PDF annotation. In: DRR, SPIE Proceedings, vol. 7247, p. 724707. SPIE (2009)
Rahmanian, B., Davis, J.G.: User interface design for crowdsourcing systems. In: AVI, Como, Italy, 27–29 May, pp. 405–408 (2014)
Raja, K., Subramani, S., Natarajan, J.: PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (2013). doi:10.1093/database/bas052
Ramanath, R., Choudhury, M., Bali, K., Roy, R.S.: Crowd prefers the middle path: a new IAA metric for crowdsourcing reveals turker biases in query segmentation. In: ACL, 4-9 August, Sofia, Bulgaria, Volume 1: Long Papers, pp. 1713–1722 (2013)
Roberts, R.J., Varmus, H.E., Ashburner, M., Brown, P.O., Eisen, M.B., Khosla, C., Kirschner, M., Nusse, R., Scott, M., Wold, B.: Building a “GenBank” of the published literature. Science 291(5512), 2318–2319 (2001)
Article Google Scholar
Rodriguez-Esteban, R.: Biocuration with insufficient resources and fixed timelines. In: Biocuration, Geneva, Switzerland, 10–14 April, Oral presentation (2016)
Rogstadius, J., Vukovic, M., Teixeira, C.A., Kostakos, V., Karapanos, E., Laredo, J.: CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM J. Res. Dev. 57(5), 1–4 (2013)
Article Google Scholar
Sadri, F.: Modeling uncertainty in databases. In: ICDE, pp. 122–131 (1991)
Sadri, F.: On the foundations of probabilistic information integration. In: CIKM, Maui, HI, USA, October 29–November 2, pp. 882–891 (2012)
Sadri, F.: Reliability of answers to queries in relational databases. IEEE Trans. Knowl. Data Eng. 3(2), 245–251 (1991)
Article Google Scholar
Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)
Article MathSciNet MATH Google Scholar
Sadri, F.: Information source tracking method: efficiency issues. IEEE Trans. Knowl. Data Eng. 7(6), 947–954 (1995)
Article Google Scholar
Sadri, F.: Integrity constraints in the information source tracking method. IEEE Trans. Knowl. Data Eng. 7(1), 106–119 (1995)
Article Google Scholar
Sarjant, S., Legg, C., Stannett, M., Willcock, D.: Crowd-sourcing ontology content and curation: the massive ontology interface. In: FOIS, Rio de Janeiro, Brazil, 22–25 September, pp. 251–260 (2014)
Sevimoglu, T., Arga, K.Y.: The role of protein interaction networks in systems biomedicine. Comput. Struct. Biotechnol. J. 11(18), 22–27 (2014)
Article Google Scholar
Shakarian, P., Parker, A., Simari, G.I., Subrahmanian, V.S.: Annotated probabilistic temporal logic. ACM Trans. Comput. Log. 12(2), 14 (2011)
Article MathSciNet MATH Google Scholar
Subramani, S., Kalpana, R., Monickaraj, P.M., Natarajan, J.: HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. J. Biomed. Inform. 54, 121–131 (2015)
Article Google Scholar
Suter, B., Zhang, X., Pesce, C.G., Mendelsohn, A.R., Dinesh-Kumar, S.P., Mao, J.-H.H.: Next-generation sequencing for binary protein–protein interactions. Front. Genet. 6, 346 (2015)
Article Google Scholar
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., Kuhn, M., Bork, P., Jensen, L.J., von Mering, C.: STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)
Article Google Scholar
Takis, J., Islam, A.Q.M.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in PDF. In: SEMANTICS, pp. 1–8. ACM (2015)
Tastan, O., Qi, Y., Carbonell, J.G., Klein-Seetharaman, J.: Refiing Literature Curated Protein Interactions Using Expert Opinions, pp. 318–329. World Scientific, Singapore (2014)
Google Scholar
Thomas, P., Starlinger, J., Vowinkel, A., Arzt, S., Leser, U.: GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 40(W1), W585–W591 (2012)
Article Google Scholar
Turinsky, A.L., Razick, S., Turner, B., Donaldson, I.M., Wodak, S.J.: Literature curation of protein interactions: measuring agreement across major public databases. Database (2010). doi:10.1093/database/baq026
U. of Washington: FoldIt: Solve Puzzles for Science. https://fold.it/portal/. Accessed 14 September 2016
U. of Washington: Play FoldIt: Games for Change. http://www.gamesforchange.org/play/foldit/. Accessed 14 September 2016
Vasilescu, J., Figeys, D.: Mapping protein–protein interactions by mass spectrometry. Curr. Opin. Biotechnol. 17(4), 394–399 (2006)
Article Google Scholar
Wang, H., Ganapathiraju, M.K.: Evaluation of protein–protein interaction predictors with noisy partially labeled data sets. CoRR, abs/1509.05742 (2015)
Wang, P.: The Scientist in Us All: How crowdsourcing in science is changing the world. http://yalescientific.org/thescope/2016/04/the-scientist-in-us-all-how-crowdsourcing-in-science-is-changing-the-world/ (2016). Accessed 6 September 2016
Wang, Z., Clark, N.R., Ma’ayan, A.: Dynamics of the discovery process of protein–protein interactions from low content studies. BMC Syst. Biol. 9, 26 (2015)
Article Google Scholar
Xie, S., Hu, Q., Zhang, J., Gao, J., Fan, W., Yu, P.S.: Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources. In IEEE International Conference on Big Data, CA, USA, October 29–November 1, pp. 815–820 (2015)
Zadeh, L.A.: Knowledge representation in fuzzy logic. IEEE Trans. Knowl. Data Eng. 1(1), 89–100 (1989)
Article Google Scholar
Zhang, Y., Lin, H., Yang, Z., Wang, J.: Integrating experimental and literature protein–protein interaction data for protein complex prediction. BMC Genom. 16(S–2), S4 (2015)
Article Google Scholar

Download references

Acknowledgements

The current prototype of CrowdCure has been implemented by Xin Mou and the authors gratefully acknowledge his contributions, and many of the investigations into tool choices he helped them with.

Author information

Authors and Affiliations

Department of Computer Science, University of Idaho, JEB Room 236, 875 Perimeter Drive, Moscow, ID, 83844-1010, USA
Hasan M. Jamil
Department of Computer Science, University of North Carolina at Greensboro, 156 Petty Building, Greensboro, NC, 27402, USA
Fereidoon Sadri

Authors

Hasan M. Jamil
View author publications
You can also search for this author in PubMed Google Scholar
Fereidoon Sadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hasan M. Jamil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jamil, H.M., Sadri, F. Crowd enabled curation and querying of large and noisy text mined protein interaction data. Distrib Parallel Databases 36, 9–45 (2018). https://doi.org/10.1007/s10619-017-7209-x

Download citation

Published: 04 October 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10619-017-7209-x

Crowd enabled curation and querying of large and noisy text mined protein interaction data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-specific interactions in literature-curated protein interaction databases

PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols

sbv IMPROVER: Modern Approach to Systems Biology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Crowd enabled curation and querying of large and noisy text mined protein interaction data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Context-specific interactions in literature-curated protein interaction databases

PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols

sbv IMPROVER: Modern Approach to Systems Biology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation