Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Crowd enabled curation and querying of large and noisy text mined protein interaction data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The abundance of mined, predicted and uncertain biological data warrant massive, efficient and scalable curation efforts. The human expertise required for any successful curation enterprise is often economically prohibitive, especially for speculative end user queries that ultimately may not bear fruit. So the challenge remains in devising a low cost engine capable of delivering fast but tentative annotation and curation of a set of data items that can later be authoritatively validated by experts demanding significantly smaller investment. The aim thus is to make a large volume of predicted data available for use as early as possible with an acceptable degree of confidence in their accuracy while the curation continues. In this paper, we present a novel approach to annotation and curation of biological database contents using crowd computing. The technical contribution is in the identification and management of trust of mechanical turks, and support for ad hoc declarative queries, both of which are leveraged to enable reliable analytics using noisy predicted interactions. While the proposed approach and the CrowdCure system are designed for literature mined protein-protein interaction data curation, they are amenable to substantial generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Young gamers have discovered the structure of a HIV retrovirus enzyme in three weeks playing FoldIt [95], configuration of which remained a puzzle for more than a decade [96].

  2. It is often believed biases also exist due to sparseness of data in high throughput PPI data collection techniques. For example, mass-spectrometry proteomics are known to be biased in detecting large, abundant or sticky proteins [56].

  3. By no means we preclude the possibility of inserting ground data into the tables using traditional database operations such as INSERT or UPDATE for subsequent crowd curation.

  4. The alternative choice is to separate mining from curation by decoupling the USING ON clause from SOURCE BEFORE clause. This alternate choice would allow importing data other sources and initiate curation as a distinct step.

  5. Turks may opt out from any assignment even though they remain registered.

  6. In the current edition of CrowdCure, we do not allow data queries to list the assigned curators or their votes (the source vector) for users to view, this can be enabled in a later version if there is a need. We are on the fence on this choice still.

  7. Figure 13 shows a better annotation possibility using the EverMap annotator to which we currently do not have unrestricted access but we hope to offer in the future. Using its AutoBookmark plug-in for Acrobat, EverMap is able to highlight a large set of keywords with all distinct colors for better differentiation. However, we need an even improved annotator that can actually pinpoint the relevant sentences as shown in Fig. 10 and export only relevant parts of the document for provenance.

  8. Available at http://www.jbc.org/content/278/4/2388.long.

  9. Available at http://www.jbc.org/content/278/4/2388.full.pdf.

  10. While it is reasonable, and expected in CrowdCure, that the curators at a higher level will have more expertise and more reliable, CureQL leaves this choice to the users. One way to ensure an increasingly credible curator hierarchy is to only accept queries that select such curator hierarchies, i.e., we could test that no curator level contains an expert less in reliability than anyone in a lower strata.

References

  1. Abekawa, T., Aizawa, A.: Sidenoter: scholarly paper browsing system based on PDF restructuring and text annotation. In: COLING (Demos), pp. 136–140. ACL (2016)

  2. Alagar, V.S., Sadri, F., Said, J.N.: An extended relational model for managing uncertain information. In: DEXA, Workshop, pp. 257–266 (1995)

  3. Alagar, V.S., Sadri, F., Said, J.N.: Semantics of an extended relational model for managing uncertain information. In: CIKM, pp. 234–240 (1995)

  4. Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R., Wang, X.: Assisted curation: does text mining really help? In: Biocomputing 2008, Proceedings of the Pacific Symposium, Kohala Coast, Hawaii, USA, 4–8 January 2008, pp. 556–567 (2008)

  5. Alonso, O., Marshall, C.C., Najork, M.A.: A human-centered framework for ensuring reliability on crowdsourced labeling tasks. In: Human Computation and Crowdsourcing: Works in Progress and Demonstration Abstracts, An Adjunct to the Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing, 7–9 November, Palm Springs, CA, USA (2013)

  6. Antony, A., Basetty, S., Hartanto, S., Palakal, M.J.: Computational approach to biological validation of protein–protein interactions discovered using literature mining. In: ACM SAC, Fortaleza, Ceara, Brazil, 16–20 March, pp. 1302–1306 (2008)

  7. Askalidis, G., Stoddard, G.: A theoretical analysis of crowdsourced content curation. In: Workshop on Social Computing and User Generated Content (2013)

  8. Attrill, H., Falls, K., Goodman, J.L., Millburn, G.H., Antonazzo, G., Rey, A.J., Marygold, S.J., the FlyBase consortium: FlyBase: establishing a gene group resource for drosophila melanogaster. Nucleic Acids Res. 44(D1), D786–D792 (2016)

  9. Bhaskar, P., Buzzi, M., Geraci, F., Pellegrini, M.: From literature to knowledge: exploiting PubMed to answer biomedical questions in natural language. In: ITBAM, Spain, 3–4 September, pp. 3–15 (2015)

  10. BlueBeam: https://www.bluebeam.com/us/products/revu/search.asp. Accessed 24 June 2017

  11. Bozzon, A., Brambilla, M., Ceri, S., Silvestri, M., Vesci, G.: Choosing the right crowd: expert finding in social networks. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 637–648 (2013)

  12. Breitkreutz, B.-J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bähler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2008 update. NAR 36, D637–D640 (2008)

    Article  Google Scholar 

  13. Budescu, D.V., Chen, E.: Identifying expertise to extract the wisdom of crowds. Manag. Sci. 61(2), 267–280 (2015)

    Article  Google Scholar 

  14. Burger, J.D., Doughty, E., Khare, R., Wei, C., Mishra, R., Aberdeen, J.S., Tresner-Kirsch, D., Wellner, B., Kann, M.G., Lu, Z., Hirschman, L.: Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database (2014). doi:10.1093/database/bau094

  15. Cao, D., Xiao, N., Xu, Q., Chen, A.F.: Rcpi: R/bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31(2), 279–281 (2015)

    Article  Google Scholar 

  16. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinform. 5(1), 1–13 (2004)

    Article  Google Scholar 

  17. Cooper, S., Khatib, F., Makedon, I., Lü, H., Barbero, J., Baker, D., Fogarty, J., Popovic, Z., Players, F.: Analysis of social gameplay macros in the FoldIt cookbook. In: Foundations of Digital Games, FDG’11, Bordeaux, France, June 28–July 1, pp. 9–14 (2011)

  18. Crescenzi, V., Merialdo, P., Qiu, D.: Crowdsourcing large scale wrapper inference. Distrib. Parallel Databases 33(1), 95–122 (2015)

    Article  Google Scholar 

  19. Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.-R., Simonis, N., Rual, J.-F., Borick, H., Braun, P., Dreze, M., Vandenhaute, J., Galli, M., Yazaki, J., Hill, D.E., Ecker, J.R., Roth, F.P., Vidal, M.: Literature-curated protein interaction datasets. Nat. Methods 6(1), 39–46 (2009)

    Article  Google Scholar 

  20. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, Toronto, Canada, August 31–September 3, pp. 864–875 (2004)

  21. Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS, pp. 1–12 (2007)

  22. Davis, A.P., Wiegers, T.C., Roberts, P.M., King, B.L., Lay, J.M., Lennon-Hopkins, K., Sciaky, D., Johnson, R.J., Keating, H., Greene, N., Hernandez, R., McConnell, K.J., Enayetallah, A., Mattingly, C.J.: A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions. Database (2013). doi:10.1093/database/bat080

  23. Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Pick-a-Crowd: tell me what you like, and i’ll tell you what to do. In: Proceedings of the 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro. Brazil, vol. 13–17, pp. 367–374 (2013)

  24. EverMap: https://www.evermap.com/HighlightText.asp. Accessed 24 June 2017

  25. Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., Matthews, L., May, B., Milacic, M., Rothfels, K., Shamovsky, V., Webber, M., Weiser, J., Williams, M., Wu, G., Stein, L., Hermjakob, H., D’Eustachio, P.: The reactome pathway knowledgebase. Nucleic Acids Res. 44(D1), D481–D487 (2016)

    Article  Google Scholar 

  26. Fourches, D., Muratov, E.N., Tropsha, A.: Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50(7), 1189–1204 (2010)

    Article  Google Scholar 

  27. Franklin, M.J., Kossmann, D., Kraska, T., Ramesh, S., Xin, R.: CrowdDB: answering queries with crowdsourcing. In: ACM SIGMOD, Athens, Greece, 12–16 June, pp. 61–72 (2011)

  28. Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views. In: WSDM, New York, 4–6 February, pp. 131–140 (2010)

  29. Gama-Castro, S., Rinaldi, F., López-Fuentes, A., Balderas-Martínez, Y.I., Clematide, S., Ellendorff, T.R., Santos-Zavaleta, A., Marques-Madeira, H., Collado-Vides, J.: Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12. Database (2014). doi: 10.1093/database/bau049

  30. Goodspeed, R., Spanring, C., Reardon, T.: Crowdsourcing as data sharing: a regional web-based real estate development database. In: ICEGOV, NY, USA, 22–25 October, pp. 460–463 (2012)

  31. Hirschman, L., Fort, K., Boué, S., Kyrpides, N., Dogan, R.I., Cohen, K.B.: Crowdsourcing and curation: perspectives from biology and natural language processing. Database (2016). doi:10.1093/database/baw115

  32. Jacquin, T., Fambon, O., Chidlovskii, B.: A web-based document harmonization and annotation chain: from PDF to RDF. In: ACM Symposium on Document Engineering, pp. 225–226. ACM (2005)

  33. Jamieson, D.G., Roberts, P.M., Robertson, D.L., Sidders, B., Nenadic, G.: Cataloging the biomedical world of pain through semi-automated curation of molecular interactions. Database (2013). doi:10.1093/database/bat033

  34. Jamil, H.M., Sadri, F.: Recognizing credible experts in inaccurate databases. In: Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, ISMIS ’94, Charlotte, North Carolina, USA, 16–19 October, pp. 46–55 (1994)

  35. Joseph, T., Saipradeep, V.G., Kotte, S., Rao, A., Srinivasan, R.: Plugin for concept-assisted search and navigation on PubMed. In: IEEE BIBM, Washington, DC, USA, 9–12 November, pp. 1712–1714 (2015)

  36. Kalathur, R.K.R., Pinto, J.P., Hernández-Prieto, M.A., Machado, R.S.R., Almeida, D., Chaurasia, G., Futschik, M.E.: UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks. Nucleic Acids Res. 42(Database–Issue), 408–414 (2014)

    Article  Google Scholar 

  37. Kamar, E., Kapoor, A., Horvitz, E.: Identifying and accounting for task-dependent bias in crowdsourcing. In: AAAI HCOMP, 8–11 November, San Diego, CA, pp. 92–101 (2015)

  38. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y.: KEGG for linking genomes to life and the environment. NAR 36(Database–Issue), 480–484 (2008)

    Google Scholar 

  39. Karp, P.D.: Crowd-sourcing and author submission as alternatives to professional curation. Database (2016). doi:10.1093/database/baw149

  40. Kazemi, L., Shahabi, C., Chen, L.: GeoTruCrowd: trustworthy query answering with spatial crowdsourcing. In: SIGSPATIAL, Orlando, FL, 5–8 November, pp. 304–313 (2013)

  41. Keseler, I.M., Skrzypek, M., Weerasinghe, D., Chen, A.Y., Fulcher, C., Li, G.-W., Lemmer, K.C., Mladinich, K.M., Chow, E.D, Sherlock, G., Karp, P.D.: Curation accuracy of model organism databases. Database (2014). doi:10.1093/database/bau058

  42. Khare, R., Burger, J.D., Aberdeen, J.S., Tresner-Kirsch, D., Corrales, T.J., Hirschman, L., Lu, Z.: Scaling drug indication curation through crowdsourcing. Database (2015). doi:10.1093/database/bav016

  43. Kifer, M., Li, A.: On the semantics of rule-based expert systems with uncertainty. In: ICDT, pp. 102–117 (1988)

  44. Kim, S., Islamaj Dogan, R., Chatr-Aryamontri, A., Chang, C.S., Oughtred, R., Rust, J., Batista-Navarro, R., Carter, J., Ananiadou, S., Matos, S., Santos, A., Campos, D., Oliveira, J.L., Singh, O., Jonnagaddala, J., Dai, H.-J., Su, E.C.-Y., Chang, Y.-C., Su, Y.-C., Chu, C.-H., Chen, C.C., Hsu, W.-L., Peng, Y., Arighi, C., Wu, C.H., Vijay-Shanker, K., Aydin, F., Hsnbeyi, Z.M., zgr, A., Shin, S.-Y., Kwon, D., Dolinski, K., Tyers, M., Wilbur, W.J., Comeau, D.C.: BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (2016). doi:10.1093/database/baw121

  45. Kostakos, V.: Is the crowd’s wisdom biased? A quantitative analysis of three online communities. In: IEEE CSE, Vancouver, BC, Canada, 29–31 August, pp. 251–255 (2009)

  46. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein–protein interaction annotation extraction task of BioCreative II. Genome Biol 9(Suppl 2), S4 (2008)

    Article  Google Scholar 

  47. Krallinger, M., Vazquez, M., Leitner, F., Salgado, D., Aryamontri, A.C., Winter, A., Perfetto, L., Briganti, L., Licata, L., Iannuccelli, M., Castagnoli, L., Cesareni, G., Tyers, M., Schneider, G., Rinaldi, F., Leaman, R., Gonzalez, G., Matos, S., Kim, S., Wilbur, W., Rocha, L., Shatkay, H., Tendulkar, A., Agarwal, S., Liu, F., Wang, X., Rak, R., Noto, K., Elkan, C., Lu, Z.: The protein–protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform. 12(Suppl 8), S3 (2011)

    Article  Google Scholar 

  48. Kuperstein, I., Cohen, D.P.A., Pook, S., Viara, E., Calzone, L., Barillot, E., Zinovyev, A.: NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps. BMC Syst. Biol. 7, 100 (2013)

    Article  Google Scholar 

  49. Kwon, D., Kim, S., Shin, S., Chatr-aryamontri, A., Wilbur, W.J.: Assisting manual literature curation for protein–protein interactions using BioQRator. Database (2014). doi:10.1093/database/bau067

  50. Lakshmanan, L.V.S., Shiri, N.: A parametric approach to deductive databases with uncertainty. IEEE Trans. Knowl. Data Eng. 13(4), 554–570 (2001)

    Article  Google Scholar 

  51. Li, F., Jagadish, H.V.: Understanding natural language queries over relational databases. SIGMOD Rec. 45(1), 6–13 (2016)

    Article  Google Scholar 

  52. Liu, W., Laulederkind, S.J.F., Hayman, G.T., Wang, S.-J., Nigam, R., Smith, J.R., De Pons, J., Dwinell, M.R., Shimoyama, M.: Ontomate: a text-mining tool aiding curation at the rat genome database. Database (2015). doi:10.1093/database/bau129

  53. Lofi, C., Maarry, K.E., Balke, W.: Skyline queries in crowd-enabled databases. In: Joint EDBT/ICDT Conferences, Genoa, Italy, 18–22 March, pp. 465–476 (2013)

  54. Mallory, E.K., Zhang, C., Ré, C., Altman, R.B.: Large-scale extraction of gene interactions from full-text literature using deepdive. Bioinformatics 32(1), 106–113 (2016)

    Google Scholar 

  55. Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: query processing with people. In: Biennial Innovative Data Systems Research Conference, Asilomar, CA, USA, 9–12 January, pp. 211–214 (2011)

  56. Mazloom, A.R., Dannenfelser, R., Clark, N.R., Grigoryan, A.V., Linder, K.M., Cardozo, T.J., Bond, J.C., Boran, A.D.W., Iyengar, R., Malovannaya, A., Lanz, R.B., Ma’ayan, A.: Recovering protein–protein and domain-domain interactions from aggregation of ip-ms proteomics of coregulator complexes. PLOS Comput Biol, 7(12):1–10, 12 (2011)

  57. McDowall, M.D., Scott, M.S., Barton, G.J.: PIPs: human protein–protein interaction prediction database. NAR 37(suppl 1), D651–D656 (2009)

    Article  Google Scholar 

  58. Mehla, J., Caufield, J.H., Uetz, P.: Mapping protein–protein interactions using yeast two-hybrid assays. Cold Spring Harb. Protoc. 5, 2015 (2015)

    Google Scholar 

  59. Moal, I.H., Jiménez-García, B., Fernández-Recio, J.: CCharPPI web server: computational characterization of protein–protein interactions from structure. Bioinformatics 31(1), 123–125 (2015)

    Article  Google Scholar 

  60. Mou, X., Jamil, H.M., Ma, X.: Visflow: A visual database integration and workflow querying system. In: Proceedings of the 33rd International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, 19–22 April, pp. 1421–1422 (2017)

  61. Mou, X., Jamil, H.M., Rinker, R.: Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, Shenzhen, China, 15–18 December, pp. 752–757 (2016)

  62. Mou, X., Jamil, H.M., Rinker, R.: Implementing computational biology pipelines using visflow. In. J. Data Min. Bioinform. 17(2), 115–131 (2017)

    Article  Google Scholar 

  63. Murali, T., Pacifico, S., Yu, J., Guest, S., Roberts, G.G., Finley Jr., R.L.: DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for drosophila. Nucleic Acids Res. 39(suppl 1), D736–D743 (2011)

    Article  Google Scholar 

  64. Nakatsu, R.T., Iacovou, C.L.: An investigation of user interface features of crowdsourcing applications. In: HCI International, Crete, Greece, 22–27 June, pp. 410–418 (2014)

  65. Park, H., Widom, J.: CrowdFill: collecting structured data from the crowd. In: ACM SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 577–588 (2014)

  66. PDF Annotator: https://www.pdfannotator.com/en/. Accessed 24 June 2017

  67. PDF Editor: https://pdf.iskysoft.com/edit-pdf/highlight-pdf-mac.html. Accessed 24 June 2017

  68. Peng, J., Liu, Q., Ihler, A., Berger, B.: Crowdsourcing for structured labeling with applications to protein folding. In: Proceedings of the Machine Learning Meets Crowdsourcing Workshop, ICML (2013)

  69. Peng, W., Wang, J., Cai, J., Chen, L., Li, M., Wu, F.-X.: Improving protein function prediction using domain and protein complexes in ppi networks. BMC Syst. Biol. 8(1), 1–13 (2014)

    Article  Google Scholar 

  70. Perkel, J.M.: Annotating the scholarly web. Nature 528(7580), 153–154 (2015)

    Article  Google Scholar 

  71. Pochampally, R., Sarma, A.D., Dong, X.L., Meliou, A., Srivastava, D.: Fusing data with correlations. In: SIGMOD, Snowbird, UT, USA, 22–27 June, pp. 433–444 (2014)

  72. Powley, B., Dale, R., Anisimoff, I.: Enriching a document collection by integrating information extraction and PDF annotation. In: DRR, SPIE Proceedings, vol. 7247, p. 724707. SPIE (2009)

  73. Rahmanian, B., Davis, J.G.: User interface design for crowdsourcing systems. In: AVI, Como, Italy, 27–29 May, pp. 405–408 (2014)

  74. Raja, K., Subramani, S., Natarajan, J.: PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (2013). doi:10.1093/database/bas052

  75. Ramanath, R., Choudhury, M., Bali, K., Roy, R.S.: Crowd prefers the middle path: a new IAA metric for crowdsourcing reveals turker biases in query segmentation. In: ACL, 4-9 August, Sofia, Bulgaria, Volume 1: Long Papers, pp. 1713–1722 (2013)

  76. Roberts, R.J., Varmus, H.E., Ashburner, M., Brown, P.O., Eisen, M.B., Khosla, C., Kirschner, M., Nusse, R., Scott, M., Wold, B.: Building a “GenBank” of the published literature. Science 291(5512), 2318–2319 (2001)

    Article  Google Scholar 

  77. Rodriguez-Esteban, R.: Biocuration with insufficient resources and fixed timelines. In: Biocuration, Geneva, Switzerland, 10–14 April, Oral presentation (2016)

  78. Rogstadius, J., Vukovic, M., Teixeira, C.A., Kostakos, V., Karapanos, E., Laredo, J.: CrisisTracker: Crowdsourced social media curation for disaster awareness. IBM J. Res. Dev. 57(5), 1–4 (2013)

    Article  Google Scholar 

  79. Sadri, F.: Modeling uncertainty in databases. In: ICDE, pp. 122–131 (1991)

  80. Sadri, F.: On the foundations of probabilistic information integration. In: CIKM, Maui, HI, USA, October 29–November 2, pp. 882–891 (2012)

  81. Sadri, F.: Reliability of answers to queries in relational databases. IEEE Trans. Knowl. Data Eng. 3(2), 245–251 (1991)

    Article  Google Scholar 

  82. Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  83. Sadri, F.: Information source tracking method: efficiency issues. IEEE Trans. Knowl. Data Eng. 7(6), 947–954 (1995)

    Article  Google Scholar 

  84. Sadri, F.: Integrity constraints in the information source tracking method. IEEE Trans. Knowl. Data Eng. 7(1), 106–119 (1995)

    Article  Google Scholar 

  85. Sarjant, S., Legg, C., Stannett, M., Willcock, D.: Crowd-sourcing ontology content and curation: the massive ontology interface. In: FOIS, Rio de Janeiro, Brazil, 22–25 September, pp. 251–260 (2014)

  86. Sevimoglu, T., Arga, K.Y.: The role of protein interaction networks in systems biomedicine. Comput. Struct. Biotechnol. J. 11(18), 22–27 (2014)

    Article  Google Scholar 

  87. Shakarian, P., Parker, A., Simari, G.I., Subrahmanian, V.S.: Annotated probabilistic temporal logic. ACM Trans. Comput. Log. 12(2), 14 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  88. Subramani, S., Kalpana, R., Monickaraj, P.M., Natarajan, J.: HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. J. Biomed. Inform. 54, 121–131 (2015)

    Article  Google Scholar 

  89. Suter, B., Zhang, X., Pesce, C.G., Mendelsohn, A.R., Dinesh-Kumar, S.P., Mao, J.-H.H.: Next-generation sequencing for binary protein–protein interactions. Front. Genet. 6, 346 (2015)

    Article  Google Scholar 

  90. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., Kuhn, M., Bork, P., Jensen, L.J., von Mering, C.: STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)

    Article  Google Scholar 

  91. Takis, J., Islam, A.Q.M.S., Lange, C., Auer, S.: Crowdsourced semantic annotation of scientific publications and tabular data in PDF. In: SEMANTICS, pp. 1–8. ACM (2015)

  92. Tastan, O., Qi, Y., Carbonell, J.G., Klein-Seetharaman, J.: Refiing Literature Curated Protein Interactions Using Expert Opinions, pp. 318–329. World Scientific, Singapore (2014)

    Google Scholar 

  93. Thomas, P., Starlinger, J., Vowinkel, A., Arzt, S., Leser, U.: GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 40(W1), W585–W591 (2012)

    Article  Google Scholar 

  94. Turinsky, A.L., Razick, S., Turner, B., Donaldson, I.M., Wodak, S.J.: Literature curation of protein interactions: measuring agreement across major public databases. Database (2010). doi:10.1093/database/baq026

  95. U. of Washington: FoldIt: Solve Puzzles for Science. https://fold.it/portal/. Accessed 14 September 2016

  96. U. of Washington: Play FoldIt: Games for Change. http://www.gamesforchange.org/play/foldit/. Accessed 14 September 2016

  97. Vasilescu, J., Figeys, D.: Mapping protein–protein interactions by mass spectrometry. Curr. Opin. Biotechnol. 17(4), 394–399 (2006)

    Article  Google Scholar 

  98. Wang, H., Ganapathiraju, M.K.: Evaluation of protein–protein interaction predictors with noisy partially labeled data sets. CoRR, abs/1509.05742 (2015)

  99. Wang, P.: The Scientist in Us All: How crowdsourcing in science is changing the world. http://yalescientific.org/thescope/2016/04/the-scientist-in-us-all-how-crowdsourcing-in-science-is-changing-the-world/ (2016). Accessed 6 September 2016

  100. Wang, Z., Clark, N.R., Ma’ayan, A.: Dynamics of the discovery process of protein–protein interactions from low content studies. BMC Syst. Biol. 9, 26 (2015)

    Article  Google Scholar 

  101. Xie, S., Hu, Q., Zhang, J., Gao, J., Fan, W., Yu, P.S.: Robust crowd bias correction via dual knowledge transfer from multiple overlapping sources. In IEEE International Conference on Big Data, CA, USA, October 29–November 1, pp. 815–820 (2015)

  102. Zadeh, L.A.: Knowledge representation in fuzzy logic. IEEE Trans. Knowl. Data Eng. 1(1), 89–100 (1989)

    Article  Google Scholar 

  103. Zhang, Y., Lin, H., Yang, Z., Wang, J.: Integrating experimental and literature protein–protein interaction data for protein complex prediction. BMC Genom. 16(S–2), S4 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

The current prototype of CrowdCure has been implemented by Xin Mou and the authors gratefully acknowledge his contributions, and many of the investigations into tool choices he helped them with.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasan M. Jamil.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jamil, H.M., Sadri, F. Crowd enabled curation and querying of large and noisy text mined protein interaction data. Distrib Parallel Databases 36, 9–45 (2018). https://doi.org/10.1007/s10619-017-7209-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7209-x

Keywords

Navigation