Abstract
The emergence of a large number of bioinformatics datasets on the Internet has resulted in the need for flexible and efficient approaches to integrate information from multiple bioinformatics data sources and services. In this paper, we present our approach to automatically generate composition plans for web services, optimize the composition plans, and execute these plans efficiently. While data integration techniques have been applied to the bioinformatics domain, the focus has been on answering specific user queries. In contrast, we focus on automatically generating parameterized integration plans that can be hosted as web services that respond to a range of inputs. In addition, we present two novel techniques that improve the execution time of the generated plans by reducing the number of requests to the existing data sources and by executing the generated plan more efficiently. The first optimization technique, called tuple-level filtering, analyzes the source/service descriptions in order to automatically insert filtering conditions in the composition plans that result in fewer requests to the component web services. To ensure that the filtering conditions can be evaluated, this technique may include sensing operations in the integration plan. The savings due to filtering significantly exceed the cost of the sensing operations. The second optimization technique consists in mapping the integration plans into programs that can be executed by a dataflow-style, streaming execution engine. We use real-world bioinformatics web services to show experimentally that (1) our automatic composition techniques can efficiently generate parameterized plans that integrate data from large numbers of existing services and (2) our optimization techniques can significantly reduce the response time of the generated integration plans.
Similar content being viewed by others
References
Bright, L., Gruser, J.-R., Raschid, L., Vidal, M.E.: A wrapper generation toolkit to specify and construct wrappers for web accessible data sources (web sources). J. Comput. Syst. Sci. Eng. 14(2), (1999)
Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proceedings of the International Conference on Artificial Intelligence, IJCAI-97 (1997)
Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proceedings of the 17th National Conference on Artificial Intelligence (2000)
Schoppers, M.: Universal plans for reactive robots in unpredictable environments. In: Proceedings of the International Conference on Artificial Intelligence, IJCAI-87 (1987)
Thakkar, S., Ambite, J.L., Knoblock, C.A.: A view integration approach to dynamic composition of web services. In: Proceedings of 2003 ICAPS Workshop on Planning for Web Services. Trento, Italy (2003)
Thakkar, S., Ambite, J.L., Knoblock, C.A.: A data integration approach to automatically composing and optimizing web services. In: Proceedings of 2004 ICAPS Workshop on Planning and Scheduling for Web and Grid Services (2004)
Thakkar, S., Knoblock, C.A.: Efficient execution of recursive integration plans. In: Proceeding of 2003 IJCAI Workshop on Information Integration on the Web. Acapulco, Mexico (2003)
Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proceedings of the Eighth ACM SIGKDD International Conference. Edmonton, Alberta, Canada (2002)
Bayardo, R.J., Jr., Bohrer, W., Brice, R.S., Cichocki, A., Flower, J., Helal, A., Kashyap, V., Ksiezyk, T., Martin, G., Nodine, M., Rashid, M., Rusinkiewicz, M., Shea, R., Unnikrishnan, C., Unruh, A., Woelk, D.: Infosleuth: agent-based semantic integration of information in open and dynamic environments. In: Proceedings of ACM SIGMOD-97 (1997)
Genesereth, M.R., Keller, A.M., Duschka, O.M.: Infomaster: an information integration system. In: Proceedings of ACM SIGMOD-97 (1997)
Knoblock, C.A., Minton, S., Ambite, J.-L., Ashish, N., Muslea, I., Philpot, A., Tejada, S.: The ariadne approach to web-based information integration. Int. J. Intell. Cooperative Inform. Syst. (IJCIS) 10(1–2), 145–169 (2001)
Levy, A.Y., Rajaraman, A., Ordille, J.J.: Query-answering algorithms for information agents. In: Proceedings of AAAI-96 (1996)
Duschka, O.M.: Query planning and optimization in information integration. PhD thesis, Stanford University (1997)
Levy, A.: Logic-based techniques in data integration. In: Minker, J. (ed.) Logic Based Artificial Intelligence. Kluwer, Boston (2000)
Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: Integrating and accessing heterogeneous information sources in tsimmis. In: Proceedings of the AAAI Symposium on Information Gathering. Stanford, CA (1995)
Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of ACM Symposium on Principles of Database Systems. Madison, WI, USA (2002)
Golden, K.: Leap before you look: information gathering in the puccini planner. In: Proceedings of the 4th International Conference on Artificial Intelligence Planning Systems (1998)
Haas, L.M., Kodali, P., Rice, J.E., Schwarz, P.M., Swope, W.C.: Integrating life sciences data-with a little garlic. In: Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE'00), pp. 5–13 (2000)
Kambhampati, S., Lambrecht, E., Nambiar, U., Nie, Z., Gnanaprakasam, S.: Optimizing recursive information gathering plans in emerac. J. Intell. Inform. Syst. (2003)
Lacroix, Z., Raschid, L., Eckman, B.A.: Techniques for optimization of queries on integrated biological resources. J. Bioinform. Comput. Biol. 2(2), 375–411 (2004)
Kifer, M., Lozinskii, E.L.: On compile-time query optimization in deductive databases by means of static filtering. ACM Trans. Database Syst. 15(3), 385–426 (1990)
Levy, A.Y., Suciu, D.: Deciding containment for queries with complex objects. In: Proceedings of the 16th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems, pp. 20–31 (1997)
Lacroix, Z., Raschid, L.: A map of biological resources to support a complete characterization of scientific entities. Technical report, University of Maryland (2002)
Michalowski, M., Thakkar, S., Knoblock, C.: Automatically utilizing secondary sources to align information across sources, special issue on semantic integration. AI Mag. 26(1), 33–45 (2005)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: ACM SIGMOD Conference (1999)
Barish, G., Knoblock, C.A.: An expressive language and efficient execution system for software agents. J. Artif. Intell. Res. 23, 625–666 (2005)
Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. VLDB J. 484–495 (2000)
Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23(2), 7–18 (2000)
Naughton, J.F., DeWitt, D.J., Maier, D., Aboulnaga, A., Chen, J., Galanis, L., Kang, J., Krishnamurthy, R., Luo, Q., Prakash, N., Ramamurthy, R., Shanmugasundaram, J., Tian, F., Tufte, K., Viglas, S., Wang, Y., Zhang, C., Jackson, B., Gupta, A., Chen, R.: The niagara Internet query system. IEEE Data Eng. Bull. 24(2), 27–33 (2001)
Mork, P., Halevy, A., Tarczy-Hornoch, P.: A model for data integration systems of biomedical data applied to online genetic databases. In: Proceedings of the American Medical Informatics Association Fall Symposium (AMIA) (2001)
Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: Pql: a declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Fall Symposium (AMIA). San Antonio, TX (2002)
Buneman, P., Crabtree, J., Davidson, S.B., Overton, C., Tannen, V., Wong, L., BioKleisli: Integrating biomedical data and analysis packages. In: Letovsky, S. (ed.) Bioinformatics: Databases and Systems. Kluwer Academic Publishers, pp. 201–217 (1999)
Davidson, S.B., Overton, G.C., Tannen, V., Wong, L.: Biokleisli: a digital library for biomedical researchers. Int. J. Digital Libraries 1(1), 36–53 (1997)
Goble, C.A., Stevens, R., Ng, G., Bechhofer, S., Paton, N.W., Baker, P.G., Peim, M., Brass, A.: Transparent access to multiple bioinformatics information sources, special issue on deep computing for the life sciences. IBM Syst. J. 40(2), 532–552 (2001)
Stevens, R., Goble, C., Paton, N.W., Bechhofer, S., Ng, G., Baker, P., Brass, A.: Complex query formulation over diverse information sources in TAMBIS. In: Lacroix, Z., Critchlow, T. (eds.) Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco, CA (2003)
Eckman, B.A., Kosky, A.S., Laroco, L.A., Jr.: Extending traditional query-based integration approaches for functional characterization of post-genomic data. Bioinformatics 17(7), 587–601 (2001)
Eckman, B.A., Lacroix, Z., Raschid, L.: Optimized seamless integration of biomolecular data. In: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE'01), pp. 23–32 (2001)
Ashish, N., Knoblock, C.A., Levy, A.: Information gathering plans with sensing actions. In: European Conference on Planning, ECP-97. Toulouse, France (1997)
Ullman, J.: Principles of Data and Knowledge-Base Systems. Computer Science Press, New York (1988)
Bultan, T., Fu, X., Hull, R., Su, J.: Conversation specification: a new approach to design and analysis of e-service composition. In: Proceedings of 12th International World Wide Web Conference (WWW) (2003)
McIlraith, S., Son, T.C.: Adapting golog for composition of semantic web services. In: Proceedings of the 8th International Conference on Knowledge Representation and Reasoning (KR'02). Toulouse, France (2002)
Wu, D., Parsia, B., Sirin, E., Hendler, J., Nau, D.: Automating daml-s web services composition using shop2. In: 2nd International Semantic Web Conference (ISWC2003) (2003)
Levesque, H.J., Reiter, R., Lesperance, Y., Lin, F., Scherl, R.B.: GOLOG: a logic programming language for dynamic domains. J. Logic Program. 31(1–3), 59–83 (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thakkar, S., Ambite, J.L. & Knoblock, C.A. Composing, optimizing, and executing plans for bioinformatics web services. The VLDB Journal 14, 330–353 (2005). https://doi.org/10.1007/s00778-005-0158-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0158-4