Barely more than 10 years after the birth of the World Wide Web, the Global Information Infrastructure is daily reality. In spite of the many applications in all domains of our societies: e-business, e-commerce, e-learning, e-science, and e-government, for instance, and in spite of the tremendous advances by engineers and scientists, the seamless integration of information and services remains a major challenge. The current shared vision for the future is one of semantically rich information and service oriented architecture for global information systems. This vision is at the convergence of progress in technologies such as XML, Web services, RDF, OWL, of multimedia, multimodal, and multilingual information retrieval, and of distributed, mobile and ubiquitous computing.

Semantic Information Management (SIM) can be identified as the discipline that studies the integration and management of multi-modal information from distributed, heterogeneous, and autonomous sources based on their meaning. SIM is deemed necessary since there is an increasing tendency of inter-business and inter-applications sharing of data. The data can be written and stored using different standard and format. Hence, we need a platform to manage the data so that it can be universally understood.

SIM as a research has variety research challenges. The first challenge is the identification of suitable meta-model that can ensure the quality of the integrated data. The next challenge is the adoption of ever-increasing number of standards used by enterprises. To comply with standards, enterprises have to represent their data uniquely and thus, the challenge to integrate them with data from other enterprises increases. XML has become a standard data format widely used in many organizations, thus leading to a growing need for exchanging and integrating multiple XML data sources across different application systems. XML has been regarded as a de-facto standard for data exchanges in enterprises. Integrating these sources requires an integration of their schemas. The main challenge is schema integration where the semantic is maintained.

Aligning with the previous challenge, many enterprises these days have to comply with various government and professional regulations in managing their data. It makes the integration of data from multi-discipline enterprises and cross-regulated regions even more difficult. Another challenge is the ability to trace back the integrated information to its original source and format. The issue of forward and backward compatibility is far more complex than in traditional data integration research domain. The issue becomes more complex in a social network setting. Social networks consist mainly of groups of inter-connected people. This has played an important role in changing the way people interact with each other. Social networks analysis focuses on studying patterns of communication and exchange of information between people, which may influence not only on the individuals who adopt them, but also on the societies and organizations that enclose them. Since social network is not only heterogeneous, SIM in social networks imposes even a greater challenge, since each social group may have their own information semantic, and therefore, semantic in a global information exchange is a complex problem.

The availability of cloud computing makes SIM even more challenging. Cloud computing has been increasingly regarded as a revolution in the computing practices through the concept of utility computing, whereby computing services are provided as utilities. There are various number of services provided by the cloud, from computing power, storage, to various software applications. SIM in the cloud will provide Information as a Service, a totally novel concept. The main challenge will be to manage such information so that the semantic will still be preserved. This can be achieved through ontology in the cloud, where ontology becomes one of the core cloud service to serve SIM.

Another infrastructural aspect of SIM is pervasive, ubiquitous, and mobile computing, where people are able to access information anytime and anywhere using a portable, wireless computer powered by battery. These portable computers communicate with a central stationary server via a wireless channel. Mobile and ubiquitous computing offers SIM a totally new perspective, whereby most of the information is mobile and location-dependent. The main challenge of SIM is to track and maintain information about moving objects. This has a large implication on how information is managed. Due to the inherent limitations of a wireless and mobile environment, such as wireless connectivity, information loss, and multiple service providers, it is critical that the semantic of information of moving objects is preserved across different platforms and service providers.

In this special issue, we present selected papers presented at the 12th International Conference on Information Integration and Web-based Applications and Services (iiWAS’2010), held in Paris, France from the 8th to the 10th of November 2010. The papers have been extended significantly from their conference version, to include a thorough literature review and more advanced results. In addition, we have also received several direct submissions as a response to our call for papers. After the review process, we are pleased to include the seven papers in this issue, three of which came from extended version of iiWAS2010 conference papers and the other four came from direct submissions. The articles in this special issue address some of the challenges mentioned above in different settings and domains.

1 Collaborative learning in the clouds

Cloud computing, as indicated in the first paper (Mousannif et al. 2013), is an emerging computing service paradigm, which is expected to provide more and more computing services. In this paper, the authors demonstrated its benefits for organizations and particularly for educational establishments, which are more and more suffering from under-funding due to the global economic crisis. Authors gave several examples of worldly known educational institutions that have embraced cloud computing for learning, teaching or research purposes. They introduced an effort in building a private cloud inside their universities and mentioned some of its offering.

As perspectives, several extensions to this private cloud are planned. One major limitation in the current implementation is that course coordinators should themselves create, suspend or delete VMs on behalf of their students. This can be sometimes heavy for course coordinators mainly because the number and nature of projects assigned to students in each semester changes all the time making it difficult for course coordinators to predict the number of VMs to be created as well as their required characteristics. Therefore, the VM creation/suppression service should be extended to students while ensuring a certain level of performance of the virtual infrastructure. One challenge in doing this is that the infrastructure seems to be infinitely available when using VMs while physical hardware in this case is still limited. A way to force students to economically and reasonably use the infrastructure will be highly required.

It is also noticed that PC labs and servers in universities are under-utilized during the night and semester breaks. The other feature that need to be added is to grant outside access to this cloud during these periods to allow researches and postgraduates to run experiments that are likely to involve a great deal of processing and computation, such as simulations.

2 Intelligent decision making

Web-DSS has potential for increasing of productivity and speed the decision making process without regard to geographic limitations. Semantic Web-DSS and Defeasible logic-based implementations of Web-DSS along with their limitations have been the subject of different research and development activities in the past decade. Several approaches and solution have been published and created during this time. The second paper (Janjua and Hussain 2013) in this special issue aims to utilize semantic information and knowledge integration to assist intelligent decision support system. To achieve their goal, two frameworks that use argumentation schemes are proposed to enable semantic information integration and knowledge integration. In this paper, the authors propose formal syntax in addition to the conceptual framework. For validation of results, the paper also demonstrates a prototype application.

The authors gave detailed explanation of current state of the art in this area. The use case related to Information and knowledge integration in enterprise is exposed. One of the drawbacks of Web-DSS is an inability to represent, reason and integrate incomplete and inconsistent information for information integration purposes. The authors propose the Web@IDSS in order to address the above-mentioned challenges. The authors elaborate on the proposed conceptual framework for semantic knowledge integration and discuss algorithms and define the formal syntax and semantics for knowledge integration. Also this paper examines implementation details of prototype and using it. The prototype application for validation of results yields a positive impression.

As the future perspectives, enterprise environments are becoming increasingly complex, competitive and dynamic. The business policies change dynamically and frequently to keep pace with the competitive nature of business environments. However the actual processes carried out in day-today business environments are not always in consonance with the new business policies. This situation is more profound in the case of managing dynamic processes where environment changes rapidly. This demand for an enterprise business process modelling methodology that automatically builds models and executes task specific models in response to user queries. Such an approach should be flexible enough for e-Collaboration for business process modelling amongst different participants to address new challenges such as business process mergers. To address above-mentioned challenge, a policy-centric information system is to be designed and developed by extending the argumentation based intelligent decision making techniques proposed in this paper with a new graphical language to represent different process constructs and their linkages in a process model.

In the past decade or so, numerous machine-learning methods have been used to automatically learn and recognize complex patterns and make intelligent decisions based on enterprise data. One of the common attributes of these machine leaning methods is that their working and functionality are constrained by the amount of input data. However, the scale of the enterprise data has increased many-fold (leading to the concept of Big Data), thereby in many cases rendering the underlying machine learning algorithms either incapable of managing such large and ever increasing data, or too slow for decision making. It is necessary to enhance the current generation of machine learning techniques with argumentation formalisms described in this paper. In such cases, the arguments from experts are considered during mining of enterprise data. Such work will lay down foundations for performing large-scale analytics on big data in an enterprise. Such an approach would make use of cloud platforms.

3 Complex social networks

The next paper (Sorkhoh et al. 2013) proposes an algorithm to compute cycles for storing information in complex networks, such as in social networks. The proposed algorithm is based on Belief Propagation (BP) algorithm. Using simulation, the enhanced BP algorithm is superior to original BP algorithm in terms of execution time without sacrificing the accuracy of cycles distribution in the networks.

The paper describes computational methods and approaches of the cycle’s distribution in complex networks to understand properties and structure of existing network for evaluating and providing better redesign, enhance the performance of networks. Using mathematical methods (Gaussian distribution of cycles) of network models computation, they propose an algorithm based on statistical mechanical concepts (Belief Propagation or BP) to count the cycle’s distribution without enumerating the cycles themselves in a variety of complex networks models. The algorithm has two problems: re-counting of cycles, and unpredictable length of cycles. In their proposed approach they solve these problems in three steps: random network generation (with randomly generated nodes by considering an input design probability of connection), generating free-scale data by formula in order to build a relation between the BP, and using mathematical model to collect enough data for counting. As a result they obtain a universal BP model, where in counting process will be taken only reasonable set of points. Also more enhancement of the algorithm can be achieved by segmenting distribution of cycles, evaluating each segment independently and combine them.

Counting periodic orbits in complex networks helps in understanding the properties and the structure of such networks and is used to identify their strengths and weaknesses. The results can be used to better redesign those networks toward enhancing their performance and minimizing the bottlenecks within the network.

Periodic orbits are a microscopic property that is not widely looked at due to the computational challenges encountered in counting the orbits existing in graph. However, it can be used in the future to provide a two-dimensional framework to evaluate complex networks as compared to the current one-dimensional degree-based characterization approaches. Without doubt, computing periodic orbits is more challenging. Hence, faster and efficient period orbits computation is desirable. An interesting algorithm that is based on statistical mechanical concepts is the BP algorithm. The interesting thing in this algorithm is its accuracy in counting the periodic orbits distribution without enumerating them. However, this algorithm faces many ambiguities and the unpredictability of its output due its high dependence over randomness and convergence criteria. Further work and research can be conducted to enhance the performance of this algorithm and to better structure the algorithm to enhance its convergence. Currently, the algorithm only computes the periodic orbits distribution for the whole network and there is a need to change the algorithm to be able to compute the periodic orbits distribution per node. New enhancements will also be necessary to improve the accuracy of the algorithm for networks that are partially disconnected. A parallel version of the algorithm can be developed to be executed on graphical processing unit (GPU) to gain in execution efficiency from those high performance-processing units. In addition, memory coalescing can be used to maximize global memory bandwidth using the technique of memory coalesce which minimizes the number of bus transactions for memory accesses. Further enhancement to the algorithm can be achieved by using distributed/grid systems. Orbits can be segmented and evaluated in separate independent segments.

Algorithms developed can be applied to social networks as an example of complex networks. Such analysis can be used to study different aspects of such networks such as: the nature of the relations and communities people create, detect spams on trend detectors and tagging patterns, prevent unwanted spams in a bookmarking system, the dynamic properties and the popularity of an online content in social media, etc.

4 Active XML

Recent developments in data-driven and service integrated web applications gave rise to diverse tools that can exploit computing powers of XML, Web Services and P2P architectures. Active XML (AXML) aims to accept this challenge and is considered a powerful extension of XML that deals with dynamic XML contents from various heterogeneous data sources on a very large scale via web services. It also manages the intentional XML data.

The next paper (Phan et al. 2013) investigates Active XML, which can be identified as XML document with embedded intentional data in the forms of service calls. As a new concept, the original AXML proposal is still at an immature stage and the authors propose improvement to the representation of AXML and its query evaluation. For evaluation, the authors qualitatively compare the existing AXML representation with the proposed representation, as well as using cost model to justify the improvement in terms of query evaluation. Beside the immaturity of AXML to be widely accepted, this work focuses the two current issues of AXML systems, i.e. its representation and query process. This work proposed a superior but formal representation to improve the query evaluation in AXML. The deficiencies in AXML representation and query on AXML data are also specified in the paper. Also various improvements are suggested to improve the evaluation of queries on AXML data. The proposed work can be applied in the management of intentional data, AXML query evaluation to improve the overall performance of AXML systems. The proposed algorithms are compared with the existing algorithms for the performance evaluation.

XML, Web services and Peer-to-Peer architectures have been more and more widely employed. They are also interesting research areas because of their applicable and powerful capability. AXML are one direction, which are able to combine computation powers of above technologies, to manage and process XML data as Distributed Database systems. However, AXML is still in beginning stages, current prototypes of AXML and algorithms seem to be not really efficient. Therefore, it is needed to study and improve issues in AXML including AXML data representation, algorithms to exchange AXML data and algorithms to process queries against AXML data.

Representations, data exchange and algorithms to query AXML data can be considered as foundations for AXML systems. Based on result from this research, there are some possible new areas, which should be studied to develop AXML systems such as securities, concurrency control, data caching as well as algorithms to optimize and manage peers in AXML systems.

AXML systems also provide possibilities for researches to study regarding data models such as E-Government, cloud computations. AXML also introduce a new direction to study about data integrations to combine static data and dynamic data provided by other anonymous sources over the Internet. Moreover, researches in AXML are also one of important reason to promote research in XML, Web services, Peer-to-Peer architectures, Distributed DBMSs, Service-Oriented architectures, and XML query optimizations.

5 Global warming

The Global warming and the greenhouse gas are among the most under discussion topics in the last two decades. USA, EU and major developing countries—Brazil, China, India and South Africa—will cut their carbon intensity (the amount of greenhouse gas emitted per widget produced) by 2020. The mitigation measures of greenhouse gas will significantly change industries, housing sector and infrastructure of energy sector. In this case one important question to emerge is how to change industries and infrastructures with minimal loss and which sector of economy should be improved in order to cut emissions and to meet engagement.

The paper (Nguyen et al. 2013) describes a tool called Mitigation Efforts Calculator (MEC) developed to compare greenhouse gas mitigation proposals by various countries by the year 2020. The conceptual model and architecture of the tool are described. At the end, the use of the tools is shown, including how it is used to evaluate cost curve associated with certain greenhouse gas mitigation policy. In this paper the authors tried to answer these questions and explained conceptual model, including methodology and system architecture of online mitigation efforts calculator, which has been developed by the authors. In this paper GAINS (Greenhouse gas—Air pollution Interactions and Synergies) methodology was introduced, using this methodology we can answer to questions like what the most effective way is in terms of use of technologies to save emissions within a given budget. Key features of GAINS approach is open for stakeholders to review data and develop their own data set, so it needs a realization involving large-scale database. The authors propose MEC system architecture, which includes data management layer, optimization layer, and a layer which represents four trading frameworks of MEC tool. This architecture enables user to compare results across countries, which are generated simultaneously from information of several cost curves.

For each of the trading frameworks they have developed algorithms to solve the system in real time on the basis of cost curve information. The MEC gives opportunity to users to select one of the four trading regimes, which is defined by its rules for trading, each of which represents a possible consequence of negotiation. In this paper the authors presented a solution for the optimization problem, which involves a linear programming problem with several tens of thousands of variables. The authors examined significant scenarios of trading of emission reduction units between countries and explained architecture and illustrated how the interactive MEC has been implemented.

This has implications for the comparability of the various proposals for GHG reductions currently under discussion in the international climate policy arena. In this context, the MEC framework has been introduced to aid negotiators as basic background of an assessment for the year 2020. In this paper the authors described the MEC system architecture and illustrated how the interactive MEC has been implemented. They have shown how the MEC translates complex information from multiple cost curves into coherent information about expected trading volumes and prices in alternate emission trading regimes. Thus the MECS proves to be a useful vehicle for discussion of future greenhouse gas mitigation efforts.

In the near future, data warehouse and semantic technologies—such as representation of data combination and constraints—should be used to enhance the efficiency and agility of the GAINS/MEC system, which is the only tool that is freely available on the Internet and covers the entire Annex I countries in sufficient depth. Moreover, the mathematical optimization and data mining algorithms need to be adapted for multidimensional analysis of integrated data from heterogeneous sources. It is hoped that the MEC will continue to improve the transparency of strategic decision-making in the international context on the basis of scientific analysis with multiple levels of information requirements.

Due to limited resources and time, this exercise could only address a limited set of issues. The MEC can be expanded in the following directions. (a) Geographical and sectoral coverage: For a global analysis the estimation of mitigation potentials need to be extended to (at least the major) non-Annex I countries, as well as to the Land Use, Land-Use Change and Forestry (LULUCF) sector. (b) Computational efficiency: data warehousing methodology, especially business intelligence is to be used to enhance the efficiency and agility of the GAINS/MEC system. To improve the transparency of strategic decision-making in the international context on the basis of scientific analysis with multiple levels of information requirements, mathematical optimization and data mining algorithms will be adapted for multidimensional analysis of integrated data from heterogeneous sources. (c) Plugin for alternative cost curves: To obtain robust information on mitigation potentials and associated costs in absolute terms, results from more than one model should be used. This will provide a range of estimates that reflects uncertainties due to different assumptions and assessment methods. Using the checklist of differences between models helps to identify reasons for differences. In this context, an international model comparison that extends the analysis to the global carbon market could provide a wealth of policy-relevant information. The MEC could easily be expanded to use and compare cost curve information from alternative models.

In addition, to fulfil global and regional specific requirements, it is important to design and build a Multidimensional Data Model based on Cloud Intelligent Services for calculating Emission Control Costs for Air Pollutants and Greenhouse Gases. In this approach, a multidimensional data model needs to be proposed to calculate emissions and costs for reducing emissions based on cloud intelligent services. For this, global scientific data from multiple international emission inventories have to be integrated into a global data warehouse (DWH). Then a class of on-cloud regional DWHs, each of which could be seen as a virtual subset extracted from the global DWH can be specified and built.

6 Indexing moving objects

The next paper (Alamri et al. 2013) proposes indexing technique to be applied in moving objects to enable efficient process of new set of queries namely directions and velocities (DV) queries. The reasoning of the work is the inability of the current indexing system for spatio-temporal queries to support DV queries. The proposed indexing technique is tested by using simulations and it shows superior results to other indexing technique. This paper is about devising a mechanism to support a large number of direction and velocity queries (DV Queries) about moving objects. This task becomes more challenging if the frequency of update queries is too high. The authors proposed an index structure to include the direction and velocity queries. The current data structure in the TPR-tree and its successors are based on space domain and do not consider any distribution of the moving object, e.g. direction and velocity. In this work the authors constructed the required data structure on the basis of spatial, temporal and velocity domains. Moreover, this data structure is based on TPR*-tree. Some new dimensions have been provided to store the related information, also a direction bucket structure is provided to hold those objects that share the same direction (as velocity is also a speed in some direction). Another dimension is the auxiliary table’s structures to include a lookup-table to be used for bottom-up update queries; it also supports velocity access-table which determines the tree-nodes which are congruent in the velocity. Further, the algorithms for insertion, deletion and update of the index objects are devised in this paper. In general, the extensive experiments were conducted to indicate that the DV-TPR*-tree is robust and efficient for DV queries more than the TPR-tree and its successors. In particular, the performance of DV queries is nearly four times better than the TPR-tree and its successors.

Several relevant research directions exist. One of them is to provide an analytical model for the Direction and Velocity data structure to estimate the number of disk accesses in answering different types of queries. It is also of interest to adapt the paper’s proposal to the indoor positioning technologies like Wi-Fi, and it is of interest to employ multiple positioning technologies in the same indoor space, such that queries can return more accurate answers. The aim of this new data structure is to achieve modelling moving objects in the indoor environment based on the actual distance. Moreover, this will help exploring how the data structure will support the different kinds of queries, special queries such as range queries and kNN queries, and temporal queries (historical queries). Furthermore, investigating and studying the different patterns of movement in moving objects in order to obtain knowledge of their queries, which facilitates how to build efficient data structure for them. Moving object based on Levy flight will be hot area for indexing this specific pattern. Indexing moving cluster also is an important common pattern, where most of the moving objects need to be clustered, especially objects that move in a similar path and stay close during their journey (e.g. Troops, cars and aircrafts). Indexing and querying this type of patterns can obtain an easy access to the data and resolve any own queries better and faster.

7 Semantic-based transaction model

The authors of the next paper (Li et al. 2013) proposed a new transaction model based on semantic, to monitor and evaluate the efficiency of heterogeneous Web services usage. The model is claimed to be useful in cloud environment, which is typically formed by heterogeneous software systems most of which are implemented using Web service technologies. In the proposed model, the components of the Web services are defined as corresponding state transitions and semantic transitions, which can help users and developers in composing and decomposing the services in the most efficient manner. The authors use a set of experiments to test the proposed model, which shows the throughput improvement and decreasing transactions delay time.

This research provides research direction both for academia and practitioners. For academia, it highlights the important differences between transaction management model in database environment and Web service environment. Academia can also use the semantic-based transaction model proposed in this work to various scheduling and composition problems in Web service environment. For practitioners, the semantic-based model is also presented with a case tool so that developers can remodel their processes and integrate with other Web services for further efficiency.