ubi:analytics Leveraging Linked Data Mining and Analytics with Weka and R libraries
During the last years, there is evident an increasing trend in the produced volumes of business data, both structured and unstructured. However, these data cannot be automatically associated with new insights and advances in our understanding of business data and the production of knowledge. In order to gain insight, available data has to be appropriately represented and be processed both statistically and analytically towards the production of advanced analytics.
It could be argued that the following two challenges have to be faced towards the realization of advanced analysis and evidence-based decision making that optimize results for small scale businesses and organizations: (i) the design of advanced but user-friendly analytics tools that can be easily integrated within the daily business processes of organizations and enterprises and (ii) the adoption of techniques that permit the production and consumption of combined datasets that were previously closed in disparate sources and can now be appropriately interlinked.
Taking into account the aforementioned challenges, UBITECH has designed an approach for the exploitation of linked data principles towards the production of added-value business analytics. The production of linked data analytics has two notions in the proposed approach: (i) the interlinking of datasets prior to the realization of analysis, targeting at the preparation of datasets that can lead to unexpected and unexplored insights that were not possible previously and (ii) the interlinking of the analytic results output and the analysed input datasets for enriching the information at the input datasets in a clean and straightforward way.
Specifically, a library of basic and robust data analytic functionality is provided by UBITECH through the support of a set of algorithms, enabling enterprises to utilize and share analytic methods on linked data for the discovery and communication of meaningful new patterns that were unattainable or hidden in the previous isolated data structures. The business analytics and data mining component is based on an extensible and modular architecture that facilitates the integration of algorithms on a per request basis. The development of the component is based on open-source software while integration of algorithms is based on open-source analytics projects. The supported algorithms in the business analytics and data mining component aim to support a variety of cross-sectorial studies and business needs. In order to achieve it, a categorization of the supported algorithms is provided along with the integration of a limited set of algorithms per category, as shown in the below table.
In order to support linked data analytics, an ontology is being designed by UBITECH aiming at the representation of the interlinking among input and output datasets, as well as the conceptual representation of the overall analytic process. Main benefits from the usage of such an ontology include the consolidation of a unified schema of terms and content relationships that describe the business analytics realisation process, the mapping of the acquired information in this schema and the usage of queries to flexibly investigate the available content and the capacity to maintain quality and trace changes made over time. In addition to the design and specification of the ontology, an interlinking policy is implemented for the creation of linked data business analytics. Different types of interlinking are supported taking into account the peculiarities of each type of algorithm. These types can be classified in “one-to-one relationship” and in “many-to-one relationship” interlinking.
Based on the existing implementation, it could be argued that the proposed approach can help enterprises enhancing their experience of managing and processing of data, in ways not available before. It can provide them the potential to produce advanced knowledge, leveraging the power of linked data analytics, acquire a significant competitive advantage in the decision making process and increase their overall effectiveness.