Nowadays, high volumes of high-value data (e.g., semantic web data) can be generated and publishe... more Nowadays, high volumes of high-value data (e.g., semantic web data) can be generated and published at a high velocity. A collection of these data can be viewed as a big, interlinked, dynamic graph structure of linked resources. Embedded in them are implicit, previously unknown, and potentially useful knowledge. Hence, ecient knowledge discovery algorithms for mining frequent subgraphs from these dynamic, streaming graph structured data are in demand. Some existing algorithms require very large memory space to discover frequent subgraphs; some others discover collections of frequently co-occurring edges (which may be disjoint). In contrast, we propose|in this paper|algorithms that use limited memory space for discovering collections of frequently co-occurring connected edges. Evaluation results show the eectiveness of our algorithms in frequent subgraph mining
We present the PowerSetViewer visualization system for lattice-based mining of powersets. Searchi... more We present the PowerSetViewer visualization system for lattice-based mining of powersets. Searching for items within the powerset of a universe occurs in many large dataset knowledge discovery contexts. Using a spatial layout based on a powerset provides a unified visual framework at three different levels: data mining on the filtered dataset, browsing the entire dataset, and comparing multiple datasets sharing the same alphabet. The features of our system allow users to find appropriate parameter settings for data mining algorithms through lightweight visual experimentation showing partial results. We use dynamic constrained frequent-set mining as a concrete case study to showcase the utility of the system. The key challenge for spatial layouts based on powerset structure is in handling large alphabets, since the size of the powerset grows exponentially with the size of the alphabet. We present scalable algorithms for enumerating and displaying datasets containing between 1.5 and 7...
2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2019
High volumes of valuable data and information can be easily collected in the current era of big d... more High volumes of valuable data and information can be easily collected in the current era of big data. As rich and constant sources of big data, an incredible amount of people from different social stratum take part in social networks. Hence, social networks are desired for many research topics. In social networks, users (or social entities) are often linked by some ‘following’ relationships. As the social networks growing, some famous users account (or social entities) might be followed by a large number of same other users. In this situation, we call those famous users as frequently followed groups, which some researchers (or businesses) may be interested in them for investigating. However, the discovery of those frequently followed groups might be difficult and challenging because the following data in social networks are usually very big but sparse (huge number of users lead to big ‘following’ data, but each user is likely only following a small number of other users). As a resul...
Proceedings of the 2015 International Conference on Big Data Applications and Services, 2015
Data mining and analytics aims to analyze valuable data and extract implicit, previously unknown,... more Data mining and analytics aims to analyze valuable data and extract implicit, previously unknown, and potentially useful information from the data. Due to advances in technology, high volumes of valuable data are generated at a high velocity in high varieties of data sources in various real-life business, scientific and engineering applications. Due to their high volumes, the quality and accuracy of these data depend on their veracity (uncertainty of data). This leads us into the new era of Big Data. This paper presents some works on big data mining and computing, especially on an important task of frequent pattern mining, which computes and mines from big data for interesting knowledge in the forms of frequently occurring sets of merchandise items in shopping markets, interesting co-located events, and/or popular individuals in social networks. The paper also shows how big data mining contributes to real-life applications and services.
2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015
Business analytics techniques help mine and analyze business/financial data. For instance, a stru... more Business analytics techniques help mine and analyze business/financial data. For instance, a structural support vector machine (SSVM) can be used to perform classification on complex inputs such as the nodes of a graph structure. We connect collaborating companies in the information technology sector in an undirected graph and use an SSVM to predict positive or negative movement in their stock prices. By using a minimum graph-cutting algorithm to drive the cutting plane optimization problem of the SSVM, an exact solution is achieved in polynomial time. The learned model exploits the associative relationship between the prices of the collaborating companies to outperform the accuracy of a regular SVM. Experiments were conducted using the companies in the Standard and Poor's 500-45 Information Technology Sector index. Trades based on the learned model achieved superior returns in the range of 10% to 17% while tracking the index alone over the same time periods yielded returns in the range of -17% to 9%.
Proceedings of the Eighth International C* Conference on Computer Science & Software Engineering - C3S2E '15, 2008
Many social networking sites such as Facebook and Twitter have been used for sharing knowledge an... more Many social networking sites such as Facebook and Twitter have been used for sharing knowledge and information among social entities. Social entities in these social networks are often linked by some interdependency such as friendship or "following" relationships. Amounts of high volumes of high-value data can be easily collected and generated in these social networking sites. As the size of the social network keeps increasing in the current era of big data, there are many real-life situations in which a social entity wants to find those frequently followed groups of social entities from these big data so that he can follow the same groups. In this paper, we present a big data mining algorithm to discover "following" patterns from these big social network data. Evaluation results show the efficiency and practicality of our algorithm in big social network mining for the "following" patterns.
2015 IEEE International Conference on Data Science and Data Intensive Systems, 2015
As we are living in a "smart world" (which comprises cyber, physical and social worlds)... more As we are living in a "smart world" (which comprises cyber, physical and social worlds), big data are everywhere. High volumes of high-veracious, high-valuable data can be easily generated and collected at a high velocity from a high variety of data sources in various real-life applications in the fields of sciences and engineering, finance, social media, as well as online information resources. These big data have become an increasingly decisive resource in the modern society. Embedded in these big data are rich sets of useful information and knowledge. Hence, data intensive systems that provide data science solutions are in demand. In this paper, we propose a system that applies the MapReduce programming model to improve communication in social networks. Experimental results show the efficiency and effectiveness of the two improvement methods used in our proposed social system in reducing the number of communication hubs. These efficiency improvements not only lead to practical social network communications but also lead to the emergence of the cyber-physical-social interaction and computing.
2014 IEEE Fourth International Conference on Big Data and Cloud Computing, 2014
Nowadays, high volumes of valuable uncertain data can be easily collected or generated at high ve... more Nowadays, high volumes of valuable uncertain data can be easily collected or generated at high velocity in many real-life applications. Mining these uncertain Big data is computationally intensive due to the presence of existential probability values associated with items in every transaction in the uncertain data. Each existential probability value expresses the likelihood of that item to be present in a particular transaction in the Big data. In some situations, users may be interested in mining all frequent patterns from these uncertain Big data, in other situations, users may be interested in only a tiny portion of these mined patterns. To reduce the computation and to focus the mining for the latter situations, we propose a tree-based algorithm that (i) allows users to express the patterns to be mined according to their intention via the use of constraints and (ii) uses MapReduce to mine uncertain Big data for only those frequent patterns that satisfy user-specified constraints. Experimental results show the effectiveness of our algorithm in mining interesting patterns from uncertain Big data.
In the current era of big data, high volumes of valuable data can be easily collected and generat... more In the current era of big data, high volumes of valuable data can be easily collected and generated. Social networks are examples of generating sources of these big data. Users (or social entities) in these social networks are often linked by some interdependency such as friendship or “following” relationships. As these big social networks keep growing, there are situations in which individual users or businesses want to find those frequently followed groups of social entities so that they can follow the same groups. In this paper, we present a big data analytics solution that uses the MapReduce model to mine social networks for discovering groups of frequently followed social entities. Evaluation results show the efficiency and practicality of our big data analytics solution in discovering “following” patterns from social networks.
2014 IEEE International Conference on Data Mining Workshop, 2014
Frequent pattern mining is an important data mining task. Since its introduction, it has drawn at... more Frequent pattern mining is an important data mining task. Since its introduction, it has drawn attention from many researchers. Consequently, many frequent pattern mining algorithms have been proposed, which include level-wise Apriori-based algorithms, tree-based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and benefit from a few advantages, they also suffer from some disadvantages. In this paper, we propose and evaluate an alternative frequent pattern mining algorithm called B-mine. Evaluation results show that our proposed algorithm is both space- and time-efficient. Furthermore, to show the practicality of B-mine in real-life applications, we apply B-mine to discover frequent following patterns in social networks.
2015 IEEE 39th Annual Computer Software and Applications Conference, 2015
In the current era of big data, high volumes of valuable information are available in collections... more In the current era of big data, high volumes of valuable information are available in collections of documents, the web, social networks, and high varieties of linked data. To search and retrieve useful information from these linked data, users often enter queries into information retrieval (IR) systems. Among the information retrieved by these systems, some information is relevant to the user queries (i.e., Interested to the users), but some is not. Moreover, some relevant information may not be retrieved by the systems. The effectiveness of these IR systems is often measured by metrics such as precision and recall. Most of the conventional IR systems (e.g., For web searches) aim to achieve high precision (i.e., High percentage of the retrieved information is relevant) at the price of low recall (i.e., Low percentage of the relevant information is retrieved). However, there are real-life situations (e.g., Patent searches) in which having high recall is desirable. In this paper, we present two high-recall IR systems. Results of our evaluation show the effectiveness of our systems in providing high-recall IR from linked big data.
Nowadays, high volumes of high-value data (e.g., semantic web data) can be generated and publishe... more Nowadays, high volumes of high-value data (e.g., semantic web data) can be generated and published at a high velocity. A collection of these data can be viewed as a big, interlinked, dynamic graph structure of linked resources. Embedded in them are implicit, previously unknown, and potentially useful knowledge. Hence, ecient knowledge discovery algorithms for mining frequent subgraphs from these dynamic, streaming graph structured data are in demand. Some existing algorithms require very large memory space to discover frequent subgraphs; some others discover collections of frequently co-occurring edges (which may be disjoint). In contrast, we propose|in this paper|algorithms that use limited memory space for discovering collections of frequently co-occurring connected edges. Evaluation results show the eectiveness of our algorithms in frequent subgraph mining
We present the PowerSetViewer visualization system for lattice-based mining of powersets. Searchi... more We present the PowerSetViewer visualization system for lattice-based mining of powersets. Searching for items within the powerset of a universe occurs in many large dataset knowledge discovery contexts. Using a spatial layout based on a powerset provides a unified visual framework at three different levels: data mining on the filtered dataset, browsing the entire dataset, and comparing multiple datasets sharing the same alphabet. The features of our system allow users to find appropriate parameter settings for data mining algorithms through lightweight visual experimentation showing partial results. We use dynamic constrained frequent-set mining as a concrete case study to showcase the utility of the system. The key challenge for spatial layouts based on powerset structure is in handling large alphabets, since the size of the powerset grows exponentially with the size of the alphabet. We present scalable algorithms for enumerating and displaying datasets containing between 1.5 and 7...
2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2019
High volumes of valuable data and information can be easily collected in the current era of big d... more High volumes of valuable data and information can be easily collected in the current era of big data. As rich and constant sources of big data, an incredible amount of people from different social stratum take part in social networks. Hence, social networks are desired for many research topics. In social networks, users (or social entities) are often linked by some ‘following’ relationships. As the social networks growing, some famous users account (or social entities) might be followed by a large number of same other users. In this situation, we call those famous users as frequently followed groups, which some researchers (or businesses) may be interested in them for investigating. However, the discovery of those frequently followed groups might be difficult and challenging because the following data in social networks are usually very big but sparse (huge number of users lead to big ‘following’ data, but each user is likely only following a small number of other users). As a resul...
Proceedings of the 2015 International Conference on Big Data Applications and Services, 2015
Data mining and analytics aims to analyze valuable data and extract implicit, previously unknown,... more Data mining and analytics aims to analyze valuable data and extract implicit, previously unknown, and potentially useful information from the data. Due to advances in technology, high volumes of valuable data are generated at a high velocity in high varieties of data sources in various real-life business, scientific and engineering applications. Due to their high volumes, the quality and accuracy of these data depend on their veracity (uncertainty of data). This leads us into the new era of Big Data. This paper presents some works on big data mining and computing, especially on an important task of frequent pattern mining, which computes and mines from big data for interesting knowledge in the forms of frequently occurring sets of merchandise items in shopping markets, interesting co-located events, and/or popular individuals in social networks. The paper also shows how big data mining contributes to real-life applications and services.
2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015
Business analytics techniques help mine and analyze business/financial data. For instance, a stru... more Business analytics techniques help mine and analyze business/financial data. For instance, a structural support vector machine (SSVM) can be used to perform classification on complex inputs such as the nodes of a graph structure. We connect collaborating companies in the information technology sector in an undirected graph and use an SSVM to predict positive or negative movement in their stock prices. By using a minimum graph-cutting algorithm to drive the cutting plane optimization problem of the SSVM, an exact solution is achieved in polynomial time. The learned model exploits the associative relationship between the prices of the collaborating companies to outperform the accuracy of a regular SVM. Experiments were conducted using the companies in the Standard and Poor's 500-45 Information Technology Sector index. Trades based on the learned model achieved superior returns in the range of 10% to 17% while tracking the index alone over the same time periods yielded returns in the range of -17% to 9%.
Proceedings of the Eighth International C* Conference on Computer Science & Software Engineering - C3S2E '15, 2008
Many social networking sites such as Facebook and Twitter have been used for sharing knowledge an... more Many social networking sites such as Facebook and Twitter have been used for sharing knowledge and information among social entities. Social entities in these social networks are often linked by some interdependency such as friendship or "following" relationships. Amounts of high volumes of high-value data can be easily collected and generated in these social networking sites. As the size of the social network keeps increasing in the current era of big data, there are many real-life situations in which a social entity wants to find those frequently followed groups of social entities from these big data so that he can follow the same groups. In this paper, we present a big data mining algorithm to discover "following" patterns from these big social network data. Evaluation results show the efficiency and practicality of our algorithm in big social network mining for the "following" patterns.
2015 IEEE International Conference on Data Science and Data Intensive Systems, 2015
As we are living in a "smart world" (which comprises cyber, physical and social worlds)... more As we are living in a "smart world" (which comprises cyber, physical and social worlds), big data are everywhere. High volumes of high-veracious, high-valuable data can be easily generated and collected at a high velocity from a high variety of data sources in various real-life applications in the fields of sciences and engineering, finance, social media, as well as online information resources. These big data have become an increasingly decisive resource in the modern society. Embedded in these big data are rich sets of useful information and knowledge. Hence, data intensive systems that provide data science solutions are in demand. In this paper, we propose a system that applies the MapReduce programming model to improve communication in social networks. Experimental results show the efficiency and effectiveness of the two improvement methods used in our proposed social system in reducing the number of communication hubs. These efficiency improvements not only lead to practical social network communications but also lead to the emergence of the cyber-physical-social interaction and computing.
2014 IEEE Fourth International Conference on Big Data and Cloud Computing, 2014
Nowadays, high volumes of valuable uncertain data can be easily collected or generated at high ve... more Nowadays, high volumes of valuable uncertain data can be easily collected or generated at high velocity in many real-life applications. Mining these uncertain Big data is computationally intensive due to the presence of existential probability values associated with items in every transaction in the uncertain data. Each existential probability value expresses the likelihood of that item to be present in a particular transaction in the Big data. In some situations, users may be interested in mining all frequent patterns from these uncertain Big data, in other situations, users may be interested in only a tiny portion of these mined patterns. To reduce the computation and to focus the mining for the latter situations, we propose a tree-based algorithm that (i) allows users to express the patterns to be mined according to their intention via the use of constraints and (ii) uses MapReduce to mine uncertain Big data for only those frequent patterns that satisfy user-specified constraints. Experimental results show the effectiveness of our algorithm in mining interesting patterns from uncertain Big data.
In the current era of big data, high volumes of valuable data can be easily collected and generat... more In the current era of big data, high volumes of valuable data can be easily collected and generated. Social networks are examples of generating sources of these big data. Users (or social entities) in these social networks are often linked by some interdependency such as friendship or “following” relationships. As these big social networks keep growing, there are situations in which individual users or businesses want to find those frequently followed groups of social entities so that they can follow the same groups. In this paper, we present a big data analytics solution that uses the MapReduce model to mine social networks for discovering groups of frequently followed social entities. Evaluation results show the efficiency and practicality of our big data analytics solution in discovering “following” patterns from social networks.
2014 IEEE International Conference on Data Mining Workshop, 2014
Frequent pattern mining is an important data mining task. Since its introduction, it has drawn at... more Frequent pattern mining is an important data mining task. Since its introduction, it has drawn attention from many researchers. Consequently, many frequent pattern mining algorithms have been proposed, which include level-wise Apriori-based algorithms, tree-based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and benefit from a few advantages, they also suffer from some disadvantages. In this paper, we propose and evaluate an alternative frequent pattern mining algorithm called B-mine. Evaluation results show that our proposed algorithm is both space- and time-efficient. Furthermore, to show the practicality of B-mine in real-life applications, we apply B-mine to discover frequent following patterns in social networks.
2015 IEEE 39th Annual Computer Software and Applications Conference, 2015
In the current era of big data, high volumes of valuable information are available in collections... more In the current era of big data, high volumes of valuable information are available in collections of documents, the web, social networks, and high varieties of linked data. To search and retrieve useful information from these linked data, users often enter queries into information retrieval (IR) systems. Among the information retrieved by these systems, some information is relevant to the user queries (i.e., Interested to the users), but some is not. Moreover, some relevant information may not be retrieved by the systems. The effectiveness of these IR systems is often measured by metrics such as precision and recall. Most of the conventional IR systems (e.g., For web searches) aim to achieve high precision (i.e., High percentage of the retrieved information is relevant) at the price of low recall (i.e., Low percentage of the relevant information is retrieved). However, there are real-life situations (e.g., Patent searches) in which having high recall is desirable. In this paper, we present two high-recall IR systems. Results of our evaluation show the effectiveness of our systems in providing high-recall IR from linked big data.
Over the past few years, social network sites (e.g., Facebook, Twitter, Weibo) have become very p... more Over the past few years, social network sites (e.g., Facebook, Twitter, Weibo) have become very popular. These sites have been used for sharing knowledge and information among users. Nowadays, it is not unusual for any user to have many friends (e.g., hundreds or even thousands friends) in these social networks. In general, social networks consist of social entities that are linked by some interdependency such as friendship. As social networks keep growing, it is not unusual for a user to find those frequently followed groups of social entities in the networks so that he can follow the same groups. In this paper, we propose (i) a space-efficient bitwise data structure to capture interdependency among social entities and (ii) a time-efficient data mining algorithm that makes the best use of our proposed data structure to discover groups of friends who are frequently followed by social entities in the social networks. Evaluation results show the efficiency of our data structure and mining algorithm.
To mine frequent itemsets from uncertain data, many existing algorithms rely on expected support ... more To mine frequent itemsets from uncertain data, many existing algorithms rely on expected support based mining. An alternative approach relies on probabilistic based mining, which captures the frequentness probability. While the possible world semantics are widely used, the exponential growth of possible worlds makes the probabilistic based mining computationally challenging when compared to the expected support based mining. In this paper, we propose two efficient approximate hyperlinked structure based algorithms, which generate a collection of all potentially probabilistic frequent itemsets with a novel upper bound and verify if they are truly probabilistic frequent. Experimental results show the efficiency of our algorithms in mining probabilistic frequent itemsets from uncertain data.
Uploads
Books by Carson Leung
Papers by Carson Leung