Generate pairwise constraints from unlabeled data for semi-supervised clustering
Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering ...
Online density estimation over high-dimensional stationary and non-stationary data streams
Efficient density estimation over an open-ended stream of high-dimensional data is of primary importance to machine learning. In general, parametric methods for density estimation are not suitable for high dimensions, and the widely ...
An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets
The abundant data across a variety of domains including bioinformatics has led to the formation of dataset with high dimensionality. The conventional algorithms expend most of their time in mining a large number of small and mid-sized ...
Highlights
- An efficient Rowset Cardinality Table based closeness checking method was proposed.
A time-indexed mereology for SUMO
While the period of time during which a subprocess occurs is precisely the time during which the part-whole relation with its main process takes place, part-whole relations between objects do not obey such a rule. The parts of an ...
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator can have positive, negative, or zero impact on the final result of ...
An efficient and scalable multi-dimensional indexing scheme for modular data centers
An efficient distributed indexing scheme plays an important role in improving the performance of cloud storage systems. To achieve concurrent query service and high manageability, the indexing scheme should meet the requirements of ...
Special Section: Third International Conference on Big Data and Smart Computing (BigComp2016)
Enhancing portfolio return based on sentiment-of-topic
While time-series analysis is commonly used in financial forecasting, a key source of market-sentiments is often omitted. Financial news is known to be making persuasive impact on the markets. Without considering this additional source ...
A secure kNN query processing algorithm using homomorphic encryption on outsourced database
With the adoption of cloud computing, database outsourcing has emerged as a new platform. Due to the serious privacy concerns associated with cloud computing, databases must be encrypted before being outsourced to the cloud. Therefore, ...
An effective High Recall Retrieval method
The High Recall Retrieval (HRR) problem is one of the fundamental tasks for many applications such as patent retrieval, legal search, medical search, marketing research, charging and collecting tax, and literature review, etc. Given ...
Constructing a paraphrase database for agglutinative languages
Paraphrase databases (PPDBs) are valuable resources for applications that use natural language processing (NLP) technology. In order to construct a high-quality PPDB for agglutinative languages, we propose a phrasal paraphrase ...
Social emotion classification based on noise-aware training
Social emotion classification draws many natural language processing researchers’ attention in recent years, since analyzing user-generated emotional documents on the Web is quite useful in recommending products, gathering public ...