Abstract: Outlier detection is an interesting issue in data mining and machine learning. In this paper, to detect outliers, an information-entropy-based k-nearest neighborhood relevant outlier factor algorithm is proposed that is combined with Shannon information theory and the triangle pruning strategy. The algorithm accounts for the data points whose k-nearest neighbors are distributed on the edge of the range within the designated radius. In particular, the neighborhood influence on each point is considered to address the problem of information concealment and submergence. Information entropy is used to calculate the weights to distinguish the importance of each attribute. Then, based on the…attribute weights, the improved pruning strategy reduces the computational complexity of the subsequent procedures by removing some inliers and obtaining the outlier candidate dataset. Finally, according to the weighted distance between the objects in the candidate dataset and those in the original dataset, the algorithm calculates the dissimilarity between each object and its k-nearest neighbors. The data points with the top $r$ dissimilarity are regarded as the outliers. Experimental results show that, compared to existing methods, the proposed approach improves pruning and detection rates while maintaining the coverage rate.
Show more
Abstract: Data needs to be released to the relevant decision makers and researchers. Privacy protection should be carried out first because it contains personal sensitive information. The k -anonymity algorithm is an important privacy protection algorithm, and partitioning is one of its key methods. To reduce the computational complexity and low speed of existing privacy-preserving algorithms for high-dimensional data publishing, a probabilistic optimal projection partition k -dimensional (KD)-tree k -anonymity algorithm is proposed. First, some attribute dimensions are probabilistically selected from the global domain. Then, for these dimensions, the partition coefficient is calculated and the optimal partition point is determined. Furthermore,…an improved KD-tree structure is introduced in which a node is a collection rather than a data point. The proposed KD-tree node is divided into left and right child nodes by the hyper-plane passing through the dividing point and perpendicular to the optimal dimension. The proposed algorithm is validated by a theoretical analysis and comparison experiments. The results show that the proposed algorithm can reduce the average generalization range by 11% to 22% compared to traditional k -anonymity. This enables better division and better dataset availability. Moreover, the runtime is reduced by 8% to 32% compared to globally optimal projection partitioning k -anonymity.
Show more
Keywords: Data publishing, privacy protection, k-anonymity, KD-tree, probabilistic partitioning
Abstract: An accurate map matching is an essential but difficult step in mapping raw float car trajectories onto a digital road network. This task is challenging because of the unavoidable positioning errors of GPS devices and the complexity of the road network structure. Aiming to address these problems, in this study, we focus on three improvements over the existing hidden Markov model: (i) The direction feature between the current and historical points is used for calculating the observation probability; (ii) With regard to the reachable cost between the current road section and the destination, we overcome the shortcoming of feature rarefaction…when calculating the transition probability with low sampling rates; (iii) The directional similarity shows a good performance in complex intersection environments. The experimental results verify that the proposed algorithm can reduce the error rate in intersection matching and is suitable for GPS devices with low sampling rates.
Show more
Abstract: Existing location-privacy-preserving methods primarily focus on solving the problem of location-privacy preservation in the global space. This not only increases the response time of the location service, it also degrades the data quality. In this paper, a k-anonymity algorithm based on locality-sensitive hashing is proposed to solve the problem of location-privacy preservation in the subspace. In the proposed algorithm, higher efficiency and higher quality of service are achieved by applying a bottom-up grid-search method. Further, reasonable division is obtained based on locality-sensitive hashing by retaining position characteristics. The results of experiments conducted to evaluate the proposed algorithm indicate that the…proposed algorithm provides a smaller anonymous spatial region, higher data quality, and lower time cost than methods with no subspace.
Show more
Abstract: The traditional trajectory privacy protection algorithm approaches the task as a single-layer problem. Taking a perspective in harmony with an approach more characteristic of human thinking, in which complex problems are solved hierarchically, we propose a two-level hierarchical granularity model for this problem. The first level of the proposed model is a coarse-grained layer, in which the original dataset is divided into groups. The second level is a fine-grained layer, where problems are solved in each group instead of on the original dataset, which reduces complexity and computation while improving efficiency. On the basis of this hierarchical model, we propose…the interpolation trajectory-anonymous privacy protection algorithm with temporal and spatial granularity constraints. In addition, we propose interpolation-based modified Hausdorff distance on adjacent segment (IMHD_AS), which provides a smaller clustering area and better data utility than the traditional Euclidean distance, as the trajectory similarity criterion for clustering within each group. Further, we theoretically prove that the proposed algorithm outperforms the traditional algorithm in terms of data distortion and anonymity cost and verify its efficacy experimentally. Compared with the classic anonymity algorithm, the maximum information loss and the anonymity cost are reduced by up to 21.04% and 28.32%, respectively.
Show more
Abstract: The issue of privacy preservation is receiving more and more attention when publishing trajectory data. In this paper, we study the challenges of published trajectory data anonymization. Most existing anonymization methods directly delete the trajectories or locations violating specific constraints, it is likely to cause a large loss of information. To address the problem, this paper proposes a trajectory privacy preservation method based on 3D-Grid partition in order to reduce information loss in the process of trajectory anonymization. This method first divides the trajectory region into several spatio-temporal units (denoted as 3D-cells), and then conducts location exchange or suppression in…each spatio-temporal unit. Based on the trajectory data partition, within each 3D-cell, the proposed method exchanges locations among trajectories or removes very few locations of some sub-trajectories which do not meet the conditions rather than the whole trajectory. Our method considers three scenarios of trajectory distribution and measures trajectory similarity based on time, orientation, spatial locations and other features of trajectory. After the reconstruction of the related anonymous sub-trajectories, an anonymized trajectory dataset is obtained. Theoretical analysis and experimental results show that, compared to other methods, the proposed algorithm effectively preserves trajectory data privacy and improves the anonymous results of trajectory data in terms of accuracy and availability.
Show more
Abstract: Adding noise to user history data helps to protect user privacy in recommendation systems but affects the recommendation performance. To solve this problem, a matrix factorization tourism point of interest recommendation model based on interest offset and differential privacy is proposed in this paper. The recommendation performance of the model is improved by analyzing user interest preferences. Specifically, user interest offsets are extracted from user tags and user ratings under time-series factors to calculate user interest drift. Then, similar neighbors are found to train user feature preferences which are integrated into the matrix model in the form of regular terms.…Meanwhile, based on the differential privacy theory, a privacy neighbor selection algorithm combining the K-Medoides clustering algorithm and index mechanism is designed to effectively protect the identity of neighbors and prevent KNN attacks. Besides, the Laplace mechanism is used to implement differential privacy protection for the model’s gradient descent process. Finally, the feasibility of the proposed recommendation model is verified through experiments, and the experimental results indicate that this model has advantages in recommendation accuracy and privacy protection.
Show more
Abstract: The use of intelligent technologies for providing useful recommendations to patients suffering chronic diseases may play a positive role in improving the general life quality of patients and help reduce the workload and cost involved in their daily healthcare. The objective of this study is to develop an intelligent recommender system based on predictive analysis for advising patients in the telehealth environment concerning whether they need to take the body test one day in advance by analyzing medical measurements of a patient for the past k days. The proposed algorithms supporting the recommender system have been validated using a time…series telehealth data recorded from heart disease patients which were collected from May to January 2012, from our industry collaborator Tunstall. The experimental results show that the proposed system yields satisfactory recommendation accuracy and offer a promising way for saving the workload for patients to conduct body tests every day. This study highlights the possible usefulness of the computerized analysis of time series telehealth data in providing appropriate recommendations to patients suffering chronic diseases such as heart diseases patients.
Show more
Keywords: Intelligent system, recommender system, heart failure, time series prediction, telehealth
Abstract: Trajectory data may include the user’s occupation, medical records, and other similar information. However, attackers can use specific background knowledge to analyze published trajectory data and access a user’s private information. Different users have different requirements regarding the anonymity of sensitive information. To satisfy personalized privacy protection requirements and minimize data loss, we propose a novel trajectory privacy preservation method based on sensitive attribute generalization and trajectory perturbation. The proposed method can prevent an attacker who has a large amount of background knowledge and has exchanged information with other attackers from stealing private user information. First, a trajectory dataset is…clustered and frequent patterns are mined according to the clustering results. Thereafter, the sensitive attributes found within the frequent patterns are generalized according to the user requirements. Finally, the trajectory locations are perturbed to achieve trajectory privacy protection. The results of theoretical analyses and experimental evaluations demonstrate the effectiveness of the proposed method in preserving personalized privacy in published trajectory data.
Show more