Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2024
DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient Near-Duplicate Crash Report Detection
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringArticle No.: 198, Pages 1–12https://doi.org/10.1145/3597503.3639146Automatic crash bucketing is a crucial phase in the software development process for efficiently triaging bug reports. It generally consists in grouping similar reports through clustering techniques. However, with real-time streaming bug collection, ...
- research-articleFebruary 2024
Lightweight and personalised e-commerce recommendation based on collaborative filtering and LSH
International Journal of Ad Hoc and Ubiquitous Computing (IJAHUC), Volume 45, Issue 2Pages 82–91https://doi.org/10.1504/ijahuc.2024.136826Nowadays, e-commerce has become one of the most popular shopping ways for worldwide customers especially after the outbreak of COVID-19 worldwide. To aid the scientific shopping decision-makings of customers, collaborative filtering is often used to ...
- research-articleOctober 2023
On the Maximal Independent Sets of k-mers with the Edit Distance
BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsArticle No.: 42, Pages 1–6https://doi.org/10.1145/3584371.3612982In computational biology, k-mers and edit distance are fundamental concepts. However, little is known about the metric space of all k-mers equipped with the edit distance. In this work, we explore the structure of the k-mer space by studying its ...
- research-articleAugust 2023
Privacy-Aware Traffic Flow Prediction Based on Multi-Party Sensor Data with Zero Trust in Smart City
- Fan Wang,
- Guangshun Li,
- Yilei Wang,
- Wajid Rafique,
- Mohammad R. Khosravi,
- Guanfeng Liu,
- Yuwen Liu,
- Lianyong Qi
ACM Transactions on Internet Technology (TOIT), Volume 23, Issue 3Article No.: 44, Pages 1–19https://doi.org/10.1145/3511904With the continuous increment of city volume and size, a number of traffic-related urban units (e.g., vehicles, roads, buildings, etc.) are emerging rapidly, which plays a heavy burden on the scientific traffic control of smart cities. In this situation, ...
- research-articleMay 2023
A New Sparse Data Clustering Method Based On Frequent Items
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 5, Pages 1–28https://doi.org/10.1145/3588685Large, sparse categorical data is a natural way to represent complex data like sequences, trees, and graphs. Such data is prevalent in many applications, e.g., Criteo released a terabyte size click log data of 4 billion records with millions of ...
-
- research-articleNovember 2022
Local Density Estimation in High Dimensions
Mathematics of Operations Research (MOOR), Volume 47, Issue 4Pages 2614–2640https://doi.org/10.1287/moor.2021.1221An important question that arises in the study of high-dimensional vector representations learned from data are, given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. We develop two estimators, ...
- short-paperOctober 2022
Scalable Graph Representation Learning via Locality-Sensitive Hashing
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementPages 3878–3882https://doi.org/10.1145/3511808.3557689A massive amount of research on graph representation learning has been carried out to learn dense features as graph embedding for information networks, thereby capturing the semantics in complex networks and benefiting a variety of downstream tasks. ...
- research-articleJanuary 2022
Optimal Las Vegas Approximate Near Neighbors in ℓp
ACM Transactions on Algorithms (TALG), Volume 18, Issue 1Article No.: 7, Pages 1–27https://doi.org/10.1145/3461777We show that approximate near neighbor search in high dimensions can be solved in a Las Vegas fashion (i.e., without false negatives) for ℓp (1≤ p≤ 2) while matching the performance of optimal locality-sensitive hashing. Specifically, we construct a data-...
- research-articleNovember 2021
A One-Pass Distributed and Private Sketch for Kernel Sums with Applications to Machine Learning at Scale
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications SecurityPages 3252–3265https://doi.org/10.1145/3460120.3485255Differential privacy is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. In machine learning, we often wish to release a function over a dataset while preserving differential privacy. Although ...
- research-articleSeptember 2021
Scalable feature selection using ReliefF aided by locality‐sensitive hashing
International Journal of Intelligent Systems (IJIS), Volume 36, Issue 11Pages 6161–6179https://doi.org/10.1002/int.22546AbstractFeature selection algorithms, such as ReliefF, are very important for processing high‐dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational cost. We describe an adaptation of ...
- research-articleJune 2021
Misactivation detection and user identification in smart home speakers using traffic flow features
WiSec '21: Proceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile NetworksPages 135–146https://doi.org/10.1145/3448300.3468289The advancement in Internet of Things (IoT) technology has transformed our daily lifestyle. Particularly, voice assistants such as Amazon's Alexa and Google Assistant are commonly deployed in households. These voice assistants enable users to interact ...
- research-articleJune 2021
Parallel Index-Based Structural Graph Clustering and Its Approximation
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1851–1864https://doi.org/10.1145/3448016.3457278SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share work among ...
- research-articleJune 2021
Point-to-Hyperplane Nearest Neighbor Search Beyond the Unit Hypersphere
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 777–789https://doi.org/10.1145/3448016.3457240Point-to-Hyperplane Nearest Neighbor Search (P2HNNS) is a fundamental yet challenging problem, and it has plenty of applications in various fields. Existing hyperplane hashing schemes enjoy sub-linear query time and achieve excellent performance on ...
- short-paperNovember 2020
Voice Command Fingerprinting with Locality Sensitive Hashes
CPSIOTSEC'20: Proceedings of the 2020 Joint Workshop on CPS&IoT Security and PrivacyPages 87–92https://doi.org/10.1145/3411498.3419963Smart home speakers are deployed in millions of homes around the world. These speakers enable users to interact with other IoT devices in the household and provide voice assistance such as telling the weather and reminding appointments. Although smart ...
- research-articleOctober 2020
Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing
- Carlos Eiras-Franco,
- David Martínez-Rego,
- Leslie Kanthan,
- César Piñeiro,
- Antonio Bahamonde,
- Bertha Guijarro-Berdiñas,
- Amparo Alonso-Betanzos
ACM Transactions on Intelligent Systems and Technology (TIST), Volume 11, Issue 6Article No.: 71, Pages 1–18https://doi.org/10.1145/3408889The k-nearest-neighbors (kNN) graph is a popular and powerful data structure that is used in various areas of Data Science, but the high computational cost of obtaining it hinders its use on large datasets. Approximate solutions have been described in ...
- short-paperNovember 2020
MinIsoClust: Isoform clustering using minhash and locality sensitive hashing
BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsArticle No.: 64, Pages 1–7https://doi.org/10.1145/3388440.3412424With the advent of next-generation sequencing technologies, computational transcriptome assembly of RNA-Seq data has become a critical step in many biological and biomedical studies. The accuracy of these transcriptome assembly methods is hindered by ...
Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataPages 2589–2599https://doi.org/10.1145/3318464.3389778Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search ...
- research-articleMarch 2020
Computational Geometry Column 70: Processing Persistence Diagrams as Purely Geometric Objects
In this column, we review the most recent results on processing persistence diagrams as purely geometric objects. (Yes, this is not a typo! While persistence diagrams originally come from computational topology, this column will mainly focus on the ...
- research-articleJune 2020
Embedding hierarchical signal to siamese network for fast name rectification
EDA tools are necessary to assist complicated flow of advanced IC design and verification in nowadays industry. After synthesis or simulation, the same signal could be viewed as different hierarchical names, especially for mixed-language designs. This ...
- research-articleJuly 2019
Discovering common bug‐fix patterns: A large‐scale observational study
Journal of Software: Evolution and Process (WSMR), Volume 31, Issue 7https://doi.org/10.1002/smr.2173AbstractBackground: Automatic program repair aims to reduce costs associated with defect repair. The detection and characterization of common bug‐fix patterns in software repositories play an important role in advancing this field. Aim: In this paper, we ...
- We propose a novel automatic technique for unveiling the most prevalent and pervasive repair actions in Java. Our approach includes a preprocessing step based on locality-sensitive hashing (LSH) to remove outliers from the search space before clustering ...