Papers by Komminist Weldemariam
arXiv (Cornell University), Jan 20, 2023
Bookmarks Related papers MentionsView impact
Software defect data provide an invaluable source of information for developers, testers and so f... more Software defect data provide an invaluable source of information for developers, testers and so forth. A concise view of a software profile, its development process, and their relationships can be systematically extracted and analyzed to deduce adequate corrective measures based on previously discovered weaknesses. This kind of approach is being widely used in various projects to improve the quality of a software system. This paper builds on top of the orthogonal defect classification (ODC) scheme to provide a structured security-specific defect classification. We perform a detailed analysis on the classified data and obtain in-process feedback so that the next version of the software can be more secure and reliable. We experimented our customized methodology on Firefox and Chrome defect repositories using six consecutive versions and milestones, respectively. We found that in-process feedback can help development team to take corrective actions as early as possible. We also studied...
Bookmarks Related papers MentionsView impact
This study aimed at identifying the factors associated with neonatal mortality. We analyzed the D... more This study aimed at identifying the factors associated with neonatal mortality. We analyzed the Demographic and Health Survey (DHS) datasets from 10 Sub-Saharan countries. For each survey, we trained machine learning models to identify women who had experienced a neonatal death within the 5 years prior to the survey being administered. We then inspected the models by visualizing the features that were important for each model, and how, on average, changing the values of the features affected the risk of neonatal mortality. We confirmed the known positive correlation between birth frequency and neonatal mortality and identified an unexpected negative correlation between household size and neonatal mortality. We further established that mothers living in smaller households have a higher risk of neonatal mortality compared to mothers living in larger households; and that factors such as the age and gender of the head of the household may influence the association between household size...
Bookmarks Related papers MentionsView impact
ArXiv, 2018
This work views neural networks as data generating systems and applies anomalous pattern detectio... more This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for multiple machine learning problems including detecting adversarial noise. More broadly, this work is a step towards giving neural networks the ability to recognize an out-of-distribution sample. This is the first work to introduce "Subset Scanning" methods from the anomalous pattern detection domain to the task of detecting anomalous input of neural networks. Subset scanning treats the detection problem as a search for the most anomalous subset of node activations (i.e., highest scoring subset according to non-parametric scan statistics). Mathematical properties of these scoring functions allow the search to be completed in log-linear rather than exponential time while still guaranteeing the most anomalous subset of nodes in the network i...
Bookmarks Related papers MentionsView impact
2019 IEEE International Conference on Healthcare Informatics (ICHI)
Clinical records capture the temporal, participatory, and interventional details of the care prov... more Clinical records capture the temporal, participatory, and interventional details of the care provision process. The exchange of these records plays a critical role in care continuity. Recently, there has been increasing attention on health data privacy and confidentiality which translates to questions on ownership and accessibility of clinical records. Traditional approaches to remedy this stand the risk of reducing the accessibility of these records, making care continuity across facilities more difficult. This poses a need for mechanisms that would enable the secure exchange of health data without adversely affecting the access to clinical records. This paper presents the Digital Health Wallet (DHW); a blockchain-enabled system that allows seamless clinical workflow orchestration and patient-mediated data exchange through consent management in a privacy-preserving manner. We conducted a preliminary test to benchmark the performance of DHW in resource-constrained healthcare facilities in developing countries.
Bookmarks Related papers MentionsView impact
This document describes the details of the BON Egocentric vision dataset. BON denotes the initial... more This document describes the details of the BON Egocentric vision dataset. BON denotes the initials of the locations where the dataset was collected; Barcelona (Spain); Oxford (UK); and Nairobi (Kenya). BON comprises first-person video, recorded when subjects were conducting common office activities. The preceding version of this dataset, FPV-O dataset has fewersubjects for only a single location (Barcelona). To develop a location agnostic framework, data from multiple locations and/or office settings is essential. Thus, BON comprises videos from an increased number of participants and office settings, resulting in a six-fold increase in the number of video segments, i.e., 2639 (BON) vs. 464 (FPV-O). In the follow up sections, we describe the details of the dataset, data collection, stratification across activities, duration, locations, and participants (genders)
Bookmarks Related papers MentionsView impact
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2021
Data-driven approaches can provide more enhanced insights for domain experts in addressing critic... more Data-driven approaches can provide more enhanced insights for domain experts in addressing critical global health challenges, such as newborn and child health, using surveys (e.g., Demographic Health Survey). Though there are multiple surveys on the topic, data-driven insight extraction and analysis are often applied on these surveys separately, with limited efforts to exploit them jointly, and hence results in poor prediction performance of critical events, such as neonatal death. Existing machine learning approaches to utilise multiple data sources are not directly applicable to surveys that are disjoint on collection time and locations. In this paper, we propose, to the best of our knowledge, the first detailed work that automatically links multiple surveys for the improved predictive performance of newborn and child mortality and achieves cross-study impact analysis of covariates.
Bookmarks Related papers MentionsView impact
Reliably detecting attacks in a given set of inputs is of high practical relevance because of the... more Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anom...
Bookmarks Related papers MentionsView impact
Existing datasets available to address crucial problems, such as child mortality and family plann... more Existing datasets available to address crucial problems, such as child mortality and family planning discontinuation in developing countries, are not ample for data-driven approaches. This is partly due to disjoint data collection efforts employed across locations, times, and variations of modalities. On the other hand, state-of-the-art methods for small data problem are confined to image modalities. In this work, we proposed a data-level linkage of disjoint surveys across Sub-Saharan African countries to improve prediction performance of neonatal death and provide cross-domain explainability.
Bookmarks Related papers MentionsView impact
Knowledge Management and Acquisition for Intelligent Systems, 2021
Bookmarks Related papers MentionsView impact
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
We investigate the effect of variational autoencoder (VAE) based data anonymization and its abili... more We investigate the effect of variational autoencoder (VAE) based data anonymization and its ability to preserve anomalous subgroup properties. We present a Utility Guaranteed Deep Privacy (UGDP) system which casts existing anomalous pattern detection methods as a new utility measure for data synthesis. UGDP’s approach shows that properties of an anomalous subset of records, identified in the original data set, are preserved through the anonymization of a VAE. This is despite the newly generated records being completely synthetic. More specifically, the Bias-Scan algorithm identifies a subgroup of records that are consistently over- (or under-) risked by a black-box classifier as an area of ’poor fit’. This scanning process is applied on both pre- and post- VAE synthesized data. The areas of poor fit (i.e. anomalous records) persist in both settings. We evaluate our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach is able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between the synthetic and original dataset.
Bookmarks Related papers MentionsView impact
2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Society Track (ICSE-SEIS), 2017
In this paper, we address the problem of improving data collection of the education system by pre... more In this paper, we address the problem of improving data collection of the education system by presenting School Census Hub (SCH). The SCH concept emerged from field studies with stakeholders in Kenya. The goal of these studies were to help unlocking three key high-level requirements for the design of SCH. i) Budget allocation, allocating budget should be based on a verifiable number of active students and teachers, ii) Spending, spending on assets should be transparent and verifiable, iii) and, Improving learning environment, unlocking the limited insight into statistical relationship between school effectiveness and demographic variables. We present the overall architecture and design of SCH based on the findings from the field studies. The first version supporting a core set of capabilities for school data collection has been implemented. To evaluate the system, we conducted a large scale pilot in 97 schools. We report on a usability study of SCH that demonstrates user awareness and support for data acquisition and reporting in education management information system in Sub-Sharan Africa.
Bookmarks Related papers MentionsView impact
2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2020
Bookmarks Related papers MentionsView impact
2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2019
Bias in data can have unintended consequences which propagate to the design, development, and dep... more Bias in data can have unintended consequences which propagate to the design, development, and deployment of machine learning models. In the financial services sector, this can result in discrimination from certain financial instruments and services. At the same time, data privacy is of paramount importance, and recent data breaches have seen reputational damage for large institutions. Presented in this paper is a trusted model-lifecycle management platform that attempts to ensure consumer data protection, anonymization, and fairness. Specifically, we examine how datasets can be reproduced using deep learning techniques to effectively retain important statistical features in datasets whilst simultaneously protecting data privacy and enabling safe and secure sharing of sensitive personal information beyond the current state-of-practice.
Bookmarks Related papers MentionsView impact
Pattern Recognition Letters, 2021
Bookmarks Related papers MentionsView impact
arXiv: Atmospheric and Oceanic Physics, 2020
In an effort to provide optimal inputs to downstream modeling systems (e.g., a hydrodynamics mode... more In an effort to provide optimal inputs to downstream modeling systems (e.g., a hydrodynamics model that simulates the water circulation of a lake), we hereby strive to enhance resolution of precipitation fields from a weather model by up to 9x. We test two super-resolution models: the enhanced super-resolution generative adversarial networks (ESRGAN) proposed in 2017, and the content adaptive resampler (CAR) proposed in 2020. Both models outperform simple bicubic interpolation, with the ESRGAN exceeding expectations for accuracy. We make several proposals for extending the work to ensure it can be a useful tool for quantifying the impact of climate change on local ecosystems while removing reliance on energy-intensive, high-resolution weather model simulations.
Bookmarks Related papers MentionsView impact
ArXiv, 2019
In this paper, we investigate the effect of machine learning based anonymization on anomalous sub... more In this paper, we investigate the effect of machine learning based anonymization on anomalous subgroup preservation. In particular, we train a binary classifier to discover the most anomalous subgroup in a dataset by maximizing the bias between the group's predicted odds ratio from the model and observed odds ratio from the data. We then perform anonymization using a variational autoencoder (VAE) to synthesize an entirely new dataset that would ideally be drawn from the distribution of the original data. We repeat the anomalous subgroup discovery task on the new data and compare it to what was identified pre-anonymization. We evaluated our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach was able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between th...
Bookmarks Related papers MentionsView impact
2020 IEEE International Conference on Blockchain (Blockchain), 2020
Bookmarks Related papers MentionsView impact
Gaining insight into how deep convolutional neural network models perform image classification an... more Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been proposed such as finding gradients of class output with respect to input image (sensitivity maps), class activation map (CAM), and Gradient based Class Activation Maps (Grad-CAM). These methods under perform when localizing multiple occurrences of the same class and do not work for all CNNs. In addition, Grad-CAM does not capture the entire object in completeness when used on single object images, this affect performance on recognition tasks. With the intention to create an enhanced visual explanation in terms of visual sharpness, object localization and explaining multiple occurrences of objects ...
Bookmarks Related papers MentionsView impact
Proceedings of the Ninth International Conference on Information and Communication Technologies and Development, 2017
Several initiatives have been proposed to collect, report, and analyze data about school systems ... more Several initiatives have been proposed to collect, report, and analyze data about school systems for supporting decision-making. These initiatives rely mostly on self-reported and summarized data collected irregularly and rarely. They also lack a single independent and systematic process to validate the collected data during its entire lifecycle. Furthermore, schools in developing countries still do not maintain complete and up-to-date school records. Due to these and other factors addressing the education challenges in those countries remains a high priority for local and international governments, donor and non-governmental agencies across the world. In this paper, we discuss our initial design, implementation, and evaluation of a blockchain-enabled School Information Hub (SIH) using Kenya's school system as a case study.
Bookmarks Related papers MentionsView impact
Uploads
Papers by Komminist Weldemariam