Abstract
Explainable AIs (XAIs) often do not provide relevant or understandable explanations for a domain-specific human-in-the-loop (HIL). In addition, internally used metrics have biases that might not match existing structures in the data. The habilitation thesis presents an alternative solution approach by deriving explanations from high dimensional structures in the data rather than from predetermined classifications. Typically, the detection of such density- or distance-based structures in data has so far entailed the challenges of choosing appropriate algorithms and their parameters, which adds a considerable amount of complex decision-making options for the HIL. Central steps of the solution approach are a parameter-free methodology for the estimation and visualization of probability density functions (PDFs); followed by a hypothesis for selecting an appropriate distance metric independent of the data context in combination with projection-based clustering (PBC). PBC allows for subsequent interactive identification of separable structures in the data. Hence, the HIL does not need deep knowledge of the underlying algorithms to identify structures in data. The complete data-driven XAI approach involving the HIL is based on a decision tree guided by distance-based structures in data (DSD). This data-driven XAI shows initial success in the application to multivariate time series and non-sequential high-dimensional data. It generates meaningful and relevant explanations that are evaluated by Grice’s maxims.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Modern artificial intelligence (AI) and machine learning (ML) algorithms can classify high-dimensional data sets with great accuracy and efficiency. These are data-driven systems which subsequent to the learning phase on learning data, are also evaluated on unlearned data, called test data to estimate the generalization ability. Very successful in the context of supervised ML are artificial neural networks consisting of simple processing units (neurons) and organized into connected layers (e.g., “deep learning”). Within AI, these algorithms are referred to as subsymbolic ML systems [1, 2]. Subsymbolic systems perform a task such as the classification of a case within a so called black-box model. Due to its inherent complexity, it is unfeasible for humans to forward a query directly to the black-box about an explanation or rationale for its decisions [3]. Within medicine in particular, there is pressing need for comprehensibility and transparency regarding decisions about patients’ health or treatment options made by algorithms.
The alternative are two, fundamentally different approaches to explaining the decision-making process of ML systems. The first solution approach generates explanations often in the form of rules, using a separately performed classification of a black box subsymbolic system followed by post-hoc interpretation and explanation with explainers like LIME [4] or SHAP [5]. The advantage of these post-hoc explainers is their ability to combine with arbitrary subsymbolic ML systems. The disadvantage is that explanations are extracted mostly locally and thus learned from examples, which might be based on non-causal correlations. Also, locally extracted explanations are often required in a large number to cover the structures in the data [6, 7].
The second solution approach involves inherently interpretable procedures of ML, which are referred to as symbolic because their algorithms make decision processes comprehensible [8]. In this work the second approach is referred to as data-driven explainable AI (XAI). Here, the notion of explanation can be specified better by defining levels of explanation with designated procedures based on cognitive processes and theoretically intended to allow conversational interaction between different levels of XAIs and a human [9].
Data-driven XAI methods have the advantage that higher-level structures in the data can be recognized and explained. The disadvantage is that such methods, due to their internal metrics, may have a bias and require already classified data. Such classifications or annotations might not map the structures in the data.
The challenge of both approaches is that the generated explanations are typically not meaningful to the domain expert but rather tailored to the data scientist [10]. Optimally, meaningful explanations should be developed in the language of the domain expert, i.e., particularly be contrastive [11]. Further, these XAI approaches pose risk of modeling artifacts when understanding and control of end-users are diminished, i.e., when structures in data are detected without human intervention [12].
The same challenges arise in the one-dimensional data for the estimation and visualization of probability density distributions (PDFs), for which many kernel density estimators and visualization methods are available. Still, all these methods regularly reveal erroneous distributions [13]. Only integrating a human-in the-loop (HIL) into the algorithms presented here at critical decision points leads to significantly better detection of distance-based structures in data (DSD) and, what follows, to understandable and relevant explanations [14].
This habilitation thesis proposes an alternative solution path for creating a data-driven XAI: higher-level structures being recognized in the data by enabling a HIL to identify them at critical decision points. The author thus follows the reasoning of Holzinger in that the integration of a HIL’s knowledge, intuition and experience may be indispensable, and the interaction of a HIL with the data can significantly improve the overall ML pipeline [15]. The HIL is an agent that interacts with algorithms, allowing the algorithms to optimize their learning behavior [16]. This perspective fundamentally integrates humans into the algorithmic loop to opportunistically and repeatedly use human knowledge and skills to improve the quality of ML systems [16,17,18].
2 Data-driven XAI Using Human-in-the-Loop
The new data-driven XAI approach is summarized in the following two steps. First, the distance-based structures in data (DSD) are identified by a HIL required at three critical decision points. Second, meaningful and relevant explanations are extracted by the XAI, based on the structures in the data involving a HIL at one critical decision point.
The first step works as follows: firstly the identification of empirical probability density distributions (PDF) is performed and, for this purpose, a basic method called Mirrored Density Plot (MD Plot) is proposed [13]. The MD Plot can detect and display distribution skewness, multimodality, abrupt distribution boundaries, and quantized data generation processes. Multimodality in particular, can be seen more sensitively by a HIL than detected by statistical tests [13]. By allowing the HIL, at the first decision point, a visual identification of distributions, appropriate transformations can be selected with the goal to normalize the variance of variables for a distance choice with the objective of avoiding undesirable weights of variables with large ranges of values and variances.
At the second critical decision point, the distance of the high-dimensional data is selected based on the theoretical multimodality concept [19] with a practical example of four gene sets causally associated with pain and the chronification of pain, hearing loss, cancer, and drug addiction in [20]. Multimodality in the distance distribution indicates modes of intrapartition distances and interpartition distances [19]. Thirdly, an automatic structure detection algorithm, called projection-based clustering (PBC), is proposed [21], whose extension is the identification process through a HIL at the third critical decision point [22, 23] as described in Sect. 3.
In the second step, the identified structures in data guide a supervised decision tree. The appropriate decision tree is chosen by the HIL according to Grice’s maxims [14, 24]. As a consequence, the HIL implicitly selects the splitting criterion, which is based on a metric defined by class information [25, 26], at this decision point, so that the decision tree represents the identified structures in data. After the decision tree has been learned, the paths from the root to the leaves are defined as explanations.
The data-driven XAI method was successfully applied to the problem of understanding water quality and its underlying processes. The multivariate time series for different years consists of water quality measurements. It leads to explanations that are both meaningful and relevant to the domain expert [14].
In addition, the data-driven XAI was applied to quarterly available fundamental data of companies traded on the German stock market [24, 27, 28]. In principle, company fundamentals can be used to select stocks with a high probability of rising or falling stock prices. Still, many of the commonly known rules or used explanations for such a stock-picking process are too vague to be applied in concrete cases. Using the explanations of the data-driven XAI, the prediction of the future price trends of specific stocks is made possible with a higher success rate than comparable methods [24].
3 Human-in-the-Loop Projection-Based Clustering (HIL-PBC)
The practical problem of López-García et al. showed that combining principal component analysis with clustering is rather disadvantageous [29]. It served as the motivation to define and systematically investigate distance and density based structures in data. The focus lies on the automatic detection with subsequent HIL recognition of partitions of separable high-dimensional structures in data, which are often also referred to as “natural” and whose data patterns in low-dimensional spaces are perceived by the human eye as separate objects (cf. [30]). Descriptions and access to typical density- and distance-based structures [31] and algorithms [32] is provided. In the subsequent work [33], the pitfalls and challenges of automated cluster detection or cluster analysis pipelines are highlighted. This work shows that
-
Parameter optimization on datasets without distance-based structures,
-
Algorithm selection using unsupervised quality measures on biomedical data, and
-
Benchmarking detection algorithms with first-order statistics or box plots or a small number of repetitions of identical algorithm calls
are biased and often not recommended [33]. This serves as a motivation to investigate HIL approaches for structure identification toward pattern recognition as opposed to automatic algorithmic detection [22].
In order to integrate a HIL into the recognition of structures in data, the combination of non-linear projection methods and automatic detection of structures proved to be very useful [21]: Let d-dimensional data points \(i\in \text{I}\) be in the input space \(I\subset {\mathbb{R}}^{d}\), and let \(o\in \text{O}\) be projected points in the output space \(O\subset {\mathbb{R}}^{b},\) then a mapping performed by a dimensionality reduction method proj: \(I\to O, i\mapsto o\) is called a projection onto a plane if \(b=2\). First, a non-linear projection (e.g., via NeRV [34], t-SNE [35], Pswarm [36]) is computed for the data points, and then the projection points are quantized into grid points \({g}_{i}\in {\mathbb{R}}^{2}\) within the finite two-dimensional space (plane). A \({g}_{l}\) is connected to \({g}_{j}\) via an edge \(e\) if and only if there exists a point \(x\in {\mathbb{R}}^{d}\)that is equally close to \({g}_{l}\) as \({g}_{j}\) in terms of metric \(D\) as well as closer to \({g}_{l}\) and \({g}_{j}\) than any other point \({g}_{i}\) with \(\exists x\in {\mathbb{R}}^{d}: D\left(x,{g}_{l}\right)=D\left(x,{g}_{j}\right) \wedge D\left(x,{g}_{l}\right)<D\left(x,{g}_{i}\right) \forall i\ne l,j\).
Let graph Γ be a pair (V,E) for which the grid points are the vertices \(v\in V\), let \(\left\{{e}_{1}(l,k) . . . , {e}_{n}(m,j)\right\}\in E\) be a sequences of edges defining a walk from grid point \({g}_{l}\) to \({g}_{j}\), let \(d(l,j)\) be the distances between the corresponding high-dimensional data points \(\{l,j\}\), then the length \(\left|{p}_{l,j}\right|\in {P}_{l,j}\) is derived from the path \({p}_{l,j} = {d\left(l,k\right)*e}_{1}, . . . , d(m,j)*{e}_{n}\). Paths are embedded in the two-dimensional toroidal plane, even if the projection is planar and not toroidal. In a two-dimensional toroidal plane, its four sides are cyclically connected. Thus, border effects of the projection process can be compensated. Then the shortest path between two grid points \({g}_{l},{g}_{j}\) in (\(\varGamma , P)\) is defined by \(\widetilde{d}\left({g}_{l},{g}_{j}\right)=\text{min}\{{\text{P}}_{l,j}\}.\)
Let \({C}_{r} \subset I\) and \({C}_{q} \subset I\) be two partitions with \(r,q\in\left\{1,\dots,k\right\}\) and \({C}_{r} \cap {C}_{q} =\left\{\right\}\) for \(r \ne q\)and let data points in the partitions be defined by \(l \in {C}_{q}\) and \(j\in {C}_{r}\), with powers \(k = \left|{C}_{q}\right|\) and \(p = \left|{C}_{r}\right|\), further let \(\{{g}_{l}, {g}_{j}\}\) be the nearest neighbors of two partitions \({C}_{r} \subset I\) and \({C}_{q} \subset I\), then in each case two partitions \(\{{C}_{r} , {C}_{q}\}\) are aggregated bottom-up with either the minimum dispersion of \(\{{C}_{r} , {C}_{q} \}\):
or with the smallest distance between \(\{{C}_{r} , {C}_{q} \}\):
The algorithm stops when the set number of partitions is reached. Yet, PBC required two parameters to be set and for specific projection methods like Pswarm a distance to be selected.
Thus, the HIL-PBC was proposed as an extension for integrating a HIL at critical decision points through an interactive topographic map to detect separable structures [22, 23]. These separable high-dimensional structures in data are visualized using a topographic map [37] based on the U-matrix principle [38, 39] with a height-dependent color mapping (so-called regional colors or “hypsometric tints”) and it can be 3D printed [40].
The task of a HIL is to estimate by inspecting the topographic map whether there is a tendency for separable high-dimensional structures (cf., clusterability [41]) to appear. Moreover, the interaction of a HIL with the topographic map enables the estimation of the number of partitions in the data and making the correct choice of the Boolean parameter (Eqs. 1, 2).
4 Concluding Remarks
The recommendation is to integrate the human-in-the-loop (HIL) at critical decision points with the goal to identify structures in the high-dimensional data to exploit them in data-driven XAIs [13, 22]. The HIL is necessary as the thesis shows that automatic ML pipelines are disadvantageous [33]. One exemplary critical decision point is the selection of the distance metric by recognizing multimodality in its distribution even when statistical testing is not sensitive enough [19]. These distance-based structures in data (DSD) guide the decision tree whose splitting criterion satisfies well the Grice’s maxim [14, 24]. From the decision tree, the explanations are extracted without a necessity for priorly labeled data, although HIL-PBC can verify that a given classification represents structures in data. The proposed algorithms in the two steps described in Sect. 2 are compared with a wide range of conventional algorithms in a variety of published works.
Supposing no multimodal distance distribution can be found or the number of cases makes a distance calculation impracticable, the first results on biomedical data show a successful application of an XAI, called algorithmic population description (ALPODS) [6, 7]. ALPODS identifies density-based partitions within the training dataset relevant to the given classification. These partitions (“populations”) are recursively generated as a sequence of decisions for each of the variables in the data [6, 7].
One open issue worth considering for future work is the evaluation of the proposed data-driven XAI based on HIL-PBC with additional human-experts.
References
Ultsch A, Korus D (1995) Integration of neural networks and knowledge-based systems. In: International Conference on Neural Networks. Perth, Australia. Vol. 4, pp. 1828–1833
Ultsch A (1998) The integration of connectionist models with knowledge-based systems: hybrid systems. In: SMC’98 Conference Proceedings 1998 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, San Diego, CA, USA, pp 1530–1535
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, CA, USA, pp 1135–1144
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
Ultsch A, Hoffman J, Röhnert M, Von Bonin M, Oelschlägel U, Brendel C et al (2022) An Explainable AI System for the Diagnosis of High Dimensional Biomedical Data. arXiv preprint arXiv:2107.01820, https://doi.org/10.48550/arXiv.2107.01820
Ultsch A, Hoffman J, Brendel C, Thrun MC (2021) ALPODS an Explainable AI for the Diagnosis of B-cell Lymphoma Data Science, Statistics & Visualisation (DSSV) and the European Conference on Data Analysis (ECDA), July 7–9. Rotterdam, Netherlands
Biran O, Cotton C (2017) Explanation and justification in machine learning: A survey. IJCAI-17 workshop on explainable AI (XAI) 8–13
Dazeley R, Vamplew P, Foale C, Young C, Aryal S, Cruz F (2021) Levels of Explainable Artificial Intelligence for Human-Aligned Conversational Explanations. Artif Intell 299:103525
Miller T, Howe P, Sonenberg L, Explainable AIE (2017) AI: Beware of inmates running the asylum. International Joint Conference on Artificial Intelligence, Workshop on Explainable AI (XAI), pp 36–42
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38
Holzinger A, Jurisica I (2014) Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. Interactive knowledge discovery and data mining in biomedical informatics. Springer 1–18
Thrun MC, Gehlert T, Ultsch A (2020) Analyzing the fine structure of distributions. PLoS ONE 15(10):e0238835. https://doi.org/10.1371/journal.pone.0238835
Thrun MC, Ultsch A, Breuer L, Explainable AI, Framework for multivariate hydrochemical time series (2021) Mach Learn Knowl Extr (MAKE) 3(1):170–205. https://doi.org/10.3390/make3010009
Holzinger A, AI (2018) From machine learning to explainable. world symposium on digital intelligence for systems and machines (DISA): IEEE; 2018 55–66
Holzinger A, Plass M, Kickmeier-Rust M, Holzinger K, Crişan GC, Pintea C-M et al (2019) Interactive machine learning: experimental evidence for the human in the algorithmic loop. Appl Intell 49(7):2401–2414
Zanzotto FM (2019) Human-in-the-loop artificial intelligence. J Artif Intell Res 64:243–252
Mac Aodha O, Stathopoulos V, Brostow GJ, Terry M, Girolami M, Jones KE (2014) Putting the scientist in the loop–Accelerating scientific progress with interactive machine learning. 2014 22nd International Conference on Pattern Recognition: IEEE, pp 9–17
Thrun MC (2021) The exploitation of distance distributions for clustering. Int J Comput Intell Appl 20(3):2150016. https://doi.org/10.1142/S1469026821500164
Thrun MC (2022) Knowledge-based indentification of homogenous structures in genes. In: Rocha A, Adeli H, Dzemyda G, Moreira F (eds) Information Systems and Technologies, Lecture Notes in Networks and Systems, Vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_9
Thrun MC, Ultsch A (2020) Using projection based clustering to find distance and density based clusters in high-dimensional data. J Classif 38(2):280–312. https://doi.org/10.1007/s00357-020-09373-2
Thrun MC, Pape F, Ultsch A (2021) Conventional displays of structures in data compared with interactive projection-based clustering (IPBC). Int J Data Sci Analytics 12(3):249–271. https://doi.org/10.1007/s41060-021-00264-2
Thrun MC, Pape F, Ultsch A (2020) Interactive machine learning tool for clustering in visual analytics. In: 7th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2020). IEEE, Sydney, Australia, pp 672–80. https://doi.org/10.1109/DSAA49011.2020.00062
Thrun MC (2022) Exploiting distance-based structures in data using an explainable AI for stock picking. Information 13(2):51. https://doi.org/10.3390/info13020051
Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Shavlik J (ed) Proceedings of the 15th International Conference on Machine Learning (ICML). Morgan Kaufmann, San Francisco, CA, USA, pp 55–63
De Mántaras RL (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6(1):81–92
Thrun MC (2019) Knowledge discovery in quarterly financial data of stocks based on the prime standard using a hybrid of a swarm with SOM. In: Verleysen M (ed) European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). Ciaco, Bruges, Belgium, pp 397–402
Thrun MC (2021) Human-in-the-loop detection of explainable distance-based structures in data for stock picking. In: Data science, statistics & visualisation (DSSV) and the European Conference on Data Analysis (ECDA). July 7-9, Rotterdam, Netherlands
López-García P, Argote DL, Thrun MC (2020) Projection-based classification of chemical groups and provenance analysis of archaeological materials. IEEE Access 8:152439–152451. https://doi.org/10.1109/ACCESS.2020.3016244
Stoll J, Thrun MC, Nuthmann A, Einhäuser W (2015) Overt attention in natural scenes: objects dominate features. Vision Res 107:36–48. doi: https://doi.org/10.1016/j.visres.2014.11.006
Thrun MC, Ultsch A (2020) Clustering benchmark datasets exploiting the fundamental clustering problems. Data Brief 30(C):100642. https://doi.org/10.1016/j.dib.2020.105501
Thrun MC, Stier Q (2021) Fundamental clustering algorithms suite. SoftwareX 13(C), pp 100642. https://doi.org/10.1016/j.softx.2020.100642
Thrun MC (2021) Distance-based clustering challenges for unbiased benchmarking studies. Nat Sci Rep 11(1):18988. https://doi.org/10.1038/s41598-021-98126-1
Venna J, Peltonen J, Nybo K, Aidos H, Kaski S (2010) Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J Mach Learn Res 11:451–490
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Thrun MC, Ultsch A (2021) Swarm intelligence for self-organized clustering. Artif Intell 290:103237. https://doi.org/10.1016/j.artint.2020.103237
Thrun MC, Ultsch A (2020) Uncovering high-dimensional structures of projections from dimensionality reduction methods. MethodsX 7:101093. https://doi.org/10.1016/j.mex.2020.101093
Ultsch A, Siemon HP (1990) Kohonen’s self organizing feature maps for exploratory data analysis. In: International Neural Network Conference. Kluwer Academic Press, Paris, France, pp 305–308
Ultsch A, Thrun MC (2017) Credible visualizations for planar projections. In: Cottrell M (ed) 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM). IEEE, Nany, France, pp 1–5. https://doi.org/10.1109/WSOM.2017.8020010
Thrun MC, Lerch F, Lötsch J, Ultsch A (2016) Visualization and 3D printing of multivariate data of biomarkers. In: Skala V (ed) International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG). Plzen, pp 7–16
Thrun MC (2020) Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plot. In: Archambault D, Nabney I, Peltonen J (eds) Machine Learning Methods in Visualisation for Big Data. The Eurographics Association, Norrköping, Sweden. https://doi.org/10.2312/mlvis.20201102
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Thrun, M.C. Identification of Explainable Structures in Data with a Human-in-the-Loop. Künstl Intell 36, 297–301 (2022). https://doi.org/10.1007/s13218-022-00782-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-022-00782-6