Treffer: Enhancing concept alignment with explanatory interactive disentangled representation learning.
Weitere Informationen
Deep learning aims to learn "good" representations that are effective for machine learning tasks, but these representations often lack interpretability due to the black-box nature of neural networks. Disentangled representation learning opts to separate the representations with regard to independent human-defined concepts, which paves the path to model explainability. However, traditional supervised learning approaches require extensive manual concept labeling, which is impractical for large-scale datasets. In this paper, we propose a XIDRL framework (eXplanatory Interactive Disentangled Representation Learning) that enables efficient collaboration between our proposed state-of-the-art representation disentangling technique, supervised contrastive learning with invariant risk minimization (SCL+IRM), and human experts. The introduced SCL+IRM algorithm can provide improved alignment capabilities, further enhancing concept alignment. Based on the framework, we design and develop a visual analytics system to assist machine learning experts in exploring concept alignments, comprehending model behaviors, and refining concepts. Additionally, we incorporate the w-BiLRP algorithm to enhance model interpretability. The insights derived from these endeavors are utilized to update the model and align the data with the human concepts. Besides, we present two case studies that demonstrate how our prototype system facilitates the creation of interpretable and human-controllable disentangled representations. Code, data, and model checkpoints will be released after the review period.
(Copyright © 2025. Published by Elsevier Ltd.)
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.