Explainable AI in the era of Large Language Models

Go back to programme

Explainable AI in the era of Large Language Models

The domain of Explainable Artificial Intelligence (XAI) has made significant strides in recent years. Various explanation techniques have been devised, each serving distinct purposes. Some of them explain individual predictions of AI models by highlighting influential input features, while others enhance comprehension of the model’s internal operations by visualizing the concepts encoded by individual neurons. Although these initial XAI techniques have proven valuable in scrutinizing models and detecting flawed prediction strategies (referred to as “Clever Hans” behaviors), they have predominantly been applied in the context of classification problems. The advancement of generative AI, notably the emergence of exceedingly large language models (LLMs), has underscored the necessity for next-generation explanation methodologies tailored to this fundamentally distinct category of models and challenges.

This workshop aims to address this necessity from various angles. Firstly, to think about what “explaining” means in the context of generative AI. Secondly, to discuss recent methodological breakthroughs, which allow to gain deeper insights into the mysterious world of LLMs. Lastly, we will have a look at the practical implications when a new class of explainable LLMs becomes available, not only from the standpoint of the lay user but also by considering the opportunities for developers, experts, and regulators.

Reimagining Explainable AI Evaluation with LLMs

Anna Hedström

Abstract: Every explainable AI researcher needs to answer the question: how good is my explanation with respect to the model it seeks to explain? Without access to ground truth labels, it is not obvious how to answer this. Researchers have therefore tried a variety of evaluation approaches: human-based, restricted to toy settings, or using metric-based measures to approximate explanation quality. In this talk, we begin by reviewing existing evaluation ideas and identifying pitfalls in prevalent evaluation practices. Lastly, we also peek into the possible implications of large language models (LLMs) increasingly dominating; what the evaluation-centric opportunities and challenges are and what this may mean for the research community and society as a whole.

Share this session

Are you sure you want to remove this speaker?