Comparing Effects of Attribution-based, Example-based, and Feature-based Explanation Methods on AI-Assisted Decision-Making

posted in: reading | 0
Trust calibration is essential in AI-assisted decision-making tasks. If human users understand the reasons for a prediction of an AI model, they can assess whether or not the prediction is reasonable. Especially for high-risk tasks like mushroom hunting (where a wrong decision may be fatal), it is important that users trust or overrule the AI in the right situations. Various explainable AI methods are currently being discussed as potentially useful for facilitating understanding and to calibrate user trust. So far, however, it is unclear which approaches are most effective. Our work takes on this issue; in a between-subjects experiment with 𝑁 = 501 participants. Participants were tasked to classify the edibility of mushrooms depicted on images. We compare the effects of three XAI methods on human AI-assisted decision-making behavior: (i) Grad-CAM attributions; (ii) nearest neighbor examples; and (iii) an adoption of network dissection. For nearest neighbor examples, we found a statistically significant improvement in user performance compared to a condition without explanations. Effects did not reach statistical significance for Grad-CAM and network dissection. For the latter, however, the effect size estimators show a similar tendency as for nearest neighbor. We found that the effects also varied for different task items (i.e., mushroom images). Explanations seem to be particularly effective if they reveal possible flaws in case of wrong AI classifications or reassure users in case of correct classifications. Our results suggest that well-established methods might not be as beneficial to end users as expected and that XAI techniques must be chosen carefully in real-world scenarios.