Deep Neural Network (DNN) models are widely used in machine translation, autonomous car driving, facial recognition, and countless other applications. However, due to their complex characteristics such as the high dimensionality of the data and the non-linearity of constituent neurons, it has been very difficult for scientists to interpret exactly how the models make their predictions. This has severely limited the application of such models to high-risk fields such as medicine and defense and they have thus become one of the fundamental areas of research in artificial intelligence (AI).

As part of the effort to better understand the basis of such decision-making, a team of researchers including Professor Jaesik Choi (CEO of AI startup INEEJI) and Dr Haedong Jeong of the Kim Jaechul Graduate School of AI, and researcher Giyoung Jeon (from UNIST and INEEJI) have proposed a new approach to measure how much each input feature contributes to the final result of the model. This is one of the most popular ways to elucidate the inner workings of such models and is widely done using a set of methods that use gradients to measure how sharply the final prediction changes with a small change in a feature’s value. However, current gradient-based methods face unreliability, noisy explanations, and vulnerability to small perturbations in input.

The team developed an algorithm that progressively distills input data features that have a comparatively small effect on the final result. As a consequence, only those features that strongly influence the final prediction remain. For instance, if the input of the model is image data, then the parts of the images that contributed heavily to the model’s calculation can be determined. The algorithm is independent of the structure of the model and can therefore be used very generally. The team confirmed the efficacy and scalability of their algorithm by applying it successfully to calculate the contributions of the input features in multiple well-known image classification models. Their work has been published in NeurIPS, one of the largest AI conferences, under the title “Distilled Gradient Aggregation: Purify Features for Input Attribution in the Deep Neural Network”.

Copyright © The KAIST Herald Unauthorized reproduction, redistribution prohibited