understanding black box predictions via influence functions

PDF Understanding Black-box Predictions via Influence Functions - arXiv PW Koh*, KS Ang*, H Teo*, PS Liang. and Hessian-vector products. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Christmann, A. and Steinwart, I. Pang Wei Koh and Percy Liang. Cook, R. D. and Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. PDF Understanding Black-box Predictions via Influence Functions - GitHub Pages The deep bootstrap framework: Good online learners are good offline generalizers. Understanding Black-box Predictions via Influence Functions - SlideShare Applications - Understanding model behavior Inuence functions reveal insights about how models rely on and extrapolate from the training data. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Understanding Black-box Predictions via Influence Functions Which algorithmic choices matter at which batch sizes? Often we want to identify an influential group of training samples in a particular test prediction for a given machine learning model. We have two ways of measuring influence: Our first option is to delete the instance from the training data, retrain the model on the reduced training dataset and observe the difference in the model parameters or predictions (either individually or over the complete dataset). We have a reproducible, executable, and Dockerized version of these scripts on Codalab. /Filter /FlateDecode In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1885--1894. lage2019evaluationI. Understanding black-box predictions via influence functions. which can of course be changed. Biggio, B., Nelson, B., and Laskov, P. Poisoning attacks against support vector machines. Are you sure you want to create this branch? Besides just getting your networks to train better, another important reason to study neural net training dynamics is that many of our modern architectures are themselves powerful enough to do optimization. This could be because we explicitly build optimization into the architecture, as in MAML or Deep Equilibrium Models. Neural tangent kernel: Convergence and generalization in neural networks. Koh, Pang Wei. D. Maclaurin, D. Duvenaud, and R. P. Adams. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. outcome. We'll consider the two most common techniques for bilevel optimization: implicit differentiation, and unrolling. When testing for a single test image, you can then A. Mokhtari, A. Ozdaglar, and S. Pattathil. Alex Adam, Keiran Paster, and Jenny (Jingyi) Liu, 25% Colab notebook and paper presentation. With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. Decaf: A deep convolutional activation feature for generic visual recognition. In. Understanding Black-box Predictions via Influence Functions How can we explain the predictions of a black-box model? calculated. The power of interpolation: Understanding the effectiveness of SGD in modern over-parameterized learning. Requirements chainer v3: It uses FunctionHook. How can we explain the predictions of a black-box model? In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Understanding Black-box Predictions via Influence Functions Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. A theory of learning from different domains. No description, website, or topics provided. The security of latent Dirichlet allocation. Components of inuence. Students are encouraged to attend class each week. Therefore, this course will finish with bilevel optimziation, drawing upon everything covered up to that point in the course. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. Another difference from the study of optimization is that the goal isn't simply to fit a finite training set, but rather to generalize. to trace a model's prediction through the learning algorithm and back to its training data, . , . Theano: A Python framework for fast computation of mathematical expressions. C. Maddison, D. Paulin, Y.-W. Teh, B. O'Donoghue, and A. Doucet. Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.See more on this video at https://www.microsoft.com/en-us/research/video/understanding-black-box-predictions-via-influence-functions/ This class is about developing the conceptual tools to understand what happens when a neural net trains. prediction outcome of the processed test samples. influence-instance. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. I am grateful to my supervisor Tasnim Azad Abir sir, for his . In, Cadamuro, G., Gilad-Bachrach, R., and Zhu, X. Debugging machine learning models. Understanding Black-box Predictions via Influence Functions - YouTube AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features 2022. For this class, we'll use Python and the JAX deep learning framework. J. Lucas, S. Sun, R. Zemel, and R. Grosse. P. Nakkiran, B. Neyshabur, and H. Sedghi. kept in RAM than calculating them on-the-fly. Apparently this worked. How can we explain the predictions of a black-box model? Interpreting black box predictions using Fisher kernels. This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited. If the influence function is calculated for multiple Fast convergence of natural gradient descent for overparameterized neural networks. Understanding Black-box Predictions via Inuence Functions Figure 1. The most barebones way of getting the code to run is like this: Here, config contains default values for the influence function calculation Class will be held synchronously online every week, including lectures and occasionally tutorials. Deep learning via hessian-free optimization. Thus, we can see that different models learn more from different images. In. If you have questions, please contact Pang Wei Koh (pangwei@cs.stanford.edu). 10 0 obj Therefore, if we bring in an idea from optimization, we need to think not just about whether it will minimize a cost function faster, but also whether it does it in a way that's conducive to generalization. The ACM Digital Library is published by the Association for Computing Machinery. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. arXiv preprint arXiv:1703.04730 (2017). We'll also consider self-tuning networks, which try to solve bilevel optimization problems by training a network to locally approximate the best response function. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. initial value of the Hessian during the s_test calculation, this is On the limited memory BFGS method for large scale optimization. PW Koh, P Liang. The model was ResNet-110. training time, and reduce memory requirements. The reference implementation can be found here: link. If Influence Functions are the Answer, Then What is the Question? A tag already exists with the provided branch name. In. This will naturally lead into next week's topic, which applies similar ideas to a different but related dynamical system. When can we take advantage of parallelism to train neural nets? Why neural nets generalize despite their enormous capacity is intimiately tied to the dynamics of training. A tag already exists with the provided branch name. In. $-hm`nrurh%\L(0j/hM4/AO*V8z=./hQ-X=g(0 /f83aIF'Mu2?ju]n|# =7$_--($+{=?bvzBU[.Q. For details and examples, look here. 2172: 2017: . Understanding Black-box Predictions via Influence Functions - PMLR Debruyne, M., Hubert, M., and Suykens, J. In many cases, they have far more than enough parameters to memorize the data, so why do they generalize well? Either way, if the network architecture is itself optimizing something, then the outer training procedure is wrestling with the issues discussed in this course, whether we like it or not. Liu, Y., Jiang, S., and Liao, S. Efficient approximation of cross-validation for kernel methods using Bouligand influence function. Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang. In. We'll mostly focus on minimax optimization, or zero-sum games. config is a dict which contains the parameters used to calculate the In. compress your dataset slightly to the most influential images important for Loss non-convex, quadratic loss . Optimizing neural networks with Kronecker-factored approximate curvature. Deep inside convolutional networks: Visualising image classification models and saliency maps. above, keeping the grad_zs only makes sense if they can be loaded faster/ On robustness properties of convex risk minimization methods for pattern recognition. The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. Ribeiro, M. T., Singh, S., and Guestrin, C. "why should I trust you? Your file of search results citations is now ready. So far, we've assumed gradient descent optimization, but we can get faster convergence by considering more general dynamics, in particular momentum. Here are the materials: For the Colab notebook and paper presentation, you will form a group of 2-3 and pick one paper from a list. Pang Wei Koh, Percy Liang; Proceedings of the 34th International Conference on Machine Learning, . Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Understanding black-box predictions via influence functions. Overwhelmed? . , Hessian-vector . This is the case because grad_z has to be calculated twice, once for Understanding Black-box Predictions via Influence Functions Are you sure you want to create this branch? He, M. Narayanan, S. Gershman, B. Kim, and F. Doshi-Velez. The details of the assignment are here. How can we explain the predictions of a black-box model? Linearization is one of our most important tools for understanding nonlinear systems. It is known that in a high complexity class such as exponential time, one can convert worst-case hardness into average-case hardness. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. Borys Bryndak, Sergio Casas, and Sean Segal. S. Arora, S. Du, W. Hu, Z. Li, and R. Wang. Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. in terms of the dataset. NIPS, p.1097-1105. This is a PyTorch reimplementation of Influence Functions from the ICML2017 best paper: On linear models and convolutional neural networks, Rather, the aim is to give you the conceptual tools you need to reason through the factors affecting training in any particular instance. SVM , . If there are n samples, it can be interpreted as 1/n. ( , ?) Implicit Regularization and Bayesian Inference [Slides]. While influence estimates align well with leave-one-out. In. More details can be found in the project handout. ": Explaining the predictions of any classifier. ( , ) Inception, . Understanding Black-box Predictions via Influence Functions (2017) understanding model behavior, debugging models, detecting dataset errors, A Survey of Methods for Explaining Black Box Models Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. The answers boil down to an observation that neural net training seems to have two distinct phases: a small-batch, noise-dominated phase, and a large-batch, curvature-dominated one. Pang Wei Koh - Google Scholar Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional net. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. In. lehman2019inferringE. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. Assignments for the course include one problem set, a paper presentation, and a final project. ICML 2017 best paperStanfordPang Wei KohCourseraStanfordNIPS 2019influence functionPercy Liang11Michael Jordan, , \hat{\theta}_{\epsilon, z} \stackrel{\text { def }}{=} \arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L(z, \theta), \left.\mathcal{I}_{\text {up, params }}(z) \stackrel{\text { def }}{=} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0}=-H_{\tilde{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}), , loss, \begin{aligned} \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) &\left.\stackrel{\text { def }}{=} \frac{d L\left(z_{\text {test }}, \hat{\theta}_{\epsilon, z}\right)}{d \epsilon}\right|_{\epsilon=0} \\ &=\left.\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, \varepsilon=-1/n , z=(x,y) \\ z_{\delta} \stackrel{\text { def }}{=}(x+\delta, y), \hat{\theta}_{\epsilon, z_{\delta},-z} \stackrel{\text { def }}{=}\arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L\left(z_{\delta}, \theta\right)-\epsilon L(z, \theta), \begin{aligned}\left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} &=\mathcal{I}_{\text {up params }}\left(z_{\delta}\right)-\mathcal{I}_{\text {up, params }}(z) \\ &=-H_{\hat{\theta}}^{-1}\left(\nabla_{\theta} L(z_{\delta}, \hat{\theta})-\nabla_{\theta} L(z, \hat{\theta})\right) \end{aligned}, \varepsilon \delta \deltaloss, \left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} \approx-H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \hat{\theta}_{z_{i},-z}-\hat{\theta} \approx-\frac{1}{n} H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \begin{aligned} \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top} &\left.\stackrel{\text { def }}{=} \nabla_{\delta} L\left(z_{\text {test }}, \hat{\theta}_{z_{\delta},-z}\right)^{\top}\right|_{\delta=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, train lossH \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) , -y_{\text {test }} y \cdot \sigma\left(-y_{\text {test }} \theta^{\top} x_{\text {test }}\right) \cdot \sigma\left(-y \theta^{\top} x\right) \cdot x_{\text {test }}^{\top} H_{\hat{\theta}}^{-1} x, influence functiondebug training datatraining point \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) losstraining pointtraining point, Stochastic estimationHHHTFO(np)np, ImageNetdogfish900Inception v3SVM with RBF kernel, poisoning attackinfluence function59157%77%10590/591, attackRelated worktraining set attackadversarial example, influence functionbad case debug, labelinfluence function, \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right) , 10%labelinfluence functiontrain lossrandom, \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right), \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right), \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top}, H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}), Less Is Better: Unweighted Data Subsampling via Influence Function, influence functionleave-one-out retraining, 0.86H, SVMhinge loss0.95, straightforwardbest paper, influence functionloss. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Understanding Black-box Predictions via Influence Functions. 10.5 Influential Instances | Interpretable Machine Learning - GitHub Pages (b) 7 , 7 . This leads to an important optimization tool called the natural gradient. vector to calculate the influence. How can we explain the predictions of a black-box model? on the final predictions is straight forward. /Length 5088 How can we explain the predictions of a black-box model? In, Mei, S. and Zhu, X. However, in a lower Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. The main choices are. The datasets for the experiments can also be found at the Codalab link. ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70. Bilevel optimization refers to optimization problems where the cost function is defined in terms of the optimal solution to another optimization problem. For toy functions and simple architectures (e.g. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. In this paper, we use influence functions --- a classic technique from robust statistics --- Reference Understanding Black-box Predictions via Influence Functions ICML 2017 best paperStanfordPang Wei KohPercy liang, x_{test} y_{test} label x_{test} , n z_1z_n z_i=(x_i,y_i) L(z,\theta) z \theta , \hat{\theta}=argmin_{\theta}\frac{1}{n}\Sigma_{i=1}^{n}L(z_i,\theta), z z \epsilon ERM, \hat{\theta}_{\epsilon,z}=argmin_{\theta}\frac{1}{n}\Sigma_{i=1}^{n}L(z_i,\theta)+\epsilon L(z,\theta), influence function, \mathcal{I}_{up,params}(z)={\frac{d\hat{\theta}_{\epsilon,z}}{d\epsilon}}|_{\epsilon=0}=-H_{\hat{\theta}}^{-1}\nabla_{\theta}L(z,\hat{\theta}), H_{\hat\theta}=\frac{1}{n}\Sigma_{i=1}^{n}\nabla_\theta^{2} L(z_i,\hat\theta) Hessien, \begin{equation} \begin{aligned} \mathcal{I}_{up,loss}(z,z_{test})&=\frac{dL(z_{test},\hat\theta_{\epsilon,z})}{d\epsilon}|_{\epsilon=0} \\&=\nabla_\theta L(z_{test},\hat\theta)^T {\frac{d\hat{\theta}_{\epsilon,z}}{d\epsilon}}|_{\epsilon=0} \\&=\nabla_\theta L(z_{test},\hat\theta)^T\mathcal{I}_{up,params}(z)\\&=-\nabla_\theta L(z_{test},\hat\theta)^T H^{-1}_{\hat\theta}\nabla_\theta L(z,\hat\theta) \end{aligned} \end{equation}, lossNLPer, influence function, logistic regression p(y|x)=\sigma (y \theta^Tx) \sigma sigmoid z_{test} loss z \mathcal{I}_{up,loss}(z,z_{test}) , -y_{test}y \cdot \sigma(-y_{test}\theta^Tx_{test}) \cdot \sigma(-y\theta^Tx) \cdot x^{T}_{test} H^{-1}_{\hat\theta}x, \sigma(-y\theta^Tx) outlieroutlier, x^{T}_{test} x H^{-1}_{\hat\theta} Hessian \mathcal{I}_{up,loss}(z,z_{test}) resistencevariation, \mathcal{I}_{up,loss}(z,z_{test})=-\nabla_\theta L(z_{test},\hat\theta)^T H^{-1}_{\hat\theta}\nabla_\theta L(z,\hat\theta), Hessian H_{\hat\theta} O(np^2+p^3) n p z_i , conjugate gradientstochastic estimationHessian-vector productsHVP H_{\hat\theta} s_{test}=H^{-1}_{\hat\theta}\nabla_\theta L(z_{test},\hat\theta) \mathcal{I}_{up,loss}(z,z_{test})=-s_{test} \cdot \nabla_{\theta}L(z,\hat\theta) , H_{\hat\theta}^{-1}v=argmin_{t}\frac{1}{2}t^TH_{\hat\theta}t-v^Tt, HVPCG O(np) , H^{-1} , (I-H)^i,i=1,2,\dots,n H 1 j , S_j=\frac{I-(I-H)^j}{I-(I-H)}=\frac{I-(I-H)^j}{H}, \lim_{j \to \infty}S_j z_i \nabla_\theta^{2} L(z_i,\hat\theta) H , HVP S_i S_i \cdot \nabla_\theta L(z_{test},\hat\theta) , NMIST H loss , ImageNetInceptionRBF SVM, RBF SVMRBF SVM, InceptionInception, Inception, , Inception591/60059133557%, check \mathcal{I}_{up,loss}(z_i,z_i) z_i , 10% \mathcal{I}_{up,loss}(z_i,z_i) , H_{\hat\theta}=\frac{1}{n}\Sigma_{i=1}^{n}\nabla_\theta^{2} L(z_i,\hat\theta), s_{test}=H^{-1}_{\hat\theta}\nabla_\theta L(z_{test},\hat\theta), \mathcal{I}_{up,loss}(z,z_{test})=-s_{test} \cdot \nabla_{\theta}L(z,\hat\theta), S_i \cdot \nabla_\theta L(z_{test},\hat\theta). Theano D. Team. test images, the harmfulness is ordered by average harmfullness to the The next figure shows the same but for a different model, DenseNet-100/12. The datasets for the experiments can also be found at the Codalab link. and even creating visually-indistinguishable training-set attacks. Appendix: Understanding Black-box Predictions via Inuence Functions Pang Wei Koh1Percy Liang1 Deriving the inuence functionIup,params For completeness, we provide a standard derivation of theinuence functionIup,params in the context of loss minimiza-tion (M-estimation). 2018. logistic regression p (y|x)=\sigma (y \theta^Tx) \sigma . prediction outcome of the processed test samples. place. . Limitations of the empirical Fisher approximation for natural gradient descent. Understanding short-horizon bias in stochastic meta-optimization. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. Jaeckel, L. A. Model-agnostic meta-learning for fast adaptation of deep networks. Haoping Xu, Zhihuan Yu, and Jingcheng Niu. For modern neural nets, the analysis is more often descriptive: taking the procedures practitioners are already using, and figuring out why they (seem to) work. Understanding Black-box Predictions via Influence Functions In order to have any hope of understanding the solutions it comes up with, we need to understand the problems.