Facebook AI

Facebook AI releases a new version of Captum, Captum 0.4, a powerful, easy-to-use model interpretability library for PyTorch. Captum 0.4 adds a variety of new functionality for model understanding.

Concept-based interpretability tools such as Captum make it easier for AI researchers and engineers to design, develop, and debug advanced AI models. They also help people understand how their AI models work, so they can assess whether those models reflect their values and whether they deliver accurate predictions that serve their businesses’ or organizations’ needs.

Captum also offers robustness tools to help model developers uncover vulnerabilities using robust metrics and adversarial attacks. With version 0.4, Facebook has added tooling for evaluating model robustness, new attribution methods, and improvements to existing attribution methods.

Concept-based Interpretability Helps Remove Statistical Biases

Captum can help AI researchers understand how a particular CV model interprets complex images like this. Deep learning models can be difficult to understand. For example, an image classifier that runs on photos operates by using low-level features, such as pixel values, lines, dots, and other minor details of the image. Concept activation vectors (CAVs) are a technique to explain a neural network’s internal state by associating model predictions with concepts (such as “apron,” “cafe”, etc.) that people can easily understand.

Captum 0.4 adds testing with concept activation vectors (TCAV), allowing researchers and engineers to assess how different user-defined concepts affect a model’s prediction. TCAV also can be used for fairness analysis to check for algorithmic and label bias. Researchers have found that some networks can inadvertently embed biases that can be difficult to detect.

TCAV expands beyond currently available attribution methods, which enable researchers and engineers to quantify the importance of various inputs by allowing them to also quantify the impact of concepts such as gender or race on a model’s prediction. In Captum 0.4, TCAV has been implemented generically, allowing users to define custom concepts with example inputs for different modalities, including vision and text. In one of its experiments, for example, the importance of using “positive adjectives” for the prediction of positive sentiment was estimated. 

As a data set, Facebook used a list of movie ratings with positive sentiment. The graphs visualize TCAV scores for positive adjectives concepts along with five different sets of neutral terms concepts. The positive adjectives concept is significantly more important for both convolutional layers across all five different neutral concept sets. This indicates the importance of positive adjectives in predicting positive sentiment.  

The distribution of TCAV scores for positive adjectives vs. neutral terms concepts for two different convolutional layers of the sentiment-analysis model.

Building More Robust AI models

Deep learning techniques can be vulnerable to a variety of adversarial inputs that may fool an AI model but be imperceptible to humans. Captum 0.4 includes robustness tooling to support improved understanding of limitations and vulnerabilities of a model. A robust AI system should consistently reproduce safe and reliable results under predefined conditions. The AI system will react to unforeseen issues and make necessary changes to avoid harming or otherwise negatively affecting people.

The library also includes new tools to understand model robustness, including implementations of adversarial attacks (fast-gradient sign method and projected-gradient descent) and robustness metrics to evaluate the impact of different attacks or perturbations on a model.

Layer-wise Relevance Propagation and Attribution Improvements

In collaboration with Technische Universität Berlin, Facebook has implemented a new attribution algorithm, layer-wise relevance propagation (LRP), which offers a new perspective for explaining model predictions.

Captum 0.4 also adds both LRP and also a layer-attribution variant, layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance, which is decomposed into values for each neuron of the underlying layers.

Finally, Captum 0.4 has added multiple new tutorials, a variety of improvements, and bug fixes to existing attribution methods. More information regarding these improvements can be found in the official release notes. Captum is interoperable with the Fiddler platform for explainable AI, which enables engineers and developers to gather actionable insights and to analyze the decision-making behavior behind AI models.

Helping the AI community build models that are more reliable, more predictable, and better able to resist adversarial attacks is an important long-term project.