shap_enhanced.explainers.AttnSHAP¶
AttnSHAPExplainer: Attention-Guided SHAP with General Proxy Attention¶
Theoretical Explanation¶
AttnSHAP is a feature attribution method that enhances the traditional SHAP framework by leveraging attention mechanisms to guide the sampling of feature coalitions. This is especially effective for sequential or structured data, where model-provided or proxy attention scores highlight important feature positions.
By biasing the coalition selection process with attention, AttnSHAP prioritizes the masking of less informative features and isolates the contributions of more relevant ones more effectively than uniform sampling.
Key Concepts¶
- Attention-Guided Sampling: When available, model attention weights (via get_attention_weights) are used
to bias coalition sampling toward informative features.
- Proxy Attention: If direct attention is unavailable, attention scores can be approximated using:
Gradient-based: Magnitude of input gradients.
Input-based: Magnitude of input values.
Perturbation-based: Change in model output due to masking each individual feature.
Uniform Sampling: Falls back to classical SHAP’s uniform random sampling when attention is not used.
- Additivity Normalization: Attribution values are scaled such that their sum equals the model output difference
between the original and fully-masked inputs.
Algorithm¶
- Initialization:
Takes a model, background dataset, a flag for using attention, a proxy attention strategy, and device context.
- Attention/Proxy Computation:
- For each input:
Retrieve model attention weights if available.
Otherwise, compute proxy attention based on the configured method.
- Coalition Sampling:
- For each feature:
Repeatedly sample a subset (coalition) of other features, with probability weighted by attention (if applicable).
Compute model output after masking the coalition.
Compute model output after masking the coalition plus the target feature.
Record the difference to estimate the marginal contribution.
- Normalization:
- Normalize feature attributions so that their sum matches the model output difference
between the unmasked input and a fully-masked input baseline.
References
Lundberg & Lee (2017), “A Unified Approach to Interpreting Model Predictions” [SHAP foundation]
Serrano & Smith (2019), “Is Attention Interpretable?” [Examines the interpretability and limitations of attention weights]
Jain & Wallace (2019), “Attention is not Explanation” [Argues that attention alone is not a reliable explanation mechanism]
Chefer, Gur, & Wolf (2021), “Transformer Interpretability Beyond Attention Visualization” [Shows advanced uses of attention and gradients for model interpretation]
Sundararajan et al. (2017), “Axiomatic Attribution for Deep Networks” [Introduces integrated gradients, a gradient-based attribution method relevant for proxy attention]
Janzing et al. (2020), “Explaining Classifiers by Removing Input Features” [Discusses alternative SHAP sampling strategies and implications for non-uniform sampling]
Classes
|
Attention-Guided SHAP Explainer for structured/sequential data. |
- class shap_enhanced.explainers.AttnSHAP.AttnSHAPExplainer(model, background, use_attention=True, proxy_attention='gradient', device=None)[source]¶
Bases:
BaseExplainer
Attention-Guided SHAP Explainer for structured/sequential data.
This class implements an extension to the SHAP framework that leverages attention mechanisms (either native to the model or via proxy strategies) to guide the coalition sampling process, focusing attribution on informative feature regions.
- Parameters:
model – PyTorch model to be explained.
background – Background dataset used for SHAP estimation.
use_attention (bool) – If True, uses attention weights (or proxy) for guiding feature masking.
proxy_attention (str) – Strategy to approximate attention when model does not provide it. Options: “gradient”, “input”, “perturb”.
device – Computation device (‘cuda’ or ‘cpu’).
- property expected_value¶
Optional property returning the expected model output on the background dataset.
- Returns:
Expected value if defined by the subclass, else None.
- Return type:
float or None
- explain(X, **kwargs)¶
Alias to shap_values for flexibility and API compatibility.
- Parameters:
X (Union[np.ndarray, torch.Tensor, list]) – Input samples to explain.
kwargs – Additional arguments.
- Returns:
SHAP values.
- Return type:
Union[np.ndarray, list]
- shap_values(X, nsamples=100, coalition_size=3, check_additivity=True, random_seed=42, **kwargs)[source]¶
Compute SHAP values using attention-guided or proxy-guided coalition sampling.
For each feature at each time step, it estimates the marginal contribution by comparing model outputs when the feature is masked vs. when it is included in a masked coalition. Sampling is optionally biased using attention scores.
The final attributions are normalized to satisfy SHAP’s additivity constraint:
\[\sum_{t=1}^T \sum_{f=1}^F \phi_{t,f} \approx f(x) - f(x_{masked})\]- Parameters:
X (np.ndarray or torch.Tensor) – Input data of shape (B, T, F) or (T, F)
nsamples (int) – Number of coalitions sampled per feature.
coalition_size (int) – Number of features in each sampled coalition.
check_additivity (bool) – Whether to print additivity check results.
random_seed (int) – Seed for reproducible coalition sampling.
- Returns:
SHAP values of shape (T, F) for single input or (B, T, F) for batch.
- Return type:
np.ndarray