shap_enhanced.explainers.ESSHAP

EnsembleSHAPWithNoise: Robust Ensemble Wrapper for SHAP/Custom Explainers

Theoretical Explanation

EnsembleSHAPWithNoise is a robust ensemble-based enhancement for SHAP and custom explainer methods. It addresses instability in feature attributions—especially in deep or highly sensitive models—by adding Gaussian noise to the inputs and/or background data across multiple runs of a base explainer, then aggregating the resulting attribution maps.

This technique improves robustness by simulating data perturbations and reduces the variance in feature importance estimates, leading to more reliable and stable interpretations.

Key Concepts

  • Ensemble Averaging:

    The explainer is executed multiple times with noisy versions of the input/background. Attribution results are aggregated using the specified method (mean or median).

  • Noise Injection:
    Gaussian noise is applied to:
    • Input: Simulates perturbations in the sample to be explained.

    • Background: Introduces variability into the reference distribution used for attribution.

    • Both: Simulates end-to-end variability.

  • Explainer Flexibility:

    Compatible with all SHAP explainers (e.g., DeepExplainer, KernelExplainer) and custom user-defined explainers. Automatically adapts inputs to the required format (NumPy or PyTorch).

  • Type Safety and Compatibility:

    Automatically handles conversions between NumPy arrays and PyTorch tensors, depending on the explainer’s requirements.

Algorithm

  1. Initialization:
    • Accepts a model, background data, explainer class (default: shap.DeepExplainer), number of runs,

      noise level (float), target for noise injection (‘input’, ‘background’, or ‘both’), aggregation method (‘mean’ or ‘median’), explainer kwargs, and device context.

  2. Ensemble Loop:
    • For each of the specified number of runs:
      • Inject Gaussian noise into the background and/or input, as specified.

      • Convert noisy data into the appropriate type (NumPy or PyTorch).

      • Instantiate the explainer using the noisy background.

      • Compute SHAP values on the noisy input.

      • Store the resulting attributions.

  3. Aggregation:
    • Combine all attribution maps using the specified aggregation method (mean or median)

      to produce the final, noise-robust attribution result.

References

  • Lundberg & Lee (2017), “A Unified Approach to Interpreting Model Predictions” [SHAP foundation—coalitional feature attribution framework]

  • Smilkov et al. (2017), “SmoothGrad: removing noise by adding noise” [Uses multiple noisy variants of an input to stabilize gradient‑based attributions—similar to aggregation under noise]

  • Jha et al. (2021), “Shaping Noise for Robust Attributions in Neural Stochastic Differential Equations” (AAAI‑22) [Demonstrates that injecting attribution‑driven noise improves attribution robustness and reduces sensitivity across methods including DeepSHAP] :contentReference[oaicite:1]{index=1}

  • Slack et al. (2020), “Fooling SHAP with Stealthily Biased Sampling” [Highlights how SHAP attribution can be highly sensitive to sampling and background distribution perturbation, underscoring need for robust aggregation] :contentReference[oaicite:2]{index=2}

  • Yasodhara et al. (2021), “On the Trustworthiness of Tree Ensemble Explainability Methods” [Evaluates attribution stability under data and model perturbations, emphasizing SHAP’s sensitivity and need for robust strategies] :contentReference[oaicite:3]{index=3}

  • Ben Braiek & Khomh (2024), “Machine Learning Robustness: A Primer” [Surveys robustness concepts including ensembling and noise injection as post‑hoc methods to enhance explainability reliability] :contentReference[oaicite:4]{index=4}

Classes

EnsembleSHAPWithNoise(model[, background, ...])

EnsembleSHAPWithNoise: Robust Ensemble Wrapper for SHAP/Custom Explainers

class shap_enhanced.explainers.ESSHAP.EnsembleSHAPWithNoise(model, background=None, explainer_class=None, explainer_kwargs=None, n_runs=5, noise_level=0.1, noise_target='input', aggregation='mean', device=None)[source]

Bases: BaseExplainer

EnsembleSHAPWithNoise: Robust Ensemble Wrapper for SHAP/Custom Explainers

This class enhances the stability of SHAP (SHapley Additive exPlanations) values by performing multiple runs with Gaussian noise applied to inputs and/or background data, and aggregating the results. It wraps around standard SHAP explainers or custom user-defined ones, making them more robust in the presence of sensitivity or instability.

Note

This class automatically handles input conversion between NumPy and PyTorch, depending on the explainer type.

Parameters:
  • model – The model to explain.

  • background – Background data used for SHAP attribution (can be None if not required).

  • explainer_class – The SHAP or custom explainer class to wrap. Defaults to shap.DeepExplainer.

  • explainer_kwargs – Dictionary of keyword arguments to pass to the explainer during instantiation.

  • n_runs (int) – Number of noisy runs to perform for ensemble aggregation.

  • noise_level (float) – Standard deviation of Gaussian noise to inject.

  • noise_target (str) – Target for noise injection: “input”, “background”, or “both”.

  • aggregation (str) – Aggregation method across runs: “mean” or “median”.

  • device – Device context (e.g., ‘cpu’, ‘cuda’) for tensor-based explainers. Defaults to available GPU or CPU.

property expected_value

Optional property returning the expected model output on the background dataset.

Returns:

Expected value if defined by the subclass, else None.

Return type:

float or None

explain(X, **kwargs)

Alias to shap_values for flexibility and API compatibility.

Parameters:
  • X (Union[np.ndarray, torch.Tensor, list]) – Input samples to explain.

  • kwargs – Additional arguments.

Returns:

SHAP values.

Return type:

Union[np.ndarray, list]

shap_values(X, **kwargs)[source]

Compute noise-robust SHAP values via ensemble averaging over multiple noisy runs.

For each run, Gaussian noise is added to the input and/or background (as configured), then the SHAP explainer is applied to compute attribution values. These are aggregated (mean or median) to produce a stable final output.

\[\begin{split}\text{Attribution}_{final}(i) = \begin{cases} \frac{1}{N} \sum_{j=1}^N \text{SHAP}_j(i) & \text{if aggregation = mean} \\ \text{median}\{\text{SHAP}_1(i), \ldots, \text{SHAP}_N(i)\} & \text{if aggregation = median} \end{cases}\end{split}\]
Parameters:
  • X (np.ndarray or torch.Tensor) – Input sample(s) to explain (NumPy array or torch.Tensor).

  • kwargs – Additional keyword arguments passed to the underlying explainer’s shap_values method.

Returns:

Aggregated attribution values across ensemble runs.

Return type:

np.ndarray