shap_enhanced.tools.evaluation¶

Ground-Truth Shapley Value Estimation via Monte Carlo¶

Overview¶

This module provides brute-force Monte Carlo estimators for ground-truth Shapley values in both sequential and tabular settings. These estimators are model-agnostic and compute marginal contributions of each feature (or feature–time pair) by averaging the effect of masking/unmasking them across many random feature subsets.

These implementations serve as a reference baseline for benchmarking approximate SHAP methods, especially in experimental or synthetic settings where accuracy is paramount.

Key Functions¶

compute_shapley_gt_seq: Computes per-feature-per-timestep Shapley values for a sequence input via masked sampling.
compute_shapley_gt_tabular: Computes standard tabular Shapley values for a single input vector using random coalitions.

Methodology¶

For a given input and feature subset mask ( S ), the Shapley value of feature ( i ) is computed as:

\[\phi_i = \mathbb{E}_{S \subseteq N \setminus \{i\}} [f(x_{S \cup \{i\}}) - f(x_S)]\]

This expectation is approximated by sampling random subsets ( S ) and measuring the marginal contribution of feature ( i ) over those subsets.

Use Case¶

These estimators are useful for: - Generating ground-truth SHAP values for synthetic benchmarking. - Evaluating surrogate or approximate SHAP methods. - Debugging the sensitivity of models to masking-based perturbations.

Example

shap_seq = compute_shapley_gt_seq(model, x_seq, baseline_seq, nsamples=500)
shap_tab = compute_shapley_gt_tabular(model, x_tab, baseline_tab, nsamples=1000)

Functions

`compute_shapley_gt_seq`(model, x, baseline[, ...])	Estimate ground-truth Shapley values for a sequential input using Monte Carlo sampling.
`compute_shapley_gt_tabular`(model, x, baseline)	Estimate ground-truth Shapley values for a tabular input using Monte Carlo sampling.

shap_enhanced.tools.evaluation.compute_shapley_gt_seq(model, x, baseline, nsamples=200, device='cpu')[source]¶

Estimate ground-truth Shapley values for a sequential input using Monte Carlo sampling.

For each feature–timestep pair ((t, f)), this method approximates the marginal contribution by computing the model output difference between including and excluding ((t, f)) from randomly sampled coalitions.

\[\phi_{t,f} = \mathbb{E}_{S \subseteq N \setminus \{(t,f)\}} \left[ f(x_{S \cup \{(t,f)\}}) - f(x_S) \right]\]

Parameters:

model – A trained PyTorch model supporting (1, T, F)-shaped input.
x (np.ndarray) – Input instance of shape (T, F).
baseline (np.ndarray) – Reference baseline of same shape (T, F).
nsamples (int) – Number of random coalitions sampled.
device (str) – Device on which to run model evaluation (‘cpu’ or ‘cuda’).

Returns:

Estimated Shapley values for each (t, f) position.

Return type:

np.ndarray of shape (T, F)

shap_enhanced.tools.evaluation.compute_shapley_gt_tabular(model, x, baseline, nsamples=1000, device='cpu')[source]¶

Estimate ground-truth Shapley values for a tabular input using Monte Carlo sampling.

Each feature’s contribution is computed as the expected marginal impact on model output when added to a random subset of other features.

\[\phi_i = \mathbb{E}_{S \subseteq N \setminus \{i\}} \left[ f(x_{S \cup \{i\}}) - f(x_S) \right]\]

Parameters:

model – A trained PyTorch model that accepts (1, F)-shaped inputs.
x (np.ndarray) – Input feature vector of shape (F,).
baseline (np.ndarray) – Baseline vector of same shape (F,).
nsamples (int) – Number of Monte Carlo samples.
device (str) – Device on which to run model evaluation (‘cpu’ or ‘cuda’).

Returns:

Estimated Shapley values for each feature.

Return type:

np.ndarray of shape (F,)