Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models

In this article, you will learn three expert-level feature engineering strategies — counterfactual features, domain-constrained representations, and causal-invariant features — for building robust and explainable models in high-stakes settings.

Topics we will cover include:

How to generate counterfactual sensitivity features for decision-boundary awareness.
How to train a constrained autoencoder that encodes a monotonic domain rule into its representation.
How to discover causal-invariant features that remain stable across environments.

Without further delay, let’s begin.

Expert-Level Feature Engineering Advanced Techniques High-Stakes Models

Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models
Image by Editor

Introduction

Building machine learning models in high-stakes contexts like finance, healthcare, and critical infrastructure often demands robustness, explainability, and other domain-specific constraints. In these situations, it can be worth going beyond classic feature engineering techniques and adopting advanced, expert-level strategies tailored to such settings.

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

This article presents three such techniques, explains how they work, and highlights their practical impact.

Counterfactual Feature Generation

Counterfactual feature generation comprises techniques that quantify how sensitive predictions are to decision boundaries by constructing hypothetical data points from minimal changes to original features. The idea is simple: ask “how much must an original feature value change for the model’s prediction to cross a critical threshold?” These derived features improve interpretability — e.g. “how close is a patient to a diagnosis?” or “what is the minimum income increase required for loan approval?”— and they encode sensitivity directly in feature space, which can improve robustness.

The Python example below creates a counterfactual sensitivity feature, cf_delta_feat0, measuring how much input feature feat_0 must change (holding all others fixed) to cross the classifier’s decision boundary. We’ll use NumPy, pandas, and scikit-learn.

import numpy as np import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.preprocessing import StandardScaler # Toy data and baseline linear classifier X, y = make_classification(n_samples=500, n_features=5, random_state=42) df = pd.DataFrame(X, columns=[f”feat_{i}” for i in range(X.shape[1])]) df[‘target’] = y scaler = StandardScaler() X_scaled = scaler.fit_transform(df.drop(columns=”target”)) clf = LogisticRegression().fit(X_scaled, y) # Decision boundary parameters weights = clf.coef_[0] bias = clf.intercept_[0] def counterfactual_delta_feat0(x, eps=1e-9): “”” Minimal change to feature 0, holding other features fixed, required to move the linear logit score to the decision boundary (0). For a linear model: delta = -score / w0 “”” score = np.dot(weights, x) + bias w0 = weights[0] return -score / (w0 + eps) df[‘cf_delta_feat0’] = [counterfactual_delta_feat0(x) for x in X_scaled] df.head()

import numpy as np

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import make_classification

from sklearn.preprocessing import StandardScaler

# Toy data and baseline linear classifier

X, y = make_classification(n_samples=500, n_features=5, random_state=42)

df = pd.DataFrame(X, columns=[f“feat_{i}” for i in range(X.shape[1])])

df[‘target’] = y

scaler = StandardScaler()

X_scaled = scaler.fit_transform(df.drop(columns=“target”))

clf = LogisticRegression().fit(X_scaled, y)

# Decision boundary parameters

weights = clf.coef_[0]

bias = clf.intercept_[0]

def counterfactual_delta_feat0(x, eps=1e–9):

“”“

Minimal change to feature 0, holding other features fixed,

required to move the linear logit score to the decision boundary (0).

For a linear model: delta = -score / w0

““”

score = np.dot(weights, x) + bias

w0 = weights[0]

return –score / (w0 + eps)

df[‘cf_delta_feat0’] = [counterfactual_delta_feat0(x) for x in X_scaled]

df.head()

Domain-Constrained Representation Learning (Constrained Autoencoders)

Autoencoders are widely used for unsupervised representation learning. We can adapt them for domain-constrained representation learning: learn a compressed representation (latent features) while enforcing explicit domain rules (e.g., safety margins or monotonicity laws). Unlike unconstrained latent factors, domain-constrained representations are trained to respect physical, ethical, or regulatory constraints.

Below, we train an autoencoder that learns three latent features and reconstructs inputs while softly enforcing a monotonic rule: higher values of feat_0 should not decrease the likelihood of the positive label. We add a simple supervised predictor head and penalize violations via a finite-difference monotonicity loss. Implementation uses PyTorch.

import torch import torch.nn as nn import torch.optim as optim from sklearn.model_selection import train_test_split # Supervised split using the earlier DataFrame `df` X_train, X_val, y_train, y_val = train_test_split( df.drop(columns=”target”).values, df[‘target’].values, test_size=0.2, random_state=42 ) X_train = torch.tensor(X_train, dtype=torch.float32) y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1) torch.manual_seed(42) class ConstrainedAutoencoder(nn.Module): def __init__(self, input_dim, latent_dim=3): super().__init__() self.encoder = nn.Sequential( nn.Linear(input_dim, 8), nn.ReLU(), nn.Linear(8, latent_dim) ) self.decoder = nn.Sequential( nn.Linear(latent_dim, 8), nn.ReLU(), nn.Linear(8, input_dim) ) # Small predictor head on top of the latent code (logit output) self.predictor = nn.Linear(latent_dim, 1) def forward(self, x): z = self.encoder(x) recon = self.decoder(z) logit = self.predictor(z) return recon, z, logit model = ConstrainedAutoencoder(input_dim=X_train.shape[1]) optimizer = optim.Adam(model.parameters(), lr=1e-3) recon_loss_fn = nn.MSELoss() pred_loss_fn = nn.BCEWithLogitsLoss() epsilon = 1e-2 # finite-difference step for monotonicity on feat_0 for epoch in range(50): model.train() optimizer.zero_grad() recon, z, logit = model(X_train) # Reconstruction + supervised prediction loss loss_recon = recon_loss_fn(recon, X_train) loss_pred = pred_loss_fn(logit, y_train) # Monotonicity penalty: y_logit(x + e*e0) – y_logit(x) should be >= 0 X_plus = X_train.clone() X_plus[:, 0] = X_plus[:, 0] + epsilon _, _, logit_plus = model(X_plus) mono_violation = torch.relu(logit – logit_plus) # negative slope if > 0 loss_mono = mono_violation.mean() loss = loss_recon + 0.5 * loss_pred + 0.1 * loss_mono loss.backward() optimizer.step() # Latent features now reflect the monotonic constraint with torch.no_grad(): _, latent_feats, _ = model(X_train) latent_feats[:5]

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.model_selection import train_test_split

# Supervised split using the earlier DataFrame `df`

X_train, X_val, y_train, y_val = train_test_split(

df.drop(columns=“target”).values, df[‘target’].values, test_size=0.2, random_state=42

)

X_train = torch.tensor(X_train, dtype=torch.float32)

y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)

torch.manual_seed(42)

class ConstrainedAutoencoder(nn.Module):

def __init__(self, input_dim, latent_dim=3):

super().__init__()

self.encoder = nn.Sequential(

nn.Linear(input_dim, 8), nn.ReLU(),

nn.Linear(8, latent_dim)

)

self.decoder = nn.Sequential(

nn.Linear(latent_dim, 8), nn.ReLU(),

nn.Linear(8, input_dim)

)

# Small predictor head on top of the latent code (logit output)

self.predictor = nn.Linear(latent_dim, 1)

def forward(self, x):

z = self.encoder(x)

recon = self.decoder(z)

logit = self.predictor(z)

return recon, z, logit

model = ConstrainedAutoencoder(input_dim=X_train.shape[1])

optimizer = optim.Adam(model.parameters(), lr=1e–3)

recon_loss_fn = nn.MSELoss()

pred_loss_fn = nn.BCEWithLogitsLoss()

epsilon = 1e–2 # finite-difference step for monotonicity on feat_0

for epoch in range(50):

model.train()

optimizer.zero_grad()

recon, z, logit = model(X_train)

# Reconstruction + supervised prediction loss

loss_recon = recon_loss_fn(recon, X_train)

loss_pred = pred_loss_fn(logit, y_train)

# Monotonicity penalty: y_logit(x + e*e0) – y_logit(x) should be >= 0

X_plus = X_train.clone()

X_plus[:, 0] = X_plus[:, 0] + epsilon

_, _, logit_plus = model(X_plus)

mono_violation = torch.relu(logit – logit_plus) # negative slope if > 0

loss_mono = mono_violation.mean()

loss = loss_recon + 0.5 * loss_pred + 0.1 * loss_mono

loss.backward()

optimizer.step()

# Latent features now reflect the monotonic constraint

with torch.no_grad():

_, latent_feats, _ = model(X_train)

latent_feats[:5]

Causal-Invariant Features

Causal-invariant features are variables whose relationship to the outcome remains stable across different contexts or environments. By targeting causal signals rather than spurious correlations, models generalize better to out-of-distribution settings. One practical route is to penalize changes in risk gradients across environments so the model cannot lean on environment-specific shortcuts.

The example below simulates two environments. Only the first feature is truly causal; the second becomes spuriously correlated with the label in environment 1. We train a shared linear model across environments while penalizing gradient mismatch, encouraging reliance on invariant (causal) structure.

import numpy as np import torch import torch.nn as nn import torch.optim as optim torch.manual_seed(42) np.random.seed(42) # Two environments with a spurious signal in env1 n = 300 X_env1 = np.random.randn(n, 2) X_env2 = np.random.randn(n, 2) # True causal relation: y depends only on X[:,0] y_env1 = (X_env1[:, 0] + 0.1*np.random.randn(n) > 0).astype(int) y_env2 = (X_env2[:, 0] + 0.1*np.random.randn(n) > 0).astype(int) # Inject spurious correlation in env1 via feature 1 X_env1[:, 1] = y_env1 + 0.1*np.random.randn(n) X1, y1 = torch.tensor(X_env1, dtype=torch.float32), torch.tensor(y_env1, dtype=torch.float32) X2, y2 = torch.tensor(X_env2, dtype=torch.float32), torch.tensor(y_env2, dtype=torch.float32) class LinearModel(nn.Module): def __init__(self): super().__init__() self.w = nn.Parameter(torch.randn(2, 1)) def forward(self, x): return x @ self.w model = LinearModel() optimizer = optim.Adam(model.parameters(), lr=1e-2) def env_risk(x, y, w): logits = x @ w return torch.mean((logits.squeeze() – y)**2) for epoch in range(2000): optimizer.zero_grad() risk1 = env_risk(X1, y1, model.w) risk2 = env_risk(X2, y2, model.w) # Invariance penalty: align risk gradients across environments grad1 = torch.autograd.grad(risk1, model.w, create_graph=True)[0] grad2 = torch.autograd.grad(risk2, model.w, create_graph=True)[0] penalty = torch.sum((grad1 – grad2)**2) loss = (risk1 + risk2) + 100.0 * penalty loss.backward() optimizer.step() print(“Learned weights:”, model.w.data.numpy().ravel())

import numpy as np

import torch

import torch.nn as nn

import torch.optim as optim

torch.manual_seed(42)

np.random.seed(42)

# Two environments with a spurious signal in env1

n = 300

X_env1 = np.random.randn(n, 2)

X_env2 = np.random.randn(n, 2)

# True causal relation: y depends only on X[:,0]

y_env1 = (X_env1[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)

y_env2 = (X_env2[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)

# Inject spurious correlation in env1 via feature 1

X_env1[:, 1] = y_env1 + 0.1*np.random.randn(n)

X1, y1 = torch.tensor(X_env1, dtype=torch.float32), torch.tensor(y_env1, dtype=torch.float32)

X2, y2 = torch.tensor(X_env2, dtype=torch.float32), torch.tensor(y_env2, dtype=torch.float32)

class LinearModel(nn.Module):

def __init__(self):

super().__init__()

self.w = nn.Parameter(torch.randn(2, 1))

def forward(self, x):

return x @ self.w

model = LinearModel()

optimizer = optim.Adam(model.parameters(), lr=1e–2)

def env_risk(x, y, w):

logits = x @ w

return torch.mean((logits.squeeze() – y)**2)

for epoch in range(2000):

optimizer.zero_grad()

risk1 = env_risk(X1, y1, model.w)

risk2 = env_risk(X2, y2, model.w)

# Invariance penalty: align risk gradients across environments

grad1 = torch.autograd.grad(risk1, model.w, create_graph=True)[0]

grad2 = torch.autograd.grad(risk2, model.w, create_graph=True)[0]

penalty = torch.sum((grad1 – grad2)**2)

loss = (risk1 + risk2) + 100.0 * penalty

loss.backward()

optimizer.step()

print(“Learned weights:”, model.w.data.numpy().ravel())

Closing Remarks

We covered three advanced feature engineering techniques for high-stakes machine learning: counterfactual sensitivity features for decision-boundary awareness, domain-constrained autoencoders that encode expert rules, and causal-invariant features that promote stable generalization. Used judiciously, these tools can make models more robust, interpretable, and reliable where it matters most.

Source_link