Treffer: Supervised Learning Approaches for Robust Predictive Modelling in Data Science
Weitere Informationen
Supervised learning remains the dominant paradigm for predictive modeling in data science, yet real-world deployments frequently fail due to fragile data pipelines, distributional shift, and optimistic evaluation. This article surveys supervised learning approaches with a focus on robustness—defined as the stability of predictive performance under perturbations to data, environment, or assumptions. We organize the model space into seven families: linear and generalized linear models; tree-based models; kernel methods; instance-based methods; probabilistic generative models; neural networks; and ensemble learning. For each family we discuss inductive biases, optimization, computational complexity, calibration, and typical failure modes. We then synthesize a method-agnostic workflow spanning dataset auditing, leakage prevention, feature engineering, resampling, hyperparameter tuning, model selection, and post-hoc reliability analysis (calibration, uncertainty, and drift monitoring). Robustness strategies—regularization, data augmentation, adversarial training, cost-sensitive learning, resampling for class imbalance, monotonic constraints, conformal prediction, and causal sensitivity analysis—are reviewed with practical guidance. Case vignettes from healthcare, finance, and operations illustrate trade-offs between accuracy, interpretability, and reliability. The paper concludes with open research directions, including integrating causal structure into supervised objectives, leveraging self-supervised pretraining for tabular data, distributionally robust optimization, and aligning evaluation with societal impact.