*Result*: Linear convergence of proximal gradient method for linear sparse SVM.

Title:
Linear convergence of proximal gradient method for linear sparse SVM.
Authors:
Jiao X; Academy of Statistics and Interdisciplinary Sciences KLATASDS-MOE, East China Normal University, Shanghai, China; Department of Decision Analytics and Operations, City University of Hong Kong, Hong Kong, China., Lian H; CityUHK Shenzhen Research Institute, Shenzhen, China; Department of Mathematics, City University of Hong Kong, Hong Kong, China. Electronic address: henglian@cityu.edu.hk., Liu J; School of Mathematics and Physics, University of Science and Technology, Beijing, China., Zhang Y; Academy of Statistics and Interdisciplinary Sciences KLATASDS-MOE, East China Normal University, Shanghai, China.
Source:
Neural networks : the official journal of the International Neural Network Society [Neural Netw] 2026 Feb; Vol. 194, pp. 108162. Date of Electronic Publication: 2025 Oct 03.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Pergamon Press Country of Publication: United States NLM ID: 8805018 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1879-2782 (Electronic) Linking ISSN: 08936080 NLM ISO Abbreviation: Neural Netw Subsets: MEDLINE
Imprint Name(s):
Original Publication: New York : Pergamon Press, [c1988-
Contributed Indexing:
Keywords: Geometric/Linear convergence; Proximal gradient descent; Sparsity; Support vector machines
Entry Date(s):
Date Created: 20251010 Date Completed: 20251216 Latest Revision: 20251216
Update Code:
20260130
DOI:
10.1016/j.neunet.2025.108162
PMID:
41072289
Database:
MEDLINE

*Further Information*

*Despite the hinge loss function being non-strongly-convex and non-strongly smooth, we establish the linear rate of convergence for sparse linear support vector machines (SVM) up to its statistical accuracy. The algorithm we use is the proximal gradient method for composite functions, applied to a sequence of regularization parameters to compute the approximate solution path on a grid. Unlike works on loss functions that are strongly convex and strongly smooth, here we do not have linear convergence to the exact solution, but we can demonstrate linear convergence to the population truth up to the statistical error (in particular, we simultaneously consider numerical convergence and statistical convergence). For any regularization parameter in the chosen decreasing sequence, we show that the estimator is in a small neighborhood of the exact solution after O(logs<sup>*</sup>) iterations, where s<sup>*</sup> is the sparsity of the true coefficient in the model, and a total number of O(logn) stages (i.e., using a sequence of regularization parameters of length O(logn)) are required to achieve the near-oracle statistical rate, with n the sample size.
(Copyright © 2025 Elsevier Ltd. All rights reserved.)*

*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*