Publications

A collection of my publications with links to preprints, published versions and code repositories.

An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit

Manu Upadhyaya, Daniel Berg Thomsen, Aymeric Dieuleveut, Adrien B. Taylor

arXiv:2605.22929 • 5/21/2026

We introduce Prox-ITEM, an optimal proximal gradient method for minimizing $f+g$ , where $f$ is smooth and strongly convex, and $g$ is convex, proper, and lower semicontinuous. In the smooth case $g=0$ , Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for $f$ and an arbitrary number of proximal-oracle calls for $g$ . We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to the composite setting and achieves the corresponding TMM distance-convergence rate.

Paper

@article{upadhyaya2026optimal,
  title={An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit},
  author={Upadhyaya, Manu and Berg Thomsen, Daniel and Dieuleveut, Aymeric and Taylor, Adrien B.},
  year={2026},
  journal={arXiv:2605.22929}
}

A Tight Theory of Error Feedback Algorithms in Distributed Optimization

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

ICML • 2/26/2026

Communication costs are a major bottleneck in distributed learning and first-order optimization. A common approach to alleviate this issue is to compress the gradient information exchanged between agents. However, such compression typically degrades the convergence guarantees of gradient-based methods. Error feedback mechanisms provide a simple and computationally cheap remedy for this issue, but numerous variants have been proposed, and their relative performance remains poorly understood. This paper provides tight convergence analyses for two of the main error-feedback algorithms from the literature, the classic Error Feedback method (EF) and Error Feedback 21 (EF21), by identifying optimal step-size choices and constructing optimal Lyapunov functions tailored to each method. The results hold independently of the number of agents and recover the known best guarantees possible in the single-agent regime.

Paper

@inproceedings{bergthomsen2026tighttheory,
  title={A Tight Theory of Error Feedback Algorithms in Distributed Optimization},
  author={Berg Thomsen, Daniel and Taylor, Adrien and Dieuleveut, Aymeric},
  year={2026}
}

Tight analyses of first-order methods with error feedback

Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut

NeurIPS • 6/5/2025

Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes---most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$---were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method---with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.

Paper Code

@article{bergthomsen2025errorfeedback,
  title={Tight analyses of first-order methods with error feedback},
  author={Berg Thomsen, Daniel and Taylor, Adrien and Dieuleveut, Aymeric},
  year={2025},
  journal={Advances in Neural Information Processing Systems (NeurIPS)}
}

Complexity of Minimizing Regularized Convex Quadratic Functions

Daniel Berg Thomsen, Nikita Doikov

SIAM Jornal on Optimization • 4/26/2024

In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of $O(N^{−p/(p−2)})$ for the functional residual, where $N$ is the iteration number and $p>2$ is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of $O(N^{−2p/(p−2)})$ is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is $p=3$ , which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of $O(N^{−6})$ is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.

Paper Code

@article{bergthomsen2024quadratic,
  title={Complexity of Minimizing Regularized Convex Quadratic Functions},
  author={Berg Thomsen, Daniel and Doikov, Nikita},
  year={2024},
  journal={arXiv:2404.17543}
}