Publications
A collection of my publications with links to preprints, published versions and code repositories.
A Tight Theory of Error Feedback Algorithms in Distributed Optimization
Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut
ICML ⢠2/26/2026
Communication costs are a major bottleneck in distributed learning and first-order optimization. A common approach to alleviate this issue is to compress the gradient information exchanged between agents. However, such compression typically degrades the convergence guarantees of gradient-based methods. Error feedback mechanisms provide a simple and computationally cheap remedy for this issue, but numerous variants have been proposed, and their relative performance remains poorly understood. This paper provides tight convergence analyses for two of the main error-feedback algorithms from the literature, namely error feedback and , by identifying optimal step-size choices, and by constructing a set of optimal Lyapunov functions tailored to each method. The results hold independently of the number of agents and recover the known best guarantees possible in the single-agent regime.
@inproceedings{bergthomsen2026tighttheory,
title={A Tight Theory of Error Feedback Algorithms in Distributed Optimization},
author={Berg Thomsen, Daniel and Taylor, Adrien and Dieuleveut, Aymeric},
year={2026}
}
Tight analyses of first-order methods with error feedback
Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut
NeurIPS ⢠6/5/2025
Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes---most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$---were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method---with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in the simplified single-agent setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.
Complexity of Minimizing Regularized Convex Quadratic Functions
Daniel Berg Thomsen, Nikita Doikov
arXiv ⢠4/26/2024
In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of for the functional residual, where is the iteration number and is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is , which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.