Publications
A collection of my publications with links to preprints, published versions and code repositories.
Tight analyses of first-order methods with error feedback
Daniel Berg Thomsen, Adrien Taylor, Aymeric Dieuleveut
arXiv ⢠6/5/2025
Communication between agents often constitutes a major computational bottleneck in distributed learning. One of the most common mitigation strategies is to compress the information exchanged, thereby reducing communication overhead. To counteract the degradation in convergence associated with compressed communication, error feedback schemes---most notably $\mathrm{EF}$ and $\mathrm{EF}^{21}$---were introduced. In this work, we provide a tight analysis of both of these methods. Specifically, we find the Lyapunov function that yields the best possible convergence rate for each method---with matching lower bounds. This principled approach yields sharp performance guarantees and enables a rigorous, apples-to-apples comparison between $\mathrm{EF}$, $\mathrm{EF}^{21}$, and compressed gradient descent. Our analysis is carried out in a simplified yet representative setting, which allows for clean theoretical insights and fair comparison of the underlying mechanisms.
Complexity of Minimizing Regularized Convex Quadratic Functions
Daniel Berg Thomsen, Nikita Doikov
arXiv ⢠4/26/2024
In this work, we study the iteration complexity of gradient methods for minimizing convex quadratic functions regularized by powers of Euclidean norms. We show that, due to the uniform convexity of the objective, gradient methods have improved convergence rates. Thus, for the basic gradient descent with a novel step size, we prove a convergence rate of for the functional residual, where is the iteration number and is the power of the regularization term. We also show that this rate is tight by establishing a corresponding lower bound for one-step first-order methods. Then, for the general class of all multi-step methods, we establish that the rate of is optimal, providing a sharp analysis of the minimization of uniformly convex regularized quadratic functions. This rate is achieved by the fast gradient method. A special case of our problem class is , which is the minimization of cubically regularized convex quadratic functions. It naturally appears as a subproblem at each iteration of the cubic Newton method. Therefore, our theory shows that the rate of is optimal in this case. We also establish new lower bounds on minimizing the gradient norm within our framework.