LASCADO - Research

Lascado

Research

SMOP: Stochastic trust region method for multi-objective problems

Authors: Nataša Krejić, Nataša Krklec Jerinkić, Luka Rutešić

The problem considered is a multi-objective optimization problem, in which the goal is to find an optimal value of a vector function representing various criteria. The aim of this work is to develop an algorithm which utilizes the trust region framework with probabilistic model functions, able to cope with noisy problems, using inaccurate functions and gradients. We prove the almost sure convergence of the proposed algorithm to a Pareto critical point if the model functions are good approximations in probabilistic sense. Numerical results demonstrate effectiveness of the probabilistic trust region by comparing it to competitive stochastic multi-objective solvers. The application in supervised machine learning is showcased by training non discriminatory Logistic Regression models on different size data groups. Additionally, we use several test examples with irregularly shaped fronts to exhibit the efficiency of the algorithm.

Optimization and Control (math.OC)

Hybrid Hu-Storey type methods for large-scale nonlinear monotone systems and signal recovery

Authors: Zoltan Papp, Sanja Rapajić, Abdulkarim Hassan Ibrahim, Supak Phiangsungnoen

We propose two hybrid methods for solving large-scale monotone systems, which are based on derivative-free conjugate gradient approach and hyperplane projection technique. The conjugate gradient approach is efficient for large-scale systems due to low memory, while projection strategy is suitable for monotone equations because it enables simply globalization. The derivative-free function-value-based line search is combined with Hu-Storey type search directions and projection procedure, in order to construct globally convergent methods. Furthermore, the proposed methods are applied into solving a number of large-scale monotone nonlinear systems and reconstruction of sparse signals. Numerical experiments indicate the robustness of the proposed methods.

Optimization and Control (math.OC)

Relaxed-inertial derivative-free algorithm for systems of nonlinear pseudo-monotone equations

Authors: Abdulkarim Hassan Ibrahim, Sanja Rapajić, Ahmad Kamandi, Poom Kumam, Zoltan Papp

Solving systems of nonlinear equations has evolved into an active research field, with numerous iterative methods being proposed. Notably, iterative methods characterized by fast convergence remain of interest. In this paper, based on the modified line search scheme by Ou and Li, we introduce a derivative-free algorithm with a relaxed-inertial technique for approximating solutions of nonlinear systems involving pseudo-monotone mappings in Euclidean space. The global convergence of the proposed algorithm is established without Lipschitz continuity of the underlying mapping. Moreover, our approach allows flexibility in selecting the inertial extrapolation step length within the interval [0, 1]. To show the efficiency of the proposed method, we embed a derivative-free search direction into the scheme. Numerical experiments are given to illustrate the efficiency of the proposed algorithm for large-scale systems and sparse signal reconstruction.

Optimization and Control (math.OC)

Distributed Gradient Clustering: Convergence and the Effect of Initialization

Authors: Aleksandar Armacki, Himkant Sharma, Dragana Bajović, Dušan Jakovetić, Mrityunjoy Chakraborty, Soummya Kar

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms, that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering. Next, inspired by the K-means++ initialization, we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.

Optimization and Control (math.OC)

Tackling heavy-tailed noise in distributed estimation: Asymptotic performance and tradeoffs

Authors: Dragana Bajović, Dušan Jakovetić, Soummya Kar, Manojlo Vuković

We present an algorithm for distributed estimation of an unknown vector parameter in the presence of heavy-tailed observation and communication noises. Heavytailed noises frequently appear, e.g., in densely deployed Internet of Things (IoT) or wireless sensor network systems. The presented algorithm falls within the class of consensus+innovation estimators and combats the effect of the heavy-tailed noises by adding general nonlinearities in the consensus and innovations update parts. We present results on almost sure convergence and asymptotic normality of the estimator. In addition, we provide novel analytical studies that reveal interesting tradeoffs between the system noises and the underlying network topology.

Optimization and Control (math.OC)

Distributed Center-based Clustering: A Unified Framework

Authors: Aleksandar Armacki, Dragana Bajović, Dušan Jakovetić, Soummya Kar

We develop a family of distributed center-based clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. Experiments validate our theory, demonstrating noise symmetry in real-life settings and showing that clipping is not always the optimal nonlinearity, further underlining the value of a general framework.

Optimization and Control (math.OC)

High-probability Convergence Bounds for Online Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Authors: Aleksandar Armacki, Shuhua Yu, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results.

Optimization and Control (math.OC)

Distributed Inexact Newton Method with Adaptive Step Sizes

Authors: Dušan Jakovetić , Nataša Krejić , Greta Malaspina

We consider two formulations for distributed optimization wherein N nodes in a generic connected network solve a problem of common interest: distributed personalized optimization and consensus optimization. A new method termed DINAS (Distributed Inexact Newton method with Adaptive step size) is proposed. DINAS employs large adaptively computed step sizes, requires a reduced global parameters knowledge with respect to existing alternatives, and can operate without any local Hessian inverse calculations nor Hessian communications. When solving personalized distributed learning formulations, DINAS achieves quadratic convergence with respect to computational cost and linear convergence with respect to communication cost, the latter rate being independent of the local functions condition numbers or of the network topology. When solving consensus optimization problems, DINAS is shown to converge to the global solution. Extensive numerical experiments demonstrate significant improvements of DINAS over existing alternatives. As a result of independent interest, we provide for the first time convergence analysis of the Newton method with the adaptive Polyak’s step size when the Newton direction is computed inexactly in centralized environment.

Optimization and Control (math.OC)

Parallel Inexact Levenberg-Marquardt Method for Nearly-Separable Nonlinear Least Squares

Authors: Lidija Fodor, Dušan Jakovetić , Nataša Krejić , Greta Malaspina

Motivated by localization problems such as cadastral maps refinements, we consider a generic Nonlinear Least Squares (NLS) problem of minimizing an aggregate squared fit across all nonlinear equations (measurements) with respect to the set of unknowns, e.g., coordinates of the unknown points’ locations. In a number of scenarios, NLS problems exhibit a nearly-separable structure: the set of measurements can be partitioned into disjoint groups (blocks), such that the unknowns that correspond to different blocks are only loosely coupled. We propose an efficient parallel method, termed Parallel Inexact Levenberg Marquardt (PILM), to solve such generic large scale NLS problems. PILM builds upon the classical Levenberg-Marquard (LM) method, with a main novelty in that the nearly-block separable structure is leveraged in order to obtain a scalable parallel method. Therein, the problem-wide system of linear equations that needs to be solved at every LM iteration is tackled iteratively. At each (inner) iteration, the block-wise systems of linear equations are solved in parallel, while the problem-wide system is then handled via sparse, inexpensive interblock communication. We establish strong convergence guarantees of PILM that are analogous to those of the classical LM; provide PILM implementation in a master-worker parallel compute environment; and demonstrate its efficiency on huge scale cadastral map refinement problems.

Optimization and Control (math.OC)

Variable metric proximal stochastic gradient methods with additional sampling

Authors: Nataša Krklec Jerinkić , Federica Porta , Valeria Ruggiero and Ilaria Trombini

Regularized empirical risk minimization problems arise in a variety of applications, including machine learning, signal processing, and image processing. Proximal stochastic gradient algorithms are a standard approach to solve these problems due to their low computational cost per iteration and a relatively simple implementation. This paper introduces a class of proximal stochastic gradient methods built on three key elements: a variable metric underlying the iterations, a stochastic line search governing the decrease properties and an incremental minibatch size technique based on additional sampling. Convergence results for the proposed algorithms are proved under different hypotheses on the function to minimize. No assumption is required regarding the Lipschitz continuity of the gradient of the differentiable part of the objective function. Possible strategies to automatically select the parameters of the suggested scheme are discussed. Numerical experiments on binary classification problems show the effectiveness of the suggested approach compared to other state-of-the-art proximal stochastic gradient methods.

Optimization and Control (math.OC)

Spectral Stochastic Gradient Method with Additional Sampling for Finite and Infinite Sums

Authors: Nataša Krklec Jerinkić , Valeria Ruggiero and Ilaria Trombini

In this paper, we propose a new stochastic gradient method for numerical minimization of finite sums. We also propose a modified version of this method applicable on more general problems referred to as infinite sum problems, where the objective function is in the form of mathematical expectation. The method is based on a strategy to exploit the effectiveness of the well-known Barzilai-Borwein (BB) rules or variants of these (BB-like) rules for updating the step length in the standard gradient method. The proposed method adapts the aforementioned strategy into the stochastic framework by exploiting the same Sample Average Approximations (SAA) estimator of the objective function for several iterations. Furthermore, the sample size is controlled by an additional sampling which also plays a role in accepting the proposed iterate point. Moreover, the number of “inner” iterations with the same sample is also controlled by an adaptive rule which prevents the method from getting stuck with the same estimator for too long. Convergence results are discussed for the finite and infinite sum version, for general and strongly convex objective functions. For the strongly convex case, we provide convergence rate and worst-case complexity analysis. Numerical experiments on well-known datasets for binary classifications show very promising performance of the method, without the need to provide special values for hyperparameters on which the method depends.

Optimization and Control (math.OC)

SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums

Authors: Stefania Bellavia, Nataša Krejić, Nataša Krklec Jerinkić, Marcos Raydan

The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum, we keep the same subsample for several iterations before subsampling again. We analyze the required algorithmic features and the conditions for almost sure convergence, and present initial numerical results that show the advantages of the proposed method.

Optimization and Control (math.OC)

A non-monotone trust-region method with noisy oracles and additional sampling

Authors: Nataša Krejić, Nataša Krklec Jerinkić, Angeles Martinez, Mahsa Yousefi

In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions.

Optimization and Control (math.OC)