Contrastive Learning Approaches for Deep Learning
Contrastive learning has emerged as a powerful approach for self-supervised representation learning, achieving classification accuracies within 1-2% of supervised benchmarks on ImageNet. These methods learn by comparing positive pairs of augmented samples against negative examples, creating embedding spaces where semantically similar items cluster together while dissimilar ones are pushed apart.
The fundamental challenge lies in designing contrastive objectives and sampling strategies that capture meaningful invariances while avoiding representational collapse.
This page brings together solutions from recent research—including momentum encoders, memory banks, hard negative mining techniques, and multi-view consistency approaches. These and other methods demonstrate how contrastive learning can be effectively implemented across computer vision, natural language processing, and multi-modal applications.
1. A Generalization Result for Convergence in Learning-to-Optimize
Michael Sucker, Peter Ochs, 2024
Convergence in learning-to-optimize is hardly studied, because conventional convergence guarantees in optimization are based on geometric arguments, which cannot be applied easily to learned algorithms. Thus, we develop a probabilistic framework that resembles deterministic optimization and allows for transferring geometric arguments into learning-to-optimize. Our main theorem is a generalization result for parametric classes of potentially non-smooth, non-convex loss functions and establishes the convergence of learned optimization algorithms to stationary points with high probability. This can be seen as a statistical counterpart to the use of geometric safeguards to ensure convergence. To the best of our knowledge, we are the first to prove convergence of optimization algorithms in such a probabilistic framework.
2. Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation
Michael Sucker, Jalal Fadili, Peter Ochs, 2024
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones derived from a (deterministic) worst-case analysis. The results rely on PAC-Bayesian bounds for general, possibly unbounded loss-functions based on exponential families. Then, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum. Furthermore, we provide a concrete algorithmic realization of the framework and new methodologies for learning-to-optimize, and we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude.
3. Learning via Surrogate PAC-Bayes
Antoine Picard-Weibel, Roman Moscoviz, Benjamin Guedj, 2024
PAC-Bayes learning is a comprehensive setting for (i) studying the generalisation ability of learning algorithms and (ii) deriving new learning algorithms by optimising a generalisation bound. However, optimising generalisation bounds might not always be viable for tractable or computational reasons, or both. For example, iteratively querying the empirical risk might prove computationally expensive. In response, we introduce a novel principled strategy for building an iterative learning algorithm via the optimisation of a sequence of surrogate training objectives, inherited from PAC-Bayes generalisation bounds. The key argument is to replace the empirical risk (seen as a function of hypotheses) in the generalisation bound by its projection onto a constructible low dimensional functional space: these projections can be queried much more efficiently than the initial risk. On top of providing that generic recipe for learning via surrogate PAC-Bayes bounds, we (i) contribute theoretical results establishing that iteratively optimising our surrogates implies the optimisation of the origin... Read More
4. Learning to Optimize Contextually Constrained Problems for Real-Time Decision Generation
Aaron Babier, Timothy C. Y. Chan, Adam Diamant - Institute for Operations Research and the Management Sciences (INFORMS), 2024
The topic of learning to solve optimization problems has received interest from both the operations research and machine learning communities. In this paper, we combine ideas from both fields to address the problem of learning to generate decisions to instances of optimization problems with potentially nonlinear or nonconvex constraints where the feasible set varies with contextual features. We propose a novel framework for training a generative model to produce provably optimal decisions by combining interior point methods and adversarial learning, which we further embed within an iterative data generation algorithm. To this end, we first train a classifier to learn feasibility and then train the generative model to produce optimal decisions to an optimization problem using the classifier as a regularizer. We prove that decisions generated by our model satisfy in-sample and out-of-sample optimality guarantees. Furthermore, the learning models are embedded in an active learning loop in which synthetic instances are iteratively added to the training data; this allows us to progressive... Read More
5. Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato, 2024
We introduce a data-driven approach to analyze the performance of continuous optimization algorithms using generalization guarantees from statistical learning theory. We study classical and learned optimizers to solve families of parametric optimization problems. We build generalization guarantees for classical optimizers, using a sample convergence bound, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. To train learned optimizers, we use a gradient-based algorithm to directly minimize the PAC-Bayes upper bound. Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds are much tighter than those that worst-case guarantees provide. For learned optimizers, our bounds outperform the empirical outcomes observed in their non-learned counterparts.
6. Learning Stress with Feet and Grids
Seung Suk Lee, Alessa Farinella, C.V. Kropas Hughes - Linguistic Society of America, 2023
This paper investigates quantity-insensitive stress learning using the MaxEnt learner of Pater and Prickett (2022) and compares the performance of the learner equipped with three different constraint sets: a foot-based constraint set and two grid-based constraint sets, one drawn directly from Gordon (2002), and one that changes the formulation of the main stress constraint to match the foot-based learner. The learner equipped with the foot-based constraint set succeeds at learning all the languages from the Gordon (2002) typology that it can represent; the structural ambiguity of the foot-based representations is not a problem in this regard. The foot-based learner also learns the languages as quickly in terms of number of epochs as the faster of the grid-based learners, which is the one with the revised main stress constraint. We conclude that the foot-based learner and the grid-based learner fare similarly well in this initial comparison on a typologically grounded set of learning problems.
7. Which Samples Should Be Learned First: Easy or Hard?
Xiaoling Zhou, Ou Wu - Institute of Electrical and Electronics Engineers (IEEE), 2023
Treating each training sample unequally is prevalent in many machine-learning tasks. Numerous weighting schemes have been proposed. Some schemes take the easy-first mode, whereas others take the hard-first one. Naturally, an interesting yet realistic question is raised. Given a new learning task, which samples should be learned first, easy or hard? To answer this question, both theoretical analysis and experimental verification are conducted. First, a general objective function is proposed and the optimal weight can be derived from it, which reveals the relationship between the difficulty distribution of the training set and the priority mode. Two novel findings are subsequently obtained: besides the easy-first and hard-first modes, there are two other typical modes, namely, medium-first and two-ends-first; the priority mode may be varied if the difficulty distribution of the training set changes greatly. Second, inspired by the findings, a flexible weighting scheme (FlexW) is proposed for selecting the optimal priority mode when there is no prior knowledge or theoretical clues. The ... Read More
8. PAC-Bayesian Learning of Optimization Algorithms
Michael Sucker, Peter Ochs, 2022
We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed. Even in the limit case, where convergence is guaranteed, our learned optimization algorithms provably outperform related algorithms based on a (deterministic) worst-case analysis. Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families. By generalizing existing ideas, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum, which enables the algorithmic realization of the learning procedure. As a proof-of-concept, we learn hyperparameters of standard optimization algorithms to empirically underline our theory.
9. Learning an Interpretable Learning Rate Schedule via the Option Framework
Chaojing Yao - IEEE, 2022
Learning rate is a common and important hyperparameter in many gradient-based optimizers which are used for training machine learning models. Heuristic handcrafted learning rate schedules (LRSs) can work in many practical situations, but their design and tuning is a tedious work, and there is no guarantee that a given handcrafted LRS matches a given problem. Many works have been dedicated to automatically learning an LRS from the training dynamics of the optimization problem, but most of them share a common deficit: they borrow the algorithms designed elsewhere as methods for automatic outer-training, but those methods often lack interpretability in the context of learning an LRS. In this paper, we leverage the option framework, a generalization to the common rein-forcement learning framework, to automatically learn an LRS based on the dynamics of optimization, which takes the idea of temporal abstraction as an underlying interpretation. To meet the requirements of LLRS, the RL state is designed as consisting of the global state and the per-parameter state. We propose a policy archit... Read More
10. Constraint Guided Gradient Descent: Guided Training with Inequality Constraints
Quinten Van Baelen, Peter Karsmakers - Ciaco - i6doc.com, 2022
Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge.In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that enables the injection of domain knowledge into the training procedure.The domain knowledge is assumed to be described as a conjunction of hard inequality constraints which appears to be a natural choice for several applications.Compared to other neuro-symbolic approaches, the proposed method converges to a model that satisfies any inequality constraint on the training data and does not require to first transform the constraints into some ad-hoc term that is added to the learning (optimisation) objective.Under certain conditions, it is shown that CGGD can converge to a model that satisfies the constraints on the training set, while prior work does not necessarily converge to such a model.It is empirically shown on two independent and small data sets that CGGD makes training less dependent on the initialisation of the network and improves the constra... Read More
11. Adaptive Hierarchical Hyper-gradient Descent
Renlong Jie, Junbin Gao, Andrey L. Vasnev, 2020
In this study, we investigate learning rate adaption at different levels based on the hyper-gradient descent framework and propose a method that adaptively learns the optimizer parameters by combining multiple levels of learning rates with hierarchical structures. Meanwhile, we show the relationship between regularizing over-parameterized learning rates and building combinations of adaptive learning rates at different levels. The experiments on several network architectures, including feed-forward networks, LeNet-5 and ResNet-18/34, show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety of circumstances.
12. Incremental and Parallel Machine Learning Algorithms With Automated Learning Rate Adjustments
Kazuhiro Hishinuma, Hideaki Iiduka - Frontiers Media SA, 2019
The existing machine learning algorithms for minimizing the convex function over a closed convex set suffer from slow convergence because their learning rates must be determined before running them. This paper proposes two machine learning algorithms incorporating the line search method, which automatically and algorithmically finds appropriate learning rates at run-time. One algorithm is based on the incremental subgradient algorithm, which sequentially and cyclically uses each of the parts of the objective function; the other is based on the parallel subgradient algorithm, which uses parts independently in parallel. These algorithms can be applied to constrained nonsmooth convex optimization problems appearing in tasks of learning support vector machines without adjusting the learning rates precisely. The proposed line search method can determine learning rates to satisfy weaker conditions than the ones used in the existing machine learning algorithms. This implies that the two algorithms are generalizations of the existing incremental and parallel subgradient algorithms for solvin... Read More
13. Knowledge Distillation via Route Constrained Optimization
Jin Xiao, Baoyun Peng, Yichao Wu - IEEE, 2019
Distillation-based learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily learned by a miniaturized model. However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a higher lower bound of congruence loss. In this work, we consider the knowledge distillation from the perspective of curriculum learning by teacher's routing. Instead of supervising the student model with a converged teacher model, we supervised it with some anchor points selected from the route in parameter space that the teacher model passed by, as we called route constrained optimization (RCO). We experimentally demonstrate this simple operation greatly reduces the lower bound of congruence loss for knowledge distillation, hint and mimicking learning. On close-set classification tasks like CIFAR and ImageNet, RCO improves knowledge distillation by 2.14% and 1.5% respect... Read More
14. A Learning Guided Parameter Setting for Constrained Multi-Objective Optimization
Zhun Fan, Jie Ruan, Wenji Li - IEEE, 2019
This paper proposes a learning guided parameter setting method for constrained multi-objective optimization. To be more specific, the proposed method can generate penalty factors adaptively, which is inspired by the learning rate setting from deep learning. The suggested penalty function employs an exponential decay model by integrating constraint violation values, objectives values, the current generation counter and the maximum number of generations. Furthermore, the proposed self-adaptive penalty method is embedded in the push and pull search framework (PPS-SA) to deal with constrained multi-objective optimization problems (CMOPs). In PPS-SA, the search process is divided into two different stages — push and pull search stages. In the push stage, a CMOP is optimized without considering any constraints. In the pull stage, the CMOP is optimized with a self-adaptive penalty constraint-handling method. To evaluate the performance regarding convergence and diversity, two commonly used metrics, including IGD and HV, are used to test the proposed PPS-SA and other four state-of-the-art CM... Read More
15. Theory of Curriculum Learning, with Convex Loss Functions
Daphna Weinshall, Dan Amir, 2018
Curriculum Learning - the idea of teaching by gradually exposing the learner to examples in a meaningful order, from easy to hard, has been investigated in the context of machine learning long ago. Although methods based on this concept have been empirically shown to improve performance of several learning algorithms, no theoretical analysis has been provided even for simple cases. To address this shortfall, we start by formulating an ideal definition of difficulty score - the loss of the optimal hypothesis at a given datapoint. We analyze the possible contribution of curriculum learning based on this score in two convex problems - linear regression, and binary classification by hinge loss minimization. We show that in both cases, the expected convergence rate decreases monotonically with the ideal difficulty score, in accordance with earlier empirical results. We also prove that when the ideal difficulty score is fixed, the convergence rate is monotonically increasing with respect to the loss of the current hypothesis at each point. We discuss how these results bring to term two app... Read More
16. Evaluation of Teaching Learning Based Optimization with Focused Learning on Expensive Optimization Problems (CEC2017)
Remya Kommadath, Prakash Kotecha - Springer Singapore, 2018
Teaching learning based optimization (TLBO) simulates the transfer of knowledge in a classroom environment for solving various optimization problems. In the current work, we propose a variant of TLBO which incorporates a focused learning strategy and evaluates its performance on bound constrained single objective computationally expensive problems provided for CEC2017. The proposed variant of TLBO uses the functional evaluations effectively to handle expensive optimization problems and has lower computational complexity.
Get Full Report
Access our comprehensive collection of 16 documents related to this technology