Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2008
Laplace maximum margin Markov networks
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1256–1263https://doi.org/10.1145/1390156.1390314We propose Laplace max-margin Markov networks (LapM3N), and a general class of Bayesian M3N (BM3N) of which the LapM3N is a special case with sparse structural bias, for robust structured prediction. BM3N generalizes extant structured prediction rules ...
- research-articleJuly 2008
Efficient multiclass maximum margin clustering
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1248–1255https://doi.org/10.1145/1390156.1390313This paper presents a cutting plane algorithm for multiclass maximum margin clustering (MMC). The proposed algorithm constructs a nested sequence of successively tighter relaxations of the original MMC problem, and each optimization problem in this ...
- research-articleJuly 2008
Estimating local optimums in EM algorithm over Gaussian mixture model
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1240–1247https://doi.org/10.1145/1390156.1390312EM algorithm is a very popular iteration-based method to estimate the parameters of Gaussian Mixture Model from a large observation set. However, in most cases, EM algorithm is not guaranteed to converge to the global optimum. Instead, it stops at some ...
- research-articleJuly 2008
Improved Nyström low-rank approximation and error analysis
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1232–1239https://doi.org/10.1145/1390156.1390311Low-rank matrix approximation is an effective tool in alleviating the memory and computational burdens of kernel methods and sampling, as the mainstream of such algorithms, has drawn considerable attention in both theory and practice. This paper ...
- research-articleJuly 2008
A quasi-Newton approach to non-smooth convex optimization
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1216–1223https://doi.org/10.1145/1390156.1390309We extend the well-known BFGS quasi-Newton method and its limited-memory variant LBFGS to the optimization of non-smooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local ...
-
- research-articleJuly 2008
Preconditioned temporal difference learning
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1208–1215https://doi.org/10.1145/1390156.1390308This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The ...
- research-articleJuly 2008
Democratic approximation of lexicographic preference models
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1200–1207https://doi.org/10.1145/1390156.1390307Previous algorithms for learning lexicographic preference models (LPMs) produce a "best guess" LPM that is consistent with the observations. Our approach is more democratic: we do not commit to a single LPM. Instead, we approximate the target using the ...
- research-articleJuly 2008
Fully distributed EM for very large datasets
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1184–1191https://doi.org/10.1145/1390156.1390305In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We ...
- research-articleJuly 2008
Efficiently learning linear-linear exponential family predictive representations of state
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1176–1183https://doi.org/10.1145/1390156.1390304Exponential Family PSR (EFPSR) models capture stochastic dynamical systems by representing state as the parameters of an exponential family distribution over a shortterm window of future observations. They are appealing from a learning perspective ...
- research-articleJuly 2008
Deep learning via semi-supervised embedding
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1168–1175https://doi.org/10.1145/1390156.1390303We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the ...
- research-articleJuly 2008
Graph transduction via alternating minimization
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1144–1151https://doi.org/10.1145/1390156.1390300Graph transduction methods label input data by learning a classification function that is regularized to exhibit smoothness along a graph over labeled and unlabeled samples. In practice, these algorithms are sensitive to the initial set of labels ...
- research-articleJuly 2008
Dirichlet component analysis: feature extraction for compositional data
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1128–1135https://doi.org/10.1145/1390156.1390298We consider feature extraction (dimensionality reduction) for compositional data, where the data vectors are constrained to be positive and constant-sum. In real-world problems, the data components (variables) usually have complicated "correlations" ...
- research-articleJuly 2008
Manifold alignment using Procrustes analysis
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1120–1127https://doi.org/10.1145/1390156.1390297In this paper we introduce a novel approach to manifold alignment, based on Procrustes analysis. Our approach differs from "semi-supervised alignment" in that it results in a mapping that is defined everywhere - when used with a suitable dimensionality ...
- research-articleJuly 2008
Sparse multiscale gaussian process regression
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1112–1119https://doi.org/10.1145/1390156.1390296Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs fixed. We generalise this for the case ...
- research-articleJuly 2008
Prediction with expert advice for the Brier game
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1104–1111https://doi.org/10.1145/1390156.1390295We show that the Brier game of prediction is mixable and find the optimal learning rate and substitution function for it. The resulting prediction algorithm is applied to predict results of football and tennis matches. The theoretical performance ...
- research-articleJuly 2008
Beam sampling for the infinite hidden Markov model
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1088–1095https://doi.org/10.1145/1390156.1390293The infinite hidden Markov model is a non-parametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which ...
- research-articleJuly 2008
A semiparametric statistical approach to model-free policy evaluation
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1072–1079https://doi.org/10.1145/1390156.1390291Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we ...
- research-articleJuly 2008
Training restricted Boltzmann machines using approximations to the likelihood gradient
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1064–1071https://doi.org/10.1145/1390156.1390290A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the ...
- research-articleJuly 2008
ν-support vector machine as conditional value-at-risk minimization
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1056–1063https://doi.org/10.1145/1390156.1390289The ν-support vector classification (ν-SVC) algorithm was shown to work well and provide intuitive interpretations, e.g., the parameter ν roughly specifies the fraction of support vectors. Although ν corresponds to a fraction, it cannot take the entire ...
- research-articleJuly 2008
The many faces of optimism: a unifying approach
ICML '08: Proceedings of the 25th international conference on Machine learningPages 1048–1055https://doi.org/10.1145/1390156.1390288The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods. Here, we ...