Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Deep learning in neural networks

Published: 01 January 2015 Publication History

Abstract

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

References

[1]
D. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, Australian National University, 2003.
[2]
J. Abounadi, D. Bertsekas, V.S. Borkar, Learning algorithms for Markov decision processes with average cost, SIAM Journal on Control and Optimization, 40 (2002) 681-698.
[3]
H. Akaike, Statistical predictor identification, Annals of the Institute of Statistical Mathematics, 22 (1970) 203-217.
[4]
H. Akaike, Information theory and an extension of the maximum likelihood principle, in: Second intl. symposium on information theory, Akademinai Kiado, 1973, pp. 267-281.
[5]
H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19 (1974) 716-723.
[6]
A. Allender, Application of time-bounded Kolmogorov complexity in complexity theory, in: EATCS monographs on theoretical computer science, Springer, 1992, pp. 6-22.
[7]
Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In IEEE 1st international conference on neural networks, vol. 2 (pp. 609-618).
[8]
L.B. Almeida, L.B. Almeida, T. Langlois, J.D. Amaral, R.A. Redol, On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000, 1997.
[9]
S. Amari, A theory of adaptive pattern classifiers, IEEE Transactions on Electronic Computers, 16 (1967) 299-307.
[10]
S.-I. Amari, Natural gradient works efficiently in learning, Neural Computation, 10 (1998) 251-276.
[11]
S. Amari, A. Cichocki, H. Yang, A new learning algorithm for blind signal separation, in: Advances in neural information processing systems (NIPS), vol. 8, The MIT Press, 1996.
[12]
S. Amari, N. Murata, Statistical theory of learning curves under entropic loss criterion, Neural Computation, 5 (1993) 140-153.
[13]
D.J. Amit, N. Brunel, Dynamics of a recurrent network of spiking neurons before and following learning, Network: Computation in Neural Systems, 8 (1997) 373-404.
[14]
G. An, The effects of adding noise during backpropagation training on a generalization performance, Neural Computation, 8 (1996) 643-674.
[15]
M.A. Andrade, P. Chacon, J.J. Merelo, F. Moran, Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network, Protein Engineering, 6 (1993) 383-390.
[16]
R. Andrews, J. Diederich, A.B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-Based Systems, 8 (1995) 373-389.
[17]
D. Anguita, B.A. Gomes, Mixing floating- and fixed-point formats for neural network learning on neuroprocessors, Microprocessing and Microprogramming, 41 (1996) 757-769.
[18]
D. Anguita, G. Parodi, R. Zunino, An efficient implementation of BP on RISC-based workstations, Neurocomputing, 6 (1994) 57-65.
[19]
I. Arel, D.C. Rose, T.P. Karnowski, Deep machine learning-a new frontier in artificial intelligence research, IEEE Computational Intelligence Magazine, 5 (2010) 13-18.
[20]
T. Ash, Dynamic node creation in backpropagation neural networks, Connection Science, 1 (1989) 365-375.
[21]
J.J. Atick, Z. Li, A.N. Redlich, Understanding retinal color coding from first principles, Neural Computation, 4 (1992) 559-572.
[22]
A.F. Atiya, A.G. Parlos, New results on recurrent network training: unifying the algorithms and accelerating convergence, IEEE Transactions on Neural Networks, 11 (2000) 697-709.
[23]
J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Advances in neural information processing systems (NIPS), 2013, pp. 3084-3092.
[24]
Baird, H. (1990). Document image defect models. In Proceddings, IAPR workshop on syntactic and structural pattern recognition.
[25]
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In International conference on machine learning (pp. 30-37).
[26]
L. Baird, A.W. Moore, Gradient descent for general reinforcement learning, in: Advances in neural information processing systems, vol. 12 (NIPS), MIT Press, 1999, pp. 968-974.
[27]
B. Bakker, Reinforcement learning with long short-term memory, in: Advances in neural information processing systems, vol. 14, MIT Press, Cambridge, MA, 2002, pp. 1475-1482.
[28]
B. Bakker, J. Schmidhuber, Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization, in: Proc. 8th conference on intelligent autonomous systems IAS-8, IOS Press, Amsterdam, NL, 2004, pp. 438-445.
[29]
Bakker, B., Zhumatiy, V., Gruener, G., & Schmidhuber, J. (2003). A robot that reinforcement-learns to identify and memorize important previous observations. In Proceedings of the 2003 IEEE/RSJ international conference on intelligent robots and systems (pp. 430-435).
[30]
P. Baldi, Gradient descent learning algorithms overview: A general dynamical systems perspective, IEEE Transactions on Neural Networks, 6 (1995) 182-195.
[31]
P. Baldi, Autoencoders, unsupervised learning, and deep architectures, Journal of Machine Learning Research, 27 (2012) 37-50.
[32]
P. Baldi, S. Brunak, P. Frasconi, G. Pollastri, G. Soda, Exploiting the past and the future in protein secondary structure prediction, Bioinformatics, 15 (1999) 937-946.
[33]
P. Baldi, Y. Chauvin, Neural networks for fingerprint recognition, Neural Computation, 5 (1993) 402-418.
[34]
P. Baldi, Y. Chauvin, Hybrid modeling, HMM/NN architectures, and protein applications, Neural Computation, 8 (1996) 1541-1565.
[35]
P. Baldi, K. Hornik, Neural networks and principal component analysis: learning from examples without local minima, Neural Networks, 2 (1989) 53-58.
[36]
P. Baldi, K. Hornik, Learning in linear networks: a survey, IEEE Transactions on Neural Networks, 6 (1995) 837-858.
[37]
P. Baldi, G. Pollastri, The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem, Journal of Machine Learning Research, 4 (2003) 575-602.
[38]
P. Baldi, P. Sadowski, The dropout learning algorithm, Artificial Intelligence, 210C (2014) 78-122.
[39]
Ballard, D. H. (1987). Modular learning in neural networks. In Proc. AAAI (pp. 279-284).
[40]
S. Baluja, Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163, Carnegie Mellon University, 1994.
[41]
R. Balzer, A 15 year perspective on automatic programming, IEEE Transactions on Software Engineering, 11 (1985) 1257-1268.
[42]
H.B. Barlow, Unsupervised learning, Neural Computation, 1 (1989) 295-311.
[43]
H.B. Barlow, T.P. Kaushal, G.J. Mitchison, Finding minimum entropy codes, Neural Computation, 1 (1989) 412-423.
[44]
H.G. Barrow, Learning receptive fields, in: Proceedings of the IEEE 1st annual conference on neural networks, vol. IV, IEEE, 1987, pp. 115-121.
[45]
A.G. Barto, S. Mahadevan, Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems, 13 (2003) 341-379.
[46]
A.G. Barto, S. Singh, N. Chentanez, Intrinsically motivated learning of hierarchical collections of skills, in: Proceedings of international conference on developmental learning, MIT Press, Cambridge, MA, 2004, pp. 112-119.
[47]
A.G. Barto, R.S. Sutton, C.W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, SMC-13 (1983) 834-846.
[48]
R. Battiti, Accelerated backpropagation learning: two optimization methods, Complex Systems, 3 (1989) 331-342.
[49]
T. Battiti, First- and second-order methods for learning: between steepest descent and Newton's method, Neural Computation, 4 (1992) 141-166.
[50]
E.B. Baum, D. Haussler, What size net gives valid generalization?, Neural Computation, 1 (1989) 151-160.
[51]
L.E. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, The Annals of Mathematical Statistics (1966) 1554-1563.
[52]
J. Baxter, P.L. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, 15 (2001) 319-350.
[53]
Bayer, J., & Osendorfer, C. (2014). Variational inference of latent state sequences using recurrent networks. ArXiv Preprint arXiv:1406.1655.
[54]
Bayer, J., Osendorfer, C., Chen, N., Urban, S., & van der Smagt, P. (2013). On fast dropout and its applicability to recurrent networks. ArXiv Preprint arXiv:1311.0701.
[55]
Bayer, J., Wierstra, D., Togelius, J., & Schmidhuber, J. (2009). Evolving memory cell structures for sequence learning. In Proc. ICANN (2) (pp. 755-764).
[56]
T. Bayes, An essay toward solving a problem in the doctrine of chances, Philosophical Transactions of the Royal Society of London, 53 (1763) 370-418.
[57]
S. Becker, Unsupervised learning procedures for neural networks, International Journal of Neural Systems, 2 (1991) 17-33.
[58]
S. Becker, Y. Le Cun, Improving the convergence of back-propagation learning with second order methods, in: Proc. 1988 connectionist models summer school, 1988, Morgan Kaufmann, San Mateo, 1989, pp. 29-37.
[59]
Behnke, S. (1999). Hebbian learning and competition in the neural abstraction pyramid. In Proceedings of the international joint conference on neural networks, vol. 2 (pp. 1356-1361).
[60]
S. Behnke, Learning iterative image reconstruction in the neural abstraction pyramid, International Journal of Computational Intelligence and Applications, 1 (2001) 427-438.
[61]
Behnke, S. (2002). Learning face localization using hierarchical recurrent networks. In Proceedings of the 12th international conference on artificial neural networks (pp. 1319-1324).
[62]
Behnke, S. (2003a). Discovering hierarchical speech features using convolutional non-negative matrix factorization. In Proceedings of the international joint conference on neural networks, vol. 4 (pp. 2758-2763).
[63]
S. Behnke, Hierarchical neural networks for image interpretation, in: LNCS, Lecture notes in computer science, Vol. 2766, Springer, 2003.
[64]
S. Behnke, Face localization and tracking in the neural abstraction pyramid, Neural Computing and Applications, 14 (2005) 97-103.
[65]
Behnke, S., & Rojas, R. (1998). Neural abstraction pyramid: a hierarchical image understanding architecture. In Proceedings of international joint conference on neural networks, vol. 2 (pp. 820-825).
[66]
A.J. Bell, T.J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, 7 (1995) 1129-1159.
[67]
R. Bellman, Dynamic programming, Princeton University Press, Princeton, NJ, USA, 1957.
[68]
A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, E. Moulines, A blind source separation technique using second-order statistics, IEEE Transactions on Signal Processing, 45 (1997) 434-444.
[69]
Y. Bengio, Artificial neural networks and their application to sequence recognition, McGill University, (Computer Science), Montreal, QC, Canada, 1991.
[70]
Y. Bengio, Learning deep architectures for AI, in: Foundations and trends in machine learning, Vol. 2(1), Now Publishers, 2009.
[71]
Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1798-1828.
[72]
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in: Advances in neural information processing systems, vol. 19 (NIPS), MIT Press, 2007, pp. 153-160.
[73]
Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, 5 (1994) 157-166.
[74]
N. Beringer, A. Graves, F. Schiel, J. Schmidhuber, Classifying unprompted speech by retraining LSTM nets, in: LNCS, Vol. 3696, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 575-581.
[75]
D.P. Bertsekas, Dynamic programming and optimal control, Athena Scientific, 2001.
[76]
D.P. Bertsekas, J.N. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, Belmont, MA, 1996.
[77]
N.P. Bichot, A.F. Rossi, R. Desimone, Parallel and serial neural mechanisms for visual search in macaque area V4, Science, 308 (2005) 529-534.
[78]
F. Biegler-König, F. Bärmann, A learning algorithm for multilayered neural networks based on linear least squares problems, Neural Networks, 6 (1993) 127-131.
[79]
C.M. Bishop, Curvature-driven smoothing: A learning algorithm for feed-forward networks, IEEE Transactions on Neural Networks, 4 (1993) 882-884.
[80]
C.M. Bishop, Pattern recognition and machine learning, Springer, 2006.
[81]
A.D. Blair, J.B. Pollack, Analysis of dynamical recognizers, Neural Computation, 9 (1997) 1127-1142.
[82]
V.D. Blondel, J.N. Tsitsiklis, A survey of computational complexity results in systems and control, Automatica, 36 (2000) 1249-1274.
[83]
Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, F., & Kermorvant, C. (2014). The A2iA Arabic handwritten text recognition system at the OpenHaRT2013 evaluation. In International workshop on document analysis systems.
[84]
A.L. Blum, R.L. Rivest, Training a 3-node neural network is NP-complete, Neural Networks, 5 (1992) 117-127.
[85]
A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Occam's razor, Information Processing Letters, 24 (1987) 377-380.
[86]
L. Bobrowski, Learning processes in multilayer threshold nets, Biological Cybernetics, 31 (1978) 1-6.
[87]
M. Bodén, J. Wiles, Context-free and context-sensitive dynamics in recurrent neural networks, Connection Science, 12 (2000) 197-210.
[88]
U. Bodenhausen, A. Waibel, The Tempo 2 algorithm: adjusting time-delays by supervised learning, in: Advances in neural information processing systems, vol. 3, Morgan Kaufmann, 1991, pp. 155-161.
[89]
S.M. Bohte, J.N. Kok, H. La Poutre, Error-backpropagation in temporally encoded networks of spiking neurons, Neurocomputing, 48 (2002) 17-37.
[90]
L. Boltzmann, Wissenschaftliche Abhandlungen, in: Wissenschaftliche Abhandlungen, Barth, Leipzig, 1909.
[91]
L. Bottou, Une approche théorique de l'apprentissage connexioniste; applications à la reconnaissance de la parole, Université de Paris XI, 1991.
[92]
H. Bourlard, N. Morgan, Connnectionist speech recognition: a hybrid approach, Kluwer Academic Publishers, 1994.
[93]
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable Markov decision processes using compact representations. In Proceedings of the AAAI.
[94]
S.J. Bradtke, A.G. Barto, L.P. Kaelbling, Linear least-squares algorithms for temporal difference learning, Machine Learning (1996) 22-33.
[95]
R.I. Brafman, M. Tennenholtz, R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, 3 (2002) 213-231.
[96]
J. Brea, W. Senn, J.-P. Pfister, Matching recall and storage in sequence learning with spiking neural networks, The Journal of Neuroscience, 33 (2013) 9565-9575.
[97]
L. Breiman, Bagging predictors, Machine Learning, 24 (1996) 123-140.
[98]
R. Brette, M. Rudolph, T. Carnevale, M. Hines, D. Beeman, J.M. Bower, Simulation of networks of spiking neurons: a review of tools and strategies, Journal of Computational Neuroscience, 23 (2007) 349-398.
[99]
T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: 12th International conference on document analysis and recognition, IEEE, 2013, pp. 683-687.
[100]
J. Bromley, J.W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, Signature verification using a Siamese time delay neural network, International Journal of Pattern Recognition and Artificial Intelligence, 7 (1993) 669-688.
[101]
C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation, 19 (1965) 577-593.
[102]
Brueckner, R., & Schulter, B. (2014). Social signal classification using deep BLSTM recurrent neural networks. In Proceedings 39th IEEE international conference on acoustics, speech, and signal processing (pp. 4856-4860).
[103]
N. Brunel, Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons, Journal of Computational Neuroscience, 8 (2000) 183-208.
[104]
Bryson, A. E. (1961). A gradient method for optimizing multi-stage allocation processes. In Proc. Harvard Univ. symposium on digital computers and their applications.
[105]
A.E. Bryson Jr., W.F. Denham, A steepest-ascent method for solving optimum programming problems. Technical report BR-1303, Raytheon Company, Missle and Space Division, 1961.
[106]
A. Bryson, Y. Ho, Applied optimal control: optimization, estimation, and control, Blaisdell Pub. Co, 1969.
[107]
J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics, 17 (2001) 419-428.
[108]
W.L. Buntine, A.S. Weigend, Bayesian back-propagation, Complex Systems, 5 (1991) 603-643.
[109]
N. Burgess, A constructive algorithm that converges for real-valued input patterns, International Journal of Neural Systems, 5 (1994) 59-66.
[110]
Cardoso, J.-F. (1994). On the performance of orthogonal source separation algorithms. In Proc. EUSIPCO (pp. 776-779).
[111]
M.A. Carreira-Perpinan, Continuous latent variable models for dimensionality reduction and sequential data reconstruction, University of Sheffield, UK, 2001.
[112]
M.J. Carter, F.J. Rudolph, A.J. Nucci, Operational fault tolerance of CMAC networks, in: Advances in neural information processing systems (NIPS), vol. 2, Morgan Kaufmann, San Mateo, CA, 1990, pp. 340-347.
[113]
R. Caruana, Multitask learning, Machine Learning, 28 (1997) 41-75.
[114]
M.P. Casey, The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction, Neural Computation, 8 (1996) 1135-1178.
[115]
G. Cauwenberghs, A fast stochastic error-descent algorithm for supervised learning and optimization, in: Advances in neural information processing systems, vol. 5, Morgan Kaufmann, 1993, pp. 244.
[116]
G.J. Chaitin, On the length of programs for computing finite binary sequences, Journal of the ACM, 13 (1966) 547-569.
[117]
S.K. Chalup, A.D. Blair, Incremental training of first order recurrent neural networks to predict a context-sensitive language, Neural Networks, 16 (2003) 955-972.
[118]
Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In International workshop on Frontiers in handwriting recognition.
[119]
K. Chen, A. Salman, Learning speaker-specific characteristics with a deep neural architecture, IEEE Transactions on Neural Networks, 22 (2011) 1744-1756.
[120]
K. Cho, Foundations and advances in deep learning, Aalto University School of Science, 2014.
[121]
K. Cho, A. Ilin, T. Raiko, Tikhonov-type regularization for restricted Boltzmann machines, in: Intl. conf. on artificial neural networks 2012, Springer, 2012, pp. 81-88.
[122]
K. Cho, T. Raiko, A. Ilin, Enhanced gradient for training restricted Boltzmann machines, Neural Computation, 25 (2013) 805-831.
[123]
A. Church, An unsolvable problem of elementary number theory, The American Journal of Mathematics, 58 (1936) 345-363.
[124]
D.C. Ciresan, A. Giusti, L.M. Gambardella, J. Schmidhuber, Deep neural networks segment neuronal membranes in electron microscopy images, in: Advances in neural information processing systems (NIPS), 2012, pp. 2852-2860.
[125]
Ciresan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In Proc. MICCAI, vol. 2 (pp. 411-418).
[126]
D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmidhuber, Deep big simple neural nets for handwritten digit recogntion, Neural Computation, 22 (2010) 3207-3220.
[127]
Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011). Flexible, high performance convolutional neural networks for image classification. In Intl. joint conference on artificial intelligence (pp. 1237-1242).
[128]
Ciresan, D. C., Meier, U., Masci, J., & Schmidhuber, J. (2011). A committee of neural networks for traffic sign classification. In International joint conference on neural networks (pp. 1918-1921).
[129]
D.C. Ciresan, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks, 32 (2012) 333-338.
[130]
Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012a). Multi-column deep neural networks for image classification. In IEEE Conference on computer vision and pattern recognition. Long preprint arXiv:1202.2745v1 ¿cs.CV.
[131]
Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012b). Transfer learning for Latin and Chinese characters with deep neural networks. In International joint conference on neural networks (pp. 1301-1306).
[132]
D.C. Ciresan, J. Schmidhuber, Multi-column deep neural networks for offline handwritten Chinese character classification. Technical report, IDSIA, 2013. arXiv:1309.0261
[133]
D.T. Cliff, P. Husbands, I. Harvey, Evolving recurrent dynamical networks for robot control, in: Artificial neural nets and genetic algorithms, Springer, 1993, pp. 428-435.
[134]
J. Clune, J.-B. Mouret, H. Lipson, The evolutionary origins of modularity, Proceedings of the Royal Society B: Biological Sciences, 280 (2013) 20122863.
[135]
J. Clune, K.O. Stanley, R.T. Pennock, C. Ofria, On the performance of indirect encoding across the continuum of regularity, IEEE Transactions on Evolutionary Computation, 15 (2011) 346-367.
[136]
Coates, A., Huval, B., Wang, T., Wu, D. J., Ng, A. Y., & Catanzaro, B. (2013). Deep learning with COTS HPC systems. In Proc. international conference on machine learning.
[137]
A. Cochocki, R. Unbehauen, Neural networks for optimization and signal processing, John Wiley & Sons, Inc, 1993.
[138]
R. Collobert, J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, in: Proceedings of the 25th international conference on machine learning, ACM, 2008, pp. 160-167.
[139]
P. Comon, Independent component analysis-a new concept?, Signal Processing, 36 (1994) 287-314.
[140]
C.E. Connor, S.L. Brincat, A. Pasupathy, Transformation of shape information in the ventral pathway, Current Opinion in Neurobiology, 17 (2007) 140-147.
[141]
J. Connor, D.R. Martin, L.E. Atlas, Recurrent neural networks and robust time series prediction, IEEE Transactions on Neural Networks, 5 (1994) 240-254.
[142]
S.A. Cook, The complexity of theorem-proving procedures, in: Proceedings of the 3rd annual ACM symposium on the theory of computing, ACM, New York, 1971, pp. 151-158.
[143]
N.L. Cramer, A representation for the adaptive generation of simple sequential programs, in: Proceedings of an international conference on genetic algorithms and their applications, Carnegie-Mellon University, Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
[144]
P. Craven, G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation, Numerische Mathematik, 31 (1979) 377-403.
[145]
G. Cuccu, M. Luciw, J. Schmidhuber, F. Gomez, Intrinsically motivated evolutionary search for vision-based reinforcement learning, in: Proceedings of the 2011 IEEE conference on development and learning and epigenetic robotics IEEE-ICDL-EPIROB, vol. 2, IEEE, 2011, pp. 1-7.
[146]
G.E. Dahl, T.N. Sainath, G.E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in: IEEE International conference on acoustics, speech and signal processing, IEEE, 2013, pp. 8609-8613.
[147]
G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech and Language Processing, 20 (2012) 30-42.
[148]
D'Ambrosio, D. B., & Stanley, K. O. (2007). A novel generative encoding for exploiting neural network sensor and output geometry. In Proceedings of the conference on genetic and evolutionary computation (pp. 974-981).
[149]
M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, Locality-sensitive hashing scheme based on p -stable distributions, in: Proceedings of the 20th annual symposium on computational geometry, ACM, 2004, pp. 253-262.
[150]
P. Dayan, G. Hinton, Feudal reinforcement learning, in: Advances in neural information processing systems (NIPS), vol. 5, Morgan Kaufmann, 1993, pp. 271-278.
[151]
P. Dayan, G.E. Hinton, Varieties of Helmholtz machine, Neural Networks, 9 (1996) 1385-1403.
[152]
P. Dayan, G.E. Hinton, R.M. Neal, R.S. Zemel, The Helmholtz machine, Neural Computation, 7 (1995) 889-904.
[153]
P. Dayan, R. Zemel, Competition and multiple cause models, Neural Computation, 7 (1995) 565-579.
[154]
G. Deco, L. Parra, Non-linear feature extraction by redundancy reduction in an unsupervised stochastic neural network, Neural Networks, 10 (1997) 683-691.
[155]
G. Deco, E.T. Rolls, Neurodynamics of biased competition and cooperation for attention: a model with spiking neurons, Journal of Neurophysiology, 94 (2005) 295-313.
[156]
J.F.G. De Freitas, Bayesian methods for neural networks, University of Cambridge, 2003.
[157]
G. DeJong, R. Mooney, Explanation-based learning: an alternative view, Machine Learning, 1 (1986) 145-176.
[158]
D. DeMers, G. Cottrell, Non-linear dimensionality reduction, in: Advances in neural information processing systems (NIPS), vol. 5, Morgan Kaufmann, 1993, pp. 580-587.
[159]
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, 39 (1977).
[160]
L. Deng, D. Yu, Deep learning: methods and applications, NOW Publishers, 2014.
[161]
R. Desimone, T.D. Albright, C.G. Gross, C. Bruce, Stimulus-selective properties of inferior temporal neurons in the macaque, The Journal of Neuroscience, 4 (1984) 2051-2062.
[162]
M.C. de Souto, M.C.P.D. Souto, W.R.D. Oliveira, The loading problem for pyramidal neural networks, Electronic Journal on Mathematics of Computation (1999).
[163]
R.L. De Valois, D.G. Albrecht, L.G. Thorell, Spatial frequency selectivity of cells in macaque visual cortex, Vision Research, 22 (1982) 545-559.
[164]
Y. Deville, K.K. Lau, Logic program synthesis, Journal of Logic Programming, 19 (1994) 321-350.
[165]
B. de Vries, J.C. Principe, A theory for neural networks with time delays, in: Advances in neural information processing systems (NIPS), vol. 3, Morgan Kaufmann, 1991, pp. 162-168.
[166]
J.J. DiCarlo, D. Zoccolan, N.C. Rust, How does the brain solve visual object recognition?, Neuron, 73 (2012) 415-434.
[167]
Dickmanns, E. D., Behringer, R., Dickmanns, D., Hildebrandt, T., Maurer, M., & Thomanek, F., et al. (1994). The seeing passenger car 'VaMoRs-P'. In Proc. int. symp. on intelligent vehicles (pp. 68-73).
[168]
D. Dickmanns, J. Schmidhuber, A. Winklhofer, Der genetische algorithmus: eine implementierung in prolog. Technical report, Inst. of Informatics, Tech. Univ. Munich, 1987. http://www.idsia.ch/~juergen/geneticprogramming.html
[169]
T.G. Dietterich, Ensemble methods in machine learning, in: Multiple classifier systems, Springer, 2000, pp. 1-15.
[170]
T.G. Dietterich, Hierarchical reinforcement learning with the MAXQ value function decomposition, Journal of Artificial Intelligence Research (JAIR), 13 (2000) 227-303.
[171]
P. Di Lena, K. Nagata, P. Baldi, Deep architectures for protein contact map prediction, Bioinformatics, 28 (2012) 2449-2457.
[172]
S.W. Director, R.A. Rohrer, Automated network design-the frequency-domain case, IEEE Transactions on Circuit Theory, CT-16 (1969) 330-337.
[173]
M. Dittenbach, D. Merkl, A. Rauber, The growing hierarchical self-organizing map, in: IEEE-INNS-ENNS International joint conference on neural networks, vol. 6, IEEE Computer Society, 2000, pp. 6015.
[174]
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., & Tzeng, E., et al. (2013). DeCAF: a deep convolutional activation feature for generic visual recognition. ArXiv Preprint arXiv:1310.1531.
[175]
Dorffner, G. (1996). Neural networks for time series processing. In Neural network world.
[176]
K. Doya, K. Samejima, K. Ichi Katagiri, M. Kawato, Multiple model-based reinforcement learning, Neural Computation, 14 (2002) 1347-1369.
[177]
S.E. Dreyfus, The numerical solution of variational problems, Journal of Mathematical Analysis and Applications, 5 (1962) 30-45.
[178]
S.E. Dreyfus, The computational solution of optimal control problems with time lag, IEEE Transactions on Automatic Control, 18 (1973) 383-385.
[179]
J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, The Journal of Machine Learning, 12 (2011) 2121-2159.
[180]
Egorova, A., Gloye, A., Göktekin, C., Liers, A., Luft, M., & Rojas, R., et al. (2004). FU-fighters small size 2004, team description. In RoboCup 2004 symposium: papers and team description papers. CD edition.
[181]
S. Elfwing, M. Otsuka, E. Uchibe, K. Doya, Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs, in: Neural information processing. theory and algorithms (ICONIP), vol. 1, Springer, 2010, pp. 215-222.
[182]
C. Eliasmith, How to build a brain: a neural architecture for biological cognition, Oxford University Press, New York, NY, 2013.
[183]
C. Eliasmith, T.C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, A large-scale model of the functioning brain, Science, 338 (2012) 1202-1205.
[184]
J.L. Elman, Finding structure in time, Cognitive Science, 14 (1990) 179-211.
[185]
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning?, Journal of Machine Learning Research, 11 (2010) 625-660.
[186]
A.N. Escalante-B, L. Wiskott, How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis, Journal of Machine Learning Research, 14 (2013) 3683-3719.
[187]
R.L. Eubank, Spline smoothing and nonparametric regression, in: Self-organizing methods in modeling, Marcel Dekker, New York, 1988.
[188]
Euler, L. (1744). Methodus inveniendi.
[189]
Eyben, F., Weninger, F., Squartini, S., & Schuller, B. (2013). Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies. In Proc. 38th IEEE international conference on acoustics, speech, and signal processing (pp. 483-487).
[190]
Faggin, F. (1992). Neural network hardware. In International joint conference on neural networks, vol. 1 (p. 153).
[191]
S.E. Fahlman, An empirical study of learning speed in back-propagation networks. Technical report CMU-CS-88-162, Carnegie-Mellon Univ., 1988.
[192]
S.E. Fahlman, The recurrent cascade-correlation learning algorithm, in: Advances in neural information processing systems (NIPS), vol. 3, Morgan Kaufmann, 1991, pp. 190-196.
[193]
M.S. Falconbridge, R.L. Stamps, D.R. Badcock, A simple Hebbian/anti-Hebbian network learns the sparse, independent components of natural images, Neural Computation, 18 (2006) 415-429.
[194]
Fan, Y., Qian, Y., Xie, F., & Soong, F. K. (2014). TTS synthesis with bidirectional LSTM based recurrent neural networks. In Proc. Interspeech.
[195]
C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1915-1929.
[196]
S.J. Farlow, Self-organizing methods in modeling: GMDH type algorithms, vol. 54, CRC Press, 1984.
[197]
L.A. Feldkamp, D.V. Prokhorov, C.F. Eagen, F. Yuan, Enhanced multi-stream Kalman filter training for recurrent networks, in: Nonlinear modeling, Springer, 1998, pp. 29-53.
[198]
L.A. Feldkamp, D.V. Prokhorov, T.M. Feldkamp, Simple and conditioned adaptive behavior from Kalman filter trained recurrent networks, Neural Networks, 16 (2003) 683-689.
[199]
L.A. Feldkamp, G.V. Puskorius, A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification, Proceedings of the IEEE, 86 (1998) 2259-2277.
[200]
D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex, Cerebral Cortex, 1 (1991) 1-47.
[201]
Fernández, S., Graves, A., & Schmidhuber, J. (2007a). An application of recurrent neural networks to discriminative keyword spotting. In Proc. ICANN (2) (pp. 220-229).
[202]
Fernandez, S., Graves, A., & Schmidhuber, J. (2007b). Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proceedings of the 20th international joint conference on artificial intelligence.
[203]
Fernandez, R., Rendel, A., Ramabhadran, B., & Hoory, R. (2014). Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In Proc. Interspeech.
[204]
D.J. Field, Relations between the statistics of natural images and the response properties of cortical cells, Journal of the Optical Society of America, 4 (1987) 2379-2394.
[205]
D.J. Field, What is the goal of sensory coding?, Neural Computation, 6 (1994) 559-601.
[206]
Fieres, J., Schemmel, J., & Meier, K. (2008). Realizing biological spiking network models in a configurable wafer-scale hardware system. In IEEE International joint conference on neural networks (pp. 969-976).
[207]
S. Fine, Y. Singer, N. Tishby, The hierarchical hidden Markov model: analysis and applications, Machine Learning, 32 (1998) 41-62.
[208]
A. Fischer, C. Igel, Training restricted Boltzmann machines: an introduction, Pattern Recognition, 47 (2014) 25-39.
[209]
R. FitzHugh, Impulses and physiological states in theoretical models of nerve membrane, Biophysical Journal, 1 (1961) 445-466.
[210]
R. Fletcher, M.J. Powell, A rapidly convergent descent method for minimization, The Computer Journal, 6 (1963) 163-168.
[211]
D. Floreano, C. Mattiussi, Evolution of spiking neural controllers for autonomous vision-based robots, in: Evolutionary robotics. From intelligent robotics to artificial life, Springer, 2001, pp. 38-61.
[212]
D.B. Fogel, L.J. Fogel, V. Porto, Evolving neural networks, Biological Cybernetics, 63 (1990) 487-493.
[213]
L. Fogel, A. Owens, M. Walsh, Artificial intelligence through simulated evolution, Wiley, New York, 1966.
[214]
P. Földiák, Forming sparse representations by local anti-Hebbian learning, Biological Cybernetics, 64 (1990) 165-170.
[215]
P. Földiák, M.P. Young, Sparse coding in the primate cortex, in: The handbook of brain theory and neural networks, The MIT Press, 1995, pp. 895-898.
[216]
Förster, A., Graves, A., & Schmidhuber, J. (2007). RNN-based learning of compact maps for efficient robot localization. In 15th European symposium on artificial neural networks (pp. 537-542).
[217]
M. Franzius, H. Sprekeler, L. Wiskott, Slowness and sparseness lead to place, head-direction, and spatial-view cells, PLoS Computational Biology, 3 (2007) 166.
[218]
Friedman, J., Hastie, T., & Tibshirani, R. (2001). Springer series in statistics: Vol. 1. The elements of statistical learning. New York.
[219]
V. Frinken, F. Zamora-Martinez, S. Espana-Boquera, M.J. Castro-Bleda, A. Fischer, H. Bunke, Long-short term memory neural networks language modeling for handwriting recognition, in: 2012 21st International conference on pattern recognition, IEEE, 2012, pp. 701-704.
[220]
B. Fritzke, A growing neural gas network learns topologies, in: NIPS, MIT Press, 1994, pp. 625-632.
[221]
K.S. Fu, Syntactic pattern recognition and applications, Springer, Berlin, 1977.
[222]
T. Fukada, M. Schuster, Y. Sagisaka, Phoneme boundary estimation using bidirectional recurrent neural networks and its applications, Systems and Computers in Japan, 30 (1999) 20-30.
[223]
K. Fukushima, Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron, Transactions of the IECE, J62-A (1979) 658-665.
[224]
K. Fukushima, Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36 (1980) 193-202.
[225]
K. Fukushima, Increasing robustness against background noise: visual pattern recognition by a neocognitron, Neural Networks, 24 (2011) 767-778.
[226]
K. Fukushima, Artificial vision by multi-layered neural networks: neocognitron and its advances, Neural Networks, 37 (2013) 103-119.
[227]
K. Fukushima, Training multi-layered neural network neocognitron, Neural Networks, 40 (2013) 18-31.
[228]
D. Gabor, Theory of communication. Part 1: the analysis of information, Electrical Engineers-Part III: Journal of the Institution of Radio and Communication Engineering, 93 (1946) 429-441.
[229]
S.I. Gallant, Connectionist expert systems, Communications of the ACM, 31 (1988) 152-169.
[230]
Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium.
[231]
Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimis obnoxiae (Theory of the combination of observations least subject to error).
[232]
S. Ge, C.C. Hang, T.H. Lee, T. Zhang, Stable adaptive neural network control, Springer, 2010.
[233]
Geiger, J. T., Zhang, Z., Weninger, F., Schuller, B., & Rigoll, G. (2014). Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In Proc. interspeech.
[234]
S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Computation, 4 (1992) 1-58.
[235]
F.A. Gers, J. Schmidhuber, Recurrent nets that time and count, in: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, 2000, vol. 3, IEEE, 2000, pp. 189-194.
[236]
F.A. Gers, J. Schmidhuber, LSTM recurrent networks learn simple context free and context sensitive languages, IEEE Transactions on Neural Networks, 12 (2001) 1333-1340.
[237]
F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, Neural Computation, 12 (2000) 2451-2471.
[238]
F.A. Gers, N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks, Journal of Machine Learning Research, 3 (2002) 115-143.
[239]
W. Gerstner, W.K. Kistler, Spiking neuron models, Cambridge University Press, 2002.
[240]
W. Gerstner, J.L. van Hemmen, Associative memory in a network of spiking neurons, Network: Computation in Neural Systems, 3 (1992) 139-164.
[241]
Ghavamzadeh, M., & Mahadevan, S. (2003). Hierarchical policy gradient algorithms. In Proceedings of the twentieth conference on machine learning (pp. 226-233).
[242]
Gherrity, M. (1989). A learning algorithm for analog fully recurrent neural networks. In IEEE/INNS International joint conference on neural networks, San Diego, vol. 1 (pp. 643-644).
[243]
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. Technical report, UC Berkeley and ICSI, 2013. arxiv.org/abs/1311.2524
[244]
L. Gisslen, M. Luciw, V. Graziano, J. Schmidhuber, Sequential constant size compressor for reinforcement learning, in: Proc. fourth conference on artificial general intelligence, Springer, 2011, pp. 31-40.
[245]
Giusti, A., Ciresan, D. C., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2013). Fast image scanning with deep max-pooling convolutional neural networks. In Proc. ICIP.
[246]
B. Glackin, T.M. McGinnity, L.P. Maguire, Q. Wu, A. Belatreche, A novel approach for the implementation of large scale spiking neural networks on FPGA hardware, in: Computational intelligence and bioinspired systems, Springer, 2005, pp. 552-563.
[247]
T. Glasmachers, T. Schaul, Y. Sun, D. Wierstra, J. Schmidhuber, Exponential natural evolution strategies, in: Proceedings of the genetic and evolutionary computation conference, ACM, 2010, pp. 393-400.
[248]
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier networks. In AISTATS, vol. 15 (pp. 315-323).
[249]
A. Gloye, F. Wiesel, O. Tenchio, M. Simon, Reinforcing the driving quality of soccer playing robots by anticipation, IT-Information Technology, 47 (2005).
[250]
K. Gödel, Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I, Monatshefte für Mathematik und Physik, 38 (1931) 173-198.
[251]
D.E. Goldberg, Genetic algorithms in search, optimization and machine learning, Addison-Wesley, Reading, MA, 1989.
[252]
D. Goldfarb, A family of variable-metric methods derived by variational means, Mathematics of Computation, 24 (1970) 23-26.
[253]
G. Golub, H. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21 (1979) 215-224.
[254]
F.J. Gomez, Robust nonlinear control through neuroevolution, Department of Computer Sciences, University of Texas at Austin, 2003.
[255]
Gomez, F. J., & Miikkulainen, R. (2003). Active guidance for a finless rocket using neuroevolution. In Proc. GECCO 2003.
[256]
F.J. Gomez, J. Schmidhuber, Co-evolving recurrent neurons learn deep memory POMDPs, in: Proc. of the 2005 conference on genetic and evolutionary computation, ACM Press, New York, NY, USA, 2005.
[257]
F.J. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses, Journal of Machine Learning Research, 9 (2008) 937-965.
[258]
H. Gomi, M. Kawato, Neural network control for a closed-loop system using feedback-error-learning, Neural Networks, 6 (1993) 933-946.
[259]
Gonzalez-Dominguez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., & Moreno, P. J. (2014). Automatic language identification using long short-term memory recurrent neural networks. In Proc. Interspeech.
[260]
Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., & Shet, V. (2014). Multi-digit number recognition from street view imagery using deep convolutional neural networks. ArXiv Preprint arXiv:1312.6082v4.
[261]
Goodfellow, I. J., Courville, A., & Bengio, Y. (2011). Spike-and-slab sparse coding for unsupervised feature discovery. In NIPS Workshop on challenges in learning hierarchical models.
[262]
Goodfellow, I. J., Courville, A. C., & Bengio, Y. (2012). Large-scale feature learning with spike-and-slab sparse coding. In Proceedings of the 29th international conference on machine learning.
[263]
I. Goodfellow, M. Mirza, X. Da, A. Courville, Y. Bengio, An empirical investigation of catastrophic forgetting in gradient-based neural networks. TR, 2014. arXiv:1312.6211v2
[264]
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International conference on machine learning.
[265]
A. Graves, Practical variational inference for neural networks, in: Advances in neural information processing systems (NIPS), 2011, pp. 2348-2356.
[266]
Graves, A., Eck, D., Beringer, N., & Schmidhuber, J. (2003). Isolated digit recognition with LSTM recurrent networks. In First international workshop on biologically inspired approaches to advanced information technology.
[267]
Graves, A., Fernandez, S., Gomez, F. J., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. In ICML'06: Proceedings of the 23rd international conference on machine learning (pp. 369-376).
[268]
A. Graves, S. Fernandez, M. Liwicki, H. Bunke, J. Schmidhuber, Unconstrained on-line handwriting recognition with recurrent neural networks, in: Advances in neural information processing systems (NIPS), vol. 20, MIT Press, Cambridge, MA, 2008, pp. 577-584.
[269]
Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In Proc. 31st International conference on machine learning (pp. 1764-1772).
[270]
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber, A novel connectionist system for improved unconstrained handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (2009).
[271]
A. Graves, A.-R. Mohamed, G.E. Hinton, Speech recognition with deep recurrent neural networks, in: IEEE International conference on acoustics, speech and signal processing, IEEE, 2013, pp. 6645-6649.
[272]
A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, 18 (2005) 602-610.
[273]
A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, in: Advances in neural information processing systems (NIPS), vol. 21, MIT Press, Cambridge, MA, 2009, pp. 545-552.
[274]
M. Graziano, The intelligent movement machine: an ethological perspective on the primate motor system, Oxford University Press, USA, 2009.
[275]
Griewank, A. (2012). Documenta Mathematica-Extra Volume ISMP, (pp. 389-400).
[276]
I. Grondman, L. Busoniu, G.A.D. Lopes, R. Babuska, A survey of actor-critic reinforcement learning: standard and natural policy gradients, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 42 (2012) 1291-1307.
[277]
S. Grossberg, Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, I, Journal of Mathematics and Mechanics, 19 (1969) 53-91.
[278]
S. Grossberg, Adaptive pattern classification and universal recoding, 1: parallel development and coding of neural feature detectors, Biological Cybernetics, 23 (1976) 187-202.
[279]
S. Grossberg, Adaptive pattern classification and universal recoding, 2: feedback, expectation, olfaction, and illusions, Biological Cybernetics, 23 (1976).
[280]
F. Gruau, D. Whitley, L. Pyeatt, A comparison between cellular encoding and direct encoding for genetic neural networks. NeuroCOLT Technical report NC-TR-96-048, ESPRIT Working Group in Neural and Computational Learning, NeuroCOLT 8556, 1996.
[281]
P.D. Grünwald, I.J. Myung, M.A. Pitt, Advances in minimum description length: theory and applications, MIT Press, 2005.
[282]
M. Grüttner, F. Sehnke, T. Schaul, J. Schmidhuber, Multi-dimensional deep memory atari-go players for parameter exploring policy gradients, in: Proceedings of the international conference on artificial neural networks ICANN, Springer, 2010, pp. 114-123.
[283]
X. Guo, S. Singh, H. Lee, R. Lewis, X. Wang, Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning, in: Advances in neural information processing systems, vol. 27 (NIPS), 2014.
[284]
I. Guyon, V. Vapnik, B. Boser, L. Bottou, S.A. Solla, Structural risk minimization for character recognition, in: Advances in neural information processing systems (NIPS), vol. 4, Morgan Kaufmann, 1992, pp. 471-479.
[285]
J. Hadamard, Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées. Mémoires présentés par divers savants à l'Académie des sciences de l'Institut de France: Éxtrait, Imprimerie nationale, 1908.
[286]
R. Hadsell, S. Chopra, Y. LeCun, Dimensionality reduction by learning an invariant mapping, in: Proc. computer vision and pattern recognition conference, IEEE Press, 2006.
[287]
Hagras, H., Pounds-Cornish, A., Colley, M., Callaghan, V., & Clarke, G. (2004). Evolving spiking neural network controllers for autonomous robots. In IEEE International conference on robotics and automation, vol. 5 (pp. 4620-4626).
[288]
N. Hansen, S.D. Müller, P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), Evolutionary Computation, 11 (2003) 1-18.
[289]
N. Hansen, A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, 9 (2001) 159-195.
[290]
S.J. Hanson, A stochastic version of the delta rule, Physica D: Nonlinear Phenomena, 42 (1990) 265-272.
[291]
S.J. Hanson, L.Y. Pratt, Comparing biases for minimal network construction with back-propagation, in: Advances in neural information processing systems (NIPS), vol. 1, Morgan Kaufmann, San Mateo, CA, 1989, pp. 177-185.
[292]
B.L. Happel, J.M. Murre, Design and evolution of modular neural network architectures, Neural Networks, 7 (1994) 985-1004.
[293]
S. Hashem, B. Schmeiser, Improving model accuracy using optimal linear combinations of trained neural networks, IEEE Transactions on Neural Networks, 6 (1992) 792-794.
[294]
B. Hassibi, D.G. Stork, Second order derivatives for network pruning: optimal brain surgeon, in: Advances in neural information processing systems, vol. 5, Morgan Kaufmann, 1993, pp. 164-171.
[295]
T.J. Hastie, R.J. Tibshirani, Generalized additive models, in: Monographs on statisics and applied probability, Vol. 43, 1990.
[296]
T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning, in: Springer series in statistics, 2009.
[297]
J. Hawkins, D. George, Hierarchical temporal memory-concepts, theory, and terminology, Numenta Inc, 2006.
[298]
S.S. Haykin, Kalman filtering and neural networks, Wiley Online Library, 2001.
[299]
D.O. Hebb, The organization of behavior, Wiley, New York, 1949.
[300]
R. Hecht-Nielsen, Theory of the backpropagation neural network, in: International joint conference on neural networks, IEEE, 1989, pp. 593-605.
[301]
J.N. Heemskerk, Overview of neural hardware, in: Neurocomputers for brain-style processing. Design, implementation and application, 1995.
[302]
Heess, N., Silver, D., & Teh, Y. W. (2012). Actor-critic reinforcement learning with energy-based policies. In Proc. European workshop on reinforcement learning (pp. 43-57).
[303]
V. Heidrich-Meisner, C. Igel, Neuroevolution strategies for episodic reinforcement learning, Journal of Algorithms, 64 (2009) 152-168.
[304]
J. Herrero, A. Valencia, J. Dopazo, A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics, 17 (2001) 126-136.
[305]
J. Hertz, A. Krogh, R. Palmer, Introduction to the theory of neural computation, Addison-Wesley, Redwood City, 1991.
[306]
M.R. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, 49 (1952) 409-436.
[307]
S.E. Hihi, Y. Bengio, Hierarchical recurrent neural networks for long-term dependencies, in: Advances in neural information processing systems, vol. 8, MIT Press, 1996, pp. 493-499.
[308]
G.E. Hinton, Connectionist learning procedures, Artificial Intelligence, 40 (1989) 185-234.
[309]
G.E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, 14 (2002) 1771-1800.
[310]
G.E. Hinton, P. Dayan, B.J. Frey, R.M. Neal, The wake-sleep algorithm for unsupervised neural networks, Science, 268 (1995) 1158-1160.
[311]
G.E. Hinton, L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Processing Magazine, 29 (2012) 82-97.
[312]
G.E. Hinton, Z. Ghahramani, Generative models for discovering sparse distributed representations, Philosophical Transactions of the Royal Society B, 352 (1997) 1177-1190.
[313]
G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, 18 (2006) 1527-1554.
[314]
G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006) 504-507.
[315]
G.E. Hinton, T.E. Sejnowski, Learning and relearning in Boltzmann machines, in: Parallel distributed processing, vol. 1, MIT Press, 1986, pp. 282-317.
[316]
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. Technical report, 2012. arXiv:1207.0580
[317]
G.E. Hinton, D. van Camp, Keeping neural networks simple, in: Proceedings of the international conference on artificial neural networks, Amsterdam, Springer, 1993, pp. 11-18.
[318]
S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.
[319]
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in: A field guide to dynamical recurrent neural networks, IEEE Press, 2001.
[320]
Hochreiter, S., & Obermayer, K. (2005). Sequence classification for protein analysis. In Snowbird workshop, Snowbird: Utah. Computational and Biological Learning Society.
[321]
S. Hochreiter, J. Schmidhuber, Bridging long time lags by weight guessing and Long Short-Term Memory, in: Frontiers in artificial intelligence and applications, Vol. 37, IOS Press, Amsterdam, Netherlands, 1996, pp. 65-72.
[322]
S. Hochreiter, J. Schmidhuber, Flat minima, Neural Computation, 9 (1997) 1-42.
[323]
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997) 1735-1780.
[324]
S. Hochreiter, J. Schmidhuber, Feature extraction through LOCOCODE, Neural Computation, 11 (1999) 679-714.
[325]
S. Hochreiter, A.S. Younger, P.R. Conwell, Learning to learn using gradient descent, in: Lecture notes on comp. sci., Vol. 2130, Springer, Berlin, Heidelberg, 2001, pp. 87-94.
[326]
A.L. Hodgkin, A.F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of Physiology, 117 (1952) 500.
[327]
G.M. Hoerzer, R. Legenstein, W. Maass, Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning, Cerebral Cortex, 24 (2014) 677-690.
[328]
S.B. Holden, On the theory of generalization and self-structuring in linearly weighted connectionist networks, Cambridge University, Engineering Department, 1994.
[329]
J.H. Holland, Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor, 1975.
[330]
V. Honavar, L.M. Uhr, A network of neuron-like units that learns to perceive by generation as well as reweighting of its links, in: Proc. of the 1988 connectionist models summer school, Morgan Kaufman, San Mateo, 1988, pp. 472-484.
[331]
V. Honavar, L. Uhr, Generative learning structures and processes for generalized connectionist networks, Information Sciences, 70 (1993) 75-108.
[332]
J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences, 79 (1982) 2554-2558.
[333]
K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989) 359-366.
[334]
D.H. Hubel, T. Wiesel, Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex, Journal of Physiology (London), 160 (1962) 106-154.
[335]
D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, 195 (1968) 215-243.
[336]
D.A. Huffman, A method for construction of minimum-redundancy codes, Proceedings IRE, 40 (1952) 1098-1101.
[337]
C.P. Hung, G. Kreiman, T. Poggio, J.J. DiCarlo, Fast readout of object identity from macaque inferior temporal cortex, Science, 310 (2005) 863-866.
[338]
M. Hutter, The fastest and shortest algorithm for all well-defined problems, International Journal of Foundations of Computer Science, 13 (2002) 431-443.
[339]
M. Hutter, Universal artificial intelligence: sequential decisions based on algorithmic probability, Springer, Berlin, 2005.
[340]
A. Hyvärinen, P. Hoyer, E. Oja, Sparse code shrinkage: denoising by maximum likelihood estimation, in: Advances in neural information processing systems (NIPS), vol. 12, MIT Press, 1999.
[341]
A. Hyvärinen, J. Karhunen, E. Oja, Independent component analysis, John Wiley & Sons, 2001.
[342]
ICPR (2012). Contest on Mitosis Detection in Breast Cancer Histological Images (2012). IPAL laboratory and TRIBVN company and pitie-salpetriere hospital and CIALAB of Ohio State Univ. http://ipal.cnrs.fr/ICPR2012/.
[343]
C. Igel, Neuroevolution for reinforcement learning using evolution strategies, in: Congress on evolutionary computation, vol. 4, IEEE, 2003, pp. 2588-2595.
[344]
C. Igel, M. Hüsken, Empirical evaluation of the improved Rprop learning algorithm, Neurocomputing, 50 (2003) 105-123.
[345]
S. Ikeda, M. Ochiai, Y. Sawaragi, Sequential GMDH algorithm and its application to river flow prediction, IEEE Transactions on Systems, Man and Cybernetics (1976) 473-479.
[346]
E. Indermuhle, V. Frinken, H. Bunke, Mode detection in online handwritten documents using BLSTM neural networks, in: Frontiers in handwriting recognition (ICFHR), 2012 international conference on, IEEE, 2012, pp. 302-307.
[347]
E. Indermuhle, V. Frinken, A. Fischer, H. Bunke, Keyword spotting in online handwritten documents containing text and non-text using BLSTM neural networks, in: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, 2011, pp. 73-77.
[348]
G. Indiveri, B. Linares-Barranco, T.J. Hamilton, A. Van Schaik, R. Etienne-Cummings, T. Delbruck, Neuromorphic silicon neuron circuits, Frontiers in Neuroscience, 5 (2011).
[349]
A.G. Ivakhnenko, The group method of data handling-a rival of the method of stochastic approximation, Soviet Automatic Control, 13 (1968) 43-55.
[350]
A.G. Ivakhnenko, Polynomial theory of complex systems, IEEE Transactions on Systems, Man and Cybernetics (1971) 364-378.
[351]
A.G. Ivakhnenko, The review of problems solvable by algorithms of the group method of data handling (GMDH), Pattern Recognition and Image Analysis/Raspoznavaniye Obrazov I Analiz Izobrazhenii, 5 (1995) 527-535.
[352]
A.G. Ivakhnenko, V.G. Lapa, Cybernetic predicting devices, CCM Information Corporation, 1965.
[353]
A.G. Ivakhnenko, V.G. Lapa, R.N. McDonough, Cybernetics and forecasting techniques, American Elsevier, NY, 1967.
[354]
E.M. Izhikevich, Simple model of spiking neurons, IEEE Transactions on Neural Networks, 14 (2003) 1569-1572.
[355]
T. Jaakkola, S.P. Singh, M.I. Jordan, Reinforcement learning algorithm for partially observable Markov decision problems, in: Advances in neural information processing systems, vol. 7, MIT Press, 1995, pp. 345-352.
[356]
Jackel, L., Boser, B., Graf, H.-P., Denker, J., LeCun, Y., & Henderson, D., et al. (1990). VLSI implementation of electronic neural networks: and example in character recognition. In IEEE (Ed.), IEEE international conference on systems, man, and cybernetics (pp. 320-322).
[357]
C. Jacob, A. Lindenmayer, G. Rozenberg, Genetic L-system programming, in: Lecture notes in computer science, 1994.
[358]
R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1 (1988) 295-307.
[359]
H. Jaeger, The "echo state" approach to analysing and training recurrent neural networks. Technical report GMD Report 148, German National Research Center for Information Technology, 2001.
[360]
H. Jaeger, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, 304 (2004) 78-80.
[361]
V. Jain, S. Seung, Natural image denoising with convolutional networks, in: Advances in neural information processing systems (NIPS), vol. 21, Curran Associates, Inc, 2009, pp. 769-776.
[362]
J. Jameson, Delayed reinforcement learning with multiple time scale hierarchical backpropagated adaptive critics, in: Neural networks for control, 1991.
[363]
S. Ji, W. Xu, M. Yang, K. Yu, 3D convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 221-231.
[364]
K. Jim, C.L. Giles, B.G. Horne, Effects of noise on convergence and generalization in recurrent networks, in: Advances in neural information processing systems (NIPS), vol. 7, Morgan Kaufmann, San Mateo, CA, 1995, pp. 649.
[365]
X. Jin, M. Lujan, L.A. Plana, S. Davies, S. Temple, S.B. Furber, Modeling spiking neural networks on SpiNNaker, Computing in Science and Engineering, 12 (2010) 91-97.
[366]
S.R. Jodogne, J.H. Piater, Closed-loop learning of visual control policies, Journal of Artificial Intelligence Research, 28 (2007) 349-391.
[367]
J.P. Jones, L.A. Palmer, An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex, Journal of Neurophysiology, 58 (1987) 1233-1258.
[368]
M.I. Jordan, Serial order: a parallel distributed processing approach. Technical report ICS report 8604, Institute for Cognitive Science, University of California, San Diego, 1986.
[369]
M.I. Jordan, Supervised learning and systems with excess degrees of freedom. Technical report COINS TR 88-27, Massachusetts Institute of Technology, 1988.
[370]
M.I. Jordan, Serial order: a parallel distributed processing approach, Advances in Psychology, 121 (1997) 471-495.
[371]
M.I. Jordan, D.E. Rumelhart, Supervised learning with a distal teacher. Technical report Occasional Paper #40, Center for Cog. Sci., Massachusetts Institute of Technology, 1990.
[372]
M.I. Jordan, T.J. Sejnowski, Graphical models: foundations of neural computation, MIT Press, 2001.
[373]
R.D. Joseph, Contributions to perceptron theory, Cornell Univ, 1961.
[374]
C.-F. Juang, A hybrid of genetic algorithm and particle swarm optimization for recurrent network design, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34 (2004) 997-1006.
[375]
J.S. Judd, Neural network design and the complexity of learning, in: Neural network modeling and connectionism, MIT Press, 1990.
[376]
C. Jutten, J. Herault, Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture, Signal Processing, 24 (1991) 1-10.
[377]
L.P. Kaelbling, M.L. Littman, A.R. Cassandra, Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.
[378]
L.P. Kaelbling, M.L. Littman, A.W. Moore, Reinforcement learning: A survey, Journal of AI Research, 4 (1996) 237-285.
[379]
Kak, S., Chen, Y., & Wang, L. (2010). Data mining using surface and deep agents based on neural networks. In AMCIS 2010 proceedings.
[380]
Y. Kalinke, H. Lehmann, Computation in recurrent neural networks: from counters to iterated function systems, in: LNAI, Vol. 1502, Springer, Berlin, Heidelberg, 1998.
[381]
R.E. Kalman, A new approach to linear filtering and prediction problems, Journal of Basic Engineering, 82 (1960) 35-45.
[382]
J. Karhunen, J. Joutsensalo, Generalizations of principal component analysis, optimization problems, and neural networks, Neural Networks, 8 (1995) 549-562.
[383]
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In IEEE conference on computer vision and pattern recognition.
[384]
N.K. Kasabov, Neucube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data, Neural Networks (2014).
[385]
H.J. Kelley, Gradient theory of optimal flight paths, ARS Journal, 30 (1960) 947-954.
[386]
R. Kempter, W. Gerstner, J.L. Van Hemmen, Hebbian learning and spiking neurons, Physical Review E, 59 (1999) 4498.
[387]
P. Kerlirzin, F. Vallet, Robustness in multilayer perceptrons, Neural Computation, 5 (1993) 473-482.
[388]
Khan, S. H., Bennamoun, M., Sohel, F., & Togneri, R. (2014). Automatic feature learning for robust shadow detection. In IEEE conference on computer vision and pattern recognition.
[389]
Khan, M. M., Khan, G. M., & Miller, J. F. (2010). Evolution of neural networks using Cartesian Genetic Programming. In IEEE congress on evolutionary computation (pp. 1-8).
[390]
M.M. Khan, D.R. Lester, L.A. Plana, A. Rast, X. Jin, E. Painkras, SpiNNaker: mapping neural networks onto a massively-parallel chip multiprocessor, in: International joint conference on neural networks, IEEE, 2008, pp. 2849-2856.
[391]
Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In ICML, vol. 97 (pp. 152-160).
[392]
W.M. Kistler, W. Gerstner, J.L. van Hemmen, Reduction of the Hodgkin-Huxley equations to a single-variable threshold model, Neural Computation, 9 (1997) 1015-1045.
[393]
H. Kitano, Designing neural networks using genetic algorithms with graph generation system, Complex Systems, 4 (1990) 461-476.
[394]
S. Klampfl, W. Maass, Emergence of dynamic memory traces in cortical microcircuit models through STDP, The Journal of Neuroscience, 33 (2013) 11515-11529.
[395]
M. Klapper-Rybicka, N.N. Schraudolph, J. Schmidhuber, Unsupervised learning in LSTM recurrent neural networks, in: Lecture Notes on Comp. Sci., Vol. 2130, Springer, Berlin, Heidelberg, 2001, pp. 684-691.
[396]
E. Kobatake, K. Tanaka, Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex, Journal of Neurophysiology, 71 (1994) 856-867.
[397]
N. Kohl, P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, in: Robotics and automation, 2004. Proceedings. ICRA'04. 2004 IEEE international conference on, vol. 3, IEEE, 2004, pp. 2619-2624.
[398]
T. Kohonen, Correlation matrix memories, IEEE Transactions on Computers, 100 (1972) 353-359.
[399]
T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics, 43 (1982) 59-69.
[400]
T. Kohonen, Self-organization and associative memory, Springer, 1988.
[401]
P. Koikkalainen, E. Oja, Self-organizing hierarchical feature maps, in: International joint conference on neural networks, IEEE, 1990, pp. 279-284.
[402]
A.N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition, Doklady Akademii Nauk SSSR, 114 (1965) 679-681.
[403]
A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems of Information Transmission, 1 (1965) 1-11.
[404]
V.R. Kompella, M.D. Luciw, J. Schmidhuber, Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams, Neural Computation, 24 (2012) 2994-3024.
[405]
T. Kondo, GMDH neural network algorithm using the heuristic self-organization method and its application to the pattern identification problem, in: Proceedings of the 37th SICE annual conference, IEEE, 1998, pp. 1143-1148.
[406]
T. Kondo, J. Ueno, Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels, International Journal of Innovative Computing, Information and Control, 4 (2008) 175-187.
[407]
P. Kordík, P. Náplava, M. Snorek, M. Genyk-Berezovskyj, Modified GMDH method and models quality evaluation by visualization, Control Systems and Computers, 2 (2003) 68-75.
[408]
Korkin, M., de Garis, H., Gers, F., & Hemmi, H. (1997). CBM (CAM-Brain Machine)-a hardware tool which evolves a neural net module in a fraction of a second and runs a million neuron artificial brain in real time.
[409]
B. Kosko, Unsupervised learning in noise, IEEE Transactions on Neural Networks, 1 (1990) 44-57.
[410]
J. Koutník, G. Cuccu, J. Schmidhuber, F. Gomez, Evolving large-scale neural networks for vision-based reinforcement learning, in: Proceedings of the genetic and evolutionary computation conference, ACM, Amsterdam, 2013, pp. 1061-1068.
[411]
Koutník, J., Gomez, F., & Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the 12th annual conference on genetic and evolutionary computation (pp. 619-626).
[412]
Koutník, J., Greff, K., Gomez, F., & Schmidhuber, J. (2014). A clockwork RNN. In Proceedings of the 31th international conference on machine learning, vol. 32 (pp. 1845-1853). arXiv:1402.3511 ¿cs.NE.
[413]
J.R. Koza, Genetic programming-on the programming of computers by means of natural selection, MIT Press, 1992.
[414]
M. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AIChE Journal, 37 (1991) 233-243.
[415]
S.C. Kremer, J.F. Kolen, Field guide to dynamical recurrent networks, Wiley-IEEE Press, 2001.
[416]
N. Kriegeskorte, M. Mur, D.A. Ruff, R. Kiani, J. Bodurka, H. Esteky, Matching categorical object representations in inferior temporal cortex of man and monkey, Neuron, 60 (2008) 1126-1141.
[417]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 4.
[418]
A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in: Advances in neural information processing systems, vol. 4, Morgan Kaufmann, 1992, pp. 950-957.
[419]
N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, Deep hierarchies in the primate visual cortex: what can we learn for computer vision?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1847-1871.
[420]
S. Kullback, R.A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics (1951) 79-86.
[421]
R. Kurzweil, How to create a mind: the secret of human thought revealed, 2012.
[422]
M.G. Lagoudakis, R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, 4 (2003) 1107-1149.
[423]
J. Lampinen, E. Oja, Clustering properties of hierarchical self-organizing maps, Journal of Mathematical Imaging and Vision, 2 (1992) 261-272.
[424]
K. Lang, A. Waibel, G.E. Hinton, A time-delay neural network architecture for isolated word recognition, Neural Networks, 3 (1990) 23-43.
[425]
Lange, S., & Riedmiller, M. (2010). Deep auto-encoder neural networks in reinforcement learning. In Neural networks, The 2010 international joint conference on (pp. 1-8).
[426]
A. Lapedes, R. Farber, A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition, Physica D, 22 (1986) 247-259.
[427]
P. Laplace, Mémoire sur la probabilité des causes par les évènements, Mémoires de l'Academie Royale des Sciences Presentés par Divers Savan, 6 (1774) 621-656.
[428]
P. Larraanaga, J.A. Lozano, Estimation of distribution algorithms: a new tool for evolutionary computation, Kluwer Academic Publishers, Norwell, MA, USA, 2001.
[429]
Le, Q. V., Ranzato, M., Monga, R., Devin, M., Corrado, G., & Chen, K., et al. (2012). Building high-level features using large scale unsupervised learning. In Proc. ICML'12.
[430]
LeCun, Y. (1985). Une procédure d'apprentissage pour réseau à seuil asymétrique. In Proceedings of cognitiva 85 (pp. 599-604).
[431]
Y. LeCun, A theoretical framework for back-propagation, in: Proceedings of the 1988 connectionist models summer school, Morgan Kaufmann, CMU, Pittsburgh, Pa, 1988, pp. 21-28.
[432]
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, Back-propagation applied to handwritten zip code recognition, Neural Computation, 1 (1989) 541-551.
[433]
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, Handwritten digit recognition with a back-propagation network, in: Advances in neural information processing systems, vol. 2, Morgan Kaufmann, 1990, pp. 396-404.
[434]
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998) 2278-2324.
[435]
Y. LeCun, J.S. Denker, S.A. Solla, Optimal brain damage, in: Advances in neural information processing systems, vol. 2, Morgan Kaufmann, 1990, pp. 598-605.
[436]
Y. LeCun, U. Muller, E. Cosatto, B. Flepp, Off-road obstacle avoidance through end-to-end learning, in: Advances in neural information processing systems (NIPS 2005), 2006.
[437]
Y. LeCun, P. Simard, B. Pearlmutter, Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors, in: Advances in neural information processing systems, vol. 5 (NIPS 1992), Morgan Kaufmann Publishers, San Mateo, CA, 1993.
[438]
L. Lee, Learning of context-free languages: a survey of the literature. Technical report TR-12-96, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts, 1996.
[439]
H. Lee, A. Battle, R. Raina, A.Y. Ng, Efficient sparse coding algorithms, in: Advances in neural information processing systems (NIPS), vol. 19, 2007, pp. 801-808.
[440]
H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2, in: Advances in neural information processing systems (NIPS), vol. 7, 2007, pp. 873-880.
[441]
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th international conference on machine learning (pp. 609-616).
[442]
S. Lee, R.M. Kil, A Gaussian potential function network with hierarchically self-organizing learning, Neural Networks, 4 (1991) 207-224.
[443]
Lee, H., Pham, P. T., Largman, Y., & Ng, A. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Proc. NIPS, vol. 9 (pp. 1096-1104).
[444]
A.M. Legendre, Nouvelles méthodes pour la détermination des orbites des cometes, F. Didot, 1805.
[445]
R.A. Legenstein, W. Maass, Neural circuits for pattern recognition with small total wire length, Theoretical Computer Science, 287 (2002) 239-249.
[446]
R. Legenstein, N. Wilbert, L. Wiskott, Reinforcement learning on slow features of high-dimensional input streams, PLoS Computational Biology, 6 (2010).
[447]
Leibniz, G. W. (1676). Memoir using the chain rule (cited in TMME 7:2&3 p. 321-332, 2010).
[448]
G.W. Leibniz, Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas, nec irrationales quantitates moratur, et singulare pro illis calculi genus, Acta Eruditorum (1684) 467-473.
[449]
D.B. Lenat, Theory formation by heuristic search, Machine Learning, 21 (1983).
[450]
D.B. Lenat, J.S. Brown, Why AM an EURISKO appear to work, Artificial Intelligence, 23 (1984) 269-294.
[451]
P. Lennie, J.A. Movshon, Coding of color and form in the geniculostriate visual pathway, Journal of the Optical Society of America A, 22 (2005) 2013-2033.
[452]
K. Levenberg, A method for the solution of certain problems in least squares, Quarterly of Applied Mathematics, 2 (1944) 164-168.
[453]
L.A. Levin, On the notion of a random sequence, Soviet Mathematics Doklady, 14 (1973) 1413-1416.
[454]
L.A. Levin, Universal sequential search problems, Problems of Information Transmission, 9 (1973) 265-266.
[455]
A.U. Levin, T.K. Leen, J.E. Moody, Fast pruning using principal components, in: Advances in neural information processing systems (NIPS), vol. 6, Morgan Kaufmann, 1994, pp. 35.
[456]
A.U. Levin, K.S. Narendra, Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control, IEEE Transactions on Neural Networks, 7 (1995) 30-42.
[457]
M.S. Lewicki, B.A. Olshausen, Inferring sparse, overcomplete image codes using an efficient coding framework, in: Advances in neural information processing systems (NIPS), vol. 10, 1998, pp. 815-821.
[458]
G.F.A. L'Hôpital, Analyse des infiniment petits, pour l'intelligence des lignes courbes, L'Imprimerie Royale, Paris, 1696.
[459]
M. Li, P.M.B. Vitányi, An introduction to Kolmogorov complexity and its applications, Springer, 1997.
[460]
R. Li, W. Zhang, H.-I. Suk, L. Wang, J. Li, D. Shen, Deep learning based imaging data completion for improved brain disease diagnosis, in: Proc. MICCAI, Springer, 2014.
[461]
L. Lin, Reinforcement learning for robots using neural networks, Carnegie Mellon University, Pittsburgh, 1993.
[462]
T. Lin, B. Horne, P. Tino, C. Giles, Learning long-term dependencies in NARX recurrent neural networks, IEEE Transactions on Neural Networks, 7 (1996) 1329-1338.
[463]
A. Lindenmayer, Mathematical models for cellular interaction in development, Journal of Theoretical Biology, 18 (1968) 280-315.
[464]
S. Lindstädt, Comparison of two unsupervised neural network models for redundancy reduction, in: Proc. of the 1993 connectionist models summer school, Erlbaum Associates, Hillsdale, NJ, 1993, pp. 308-315.
[465]
S. Linnainmaa, The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors, Univ. Helsinki, 1970.
[466]
S. Linnainmaa, Taylor expansion of the accumulated rounding error, BIT Numerical Mathematics, 16 (1976) 146-160.
[467]
R. Linsker, Self-organization in a perceptual network, IEEE Computer, 21 (1988) 105-117.
[468]
M.L. Littman, A.R. Cassandra, L.P. Kaelbling, Learning policies for partially observable environments: scaling up, in: Machine learning: proceedings of the twelfth international conference, Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 362-370.
[469]
S.-C. Liu, J. Kramer, G. Indiveri, T. Delbrück, T. Burg, R. Douglas, Orientation-selective aVLSI spiking neurons, Neural Networks, 14 (2001) 629-643.
[470]
L. Ljung, System identification, Springer, 1998.
[471]
N.K. Logothetis, J. Pauls, T. Poggio, Shape representation in the inferior temporal cortex of monkeys, Current Biology, 5 (1995) 552-563.
[472]
D. Loiacono, L. Cardamone, P.L. Lanzi, Simulated car racing championship competition software manual. Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy, 2011.
[473]
Loiacono, D., Lanzi, P. L., Togelius, J., Onieva, E., Pelta, D. A., & Butz, M. V., et al. (2009). The 2009 simulated car racing championship.
[474]
Lowe, D. (1999). Object recognition from local scale-invariant features. In The Proceedings of the seventh IEEE international conference on computer vision, vol. 2 (pp. 1150-1157).
[475]
D. Lowe, Distinctive image features from scale-invariant key-points, International Journal of Computer Vision, 60 (2004) 91-110.
[476]
M. Luciw, V.R. Kompella, S. Kazerounian, J. Schmidhuber, An intrinsic value system for developing multiple invariant representations with incremental slowness learning, Frontiers in Neurorobotics, 7 (2013).
[477]
A. Lusci, G. Pollastri, P. Baldi, Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules, Journal of Chemical Information and Modeling, 53 (2013) 1563-1575.
[478]
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning.
[479]
W. Maass, Lower bounds for the computational power of networks of spiking neurons, Neural Computation, 8 (1996) 1-40.
[480]
W. Maass, Networks of spiking neurons: the third generation of neural network models, Neural Networks, 10 (1997) 1659-1671.
[481]
W. Maass, On the computational power of winner-take-all, Neural Computation, 12 (2000) 2519-2535.
[482]
W. Maass, T. Natschläger, H. Markram, Real-time computing without stable states: A new framework for neural computation based on perturbations, Neural Computation, 14 (2002) 2531-2560.
[483]
D.J.C. MacKay, A practical Bayesian framework for backprop networks, Neural Computation, 4 (1992) 448-472.
[484]
D.J.C. MacKay, K.D. Miller, Analysis of Linsker's simulation of Hebbian rules, Neural Computation, 2 (1990) 173-187.
[485]
R. Maclin, J.W. Shavlik, Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding, Machine Learning, 11 (1993) 195-215.
[486]
Maclin, R., & Shavlik, J. W. (1995). Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks. In Proc. IJCAI (pp. 524-531).
[487]
H.R. Madala, A.G. Ivakhnenko, Inductive learning algorithms for complex systems modeling, CRC Press, Boca Raton, 1994.
[488]
O. Madani, S. Hanks, A. Condon, On the undecidability of probabilistic planning and related stochastic optimization problems, Artificial Intelligence, 147 (2003) 5-34.
[489]
Maei, H. R., & Sutton, R. S. (2010). GQ( λ ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the third conference on artificial general intelligence, vol. 1 (pp. 91-96).
[490]
R. Maex, G. Orban, Model circuit of spiking neurons generating directional selectivity in simple cells, Journal of Neurophysiology, 75 (1996) 1515-1545.
[491]
S. Mahadevan, Average reward reinforcement learning: Foundations, algorithms, and empirical results, Machine Learning, 22 (1996) 159.
[492]
J. Malik, P. Perona, Preattentive texture discrimination with early vision mechanisms, Journal of the Optical Society of America A, 7 (1990) 923-932.
[493]
V. Maniezzo, Genetic evolution of the topology and weight distribution of neural networks, IEEE Transactions on Neural Networks, 5 (1994) 39-53.
[494]
P. Manolios, R. Fanelli, First-order recurrent neural networks and deterministic finite state automata, Neural Computation, 6 (1994) 1155-1173.
[495]
Marchi, E., Ferroni, G., Eyben, F., Gabrielli, L., Squartini, S., & Schuller, B. (2014). Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In Proc. 39th IEEE international conference on acoustics, speech, and signal processing (pp. 2183-2187).
[496]
H. Markram, The human brain project, Scientific American, 306 (2012) 50-55.
[497]
D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial & Applied Mathematics, 11 (1963) 431-441.
[498]
J. Martens, Deep learning via Hessian-free optimization, in: Proceedings of the 27th international conference on machine learning, OmniPress, Haifa, Israel, 2010, pp. 735-742.
[499]
Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th international conference on machine learning (pp. 1033-1040).
[500]
T.M. Martinetz, H.J. Ritter, K.J. Schulten, Three-dimensional neural net for learning visuomotor coordination of a robot arm, IEEE Transactions on Neural Networks, 1 (1990) 131-136.
References 501 through 888 have been omitted.

Cited By

View all
  • (2024)Computer-aided detection of prostate cancer in early stages using multi-parameter MRITechnology and Health Care10.3233/THC-24801132:S1(125-133)Online publication date: 31-May-2024
  • (2024)Deep learning based decision tree ensembles for incomplete medical datasetsTechnology and Health Care10.3233/THC-22051432:1(75-87)Online publication date: 5-Jan-2024
  • (2024)Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognitionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23349846:1(1481-1501)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Networks
Neural Networks  Volume 61, Issue C
January 2015
130 pages

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 January 2015

Author Tags

  1. Deep learning
  2. Evolutionary computation
  3. Reinforcement learning
  4. Supervised learning
  5. Unsupervised learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computer-aided detection of prostate cancer in early stages using multi-parameter MRITechnology and Health Care10.3233/THC-24801132:S1(125-133)Online publication date: 31-May-2024
  • (2024)Deep learning based decision tree ensembles for incomplete medical datasetsTechnology and Health Care10.3233/THC-22051432:1(75-87)Online publication date: 5-Jan-2024
  • (2024)Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognitionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23349846:1(1481-1501)Online publication date: 1-Jan-2024
  • (2024)Analytical Calculation of Weights Convolutional Neural NetworkOptical Memory and Neural Networks10.3103/S1060992X2470006133:2(157-177)Online publication date: 1-Jun-2024
  • (2024)ASOD: an adaptive stream outlier detection method using online strategyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00682-013:1Online publication date: 5-Jul-2024
  • (2024)Detection of cotton leaf curl disease’s susceptibility scale level based on deep learningJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00582-913:1Online publication date: 26-Feb-2024
  • (2024)A modified LSTM network to predict the citation counts of papersJournal of Information Science10.1177/0165551522111100050:4(894-909)Online publication date: 1-Aug-2024
  • (2024)Transformer Inrush Current and Internal Fault Discrimination Using Multitypes of Convolutional Neural Network TechniquesJournal of Electrical and Computer Engineering10.1155/2024/39864002024Online publication date: 1-Jan-2024
  • (2024)A Comprehensive Analysis of Explainable AI for Malware HuntingACM Computing Surveys10.1145/367737456:12(1-40)Online publication date: 11-Jul-2024
  • (2024)Synthesizing Particle-In-Cell Simulations through Learning and GPU Computing for Hybrid Particle Accelerator BeamlinesProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659937(1-11)Online publication date: 3-Jun-2024
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media