research-article

Deep learning in neural networks

Author:

Jürgen SchmidhuberAuthors Info & Claims

Neural Networks, Volume 61, Issue C

Pages 85 - 117

https://doi.org/10.1016/j.neunet.2014.09.003

Published: 01 January 2015 Publication History

Abstract

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

References

[1]

D. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, Australian National University, 2003.

[2]

J. Abounadi, D. Bertsekas, V.S. Borkar, Learning algorithms for Markov decision processes with average cost, SIAM Journal on Control and Optimization, 40 (2002) 681-698.

Digital Library

[3]

H. Akaike, Statistical predictor identification, Annals of the Institute of Statistical Mathematics, 22 (1970) 203-217.

[4]

H. Akaike, Information theory and an extension of the maximum likelihood principle, in: Second intl. symposium on information theory, Akademinai Kiado, 1973, pp. 267-281.

[5]

H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19 (1974) 716-723.

[6]

A. Allender, Application of time-bounded Kolmogorov complexity in complexity theory, in: EATCS monographs on theoretical computer science, Springer, 1992, pp. 6-22.

[7]

Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In IEEE 1st international conference on neural networks, vol. 2 (pp. 609-618).

[8]

L.B. Almeida, L.B. Almeida, T. Langlois, J.D. Amaral, R.A. Redol, On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000, 1997.

[9]

S. Amari, A theory of adaptive pattern classifiers, IEEE Transactions on Electronic Computers, 16 (1967) 299-307.

[10]

S.-I. Amari, Natural gradient works efficiently in learning, Neural Computation, 10 (1998) 251-276.

Digital Library

[11]

S. Amari, A. Cichocki, H. Yang, A new learning algorithm for blind signal separation, in: Advances in neural information processing systems (NIPS), vol. 8, The MIT Press, 1996.

[12]

S. Amari, N. Murata, Statistical theory of learning curves under entropic loss criterion, Neural Computation, 5 (1993) 140-153.

Digital Library

[13]

D.J. Amit, N. Brunel, Dynamics of a recurrent network of spiking neurons before and following learning, Network: Computation in Neural Systems, 8 (1997) 373-404.

[14]

G. An, The effects of adding noise during backpropagation training on a generalization performance, Neural Computation, 8 (1996) 643-674.

Digital Library

[15]

M.A. Andrade, P. Chacon, J.J. Merelo, F. Moran, Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network, Protein Engineering, 6 (1993) 383-390.

[16]

R. Andrews, J. Diederich, A.B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-Based Systems, 8 (1995) 373-389.

Digital Library

[17]

D. Anguita, B.A. Gomes, Mixing floating- and fixed-point formats for neural network learning on neuroprocessors, Microprocessing and Microprogramming, 41 (1996) 757-769.

Digital Library

[18]

D. Anguita, G. Parodi, R. Zunino, An efficient implementation of BP on RISC-based workstations, Neurocomputing, 6 (1994) 57-65.

[19]

I. Arel, D.C. Rose, T.P. Karnowski, Deep machine learning-a new frontier in artificial intelligence research, IEEE Computational Intelligence Magazine, 5 (2010) 13-18.

Digital Library

[20]

T. Ash, Dynamic node creation in backpropagation neural networks, Connection Science, 1 (1989) 365-375.

[21]

J.J. Atick, Z. Li, A.N. Redlich, Understanding retinal color coding from first principles, Neural Computation, 4 (1992) 559-572.

Digital Library

[22]

A.F. Atiya, A.G. Parlos, New results on recurrent network training: unifying the algorithms and accelerating convergence, IEEE Transactions on Neural Networks, 11 (2000) 697-709.

Digital Library

[23]

J. Ba, B. Frey, Adaptive dropout for training deep neural networks, in: Advances in neural information processing systems (NIPS), 2013, pp. 3084-3092.

[24]

Baird, H. (1990). Document image defect models. In Proceddings, IAPR workshop on syntactic and structural pattern recognition.

[25]

Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In International conference on machine learning (pp. 30-37).

[26]

L. Baird, A.W. Moore, Gradient descent for general reinforcement learning, in: Advances in neural information processing systems, vol. 12 (NIPS), MIT Press, 1999, pp. 968-974.

Digital Library

[27]

B. Bakker, Reinforcement learning with long short-term memory, in: Advances in neural information processing systems, vol. 14, MIT Press, Cambridge, MA, 2002, pp. 1475-1482.

[28]

B. Bakker, J. Schmidhuber, Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization, in: Proc. 8th conference on intelligent autonomous systems IAS-8, IOS Press, Amsterdam, NL, 2004, pp. 438-445.

[29]

Bakker, B., Zhumatiy, V., Gruener, G., & Schmidhuber, J. (2003). A robot that reinforcement-learns to identify and memorize important previous observations. In Proceedings of the 2003 IEEE/RSJ international conference on intelligent robots and systems (pp. 430-435).

[30]

P. Baldi, Gradient descent learning algorithms overview: A general dynamical systems perspective, IEEE Transactions on Neural Networks, 6 (1995) 182-195.

Digital Library

[31]

P. Baldi, Autoencoders, unsupervised learning, and deep architectures, Journal of Machine Learning Research, 27 (2012) 37-50.

[32]

P. Baldi, S. Brunak, P. Frasconi, G. Pollastri, G. Soda, Exploiting the past and the future in protein secondary structure prediction, Bioinformatics, 15 (1999) 937-946.

[33]

P. Baldi, Y. Chauvin, Neural networks for fingerprint recognition, Neural Computation, 5 (1993) 402-418.

Digital Library

[34]

P. Baldi, Y. Chauvin, Hybrid modeling, HMM/NN architectures, and protein applications, Neural Computation, 8 (1996) 1541-1565.

Digital Library

[35]

P. Baldi, K. Hornik, Neural networks and principal component analysis: learning from examples without local minima, Neural Networks, 2 (1989) 53-58.

Digital Library

[36]

P. Baldi, K. Hornik, Learning in linear networks: a survey, IEEE Transactions on Neural Networks, 6 (1995) 837-858.

Digital Library

[37]

P. Baldi, G. Pollastri, The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem, Journal of Machine Learning Research, 4 (2003) 575-602.

Digital Library

[38]

P. Baldi, P. Sadowski, The dropout learning algorithm, Artificial Intelligence, 210C (2014) 78-122.

[39]

Ballard, D. H. (1987). Modular learning in neural networks. In Proc. AAAI (pp. 279-284).

Digital Library

[40]

S. Baluja, Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163, Carnegie Mellon University, 1994.

Digital Library

[41]

R. Balzer, A 15 year perspective on automatic programming, IEEE Transactions on Software Engineering, 11 (1985) 1257-1268.

Digital Library

[42]

H.B. Barlow, Unsupervised learning, Neural Computation, 1 (1989) 295-311.

Digital Library

[43]

H.B. Barlow, T.P. Kaushal, G.J. Mitchison, Finding minimum entropy codes, Neural Computation, 1 (1989) 412-423.

Digital Library

[44]

H.G. Barrow, Learning receptive fields, in: Proceedings of the IEEE 1st annual conference on neural networks, vol. IV, IEEE, 1987, pp. 115-121.

[45]

A.G. Barto, S. Mahadevan, Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems, 13 (2003) 341-379.

Digital Library

[46]

A.G. Barto, S. Singh, N. Chentanez, Intrinsically motivated learning of hierarchical collections of skills, in: Proceedings of international conference on developmental learning, MIT Press, Cambridge, MA, 2004, pp. 112-119.

[47]

A.G. Barto, R.S. Sutton, C.W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, SMC-13 (1983) 834-846.

[48]

R. Battiti, Accelerated backpropagation learning: two optimization methods, Complex Systems, 3 (1989) 331-342.

[49]

T. Battiti, First- and second-order methods for learning: between steepest descent and Newton's method, Neural Computation, 4 (1992) 141-166.

Digital Library

[50]

E.B. Baum, D. Haussler, What size net gives valid generalization?, Neural Computation, 1 (1989) 151-160.

Digital Library

[51]

L.E. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, The Annals of Mathematical Statistics (1966) 1554-1563.

[52]

J. Baxter, P.L. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, 15 (2001) 319-350.

Digital Library

[53]

Bayer, J., & Osendorfer, C. (2014). Variational inference of latent state sequences using recurrent networks. ArXiv Preprint arXiv:1406.1655.

[54]

Bayer, J., Osendorfer, C., Chen, N., Urban, S., & van der Smagt, P. (2013). On fast dropout and its applicability to recurrent networks. ArXiv Preprint arXiv:1311.0701.

[55]

Bayer, J., Wierstra, D., Togelius, J., & Schmidhuber, J. (2009). Evolving memory cell structures for sequence learning. In Proc. ICANN (2) (pp. 755-764).

Digital Library

[56]

T. Bayes, An essay toward solving a problem in the doctrine of chances, Philosophical Transactions of the Royal Society of London, 53 (1763) 370-418.

[57]

S. Becker, Unsupervised learning procedures for neural networks, International Journal of Neural Systems, 2 (1991) 17-33.

[58]

S. Becker, Y. Le Cun, Improving the convergence of back-propagation learning with second order methods, in: Proc. 1988 connectionist models summer school, 1988, Morgan Kaufmann, San Mateo, 1989, pp. 29-37.

[59]

Behnke, S. (1999). Hebbian learning and competition in the neural abstraction pyramid. In Proceedings of the international joint conference on neural networks, vol. 2 (pp. 1356-1361).

[60]

S. Behnke, Learning iterative image reconstruction in the neural abstraction pyramid, International Journal of Computational Intelligence and Applications, 1 (2001) 427-438.

[61]

Behnke, S. (2002). Learning face localization using hierarchical recurrent networks. In Proceedings of the 12th international conference on artificial neural networks (pp. 1319-1324).

Digital Library

[62]

Behnke, S. (2003a). Discovering hierarchical speech features using convolutional non-negative matrix factorization. In Proceedings of the international joint conference on neural networks, vol. 4 (pp. 2758-2763).

[63]

S. Behnke, Hierarchical neural networks for image interpretation, in: LNCS, Lecture notes in computer science, Vol. 2766, Springer, 2003.

Digital Library

[64]

S. Behnke, Face localization and tracking in the neural abstraction pyramid, Neural Computing and Applications, 14 (2005) 97-103.

Digital Library

[65]

Behnke, S., & Rojas, R. (1998). Neural abstraction pyramid: a hierarchical image understanding architecture. In Proceedings of international joint conference on neural networks, vol. 2 (pp. 820-825).

[66]

A.J. Bell, T.J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, 7 (1995) 1129-1159.

Digital Library

[67]

R. Bellman, Dynamic programming, Princeton University Press, Princeton, NJ, USA, 1957.

[68]

A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, E. Moulines, A blind source separation technique using second-order statistics, IEEE Transactions on Signal Processing, 45 (1997) 434-444.

Digital Library

[69]

Y. Bengio, Artificial neural networks and their application to sequence recognition, McGill University, (Computer Science), Montreal, QC, Canada, 1991.

[70]

Y. Bengio, Learning deep architectures for AI, in: Foundations and trends in machine learning, Vol. 2(1), Now Publishers, 2009.

Digital Library

[71]

Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1798-1828.

Digital Library

[72]

Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in: Advances in neural information processing systems, vol. 19 (NIPS), MIT Press, 2007, pp. 153-160.

Digital Library

[73]

Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, 5 (1994) 157-166.

Digital Library

[74]

N. Beringer, A. Graves, F. Schiel, J. Schmidhuber, Classifying unprompted speech by retraining LSTM nets, in: LNCS, Vol. 3696, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 575-581.

Digital Library

[75]

D.P. Bertsekas, Dynamic programming and optimal control, Athena Scientific, 2001.

Digital Library

[76]

D.P. Bertsekas, J.N. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, Belmont, MA, 1996.

Digital Library

[77]

N.P. Bichot, A.F. Rossi, R. Desimone, Parallel and serial neural mechanisms for visual search in macaque area V4, Science, 308 (2005) 529-534.

[78]

F. Biegler-König, F. Bärmann, A learning algorithm for multilayered neural networks based on linear least squares problems, Neural Networks, 6 (1993) 127-131.

Digital Library

[79]

C.M. Bishop, Curvature-driven smoothing: A learning algorithm for feed-forward networks, IEEE Transactions on Neural Networks, 4 (1993) 882-884.

Digital Library

[80]

C.M. Bishop, Pattern recognition and machine learning, Springer, 2006.

Digital Library

[81]

A.D. Blair, J.B. Pollack, Analysis of dynamical recognizers, Neural Computation, 9 (1997) 1127-1142.

Digital Library

[82]

V.D. Blondel, J.N. Tsitsiklis, A survey of computational complexity results in systems and control, Automatica, 36 (2000) 1249-1274.

Digital Library

[83]

Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, F., & Kermorvant, C. (2014). The A2iA Arabic handwritten text recognition system at the OpenHaRT2013 evaluation. In International workshop on document analysis systems.

[84]

A.L. Blum, R.L. Rivest, Training a 3-node neural network is NP-complete, Neural Networks, 5 (1992) 117-127.

Digital Library

[85]

A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Occam's razor, Information Processing Letters, 24 (1987) 377-380.

Digital Library

[86]

L. Bobrowski, Learning processes in multilayer threshold nets, Biological Cybernetics, 31 (1978) 1-6.

Digital Library

[87]

M. Bodén, J. Wiles, Context-free and context-sensitive dynamics in recurrent neural networks, Connection Science, 12 (2000) 197-210.

[88]

U. Bodenhausen, A. Waibel, The Tempo 2 algorithm: adjusting time-delays by supervised learning, in: Advances in neural information processing systems, vol. 3, Morgan Kaufmann, 1991, pp. 155-161.

Digital Library

[89]

S.M. Bohte, J.N. Kok, H. La Poutre, Error-backpropagation in temporally encoded networks of spiking neurons, Neurocomputing, 48 (2002) 17-37.

[90]

L. Boltzmann, Wissenschaftliche Abhandlungen, in: Wissenschaftliche Abhandlungen, Barth, Leipzig, 1909.

[91]

L. Bottou, Une approche théorique de l'apprentissage connexioniste; applications à la reconnaissance de la parole, Université de Paris XI, 1991.

[92]

H. Bourlard, N. Morgan, Connnectionist speech recognition: a hybrid approach, Kluwer Academic Publishers, 1994.

Digital Library

[93]

Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable Markov decision processes using compact representations. In Proceedings of the AAAI.

Digital Library

[94]

S.J. Bradtke, A.G. Barto, L.P. Kaelbling, Linear least-squares algorithms for temporal difference learning, Machine Learning (1996) 22-33.

Digital Library

[95]

R.I. Brafman, M. Tennenholtz, R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, 3 (2002) 213-231.

Digital Library

[96]

J. Brea, W. Senn, J.-P. Pfister, Matching recall and storage in sequence learning with spiking neural networks, The Journal of Neuroscience, 33 (2013) 9565-9575.

[97]

L. Breiman, Bagging predictors, Machine Learning, 24 (1996) 123-140.

Digital Library

[98]

R. Brette, M. Rudolph, T. Carnevale, M. Hines, D. Beeman, J.M. Bower, Simulation of networks of spiking neurons: a review of tools and strategies, Journal of Computational Neuroscience, 23 (2007) 349-398.

[99]

T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: 12th International conference on document analysis and recognition, IEEE, 2013, pp. 683-687.

Digital Library

[100]

J. Bromley, J.W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, Signature verification using a Siamese time delay neural network, International Journal of Pattern Recognition and Artificial Intelligence, 7 (1993) 669-688.

[101]

C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation, 19 (1965) 577-593.

[102]

Brueckner, R., & Schulter, B. (2014). Social signal classification using deep BLSTM recurrent neural networks. In Proceedings 39th IEEE international conference on acoustics, speech, and signal processing (pp. 4856-4860).

[103]

N. Brunel, Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons, Journal of Computational Neuroscience, 8 (2000) 183-208.

[104]

Bryson, A. E. (1961). A gradient method for optimizing multi-stage allocation processes. In Proc. Harvard Univ. symposium on digital computers and their applications.

[105]

A.E. Bryson Jr., W.F. Denham, A steepest-ascent method for solving optimum programming problems. Technical report BR-1303, Raytheon Company, Missle and Space Division, 1961.

[106]

A. Bryson, Y. Ho, Applied optimal control: optimization, estimation, and control, Blaisdell Pub. Co, 1969.

[107]

J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics, 17 (2001) 419-428.

[108]

W.L. Buntine, A.S. Weigend, Bayesian back-propagation, Complex Systems, 5 (1991) 603-643.

[109]

N. Burgess, A constructive algorithm that converges for real-valued input patterns, International Journal of Neural Systems, 5 (1994) 59-66.

[110]

Cardoso, J.-F. (1994). On the performance of orthogonal source separation algorithms. In Proc. EUSIPCO (pp. 776-779).

[111]

M.A. Carreira-Perpinan, Continuous latent variable models for dimensionality reduction and sequential data reconstruction, University of Sheffield, UK, 2001.

[112]

M.J. Carter, F.J. Rudolph, A.J. Nucci, Operational fault tolerance of CMAC networks, in: Advances in neural information processing systems (NIPS), vol. 2, Morgan Kaufmann, San Mateo, CA, 1990, pp. 340-347.

Digital Library

[113]

R. Caruana, Multitask learning, Machine Learning, 28 (1997) 41-75.

Digital Library

[114]

M.P. Casey, The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction, Neural Computation, 8 (1996) 1135-1178.

Digital Library

[115]

G. Cauwenberghs, A fast stochastic error-descent algorithm for supervised learning and optimization, in: Advances in neural information processing systems, vol. 5, Morgan Kaufmann, 1993, pp. 244.

Digital Library

[116]

G.J. Chaitin, On the length of programs for computing finite binary sequences, Journal of the ACM, 13 (1966) 547-569.

Digital Library

[117]

S.K. Chalup, A.D. Blair, Incremental training of first order recurrent neural networks to predict a context-sensitive language, Neural Networks, 16 (2003) 955-972.

Digital Library

[118]

Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In International workshop on Frontiers in handwriting recognition.

[119]

K. Chen, A. Salman, Learning speaker-specific characteristics with a deep neural architecture, IEEE Transactions on Neural Networks, 22 (2011) 1744-1756.

Digital Library

[120]

K. Cho, Foundations and advances in deep learning, Aalto University School of Science, 2014.

[121]

K. Cho, A. Ilin, T. Raiko, Tikhonov-type regularization for restricted Boltzmann machines, in: Intl. conf. on artificial neural networks 2012, Springer, 2012, pp. 81-88.

Digital Library

[122]

K. Cho, T. Raiko, A. Ilin, Enhanced gradient for training restricted Boltzmann machines, Neural Computation, 25 (2013) 805-831.

Digital Library

[123]

A. Church, An unsolvable problem of elementary number theory, The American Journal of Mathematics, 58 (1936) 345-363.

[124]

D.C. Ciresan, A. Giusti, L.M. Gambardella, J. Schmidhuber, Deep neural networks segment neuronal membranes in electron microscopy images, in: Advances in neural information processing systems (NIPS), 2012, pp. 2852-2860.

Digital Library

[125]

Ciresan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In Proc. MICCAI, vol. 2 (pp. 411-418).

[126]

D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmidhuber, Deep big simple neural nets for handwritten digit recogntion, Neural Computation, 22 (2010) 3207-3220.

Digital Library

[127]

Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011). Flexible, high performance convolutional neural networks for image classification. In Intl. joint conference on artificial intelligence (pp. 1237-1242).

Digital Library

[128]

Ciresan, D. C., Meier, U., Masci, J., & Schmidhuber, J. (2011). A committee of neural networks for traffic sign classification. In International joint conference on neural networks (pp. 1918-1921).

[129]

D.C. Ciresan, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks, 32 (2012) 333-338.

Digital Library

[130]

Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012a). Multi-column deep neural networks for image classification. In IEEE Conference on computer vision and pattern recognition. Long preprint arXiv:1202.2745v1 ¿cs.CV.

Digital Library

[131]

Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012b). Transfer learning for Latin and Chinese characters with deep neural networks. In International joint conference on neural networks (pp. 1301-1306).

[132]

D.C. Ciresan, J. Schmidhuber, Multi-column deep neural networks for offline handwritten Chinese character classification. Technical report, IDSIA, 2013. arXiv:1309.0261

[133]

D.T. Cliff, P. Husbands, I. Harvey, Evolving recurrent dynamical networks for robot control, in: Artificial neural nets and genetic algorithms, Springer, 1993, pp. 428-435.

[134]

J. Clune, J.-B. Mouret, H. Lipson, The evolutionary origins of modularity, Proceedings of the Royal Society B: Biological Sciences, 280 (2013) 20122863.

[135]

J. Clune, K.O. Stanley, R.T. Pennock, C. Ofria, On the performance of indirect encoding across the continuum of regularity, IEEE Transactions on Evolutionary Computation, 15 (2011) 346-367.

Digital Library

[136]

Coates, A., Huval, B., Wang, T., Wu, D. J., Ng, A. Y., & Catanzaro, B. (2013). Deep learning with COTS HPC systems. In Proc. international conference on machine learning.

[137]

A. Cochocki, R. Unbehauen, Neural networks for optimization and signal processing, John Wiley & Sons, Inc, 1993.

Digital Library

[138]

R. Collobert, J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, in: Proceedings of the 25th international conference on machine learning, ACM, 2008, pp. 160-167.

Digital Library

[139]

P. Comon, Independent component analysis-a new concept?, Signal Processing, 36 (1994) 287-314.

Digital Library

[140]

C.E. Connor, S.L. Brincat, A. Pasupathy, Transformation of shape information in the ventral pathway, Current Opinion in Neurobiology, 17 (2007) 140-147.

[141]

J. Connor, D.R. Martin, L.E. Atlas, Recurrent neural networks and robust time series prediction, IEEE Transactions on Neural Networks, 5 (1994) 240-254.

Digital Library

[142]

S.A. Cook, The complexity of theorem-proving procedures, in: Proceedings of the 3rd annual ACM symposium on the theory of computing, ACM, New York, 1971, pp. 151-158.

Digital Library

[143]

N.L. Cramer, A representation for the adaptive generation of simple sequential programs, in: Proceedings of an international conference on genetic algorithms and their applications, Carnegie-Mellon University, Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.

Digital Library

[144]

P. Craven, G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation, Numerische Mathematik, 31 (1979) 377-403.

Digital Library

[145]

G. Cuccu, M. Luciw, J. Schmidhuber, F. Gomez, Intrinsically motivated evolutionary search for vision-based reinforcement learning, in: Proceedings of the 2011 IEEE conference on development and learning and epigenetic robotics IEEE-ICDL-EPIROB, vol. 2, IEEE, 2011, pp. 1-7.

[146]

G.E. Dahl, T.N. Sainath, G.E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in: IEEE International conference on acoustics, speech and signal processing, IEEE, 2013, pp. 8609-8613.

[147]

G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech and Language Processing, 20 (2012) 30-42.

Digital Library

[148]

D'Ambrosio, D. B., & Stanley, K. O. (2007). A novel generative encoding for exploiting neural network sensor and output geometry. In Proceedings of the conference on genetic and evolutionary computation (pp. 974-981).

Digital Library

[149]

M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, Locality-sensitive hashing scheme based on p -stable distributions, in: Proceedings of the 20th annual symposium on computational geometry, ACM, 2004, pp. 253-262.

Digital Library

[150]

P. Dayan, G. Hinton, Feudal reinforcement learning, in: Advances in neural information processing systems (NIPS), vol. 5, Morgan Kaufmann, 1993, pp. 271-278.

Digital Library

[151]

P. Dayan, G.E. Hinton, Varieties of Helmholtz machine, Neural Networks, 9 (1996) 1385-1403.

Digital Library

[152]

P. Dayan, G.E. Hinton, R.M. Neal, R.S. Zemel, The Helmholtz machine, Neural Computation, 7 (1995) 889-904.

Digital Library

[153]

P. Dayan, R. Zemel, Competition and multiple cause models, Neural Computation, 7 (1995) 565-579.

Digital Library

[154]

G. Deco, L. Parra, Non-linear feature extraction by redundancy reduction in an unsupervised stochastic neural network, Neural Networks, 10 (1997) 683-691.

Digital Library

[155]

G. Deco, E.T. Rolls, Neurodynamics of biased competition and cooperation for attention: a model with spiking neurons, Journal of Neurophysiology, 94 (2005) 295-313.

[156]

J.F.G. De Freitas, Bayesian methods for neural networks, University of Cambridge, 2003.

[157]

G. DeJong, R. Mooney, Explanation-based learning: an alternative view, Machine Learning, 1 (1986) 145-176.

Digital Library

[158]

D. DeMers, G. Cottrell, Non-linear dimensionality reduction, in: Advances in neural information processing systems (NIPS), vol. 5, Morgan Kaufmann, 1993, pp. 580-587.

Digital Library

[159]

A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society B, 39 (1977).

[160]

L. Deng, D. Yu, Deep learning: methods and applications, NOW Publishers, 2014.

Digital Library

[161]

R. Desimone, T.D. Albright, C.G. Gross, C. Bruce, Stimulus-selective properties of inferior temporal neurons in the macaque, The Journal of Neuroscience, 4 (1984) 2051-2062.

[162]

M.C. de Souto, M.C.P.D. Souto, W.R.D. Oliveira, The loading problem for pyramidal neural networks, Electronic Journal on Mathematics of Computation (1999).

[163]

R.L. De Valois, D.G. Albrecht, L.G. Thorell, Spatial frequency selectivity of cells in macaque visual cortex, Vision Research, 22 (1982) 545-559.

[164]

Y. Deville, K.K. Lau, Logic program synthesis, Journal of Logic Programming, 19 (1994) 321-350.

[165]

B. de Vries, J.C. Principe, A theory for neural networks with time delays, in: Advances in neural information processing systems (NIPS), vol. 3, Morgan Kaufmann, 1991, pp. 162-168.

Digital Library

[166]

J.J. DiCarlo, D. Zoccolan, N.C. Rust, How does the brain solve visual object recognition?, Neuron, 73 (2012) 415-434.

[167]

Dickmanns, E. D., Behringer, R., Dickmanns, D., Hildebrandt, T., Maurer, M., & Thomanek, F., et al. (1994). The seeing passenger car 'VaMoRs-P'. In Proc. int. symp. on intelligent vehicles (pp. 68-73).

[168]

D. Dickmanns, J. Schmidhuber, A. Winklhofer, Der genetische algorithmus: eine implementierung in prolog. Technical report, Inst. of Informatics, Tech. Univ. Munich, 1987. http://www.idsia.ch/~juergen/geneticprogramming.html

[169]

T.G. Dietterich, Ensemble methods in machine learning, in: Multiple classifier systems, Springer, 2000, pp. 1-15.

Digital Library

[170]

T.G. Dietterich, Hierarchical reinforcement learning with the MAXQ value function decomposition, Journal of Artificial Intelligence Research (JAIR), 13 (2000) 227-303.

Digital Library

[171]

P. Di Lena, K. Nagata, P. Baldi, Deep architectures for protein contact map prediction, Bioinformatics, 28 (2012) 2449-2457.

Digital Library

[172]

S.W. Director, R.A. Rohrer, Automated network design-the frequency-domain case, IEEE Transactions on Circuit Theory, CT-16 (1969) 330-337.

[173]

M. Dittenbach, D. Merkl, A. Rauber, The growing hierarchical self-organizing map, in: IEEE-INNS-ENNS International joint conference on neural networks, vol. 6, IEEE Computer Society, 2000, pp. 6015.

Digital Library

[174]

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., & Tzeng, E., et al. (2013). DeCAF: a deep convolutional activation feature for generic visual recognition. ArXiv Preprint arXiv:1310.1531.

[175]

Dorffner, G. (1996). Neural networks for time series processing. In Neural network world.

[176]

K. Doya, K. Samejima, K. Ichi Katagiri, M. Kawato, Multiple model-based reinforcement learning, Neural Computation, 14 (2002) 1347-1369.

Digital Library

[177]

S.E. Dreyfus, The numerical solution of variational problems, Journal of Mathematical Analysis and Applications, 5 (1962) 30-45.

[178]

S.E. Dreyfus, The computational solution of optimal control problems with time lag, IEEE Transactions on Automatic Control, 18 (1973) 383-385.

[179]

J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, The Journal of Machine Learning, 12 (2011) 2121-2159.

Digital Library

[180]

Egorova, A., Gloye, A., Göktekin, C., Liers, A., Luft, M., & Rojas, R., et al. (2004). FU-fighters small size 2004, team description. In RoboCup 2004 symposium: papers and team description papers. CD edition.

[181]

S. Elfwing, M. Otsuka, E. Uchibe, K. Doya, Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs, in: Neural information processing. theory and algorithms (ICONIP), vol. 1, Springer, 2010, pp. 215-222.

Digital Library

[182]

C. Eliasmith, How to build a brain: a neural architecture for biological cognition, Oxford University Press, New York, NY, 2013.

[183]

C. Eliasmith, T.C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, A large-scale model of the functioning brain, Science, 338 (2012) 1202-1205.

[184]

J.L. Elman, Finding structure in time, Cognitive Science, 14 (1990) 179-211.

[185]

D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning?, Journal of Machine Learning Research, 11 (2010) 625-660.

Digital Library

[186]

A.N. Escalante-B, L. Wiskott, How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis, Journal of Machine Learning Research, 14 (2013) 3683-3719.

Digital Library

[187]

R.L. Eubank, Spline smoothing and nonparametric regression, in: Self-organizing methods in modeling, Marcel Dekker, New York, 1988.

[188]

Euler, L. (1744). Methodus inveniendi.

[189]

Eyben, F., Weninger, F., Squartini, S., & Schuller, B. (2013). Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies. In Proc. 38th IEEE international conference on acoustics, speech, and signal processing (pp. 483-487).

[190]

Faggin, F. (1992). Neural network hardware. In International joint conference on neural networks, vol. 1 (p. 153).

[191]

S.E. Fahlman, An empirical study of learning speed in back-propagation networks. Technical report CMU-CS-88-162, Carnegie-Mellon Univ., 1988.

[192]

S.E. Fahlman, The recurrent cascade-correlation learning algorithm, in: Advances in neural information processing systems (NIPS), vol. 3, Morgan Kaufmann, 1991, pp. 190-196.

Digital Library

[193]

M.S. Falconbridge, R.L. Stamps, D.R. Badcock, A simple Hebbian/anti-Hebbian network learns the sparse, independent components of natural images, Neural Computation, 18 (2006) 415-429.

Digital Library

[194]

Fan, Y., Qian, Y., Xie, F., & Soong, F. K. (2014). TTS synthesis with bidirectional LSTM based recurrent neural networks. In Proc. Interspeech.

[195]

C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1915-1929.

Digital Library

[196]

S.J. Farlow, Self-organizing methods in modeling: GMDH type algorithms, vol. 54, CRC Press, 1984.

Digital Library

[197]

L.A. Feldkamp, D.V. Prokhorov, C.F. Eagen, F. Yuan, Enhanced multi-stream Kalman filter training for recurrent networks, in: Nonlinear modeling, Springer, 1998, pp. 29-53.

[198]

L.A. Feldkamp, D.V. Prokhorov, T.M. Feldkamp, Simple and conditioned adaptive behavior from Kalman filter trained recurrent networks, Neural Networks, 16 (2003) 683-689.

Digital Library

[199]

L.A. Feldkamp, G.V. Puskorius, A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification, Proceedings of the IEEE, 86 (1998) 2259-2277.

[200]

D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex, Cerebral Cortex, 1 (1991) 1-47.

[201]

Fernández, S., Graves, A., & Schmidhuber, J. (2007a). An application of recurrent neural networks to discriminative keyword spotting. In Proc. ICANN (2) (pp. 220-229).

Digital Library

[202]

Fernandez, S., Graves, A., & Schmidhuber, J. (2007b). Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proceedings of the 20th international joint conference on artificial intelligence.

Digital Library

[203]

Fernandez, R., Rendel, A., Ramabhadran, B., & Hoory, R. (2014). Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In Proc. Interspeech.

[204]

D.J. Field, Relations between the statistics of natural images and the response properties of cortical cells, Journal of the Optical Society of America, 4 (1987) 2379-2394.

[205]

D.J. Field, What is the goal of sensory coding?, Neural Computation, 6 (1994) 559-601.

Digital Library

[206]

Fieres, J., Schemmel, J., & Meier, K. (2008). Realizing biological spiking network models in a configurable wafer-scale hardware system. In IEEE International joint conference on neural networks (pp. 969-976).

[207]

S. Fine, Y. Singer, N. Tishby, The hierarchical hidden Markov model: analysis and applications, Machine Learning, 32 (1998) 41-62.

Digital Library

[208]

A. Fischer, C. Igel, Training restricted Boltzmann machines: an introduction, Pattern Recognition, 47 (2014) 25-39.

Digital Library

[209]

R. FitzHugh, Impulses and physiological states in theoretical models of nerve membrane, Biophysical Journal, 1 (1961) 445-466.

[210]

R. Fletcher, M.J. Powell, A rapidly convergent descent method for minimization, The Computer Journal, 6 (1963) 163-168.

[211]

D. Floreano, C. Mattiussi, Evolution of spiking neural controllers for autonomous vision-based robots, in: Evolutionary robotics. From intelligent robotics to artificial life, Springer, 2001, pp. 38-61.

Digital Library

[212]

D.B. Fogel, L.J. Fogel, V. Porto, Evolving neural networks, Biological Cybernetics, 63 (1990) 487-493.

Digital Library

[213]

L. Fogel, A. Owens, M. Walsh, Artificial intelligence through simulated evolution, Wiley, New York, 1966.

[214]

P. Földiák, Forming sparse representations by local anti-Hebbian learning, Biological Cybernetics, 64 (1990) 165-170.

Digital Library

[215]

P. Földiák, M.P. Young, Sparse coding in the primate cortex, in: The handbook of brain theory and neural networks, The MIT Press, 1995, pp. 895-898.

Digital Library

[216]

Förster, A., Graves, A., & Schmidhuber, J. (2007). RNN-based learning of compact maps for efficient robot localization. In 15th European symposium on artificial neural networks (pp. 537-542).

[217]

M. Franzius, H. Sprekeler, L. Wiskott, Slowness and sparseness lead to place, head-direction, and spatial-view cells, PLoS Computational Biology, 3 (2007) 166.

[218]

Friedman, J., Hastie, T., & Tibshirani, R. (2001). Springer series in statistics: Vol. 1. The elements of statistical learning. New York.

[219]

V. Frinken, F. Zamora-Martinez, S. Espana-Boquera, M.J. Castro-Bleda, A. Fischer, H. Bunke, Long-short term memory neural networks language modeling for handwriting recognition, in: 2012 21st International conference on pattern recognition, IEEE, 2012, pp. 701-704.

[220]

B. Fritzke, A growing neural gas network learns topologies, in: NIPS, MIT Press, 1994, pp. 625-632.

[221]

K.S. Fu, Syntactic pattern recognition and applications, Springer, Berlin, 1977.

[222]

T. Fukada, M. Schuster, Y. Sagisaka, Phoneme boundary estimation using bidirectional recurrent neural networks and its applications, Systems and Computers in Japan, 30 (1999) 20-30.

[223]

K. Fukushima, Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron, Transactions of the IECE, J62-A (1979) 658-665.

[224]

K. Fukushima, Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, 36 (1980) 193-202.

[225]

K. Fukushima, Increasing robustness against background noise: visual pattern recognition by a neocognitron, Neural Networks, 24 (2011) 767-778.

Digital Library

[226]

K. Fukushima, Artificial vision by multi-layered neural networks: neocognitron and its advances, Neural Networks, 37 (2013) 103-119.

Digital Library

[227]

K. Fukushima, Training multi-layered neural network neocognitron, Neural Networks, 40 (2013) 18-31.

Digital Library

[228]

D. Gabor, Theory of communication. Part 1: the analysis of information, Electrical Engineers-Part III: Journal of the Institution of Radio and Communication Engineering, 93 (1946) 429-441.

[229]

S.I. Gallant, Connectionist expert systems, Communications of the ACM, 31 (1988) 152-169.

Digital Library

[230]

Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium.

[231]

Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimis obnoxiae (Theory of the combination of observations least subject to error).

[232]

S. Ge, C.C. Hang, T.H. Lee, T. Zhang, Stable adaptive neural network control, Springer, 2010.

Digital Library

[233]

Geiger, J. T., Zhang, Z., Weninger, F., Schuller, B., & Rigoll, G. (2014). Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In Proc. interspeech.

[234]

S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Computation, 4 (1992) 1-58.

Digital Library

[235]

F.A. Gers, J. Schmidhuber, Recurrent nets that time and count, in: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, 2000, vol. 3, IEEE, 2000, pp. 189-194.

Digital Library

[236]

F.A. Gers, J. Schmidhuber, LSTM recurrent networks learn simple context free and context sensitive languages, IEEE Transactions on Neural Networks, 12 (2001) 1333-1340.

Digital Library

[237]

F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, Neural Computation, 12 (2000) 2451-2471.

Digital Library

[238]

F.A. Gers, N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks, Journal of Machine Learning Research, 3 (2002) 115-143.

Digital Library

[239]

W. Gerstner, W.K. Kistler, Spiking neuron models, Cambridge University Press, 2002.

Digital Library

[240]

W. Gerstner, J.L. van Hemmen, Associative memory in a network of spiking neurons, Network: Computation in Neural Systems, 3 (1992) 139-164.

[241]

Ghavamzadeh, M., & Mahadevan, S. (2003). Hierarchical policy gradient algorithms. In Proceedings of the twentieth conference on machine learning (pp. 226-233).

[242]

Gherrity, M. (1989). A learning algorithm for analog fully recurrent neural networks. In IEEE/INNS International joint conference on neural networks, San Diego, vol. 1 (pp. 643-644).

[243]

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. Technical report, UC Berkeley and ICSI, 2013. arxiv.org/abs/1311.2524

[244]

L. Gisslen, M. Luciw, V. Graziano, J. Schmidhuber, Sequential constant size compressor for reinforcement learning, in: Proc. fourth conference on artificial general intelligence, Springer, 2011, pp. 31-40.

Digital Library

[245]

Giusti, A., Ciresan, D. C., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2013). Fast image scanning with deep max-pooling convolutional neural networks. In Proc. ICIP.

[246]

B. Glackin, T.M. McGinnity, L.P. Maguire, Q. Wu, A. Belatreche, A novel approach for the implementation of large scale spiking neural networks on FPGA hardware, in: Computational intelligence and bioinspired systems, Springer, 2005, pp. 552-563.

Digital Library

[247]

T. Glasmachers, T. Schaul, Y. Sun, D. Wierstra, J. Schmidhuber, Exponential natural evolution strategies, in: Proceedings of the genetic and evolutionary computation conference, ACM, 2010, pp. 393-400.

Digital Library

[248]

Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier networks. In AISTATS, vol. 15 (pp. 315-323).

[249]

A. Gloye, F. Wiesel, O. Tenchio, M. Simon, Reinforcing the driving quality of soccer playing robots by anticipation, IT-Information Technology, 47 (2005).

[250]

K. Gödel, Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I, Monatshefte für Mathematik und Physik, 38 (1931) 173-198.

[251]

D.E. Goldberg, Genetic algorithms in search, optimization and machine learning, Addison-Wesley, Reading, MA, 1989.

Digital Library

[252]

D. Goldfarb, A family of variable-metric methods derived by variational means, Mathematics of Computation, 24 (1970) 23-26.

[253]

G. Golub, H. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21 (1979) 215-224.

[254]

F.J. Gomez, Robust nonlinear control through neuroevolution, Department of Computer Sciences, University of Texas at Austin, 2003.

Digital Library

[255]

Gomez, F. J., & Miikkulainen, R. (2003). Active guidance for a finless rocket using neuroevolution. In Proc. GECCO 2003.

Digital Library

[256]

F.J. Gomez, J. Schmidhuber, Co-evolving recurrent neurons learn deep memory POMDPs, in: Proc. of the 2005 conference on genetic and evolutionary computation, ACM Press, New York, NY, USA, 2005.

Digital Library

[257]

F.J. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses, Journal of Machine Learning Research, 9 (2008) 937-965.

Digital Library

[258]

H. Gomi, M. Kawato, Neural network control for a closed-loop system using feedback-error-learning, Neural Networks, 6 (1993) 933-946.

Digital Library

[259]

Gonzalez-Dominguez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., & Moreno, P. J. (2014). Automatic language identification using long short-term memory recurrent neural networks. In Proc. Interspeech.

[260]

Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., & Shet, V. (2014). Multi-digit number recognition from street view imagery using deep convolutional neural networks. ArXiv Preprint arXiv:1312.6082v4.

[261]

Goodfellow, I. J., Courville, A., & Bengio, Y. (2011). Spike-and-slab sparse coding for unsupervised feature discovery. In NIPS Workshop on challenges in learning hierarchical models.

[262]

Goodfellow, I. J., Courville, A. C., & Bengio, Y. (2012). Large-scale feature learning with spike-and-slab sparse coding. In Proceedings of the 29th international conference on machine learning.

[263]

I. Goodfellow, M. Mirza, X. Da, A. Courville, Y. Bengio, An empirical investigation of catastrophic forgetting in gradient-based neural networks. TR, 2014. arXiv:1312.6211v2

[264]

Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International conference on machine learning.

[265]

A. Graves, Practical variational inference for neural networks, in: Advances in neural information processing systems (NIPS), 2011, pp. 2348-2356.

Digital Library

[266]

Graves, A., Eck, D., Beringer, N., & Schmidhuber, J. (2003). Isolated digit recognition with LSTM recurrent networks. In First international workshop on biologically inspired approaches to advanced information technology.

[267]

Graves, A., Fernandez, S., Gomez, F. J., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. In ICML'06: Proceedings of the 23rd international conference on machine learning (pp. 369-376).

Digital Library

[268]

A. Graves, S. Fernandez, M. Liwicki, H. Bunke, J. Schmidhuber, Unconstrained on-line handwriting recognition with recurrent neural networks, in: Advances in neural information processing systems (NIPS), vol. 20, MIT Press, Cambridge, MA, 2008, pp. 577-584.

[269]

Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In Proc. 31st International conference on machine learning (pp. 1764-1772).

[270]

A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber, A novel connectionist system for improved unconstrained handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (2009).

Digital Library

[271]

A. Graves, A.-R. Mohamed, G.E. Hinton, Speech recognition with deep recurrent neural networks, in: IEEE International conference on acoustics, speech and signal processing, IEEE, 2013, pp. 6645-6649.

[272]

A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, 18 (2005) 602-610.

Digital Library

[273]

A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, in: Advances in neural information processing systems (NIPS), vol. 21, MIT Press, Cambridge, MA, 2009, pp. 545-552.

[274]

M. Graziano, The intelligent movement machine: an ethological perspective on the primate motor system, Oxford University Press, USA, 2009.

[275]

Griewank, A. (2012). Documenta Mathematica-Extra Volume ISMP, (pp. 389-400).

[276]

I. Grondman, L. Busoniu, G.A.D. Lopes, R. Babuska, A survey of actor-critic reinforcement learning: standard and natural policy gradients, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 42 (2012) 1291-1307.

Digital Library

[277]

S. Grossberg, Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, I, Journal of Mathematics and Mechanics, 19 (1969) 53-91.

[278]

S. Grossberg, Adaptive pattern classification and universal recoding, 1: parallel development and coding of neural feature detectors, Biological Cybernetics, 23 (1976) 187-202.

Digital Library

[279]

S. Grossberg, Adaptive pattern classification and universal recoding, 2: feedback, expectation, olfaction, and illusions, Biological Cybernetics, 23 (1976).

Digital Library

[280]

F. Gruau, D. Whitley, L. Pyeatt, A comparison between cellular encoding and direct encoding for genetic neural networks. NeuroCOLT Technical report NC-TR-96-048, ESPRIT Working Group in Neural and Computational Learning, NeuroCOLT 8556, 1996.

[281]

P.D. Grünwald, I.J. Myung, M.A. Pitt, Advances in minimum description length: theory and applications, MIT Press, 2005.

[282]

M. Grüttner, F. Sehnke, T. Schaul, J. Schmidhuber, Multi-dimensional deep memory atari-go players for parameter exploring policy gradients, in: Proceedings of the international conference on artificial neural networks ICANN, Springer, 2010, pp. 114-123.

Digital Library

[283]

X. Guo, S. Singh, H. Lee, R. Lewis, X. Wang, Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning, in: Advances in neural information processing systems, vol. 27 (NIPS), 2014.

[284]

I. Guyon, V. Vapnik, B. Boser, L. Bottou, S.A. Solla, Structural risk minimization for character recognition, in: Advances in neural information processing systems (NIPS), vol. 4, Morgan Kaufmann, 1992, pp. 471-479.

[285]

J. Hadamard, Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées. Mémoires présentés par divers savants à l'Académie des sciences de l'Institut de France: Éxtrait, Imprimerie nationale, 1908.

[286]

R. Hadsell, S. Chopra, Y. LeCun, Dimensionality reduction by learning an invariant mapping, in: Proc. computer vision and pattern recognition conference, IEEE Press, 2006.

Digital Library

[287]

Hagras, H., Pounds-Cornish, A., Colley, M., Callaghan, V., & Clarke, G. (2004). Evolving spiking neural network controllers for autonomous robots. In IEEE International conference on robotics and automation, vol. 5 (pp. 4620-4626).

[288]

N. Hansen, S.D. Müller, P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), Evolutionary Computation, 11 (2003) 1-18.

Digital Library

[289]

N. Hansen, A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, 9 (2001) 159-195.

Digital Library

[290]

S.J. Hanson, A stochastic version of the delta rule, Physica D: Nonlinear Phenomena, 42 (1990) 265-272.

Digital Library

[291]

S.J. Hanson, L.Y. Pratt, Comparing biases for minimal network construction with back-propagation, in: Advances in neural information processing systems (NIPS), vol. 1, Morgan Kaufmann, San Mateo, CA, 1989, pp. 177-185.

Digital Library

[292]

B.L. Happel, J.M. Murre, Design and evolution of modular neural network architectures, Neural Networks, 7 (1994) 985-1004.

Digital Library

[293]

S. Hashem, B. Schmeiser, Improving model accuracy using optimal linear combinations of trained neural networks, IEEE Transactions on Neural Networks, 6 (1992) 792-794.

Digital Library

[294]

B. Hassibi, D.G. Stork, Second order derivatives for network pruning: optimal brain surgeon, in: Advances in neural information processing systems, vol. 5, Morgan Kaufmann, 1993, pp. 164-171.

Digital Library

[295]

T.J. Hastie, R.J. Tibshirani, Generalized additive models, in: Monographs on statisics and applied probability, Vol. 43, 1990.

[296]

T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning, in: Springer series in statistics, 2009.

[297]

J. Hawkins, D. George, Hierarchical temporal memory-concepts, theory, and terminology, Numenta Inc, 2006.

[298]

S.S. Haykin, Kalman filtering and neural networks, Wiley Online Library, 2001.

Digital Library

[299]

D.O. Hebb, The organization of behavior, Wiley, New York, 1949.

[300]

R. Hecht-Nielsen, Theory of the backpropagation neural network, in: International joint conference on neural networks, IEEE, 1989, pp. 593-605.

[301]

J.N. Heemskerk, Overview of neural hardware, in: Neurocomputers for brain-style processing. Design, implementation and application, 1995.

[302]

Heess, N., Silver, D., & Teh, Y. W. (2012). Actor-critic reinforcement learning with energy-based policies. In Proc. European workshop on reinforcement learning (pp. 43-57).

[303]

V. Heidrich-Meisner, C. Igel, Neuroevolution strategies for episodic reinforcement learning, Journal of Algorithms, 64 (2009) 152-168.

Digital Library

[304]

J. Herrero, A. Valencia, J. Dopazo, A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics, 17 (2001) 126-136.

[305]

J. Hertz, A. Krogh, R. Palmer, Introduction to the theory of neural computation, Addison-Wesley, Redwood City, 1991.

Digital Library

[306]

M.R. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, 49 (1952) 409-436.

[307]

S.E. Hihi, Y. Bengio, Hierarchical recurrent neural networks for long-term dependencies, in: Advances in neural information processing systems, vol. 8, MIT Press, 1996, pp. 493-499.

[308]

G.E. Hinton, Connectionist learning procedures, Artificial Intelligence, 40 (1989) 185-234.

Digital Library

[309]

G.E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation, 14 (2002) 1771-1800.

Digital Library

[310]

G.E. Hinton, P. Dayan, B.J. Frey, R.M. Neal, The wake-sleep algorithm for unsupervised neural networks, Science, 268 (1995) 1158-1160.

[311]

G.E. Hinton, L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Processing Magazine, 29 (2012) 82-97.

[312]

G.E. Hinton, Z. Ghahramani, Generative models for discovering sparse distributed representations, Philosophical Transactions of the Royal Society B, 352 (1997) 1177-1190.

[313]

G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, 18 (2006) 1527-1554.

Digital Library

[314]

G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006) 504-507.

[315]

G.E. Hinton, T.E. Sejnowski, Learning and relearning in Boltzmann machines, in: Parallel distributed processing, vol. 1, MIT Press, 1986, pp. 282-317.

Digital Library

[316]

G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. Technical report, 2012. arXiv:1207.0580

[317]

G.E. Hinton, D. van Camp, Keeping neural networks simple, in: Proceedings of the international conference on artificial neural networks, Amsterdam, Springer, 1993, pp. 11-18.

[318]

S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.

[319]

S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in: A field guide to dynamical recurrent neural networks, IEEE Press, 2001.

[320]

Hochreiter, S., & Obermayer, K. (2005). Sequence classification for protein analysis. In Snowbird workshop, Snowbird: Utah. Computational and Biological Learning Society.

[321]

S. Hochreiter, J. Schmidhuber, Bridging long time lags by weight guessing and Long Short-Term Memory, in: Frontiers in artificial intelligence and applications, Vol. 37, IOS Press, Amsterdam, Netherlands, 1996, pp. 65-72.

[322]

S. Hochreiter, J. Schmidhuber, Flat minima, Neural Computation, 9 (1997) 1-42.

Digital Library

[323]

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997) 1735-1780.

Digital Library

[324]

S. Hochreiter, J. Schmidhuber, Feature extraction through LOCOCODE, Neural Computation, 11 (1999) 679-714.

Digital Library

[325]

S. Hochreiter, A.S. Younger, P.R. Conwell, Learning to learn using gradient descent, in: Lecture notes on comp. sci., Vol. 2130, Springer, Berlin, Heidelberg, 2001, pp. 87-94.

Digital Library

[326]

A.L. Hodgkin, A.F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of Physiology, 117 (1952) 500.

[327]

G.M. Hoerzer, R. Legenstein, W. Maass, Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning, Cerebral Cortex, 24 (2014) 677-690.

[328]

S.B. Holden, On the theory of generalization and self-structuring in linearly weighted connectionist networks, Cambridge University, Engineering Department, 1994.

[329]

J.H. Holland, Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor, 1975.

Digital Library

[330]

V. Honavar, L.M. Uhr, A network of neuron-like units that learns to perceive by generation as well as reweighting of its links, in: Proc. of the 1988 connectionist models summer school, Morgan Kaufman, San Mateo, 1988, pp. 472-484.

[331]

V. Honavar, L. Uhr, Generative learning structures and processes for generalized connectionist networks, Information Sciences, 70 (1993) 75-108.

Digital Library

[332]

J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences, 79 (1982) 2554-2558.

[333]

K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989) 359-366.

Digital Library

[334]

D.H. Hubel, T. Wiesel, Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex, Journal of Physiology (London), 160 (1962) 106-154.

[335]

D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, 195 (1968) 215-243.

[336]

D.A. Huffman, A method for construction of minimum-redundancy codes, Proceedings IRE, 40 (1952) 1098-1101.

[337]

C.P. Hung, G. Kreiman, T. Poggio, J.J. DiCarlo, Fast readout of object identity from macaque inferior temporal cortex, Science, 310 (2005) 863-866.

[338]

M. Hutter, The fastest and shortest algorithm for all well-defined problems, International Journal of Foundations of Computer Science, 13 (2002) 431-443.

[339]

M. Hutter, Universal artificial intelligence: sequential decisions based on algorithmic probability, Springer, Berlin, 2005.

Digital Library

[340]

A. Hyvärinen, P. Hoyer, E. Oja, Sparse code shrinkage: denoising by maximum likelihood estimation, in: Advances in neural information processing systems (NIPS), vol. 12, MIT Press, 1999.

Digital Library

[341]

A. Hyvärinen, J. Karhunen, E. Oja, Independent component analysis, John Wiley & Sons, 2001.

[342]

ICPR (2012). Contest on Mitosis Detection in Breast Cancer Histological Images (2012). IPAL laboratory and TRIBVN company and pitie-salpetriere hospital and CIALAB of Ohio State Univ. http://ipal.cnrs.fr/ICPR2012/.

[343]

C. Igel, Neuroevolution for reinforcement learning using evolution strategies, in: Congress on evolutionary computation, vol. 4, IEEE, 2003, pp. 2588-2595.

[344]

C. Igel, M. Hüsken, Empirical evaluation of the improved Rprop learning algorithm, Neurocomputing, 50 (2003) 105-123.

[345]

S. Ikeda, M. Ochiai, Y. Sawaragi, Sequential GMDH algorithm and its application to river flow prediction, IEEE Transactions on Systems, Man and Cybernetics (1976) 473-479.

[346]

E. Indermuhle, V. Frinken, H. Bunke, Mode detection in online handwritten documents using BLSTM neural networks, in: Frontiers in handwriting recognition (ICFHR), 2012 international conference on, IEEE, 2012, pp. 302-307.

Digital Library

[347]

E. Indermuhle, V. Frinken, A. Fischer, H. Bunke, Keyword spotting in online handwritten documents containing text and non-text using BLSTM neural networks, in: Document analysis and recognition (ICDAR), 2011 international conference on, IEEE, 2011, pp. 73-77.

Digital Library

[348]

G. Indiveri, B. Linares-Barranco, T.J. Hamilton, A. Van Schaik, R. Etienne-Cummings, T. Delbruck, Neuromorphic silicon neuron circuits, Frontiers in Neuroscience, 5 (2011).

[349]

A.G. Ivakhnenko, The group method of data handling-a rival of the method of stochastic approximation, Soviet Automatic Control, 13 (1968) 43-55.

[350]

A.G. Ivakhnenko, Polynomial theory of complex systems, IEEE Transactions on Systems, Man and Cybernetics (1971) 364-378.

[351]

A.G. Ivakhnenko, The review of problems solvable by algorithms of the group method of data handling (GMDH), Pattern Recognition and Image Analysis/Raspoznavaniye Obrazov I Analiz Izobrazhenii, 5 (1995) 527-535.

[352]

A.G. Ivakhnenko, V.G. Lapa, Cybernetic predicting devices, CCM Information Corporation, 1965.

[353]

A.G. Ivakhnenko, V.G. Lapa, R.N. McDonough, Cybernetics and forecasting techniques, American Elsevier, NY, 1967.

[354]

E.M. Izhikevich, Simple model of spiking neurons, IEEE Transactions on Neural Networks, 14 (2003) 1569-1572.

Digital Library

[355]

T. Jaakkola, S.P. Singh, M.I. Jordan, Reinforcement learning algorithm for partially observable Markov decision problems, in: Advances in neural information processing systems, vol. 7, MIT Press, 1995, pp. 345-352.

[356]

Jackel, L., Boser, B., Graf, H.-P., Denker, J., LeCun, Y., & Henderson, D., et al. (1990). VLSI implementation of electronic neural networks: and example in character recognition. In IEEE (Ed.), IEEE international conference on systems, man, and cybernetics (pp. 320-322).

[357]

C. Jacob, A. Lindenmayer, G. Rozenberg, Genetic L-system programming, in: Lecture notes in computer science, 1994.

[358]

R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, 1 (1988) 295-307.

[359]

H. Jaeger, The "echo state" approach to analysing and training recurrent neural networks. Technical report GMD Report 148, German National Research Center for Information Technology, 2001.

[360]

H. Jaeger, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, 304 (2004) 78-80.

[361]

V. Jain, S. Seung, Natural image denoising with convolutional networks, in: Advances in neural information processing systems (NIPS), vol. 21, Curran Associates, Inc, 2009, pp. 769-776.

[362]

J. Jameson, Delayed reinforcement learning with multiple time scale hierarchical backpropagated adaptive critics, in: Neural networks for control, 1991.

[363]

S. Ji, W. Xu, M. Yang, K. Yu, 3D convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 221-231.

Digital Library

[364]

K. Jim, C.L. Giles, B.G. Horne, Effects of noise on convergence and generalization in recurrent networks, in: Advances in neural information processing systems (NIPS), vol. 7, Morgan Kaufmann, San Mateo, CA, 1995, pp. 649.

[365]

X. Jin, M. Lujan, L.A. Plana, S. Davies, S. Temple, S.B. Furber, Modeling spiking neural networks on SpiNNaker, Computing in Science and Engineering, 12 (2010) 91-97.

Digital Library

[366]

S.R. Jodogne, J.H. Piater, Closed-loop learning of visual control policies, Journal of Artificial Intelligence Research, 28 (2007) 349-391.

Digital Library

[367]

J.P. Jones, L.A. Palmer, An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex, Journal of Neurophysiology, 58 (1987) 1233-1258.

[368]

M.I. Jordan, Serial order: a parallel distributed processing approach. Technical report ICS report 8604, Institute for Cognitive Science, University of California, San Diego, 1986.

[369]

M.I. Jordan, Supervised learning and systems with excess degrees of freedom. Technical report COINS TR 88-27, Massachusetts Institute of Technology, 1988.

Digital Library

[370]

M.I. Jordan, Serial order: a parallel distributed processing approach, Advances in Psychology, 121 (1997) 471-495.

[371]

M.I. Jordan, D.E. Rumelhart, Supervised learning with a distal teacher. Technical report Occasional Paper #40, Center for Cog. Sci., Massachusetts Institute of Technology, 1990.

[372]

M.I. Jordan, T.J. Sejnowski, Graphical models: foundations of neural computation, MIT Press, 2001.

Digital Library

[373]

R.D. Joseph, Contributions to perceptron theory, Cornell Univ, 1961.

[374]

C.-F. Juang, A hybrid of genetic algorithm and particle swarm optimization for recurrent network design, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34 (2004) 997-1006.

Digital Library

[375]

J.S. Judd, Neural network design and the complexity of learning, in: Neural network modeling and connectionism, MIT Press, 1990.

[376]

C. Jutten, J. Herault, Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture, Signal Processing, 24 (1991) 1-10.

Digital Library

[377]

L.P. Kaelbling, M.L. Littman, A.R. Cassandra, Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.

Digital Library

[378]

L.P. Kaelbling, M.L. Littman, A.W. Moore, Reinforcement learning: A survey, Journal of AI Research, 4 (1996) 237-285.

Digital Library

[379]

Kak, S., Chen, Y., & Wang, L. (2010). Data mining using surface and deep agents based on neural networks. In AMCIS 2010 proceedings.

[380]

Y. Kalinke, H. Lehmann, Computation in recurrent neural networks: from counters to iterated function systems, in: LNAI, Vol. 1502, Springer, Berlin, Heidelberg, 1998.

Digital Library

[381]

R.E. Kalman, A new approach to linear filtering and prediction problems, Journal of Basic Engineering, 82 (1960) 35-45.

[382]

J. Karhunen, J. Joutsensalo, Generalizations of principal component analysis, optimization problems, and neural networks, Neural Networks, 8 (1995) 549-562.

Digital Library

[383]

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In IEEE conference on computer vision and pattern recognition.

Digital Library

[384]

N.K. Kasabov, Neucube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data, Neural Networks (2014).

Digital Library

[385]

H.J. Kelley, Gradient theory of optimal flight paths, ARS Journal, 30 (1960) 947-954.

[386]

R. Kempter, W. Gerstner, J.L. Van Hemmen, Hebbian learning and spiking neurons, Physical Review E, 59 (1999) 4498.

[387]

P. Kerlirzin, F. Vallet, Robustness in multilayer perceptrons, Neural Computation, 5 (1993) 473-482.

Digital Library

[388]

Khan, S. H., Bennamoun, M., Sohel, F., & Togneri, R. (2014). Automatic feature learning for robust shadow detection. In IEEE conference on computer vision and pattern recognition.

Digital Library

[389]

Khan, M. M., Khan, G. M., & Miller, J. F. (2010). Evolution of neural networks using Cartesian Genetic Programming. In IEEE congress on evolutionary computation (pp. 1-8).

[390]

M.M. Khan, D.R. Lester, L.A. Plana, A. Rast, X. Jin, E. Painkras, SpiNNaker: mapping neural networks onto a massively-parallel chip multiprocessor, in: International joint conference on neural networks, IEEE, 2008, pp. 2849-2856.

[391]

Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In ICML, vol. 97 (pp. 152-160).

Digital Library

[392]

W.M. Kistler, W. Gerstner, J.L. van Hemmen, Reduction of the Hodgkin-Huxley equations to a single-variable threshold model, Neural Computation, 9 (1997) 1015-1045.

Digital Library

[393]

H. Kitano, Designing neural networks using genetic algorithms with graph generation system, Complex Systems, 4 (1990) 461-476.

[394]

S. Klampfl, W. Maass, Emergence of dynamic memory traces in cortical microcircuit models through STDP, The Journal of Neuroscience, 33 (2013) 11515-11529.

[395]

M. Klapper-Rybicka, N.N. Schraudolph, J. Schmidhuber, Unsupervised learning in LSTM recurrent neural networks, in: Lecture Notes on Comp. Sci., Vol. 2130, Springer, Berlin, Heidelberg, 2001, pp. 684-691.

Digital Library

[396]

E. Kobatake, K. Tanaka, Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex, Journal of Neurophysiology, 71 (1994) 856-867.

[397]

N. Kohl, P. Stone, Policy gradient reinforcement learning for fast quadrupedal locomotion, in: Robotics and automation, 2004. Proceedings. ICRA'04. 2004 IEEE international conference on, vol. 3, IEEE, 2004, pp. 2619-2624.

[398]

T. Kohonen, Correlation matrix memories, IEEE Transactions on Computers, 100 (1972) 353-359.

Digital Library

[399]

T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics, 43 (1982) 59-69.

[400]

T. Kohonen, Self-organization and associative memory, Springer, 1988.

Digital Library

[401]

P. Koikkalainen, E. Oja, Self-organizing hierarchical feature maps, in: International joint conference on neural networks, IEEE, 1990, pp. 279-284.

[402]

A.N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition, Doklady Akademii Nauk SSSR, 114 (1965) 679-681.

[403]

A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems of Information Transmission, 1 (1965) 1-11.

[404]

V.R. Kompella, M.D. Luciw, J. Schmidhuber, Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams, Neural Computation, 24 (2012) 2994-3024.

Digital Library

[405]

T. Kondo, GMDH neural network algorithm using the heuristic self-organization method and its application to the pattern identification problem, in: Proceedings of the 37th SICE annual conference, IEEE, 1998, pp. 1143-1148.

[406]

T. Kondo, J. Ueno, Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels, International Journal of Innovative Computing, Information and Control, 4 (2008) 175-187.

[407]

P. Kordík, P. Náplava, M. Snorek, M. Genyk-Berezovskyj, Modified GMDH method and models quality evaluation by visualization, Control Systems and Computers, 2 (2003) 68-75.

[408]

Korkin, M., de Garis, H., Gers, F., & Hemmi, H. (1997). CBM (CAM-Brain Machine)-a hardware tool which evolves a neural net module in a fraction of a second and runs a million neuron artificial brain in real time.

[409]

B. Kosko, Unsupervised learning in noise, IEEE Transactions on Neural Networks, 1 (1990) 44-57.

Digital Library

[410]

J. Koutník, G. Cuccu, J. Schmidhuber, F. Gomez, Evolving large-scale neural networks for vision-based reinforcement learning, in: Proceedings of the genetic and evolutionary computation conference, ACM, Amsterdam, 2013, pp. 1061-1068.

Digital Library

[411]

Koutník, J., Gomez, F., & Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the 12th annual conference on genetic and evolutionary computation (pp. 619-626).

Digital Library

[412]

Koutník, J., Greff, K., Gomez, F., & Schmidhuber, J. (2014). A clockwork RNN. In Proceedings of the 31th international conference on machine learning, vol. 32 (pp. 1845-1853). arXiv:1402.3511 ¿cs.NE.

[413]

J.R. Koza, Genetic programming-on the programming of computers by means of natural selection, MIT Press, 1992.

Digital Library

[414]

M. Kramer, Nonlinear principal component analysis using autoassociative neural networks, AIChE Journal, 37 (1991) 233-243.

[415]

S.C. Kremer, J.F. Kolen, Field guide to dynamical recurrent networks, Wiley-IEEE Press, 2001.

Digital Library

[416]

N. Kriegeskorte, M. Mur, D.A. Ruff, R. Kiani, J. Bodurka, H. Esteky, Matching categorical object representations in inferior temporal cortex of man and monkey, Neuron, 60 (2008) 1126-1141.

[417]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 4.

[418]

A. Krogh, J.A. Hertz, A simple weight decay can improve generalization, in: Advances in neural information processing systems, vol. 4, Morgan Kaufmann, 1992, pp. 950-957.

Digital Library

[419]

N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, Deep hierarchies in the primate visual cortex: what can we learn for computer vision?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013) 1847-1871.

Digital Library

[420]

S. Kullback, R.A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics (1951) 79-86.

[421]

R. Kurzweil, How to create a mind: the secret of human thought revealed, 2012.

Digital Library

[422]

M.G. Lagoudakis, R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, 4 (2003) 1107-1149.

Digital Library

[423]

J. Lampinen, E. Oja, Clustering properties of hierarchical self-organizing maps, Journal of Mathematical Imaging and Vision, 2 (1992) 261-272.

[424]

K. Lang, A. Waibel, G.E. Hinton, A time-delay neural network architecture for isolated word recognition, Neural Networks, 3 (1990) 23-43.

Digital Library

[425]

Lange, S., & Riedmiller, M. (2010). Deep auto-encoder neural networks in reinforcement learning. In Neural networks, The 2010 international joint conference on (pp. 1-8).

[426]

A. Lapedes, R. Farber, A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition, Physica D, 22 (1986) 247-259.

Digital Library

[427]

P. Laplace, Mémoire sur la probabilité des causes par les évènements, Mémoires de l'Academie Royale des Sciences Presentés par Divers Savan, 6 (1774) 621-656.

[428]

P. Larraanaga, J.A. Lozano, Estimation of distribution algorithms: a new tool for evolutionary computation, Kluwer Academic Publishers, Norwell, MA, USA, 2001.

Digital Library

[429]

Le, Q. V., Ranzato, M., Monga, R., Devin, M., Corrado, G., & Chen, K., et al. (2012). Building high-level features using large scale unsupervised learning. In Proc. ICML'12.

[430]

LeCun, Y. (1985). Une procédure d'apprentissage pour réseau à seuil asymétrique. In Proceedings of cognitiva 85 (pp. 599-604).

[431]

Y. LeCun, A theoretical framework for back-propagation, in: Proceedings of the 1988 connectionist models summer school, Morgan Kaufmann, CMU, Pittsburgh, Pa, 1988, pp. 21-28.

[432]

Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, Back-propagation applied to handwritten zip code recognition, Neural Computation, 1 (1989) 541-551.

Digital Library

[433]

Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, Handwritten digit recognition with a back-propagation network, in: Advances in neural information processing systems, vol. 2, Morgan Kaufmann, 1990, pp. 396-404.

Digital Library

[434]

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998) 2278-2324.

[435]

Y. LeCun, J.S. Denker, S.A. Solla, Optimal brain damage, in: Advances in neural information processing systems, vol. 2, Morgan Kaufmann, 1990, pp. 598-605.

Digital Library

[436]

Y. LeCun, U. Muller, E. Cosatto, B. Flepp, Off-road obstacle avoidance through end-to-end learning, in: Advances in neural information processing systems (NIPS 2005), 2006.

[437]

Y. LeCun, P. Simard, B. Pearlmutter, Automatic learning rate maximization by on-line estimation of the Hessian's eigenvectors, in: Advances in neural information processing systems, vol. 5 (NIPS 1992), Morgan Kaufmann Publishers, San Mateo, CA, 1993.

[438]

L. Lee, Learning of context-free languages: a survey of the literature. Technical report TR-12-96, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts, 1996.

[439]

H. Lee, A. Battle, R. Raina, A.Y. Ng, Efficient sparse coding algorithms, in: Advances in neural information processing systems (NIPS), vol. 19, 2007, pp. 801-808.

[440]

H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2, in: Advances in neural information processing systems (NIPS), vol. 7, 2007, pp. 873-880.

[441]

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th international conference on machine learning (pp. 609-616).

Digital Library

[442]

S. Lee, R.M. Kil, A Gaussian potential function network with hierarchically self-organizing learning, Neural Networks, 4 (1991) 207-224.

Digital Library

[443]

Lee, H., Pham, P. T., Largman, Y., & Ng, A. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Proc. NIPS, vol. 9 (pp. 1096-1104).

[444]

A.M. Legendre, Nouvelles méthodes pour la détermination des orbites des cometes, F. Didot, 1805.

[445]

R.A. Legenstein, W. Maass, Neural circuits for pattern recognition with small total wire length, Theoretical Computer Science, 287 (2002) 239-249.

Digital Library

[446]

R. Legenstein, N. Wilbert, L. Wiskott, Reinforcement learning on slow features of high-dimensional input streams, PLoS Computational Biology, 6 (2010).

[447]

Leibniz, G. W. (1676). Memoir using the chain rule (cited in TMME 7:2&3 p. 321-332, 2010).

[448]

G.W. Leibniz, Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas, nec irrationales quantitates moratur, et singulare pro illis calculi genus, Acta Eruditorum (1684) 467-473.

[449]

D.B. Lenat, Theory formation by heuristic search, Machine Learning, 21 (1983).

Digital Library

[450]

D.B. Lenat, J.S. Brown, Why AM an EURISKO appear to work, Artificial Intelligence, 23 (1984) 269-294.

Digital Library

[451]

P. Lennie, J.A. Movshon, Coding of color and form in the geniculostriate visual pathway, Journal of the Optical Society of America A, 22 (2005) 2013-2033.

[452]

K. Levenberg, A method for the solution of certain problems in least squares, Quarterly of Applied Mathematics, 2 (1944) 164-168.

[453]

L.A. Levin, On the notion of a random sequence, Soviet Mathematics Doklady, 14 (1973) 1413-1416.

[454]

L.A. Levin, Universal sequential search problems, Problems of Information Transmission, 9 (1973) 265-266.

[455]

A.U. Levin, T.K. Leen, J.E. Moody, Fast pruning using principal components, in: Advances in neural information processing systems (NIPS), vol. 6, Morgan Kaufmann, 1994, pp. 35.

[456]

A.U. Levin, K.S. Narendra, Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control, IEEE Transactions on Neural Networks, 7 (1995) 30-42.

Digital Library

[457]

M.S. Lewicki, B.A. Olshausen, Inferring sparse, overcomplete image codes using an efficient coding framework, in: Advances in neural information processing systems (NIPS), vol. 10, 1998, pp. 815-821.

Digital Library

[458]

G.F.A. L'Hôpital, Analyse des infiniment petits, pour l'intelligence des lignes courbes, L'Imprimerie Royale, Paris, 1696.

[459]

M. Li, P.M.B. Vitányi, An introduction to Kolmogorov complexity and its applications, Springer, 1997.

Digital Library

[460]

R. Li, W. Zhang, H.-I. Suk, L. Wang, J. Li, D. Shen, Deep learning based imaging data completion for improved brain disease diagnosis, in: Proc. MICCAI, Springer, 2014.

[461]

L. Lin, Reinforcement learning for robots using neural networks, Carnegie Mellon University, Pittsburgh, 1993.

[462]

T. Lin, B. Horne, P. Tino, C. Giles, Learning long-term dependencies in NARX recurrent neural networks, IEEE Transactions on Neural Networks, 7 (1996) 1329-1338.

Digital Library

[463]

A. Lindenmayer, Mathematical models for cellular interaction in development, Journal of Theoretical Biology, 18 (1968) 280-315.

[464]

S. Lindstädt, Comparison of two unsupervised neural network models for redundancy reduction, in: Proc. of the 1993 connectionist models summer school, Erlbaum Associates, Hillsdale, NJ, 1993, pp. 308-315.

[465]

S. Linnainmaa, The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors, Univ. Helsinki, 1970.

[466]

S. Linnainmaa, Taylor expansion of the accumulated rounding error, BIT Numerical Mathematics, 16 (1976) 146-160.

[467]

R. Linsker, Self-organization in a perceptual network, IEEE Computer, 21 (1988) 105-117.

Digital Library

[468]

M.L. Littman, A.R. Cassandra, L.P. Kaelbling, Learning policies for partially observable environments: scaling up, in: Machine learning: proceedings of the twelfth international conference, Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 362-370.

[469]

S.-C. Liu, J. Kramer, G. Indiveri, T. Delbrück, T. Burg, R. Douglas, Orientation-selective aVLSI spiking neurons, Neural Networks, 14 (2001) 629-643.

[470]

L. Ljung, System identification, Springer, 1998.

[471]

N.K. Logothetis, J. Pauls, T. Poggio, Shape representation in the inferior temporal cortex of monkeys, Current Biology, 5 (1995) 552-563.

[472]

D. Loiacono, L. Cardamone, P.L. Lanzi, Simulated car racing championship competition software manual. Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy, 2011.

[473]

Loiacono, D., Lanzi, P. L., Togelius, J., Onieva, E., Pelta, D. A., & Butz, M. V., et al. (2009). The 2009 simulated car racing championship.

[474]

Lowe, D. (1999). Object recognition from local scale-invariant features. In The Proceedings of the seventh IEEE international conference on computer vision, vol. 2 (pp. 1150-1157).

Digital Library

[475]

D. Lowe, Distinctive image features from scale-invariant key-points, International Journal of Computer Vision, 60 (2004) 91-110.

Digital Library

[476]

M. Luciw, V.R. Kompella, S. Kazerounian, J. Schmidhuber, An intrinsic value system for developing multiple invariant representations with incremental slowness learning, Frontiers in Neurorobotics, 7 (2013).

[477]

A. Lusci, G. Pollastri, P. Baldi, Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules, Journal of Chemical Information and Modeling, 53 (2013) 1563-1575.

[478]

Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning.

[479]

W. Maass, Lower bounds for the computational power of networks of spiking neurons, Neural Computation, 8 (1996) 1-40.

Digital Library

[480]

W. Maass, Networks of spiking neurons: the third generation of neural network models, Neural Networks, 10 (1997) 1659-1671.

[481]

W. Maass, On the computational power of winner-take-all, Neural Computation, 12 (2000) 2519-2535.

Digital Library

[482]

W. Maass, T. Natschläger, H. Markram, Real-time computing without stable states: A new framework for neural computation based on perturbations, Neural Computation, 14 (2002) 2531-2560.

Digital Library

[483]

D.J.C. MacKay, A practical Bayesian framework for backprop networks, Neural Computation, 4 (1992) 448-472.

Digital Library

[484]

D.J.C. MacKay, K.D. Miller, Analysis of Linsker's simulation of Hebbian rules, Neural Computation, 2 (1990) 173-187.

Digital Library

[485]

R. Maclin, J.W. Shavlik, Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding, Machine Learning, 11 (1993) 195-215.

Digital Library

[486]

Maclin, R., & Shavlik, J. W. (1995). Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks. In Proc. IJCAI (pp. 524-531).

Digital Library

[487]

H.R. Madala, A.G. Ivakhnenko, Inductive learning algorithms for complex systems modeling, CRC Press, Boca Raton, 1994.

Digital Library

[488]

O. Madani, S. Hanks, A. Condon, On the undecidability of probabilistic planning and related stochastic optimization problems, Artificial Intelligence, 147 (2003) 5-34.

Digital Library

[489]

Maei, H. R., & Sutton, R. S. (2010). GQ( λ ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the third conference on artificial general intelligence, vol. 1 (pp. 91-96).

[490]

R. Maex, G. Orban, Model circuit of spiking neurons generating directional selectivity in simple cells, Journal of Neurophysiology, 75 (1996) 1515-1545.

[491]

S. Mahadevan, Average reward reinforcement learning: Foundations, algorithms, and empirical results, Machine Learning, 22 (1996) 159.

Digital Library

[492]

J. Malik, P. Perona, Preattentive texture discrimination with early vision mechanisms, Journal of the Optical Society of America A, 7 (1990) 923-932.

[493]

V. Maniezzo, Genetic evolution of the topology and weight distribution of neural networks, IEEE Transactions on Neural Networks, 5 (1994) 39-53.

Digital Library

[494]

P. Manolios, R. Fanelli, First-order recurrent neural networks and deterministic finite state automata, Neural Computation, 6 (1994) 1155-1173.

Digital Library

[495]

Marchi, E., Ferroni, G., Eyben, F., Gabrielli, L., Squartini, S., & Schuller, B. (2014). Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In Proc. 39th IEEE international conference on acoustics, speech, and signal processing (pp. 2183-2187).

[496]

H. Markram, The human brain project, Scientific American, 306 (2012) 50-55.

[497]

D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial & Applied Mathematics, 11 (1963) 431-441.

[498]

J. Martens, Deep learning via Hessian-free optimization, in: Proceedings of the 27th international conference on machine learning, OmniPress, Haifa, Israel, 2010, pp. 735-742.

Digital Library

[499]

Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th international conference on machine learning (pp. 1033-1040).

[500]

T.M. Martinetz, H.J. Ritter, K.J. Schulten, Three-dimensional neural net for learning visuomotor coordination of a robot arm, IEEE Transactions on Neural Networks, 1 (1990) 131-136.

Digital Library

References 501 through 888 have been omitted.

Cited By

Tang JZheng XWang XMao QXie LWang R(2024)Computer-aided detection of prostate cancer in early stages using multi-parameter MRITechnology and Health Care10.3233/THC-24801132:S1(125-133)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.3233/THC-248011
Chiu CKe STsai CLin WHuang MKo Y(2024)Deep learning based decision tree ensembles for incomplete medical datasetsTechnology and Health Care10.3233/THC-22051432:1(75-87)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.3233/THC-220514
Amsaprabhaa M(2024)Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognitionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23349846:1(1481-1501)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-233498
Show More Cited By

Deep learning in neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Deep Learning: Methods and Applications

This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge ...
Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Networks

Neural Networks Volume 61, Issue C

January 2015

130 pages

ISSN:0893-6080

Issue’s Table of Contents

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 January 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,810
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tang JZheng XWang XMao QXie LWang R(2024)Computer-aided detection of prostate cancer in early stages using multi-parameter MRITechnology and Health Care10.3233/THC-24801132:S1(125-133)Online publication date: 31-May-2024
https://dl.acm.org/doi/10.3233/THC-248011
Chiu CKe STsai CLin WHuang MKo Y(2024)Deep learning based decision tree ensembles for incomplete medical datasetsTechnology and Health Care10.3233/THC-22051432:1(75-87)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.3233/THC-220514
Amsaprabhaa M(2024)Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognitionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23349846:1(1481-1501)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-233498
Geidarov P(2024)Analytical Calculation of Weights Convolutional Neural NetworkOptical Memory and Neural Networks10.3103/S1060992X2470006133:2(157-177)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.3103/S1060992X24700061
Hu ZYu XLiu LZhang YYu H(2024)ASOD: an adaptive stream outlier detection method using online strategyJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00682-013:1Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1186/s13677-024-00682-0
Nazeer RAli SHu ZAnsari GAl-Razgan MAwwad EGhadi Y(2024)Detection of cotton leaf curl disease’s susceptibility scale level based on deep learningJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00582-913:1Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1186/s13677-023-00582-9
Du WLi ZXie Z(2024)A modified LSTM network to predict the citation counts of papersJournal of Information Science10.1177/0165551522111100050:4(894-909)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1177/01655515221111000
Athamneh AAlqudah A(2024)Transformer Inrush Current and Internal Fault Discrimination Using Multitypes of Convolutional Neural Network TechniquesJournal of Electrical and Computer Engineering10.1155/2024/39864002024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/3986400
Saqib MMahdavifar SFung BCharland P(2024)A Comprehensive Analysis of Explainable AI for Malware HuntingACM Computing Surveys10.1145/367737456:12(1-40)Online publication date: 11-Jul-2024
https://dl.acm.org/doi/10.1145/3677374
Sandberg RLehe RMitchell CGarten MMyers AQiang JVay JHuebl AEvans KSchenk O(2024)Synthesizing Particle-In-Cell Simulations through Learning and GPU Computing for Hybrid Particle Accelerator BeamlinesProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659937(1-11)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659937
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents