Success of Deep Learning is evident from its numerous wins in pattern recognition and machine learning competitions over the past couple of years. In this review blog we explore Deep Learning from its humble beginnings to its recent achievement and conclude highlighting future directions and applications. There are several people from a range of fields including science and medicine, engineering and statistics, amongst others who have contributed to Deep Learning to bring it to the state of the art that it is today. It is impossible to cover all these contributions and research works in a single blog. We highlight some of the major events and contributions that had long term impacts on the direction of progress of Deep Learning. We also introduce some of the popular software tools available for the implementation of Deep Learning and some real world applications.
Deep Learning is an area of machine learning that strives to make machines capable of artificial intelligence. It tries to do this by using multiple levels of abstraction and representation layers in order to make sense of the original data1. A fundamental motivation for Artificial Intelligence has always been the ability to mimic the human brain using machines. Our brains are fed with a myriad amount of sensory data, but somehow we are able to capture critical information from this data and store it in a form suitable for current and future use. Achieving the same capabilities in machines would require the processing of data of several dimensions. Unfortunately, for a linear increase in dimension, the computational complexity increases exponentially often referred to as the ‘Curse of Dimensionality’2.
Existing machine learning techniques address this issue by pre-processing the data to reduce the dimensions using various techniques, which include feature extraction, layered abstraction, Statistical regression, structure breakdown, among others.
Historical Inspirations (1940-1980)
The hierarchical structure of Deep Learning gains its inspiration primarily from Artificial Neural Networks (ANN) and Backpropagation (BP) algorithms. The basic computational model for ANN was developed by McCulloch and Pitts in 1943 which successfully used mathematical logic to approximately represent the neurons in the Human Brain3. This was followed by the development of Dynamic Programming by Bellman in 19572 which divides the main problem into several subproblems to efficiently evaluate the solution. This concept formed the basis for the backpropagation algorithm which was proposed by Werbos in 19754. In backpropagation, the errors are propagated through the network from outputs to inputs which help in the adjustment of weights to get closer to the desired solution thus stimulating learning.
Initial Implementations (1980-2000)
While Deep Learning is often considered a relatively new concept, the first major breakthrough was achieved, when in 1989, Yann LeCun et al. applied a standard backpropagation algorithm to a deep neural network to recognize handwritten ZIP codes6. This was soon followed by the wake-sleep algorithm for neural networks that consisted of six hidden layers and several 100 hidden units (1995)7. Essentially, the algorithm consisted of a wake phase where the system focused on the recognition weights to identify objects while the sleep phase focused on the generative weights to generate weight vectors to best identify the objects.
Failures and Setbacks (1980-2000)
While the early research on Deep Learning seemed to be very promising there were several challenges that were highlighted in the early 1990’s that led to the slowdown of research in this field. In 1991, John F. Kolen et al. described the problem of Exponential Error Decay in Recurrent Neural Networks (RNN)8. Another major issue, which was essentially an outcome of the exponential error decay, was the extremely slow learning rate of most of the ANN algorithms which use the steepest gradient descent method. The Gradient Descent methods had a constraint where they had to use small values for the learning rate.
Another issue with ANNs was the huge computational cost which was not allowed by the hardware specifications at the time. As a result, quick calculations of complex matrices were not possible thus making ANN slower than other methods of pattern recognition and object identification.
There were also several other methods which had an edge over ANNs for classification and identification tasks such as Gabor filter, support vector machines (SVMs), Gaussian mixture model/Hidden Markov model (GMM-HMM) and many others which attracted more focus from the research community at that time.
First Breakthrough (2000s)
In the year 2007, Geoffrey Hinton used the concept of Restricted Boltzmann Machines (RBM) to optimize the learning capabilities of ANN. It is extremely difficult to achieve the task of classification from raw sensory input data in the form of an image or speech, for example. As a result, it is useful to generate high-level representation using the raw sensory data in an abstraction layer and then use this information to achieve classification. Doing this avoids the need for a large amount of labeled data, which is otherwise required for simple feed-forward back propagation NNs9. This technique of top-down generation made ANNs far more efficient as the dimensionality of the problem reduced to the high-level abstract data. This could be extended further by using multiple hidden layers of abstraction which could be built by connecting different RBMs in a cascaded layout. The capabilities and future applications and possibilities of Deep Networks highlighted in this paper rekindled the interest for Deep learning among various research communities all over the World.
Second Breakthrough (2010s)
As Deep Learning gained popularity several new models were developed, especially in the fields of Image and Speech Recognition. Major tech companies saw the advantages of Deep Learning and started investing in their applications. There was rapid innovation in the field of hardware which also contributed to the improved computation of Deep Learning problems. One of the early attempts was to develop special hardware to exclusively solve neural network problems10. However, researchers later realized that GPUs which were already well developed and available for much cheaper prices ideally suited for deep learning tasks as they are designed for huge matrix operations. This was followed by the victory of Deep Learning based AI systems in various contents and benchmarks including ImageNet, CIFAR-10, Kaggle, amongst others11.
Deep Learning has already had a vast impact on various computational intelligence challenges and contests in the last few years. These successes have led to a number of major tech companies hugely investing in Deep Learning and a number of startups being set up to commercialize the capabilities of deep learning. Below, we highlight some of the major organizations that are finding solution to real-world problems and challenges using Deep Learning followed by a list of areas and fields involved in active research and projects using Deep Learning.
Google Brain started as a project collaboration between Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University professor Andrew Ng and the goal was to build a large scale deep learning software system on top of the Google cloud infrastructure. The project is currently being using in Speech Recognition system for Android Operating system, photo search for Google+ and youtube video recommendations amongst others.
Facebook Deepface (FAIR)
Among several things Facebook’s AI technology uses Deep Learning for face recognition and identification, object recognition, scene detection, and others to enhance the use of social media.
Speech Recognition – Siri, Cortana, Baidu, Google, Alexa
All the major tech companies that use speech recognition in their technology use some form of deep learning to achieve high-quality speech recognition and in some cases, synthesis.
Google Deepmind which recently became popular in the news for beating the US national champion in Go using the AlphaGo program uses Deep Learning extensively for a number of its projects. Another popular application has been finding solutions for riddles using Deep Learning.
Butterfly is one of the newest startups in the field of Deep Learning and they are hoping to use Deep Learning in the field of medicine to develop new imaging devices at the fractional costs of today’s machines with AI capabilities.
|Area of Application||Industry (Non-Exhaustive)|
|Face Recognition, Machine Vision, Image Search||Automotive, Security, Search Engine, Social Media|
|Voice Recognition, Voice Search, Flaw Detection||UI/UX, Security, Automotive, Telecom, Aviation|
|Fraud Detection, Sentiment Analysis, Data Mining, Data Science||Finance, Security, CRM, Social Media, IoT, IT, Software|
|Real-time threat detection, Motion detection||Social Media, Government, Security|
Deep machine learning is an active area of research. The field of Deep Learning has seen exponential growth in the last decade and is only expected to grow even faster with an increase in the available data sets and more computational power. Besides the research fraternity involved in deep learning has been growing at an equally fast rate and so is the number of papers being published in and around this topic. Future work in Deep learning may be broadly classified into 3 areas:
- Optimization and improvements of current Algorithms
- General Purpose Learning Algorithms
- New frontiers of Research and Development
The journey towards deep learning began nearly half a century ago with the development of Artificial Neural Networks followed by Perceptron and Back propagation algorithm in the late 20th Century. Within a decade, Deep Learning has seen an insurmountable growth and holds a lot of hope for the future. While it is still early stages of development and there are many challenges still to be addressed Deep Learning Certainly seems to be a very promising candidate to take machine learning to the next level of Artificial Intelligence. And when this happens it will certainly have a great impact on our ways of life and our future.
To be continued …
- 1.Deng L. Deep Learning: Methods and Applications. FNT in Signal Processing. 2014:197-387. doi:10.1561/2000000039
- 2.Bellman R. Dynamic Programming. Science. July 1966:34-37. doi:10.1126/science.153.3731.34
- 3.McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics. December 1943:115-133. doi:10.1007/bf02478259
- 4.Werbos P. “Beyond Regression:” New Tools for Prediction and Analysis in the Behavioral Sciences. 1974.
- 5.What’s New In Gartner’s Hype Cycle For AI, 2019 . Forbes. https://www.forbes.com/sites/louiscolumbus/2019/09/25/whats-new-in-gartners-hype-cycle-for-ai-2019/#1cd42481547b. Published September 25, 2019.
- 6.LeCun Y, Boser B, Denker JS, et al. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation. December 1989:541-551. doi:10.1162/neco.19126.96.36.1991
- 7.Hinton G, Dayan P, Frey B, Neal R. The “wake-sleep” algorithm for unsupervised neural networks. Science. May 1995:1158-1161. doi:10.1126/science.7761831
- 8.Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies. In: A Field Guide to Dynamical Recurrent Networks. IEEE; 2009. doi:10.1109/9780470544037.ch14
- 9.Hinton GE. Learning multiple layers of representation. Trends in Cognitive Sciences. October 2007:428-434. doi:10.1016/j.tics.2007.09.004
- 10.Jackel LD, Boser B, Graf HP, et al. VLSI implementations of electronic neural networks: an example in character recognition. In: 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings. IEEE. doi:10.1109/icsmc.1990.142119
- 11.Deeplearning4j: Open-source, distributed deep learning for the JVM. Deep Learning’s Accuracy. http://deeplearning4j.org/accuracy.html. Accessed March 2, 2016.