Deep Learning (DL)

What exactly is Deep learning? When did Deep Learning arrive? How is it Different than machine Learning?

Deep learning is a subfield of machine learning that focuses on the development and application of artificial neural networks, particularly deep neural networks. Deep learning algorithms are designed to automatically learn and extract hierarchical representations of data through multiple layers of interconnected nodes, known as artificial neurons or units. These neural networks are called "deep" because they typically consist of multiple hidden layers, allowing them to learn complex patterns and relationships in the data.

Deep learning has its roots in the early development of artificial neural networks in the 1940s and 1950s. However, significant advancements and breakthroughs in deep learning occurred in the late 2000s and early 2010s when computing power and the availability of large-scale labeled datasets increased.

One of the key differences between deep learning and traditional machine learning is the level of abstraction and feature engineering required. In traditional machine learning, human experts often need to manually engineer or select relevant features from the data, which can be a time-consuming and challenging task. Deep learning, on the other hand, aims to automate this feature engineering process by allowing the model to automatically learn useful representations and features directly from the raw data.

Deep learning models, such as deep neural networks, are characterized by their ability to automatically learn hierarchical representations of data, extracting higher-level features from lower-level ones. This capability enables deep learning models to handle complex patterns and large amounts of data more effectively than traditional machine learning models in certain domains, such as computer vision, natural language processing, and speech recognition.

Another key aspect of deep learning is its reliance on large-scale datasets and computational resources. Deep learning models often require substantial amounts of labeled training data to effectively learn complex patterns. Additionally, training deep neural networks can be computationally intensive, benefiting from specialized hardware, such as graphics processing units (GPUs) or tensor processing units (TPUs), to accelerate the computations involved.

While deep learning has achieved remarkable success in various domains, it is important to note that it is not always the best approach for every problem. Traditional machine learning techniques can still be effective and appropriate for certain tasks, especially when data is limited, interpretability is crucial, or feature engineering expertise is readily available. Deep learning and traditional machine learning are complementary approaches that can be applied in combination or individually, depending on the specific problem and available resources.


Around 2006, we started doing what we call ‘Deep Learning’. The rise in popularity and use of Deep Learning Neural Network techniques can be traced back to the innovations in Machine Learning, which focuses on the use of data and algorithms to let machines "learn" and imitate the way that humans learn, gradually improving its accuracy. Feed-Forward Neural Networks emerged and the breakthrough was the introduction of Convolutional Neural Network’ (CNN’s).

Before CNN’s it had been hard to get to Neural Nets with many layers of representation to learn complicated things. Gradually the deep learning models got fancier and fancier and deeper and deeper from the seven layers of an AlexNet to the ResNet-50 Convolutional Neural Network that is 50 layers deep. With this you could load a ‘Pretrained’ version of the neural network trained on more than a million images from the ImageNet database.

By 2009, we had already produced something that was better than the best ‘Speech Recognizers’ by recognizing which phoneme you were saying. This was a different technology than all the other speech recognizers using the standard approach which you've been tuned for 20 for 30 years. There were other techniques using Neural Nets, but they were not using ‘Deep Neural Nets’. And then we found better ways of initializing the networks with what we call now ‘Pre-training’. The ‘P’ in ChatGPT stands for ‘Pre-training’ and the ‘T’ is ‘Transformer’ and ‘G’ is ‘Generative’. And it was actually these ‘Generative models’ that provided this better way of Pre-training Neural Nets.

In 2012, two big things happened. One was that the research done in 2009 by two of Geoffrey Hinton’s students that led to better speech recognition. That got disseminated to all the big speech recognition Labs at Microsoft, IBM and Google. In 2012, Google was the first to develop a product and suddenly made speech recognition on the Android as good as Siri if not better. That was a deployment of Deep Neural Nets applied to speech recognition three years earlier. At the same time as that happened, within a few months of that happening, two other students of mine developed an ‘Object Recognition System’ that would look at images and tell you what the object was and it worked much better than previous systems.

Most modern deep learning models are based on Artificial Neural Networks (ANN’s), specifically Convolutional Neural Networks (CNN’s), although they can also include propositional formulas or latent variables organized layer-wise in deep generative models.

Last updated