Artificial Neural Networks (ANN's)


Artificial Neural Networks (ANNs)

A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called ‘Deep Learning’, that uses interconnected nodes or neurons in a layered structure that resembles the human brain. It creates an adaptive system that computers use to learn from their mistakes and improve continuously. Thus, artificial neural networks attempt to solve complicated problems, like summarizing documents or recognizing faces, with greater accuracy.

In a neural network, a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.

AI - the term Artificial Intelligence is not new; There is nothing ‘Artificial’ about AI because its impact is real. Its impact is in humans already. And it is in our society.

The convergence of three major factors led to the AI revolution;

1. The ‘Convolutional Neural Network’ and the maturing of algorithms

2. Massive scaling of computation

3. A availability of mass amounts of data

AI is just exploding.

‘Vision’ started in animals 540 million years ago. It is one of the oldest perceptual systems of the animal kingdom. Modern zoology has hypothesized it was vision that led to the Cambrian explosion of animal speciation about 540 million years ago and led to generations and generations of evolution and eventually delivered what we know today as the most ‘incredible intelligence system’ in the universe that we know of, which is the human intelligence system aka ‘Neural Network’.

Vision is very much a cornerstone of biological intelligence. It took 540 million years for humans to finally realize and prove this. Humans have been ‘observing’ and ‘communicating’ objects through language and images.

In the 1990’s Cognitive neuroscientist and linguist, George Miller at Princeton decided to organize ‘English Language’ - all the lexicons in the English world, into a database called ‘WORDNET’. WORDNET led to a lot of important work in Natural Language Processing (NLP) as it provided the ‘data set’ for training models. Inspired by WORDNET, in 2009, Fi-Fi Lee, a Stanford Ai Researcher and Scientist created IMAGENET data set which consisted of 15 million images distilled from over 1 billion images that were downloaded from the internet and organized in the WORDNET structure of 22,000 object classes. With IMAGENET and WORDNET combined you can find one picture for each WORDNET entry. For example a ‘Panda Bear’ entry will be attached to one picture of a Panda Bear.

Then 2012 humans were able to further ‘Classify’ Objects in the Image database. Eleanor Rosch from Berkeley published the organization of Superordinate, Basic and Subordinate object as an object structure in the world. (For example, depending on our knowledge and the situation, we might label the food in front of us as a 'fruit' (superordinate level), an 'apple' (basic level), or a 'golden delicious' (subordinate level).

Arrival of Deep Learning

Deep Learning is Machine Learning meth, which is based on Artificial Neural Networks with 'Representation Learning'. Learning can be supervised, semi-supervised or unsupervised. Deep learning is a technique that teaches computers to do what comes naturally to humans: learn by example. It is based on learning and improving on its own by examining computer algorithms. While machine learning uses simpler concepts, Deep Learning works with Artificial Neural Networks, which are designed to imitate how humans think and learn. Deep learning structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own.

The word "Deep" in "Deep Learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial Credit Assignment Path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output.

Deep Learning Algorithms that use multiple layers to progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.

In Deep Learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, the raw input may be a matrix of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encode a nose and eyes; and the fourth layer may recognize that the image contains a face. Importantly, a deep learning process can learn which features to optimally place in which level on its own. This does not eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.

Feature Detectors in Image recognition

At the first level of features you might make ‘Feature Detectors’ that take end examine little combinations of pixels. You might make a feature detector that for example tests if all examined pixels are 'dark', and which pixels are 'bright' and then turn on the dark pixels. That feature detector would represent an edge - a Vertical Edge. You might have another one that examines and tests if all the next selection of pixels are bright or dark and then turn on the dark. That detector then would represent a Horizontal Edge. You can have others for edges of different organs (of a mammal for example). So far all we've done is made a box but there have to be a whole lot of feature settings like that and that's what you actually have in a mammalian brain. If you look in a cat or monkey cortex, it has got feature detectors like that.

At the next level you would say; okay suppose I have two ‘Edge Detectors’ that join at a fine angle that could just be a Ear. So the next level up will have a feature detector that detects two of the lower level detectors joining a fine angle. We might also notice a bunch of edges that sort of form a circle. We might have a detector for that. Then the next level up we might have a detector that says; hey I found this ear like thing, and I find a circular thing in roughly the right spatial relationship to make the eye and the large ear of an Elephant. And so at the next level up you'd have a Elephant detector that says; if I see those two there, I think it might be an Elephant.

The idea of 'back propagation' is to just put in random weights to begin with. And now the ‘Feature Detector’ would instead look to see what it predicts. And if it happened to predict ‘Elephant’, it wouldn't but if it happened to leave the weights alone. But if it predicts ‘cat’ then what you do is you go backwards through the network and you ask the following question, and you can ask this with a branch of mathematics called calculus, but you just need to think about the question and the question is; “how should I change this connection strength so it's less likely to say ‘Cat’ and more likely to say ‘Elephant’. That's called the ER the error the discrepancy. And you figure out for every connection strength how I should change a little bit to make it more likely to say ‘Bird’ and less likely to say ‘cat’.

So a person looked at the image and said it's an ‘Elephant’. It’s not a ‘cat’. So that's a label supplied by a ‘person’. But then the algorithm - back propagation is just a way of figuring out how to change every connection strength to make it more likely to say ‘Elephant’ less likely to say ‘cat’.

It just keeps trying and keeps turning - just keeps doing that over and over and now if you showed enough Elephants and enough cats when you showed an Elephant it'll say ‘Elephant’.

Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs)

Before 2017, there were Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). RNNs were great at processing sequences of data, like sentences or time series. They had a "memory" that allowed them to consider the previous inputs when predicting the next one. However, RNNs faced challenges with capturing long-range dependencies. They tended to forget information from earlier parts of the sequence, making it harder to understand long sentences or pieces of text.

Convolutional Neural Networks (CNNs) were particularly successful in computer vision tasks, like image recognition. They used filters to scan and extract features from images, enabling them to identify objects or patterns. However, CNNs were not as effective in handling sequential data, where the order of the information mattered.

These previous techniques brought us significant progress, but they had their limitations when it came to understanding language and handling sequential data effectively. The need for a more efficient and powerful architecture led to the birth of the Transformer.

The Transformer has unquestionably transformed the field of AI, revolutionizing the way machines understand and process language. At the heart of the Transformer lies a powerful concept called self-attention. This mechanism allows the model to pay attention to different parts of the input sequence simultaneously.

As we continue to unlock the full potential of AI architectures that include CNN's, RNN's and Transformers, we can expect further breakthroughs that are redefining the possibilities of AI and reshape industries worldwide.

Last updated