🔲AI TECHNOLOGY
What is Artificial Intelligence?
Summary:
Artificial Intelligence (AI), can be understood by drawing parallels with the evolution of intelligence in mammals, especially humans. A pivotal moment in the history of evolution was the Cambrian explosion approximately 540 million years ago, where vision played a central role in driving speciation. Today, vision stands as a testament to the power of evolution, manifesting in humans as the most intricate intelligence system known to us.
This biological perspective sets the stage for AI. Like how vision is fundamental to our biological intelligence, and how various components come together to constitute what we refer to as AI. These components work in tandem to potentially emulate the vast expanse of human functions:
Speech Recognition - Enables machines to understand and transcribe spoken language.
Natural Language Processing (NLP) - Helps computers process and understand human language.
Computer Vision - Equips machines with the ability to interpret visual information.
Robotics - Empowers machines to physically interact with their surroundings.
Pattern Recognition - Allows machines to identify patterns in data.
Machine Learning - Provides machines the capacity to learn from data.
Human Neural Networks – The neurons and synaptic connection they make in our brains
Artificial Neural Networks (ANNs) - Simulate the structure of human brain neurons to process data.
Arrival of the Transformer Architecture - An innovative structure in neural networks that has redefined the landscape of natural language processing, with models like BERT and GPT at its forefront. By processing sequences in parallel and utilizing a unique attention mechanism, the Transformer stands as a significant stride towards artificial general intelligence, especially when combined with vast computational resources and data. In essence, the fusion of these mature AI architectures, computational power, and data brings us closer than ever to achieving the zenith of AI potential.
Artificial intelligence as it exists today is not even close to matching the actual functional abilities of our brains. However, the inspiration drawn from the structure and function of the human brain is leading to tremendous progress in developing artificial neural networks. Perhaps within a few decades, if not sooner, ANNs may actually mimic the brain to a degree where machines will acquire human-like intelligence.
Artificial Intelligence Under the Hood?
Let's start our journey toward discovering what artificial intelligence looks like under the hood, and hopefully demystify this marvelous human innovation. It has the potential to bring prosperity to all of humanity and, at the same time, wipe out humanity if misused.
Artificial Intelligence (AI), can be understood by drawing parallels with the evolution of intelligence in mammals, especially humans. This is the simplest way to understand artificial intelligence.
Modern zoology hypothesizes that 'Vision' led to the Cambrian explosion of animal speciation around 540 million years ago. Vision is among the earliest perceptual systems in the animal kingdom.
Generations of evolution, driven by vision, have culminated in what we today recognize as the most extraordinary intelligence system in the known universe: the human intelligence system. As of now, humans are the most intelligent beings we are aware of in the entire universe.
Vision is fundamental to biological intelligence. Through it, humans have observed and conveyed objects using language and images.
In this paper, we'll break down AI and its various components. Together, these components, working together, may soon enable AI to emulate almost all human functions, both intelligently and autonomously. The future is approaching rapidly, so let's get started and we’ll explain each component so that you have a solid foundation.
1. Speech Recognition.
Humans communicate through spoken language. This domain is known as 'Speech Recognition'. Language is pivotal for expressing thoughts, ideas, emotions, and information.
Speaking and listening are innate skills cultivated from early childhood. Over millennia, our languages have become intricate, capturing abstract concepts and nuanced expressions, facilitating effective communication.
'Speech Recognition' aims to develop methods enabling computers to recognize and understand spoken language. Essentially, it's about training machines to hear and comprehend spoken words in the same way humans do. If you've used voice-activated virtual assistants like Siri, Google Assistant, or Alexa, or AI tools such as IBM's Watson Speech to Text, you've interacted with speech recognition.
Interestingly, much of speech recognition is rooted in statistics, hence it's often termed 'Statistical Learning'. At its core, most contemporary speech recognition systems employ mathematical models to process audio signals and map them to words or phrases.
Using statistical models, the system analyzes vast amounts of audio data to predict the likelihood of certain sounds or sequences corresponding to specific words or phrases. For instance, upon hearing "a-p-l", the system might deduce that the speaker likely said "apple", based on its prior statistical learning.
Statistical learning emphasizes 'learning from data' through statistical techniques. For speech recognition, it entails the system analyzing copious spoken language datasets to comprehend its statistical properties and patterns. This insight then helps predict future audio inputs. The goal is to detect patterns in the data to decode new, unfamiliar speech.
Humans inherently communicate via speech. Our mission is to enable machines to replicate this through data, statistical techniques, and "statistical learning" – a task we're nearly accomplishing.
2. Natural Language Processing (NLP)
Humans can read and write text. This domain is 'Natural Language Processing' (or NLP).
Reading and writing are essential human skills, yet their relatively recent advent suggests limited evolutionary influence. While humans have communicated observations through speech, gestures, and actions for ages, writing and reading only began between 6 to 10 thousand years ago.
This highlights the brain's unique capability, allowing humans to learn diverse subjects, raising the question: How can we achieve feats, like reading and writing, that we haven't specifically evolved for, and, which other mammals cannot?
This leads to the notion of 'learning' within the human brain's biological 'neural networks'.
Brain theory suggests our brain functions like a computational entity. The visual cortex processes visual data initially, allowing the brain to make inferences based on what the eyes capture.
Visual perception underpins reading, writing, and motion. Through studies, neuroscientists have identified the brain's hierarchical information processing and its feature for persistent information storage, our 'memory'.
Unlike transient spoken language, written language offers durable information storage. Throughout history, writing has enabled civilizations to archive knowledge, record events, and communicate across eras. The evolution of scripts, alphabets, and written languages has significantly impacted human progress, with AI being a noteworthy result.
In AI, 'Natural Language Processing’ deals with human-computer interactions through 'Natural Language', marrying linguistics and computer science. N.L.P aims to equip computers with the ability to understand, interpret, generate, and respond to human language meaningfully.
Just as humans convey, document, and comprehend information via writing, N.L.P seeks to bestow similar capabilities upon machines.
Recent advancements in machine learning, specifically deep learning, have boosted Natural Language Processing, resulting in more advanced and accurate AI systems.
3. Computer Vision
Humans perceive their surroundings using vision. This is the field of Computer Vision.
Our vision system involves capturing light and images with our eyes and processing and interpreting these images in the brain.
Computer Vision seeks to grant machines the capability to "see" and interpret visual data. Its goal is to emulate the human vision system in a computer. This enables machines to detect, recognize, and categorize objects, faces, scenes, and activities based on digital images and videos.
As Computer Vision improves, its applications proliferate. For instance:
Face recognition apps identify and verify individuals based on facial features.
Object detection pinpoints specific items within imagery.
Scene recognition determines the broader context of an image.
Image segmentation breaks down an image into distinct segments.
Autonomous vehicles leverage cameras and other sensors for real-time navigation and decision-making.
Historically, early AI and Computer Vision tasks hinged on symbolic AI, which utilizes symbols (e.g., words) and rules to represent and manipulate knowledge. However, the rise of deep learning and neural networks has shifted Computer Vision from primarily symbolic methodologies. Now, data-driven approaches, where systems discern patterns directly from ample labeled data, are prevalent. For instance, Convolutional Neural Networks (C.N.N’s) have excelled in image recognition tasks. We’ll talk a bit more on C.N.N’’s later in this video.
As humans perceive and interpret the visual world, Computer Vision aims to imbue machines with similar faculties. Although symbolic AI still has a role, data-driven neural networks have become the dominant approach.
4. Robotics
Humans have the innate capability to comprehend their environment and navigate seamlessly. This capability is encapsulated in the field of Robotics.
Humans possess an innate ability to perceive, understand, and interact with their surroundings. Using a mix of sensory inputs such as sight, touch, and hearing, combined with cognitive processes, humans can identify objects, assess distances, detect obstacles, and traverse complex terrains.
This capability allows humans to move fluidly, avoid collisions, reach out and grasp objects, and perform countless other physical tasks with ease and precision.
Just as humans inherently understand and interact with their environment, Robotics aims to endow machines with similar capabilities.
Robotics combines Engineering, Computer Science, and other disciplines to design, construct, and operate robots. Robots are autonomous or semi-autonomous machines capable of carrying out tasks in the real world. The goal of Robotics is to replicate or emulate many of the capabilities humans have, like:
Navigating and interacting with the environment.
Possessing perception and utilizing senses to comprehend their surroundings.
Process and interpret data and information to understand the environment and decide on actions.
Possess effectors or actuators to perform actions.
Navigate through unfamiliar environments autonomously.
Interact with other Robots as well humans and understand human gestures, voice commands, or even emotions.
The overarching goal of Robotics is to create Robots that can autonomously operate in diverse environments and contexts, making decisions, navigating challenges, and assisting humans in various capacities.
5. Pattern Recognition.
Humans inherently recognize patterns, exemplified by our ability to group similar objects. This skill falls under the umbrella of 'Pattern Recognition'.
The human brain is remarkably adept at recognizing patterns and making sense of the world around us. This capability is deeply ingrained in our evolutionary history; it's been essential for survival.
Recognizing patterns helps us predict and anticipate events, draw conclusions from observations, and make decisions based on recurring phenomena.
For instance, consider our natural ability to group similar objects. From a very young age, children start classifying objects by shape, color, size, or function. If you spread out an array of toys, fruits, and stationery, most people could quickly group them into their respective categories without much thought.
Pattern recognition focuses on the automatic recognition of regularities and patterns in data. The goal is to provide machines with a similar capability to what humans possess in terms of recognizing and categorizing data patterns.
Data can come in many forms, such as images, video clips, audio clips, text documents, or any kind of signal or data set. Pattern recognition itself includes these three tasks;
Feature Extraction: Before patterns can be recognized, meaningful features or attributes from the data must be extracted. Features are measurable properties or characteristics of the observed data. For instance, in image recognition, features might include edges, textures, or colors.
Classification: Once features are extracted, the pattern recognition system classifies the data based on its features. For example, in handwriting recognition, the system would classify drawn shapes as particular letters or numbers.
Training and Learning: Many pattern recognition systems use Machine Learning, which means they require training. During training, the system is exposed to vast amounts of labeled data. It learns to recognize patterns based on this data, and once trained, it can classify new, previously unseen data.
Pattern recognition is foundational in many applications, from computer vision (recognizing objects in images), to speech recognition (identifying spoken words), and, even in areas like information security and finance (for example detecting unusual patterns in transaction data that might indicate fraud).
Basically, Pattern Recognition aims to bestow machines with a similar capability, allowing them to detect and categorize patterns in diverse sets of data.
6. Machine Learning
As mentioned earlier with Computer Vision, there are two ways AI works. One is ‘Symbolic-based’ and another is ‘Data-based’.
In Data-based AI, the machine requires a substantial amount of data to learn. The more data, the more it learns. With lots of data you can develop patterns. If the machine can learn these patterns then it can make 'Predictions' based on what it has learned, and, humans can't even come close to this marvel. This domain is known as ‘Machine Learning’.
Despite the human brain's remarkable ability to intuitively recognize patterns, its cognitive capacity has boundaries, especially when it involves rapidly processing extensive data.
Machines, on the other hand, can analyze massive datasets without getting overwhelmed or fatigued. They can handle and process data at scales that are beyond human capacity. This is especially valuable in today's age of big data, where information is generated at an unprecedented rate - almost doubling each.
While humans are good at visualizing two or three dimensions (like length, width, and height on a graph), it is near impossible for humans to conceptualize or visualize data in higher dimensions (for example, a dataset with hundreds of features). Machines, however, can process and find patterns in "high-dimensional data" - or data with many dimensions. This ability allows machines to identify complex relationships and patterns that might be obscured or invisible in lower-dimensional representations.
Machine Learning in Artificial Intelligence (or ‘ML’ as it is sometimes called), focuses on building systems that can learn from and make decisions based on data. Instead of being explicitly programmed to perform a task, machine learning systems use algorithms and statistical models to analyze and draw inferences from data. There are many ways in which Machines learn. For instance there is:
Supervised Learning: This is the most common technique, where the machine learns from labeled training data, and makes predictions based on that data. It's similar to teaching a child by showing them examples. In 'Supervised learning', you train an algorithm with data but that data also contains the answer. For example when you train a machine to recognize your friends by name, you'll need to identify them as well, that is show the computer pictures of your friends as well.
Unsupervised Learning: Here, the machine is provided with unlabeled data and the Machine must find structures and patterns on its own. This is similar to a child learning through exploration without explicit guidance. In 'Unsupervised Learning', you train an AI algorithm with data and you let the machine figure out the patterns on its own. For example you might want to feed the data about celestial objects in the universe and expect the machine to come up with patterns in that data by itself.
Reinforcement Learning (or RL): If you give any algorithm a goal and expect the Machine through trial-and-error to achieve that goal, then it's called 'Reinforcement Learning'. For example, a robot's attempt to climb over the wall until it succeeds is an example of that. In this analogy, the robot interacts with its environment (which includes the wall). Each time the robot attempts to climb and fails, it might receive a negative reward. When it finally succeeds, it might receive a positive reward. Over time, by trying different climbing techniques and learning from the outcomes, the robot aims to find the best approach to climb the wall efficiently.
Reinforcement Learning with Human Feedback (or RLHF): When you have Humans Interact in the process of Machine Learning by giving feedback, it is called 'Reinforcement Learning with Human Feedback'. Traditional Reinforcement Learning relies on rewards from the environment, but sometimes, these rewards might be sparse, unclear, or hard to specify. Incorporating human feedback helps guide the learning process, making it more efficient or aligning the machines behavior with human values. In 'Reinforcement Learning with Human Feedback', humans can offer insights, corrections, or even rank different trajectories the AI agents take, in aiding the learning process.
7. Artificial Neural Networks (or ANN’s).
Before we delve into ANN's, let's talk about the fascinating Human Brain – which is the biological neural network. The human brain is a network of neurons and humans use these neurons to learn things. If we can replicate the structure and the function of the human brain, we might be able to develop advanced 'Cognitive Abilities' in machines and maybe someday even 'Reasoning'. This is the field of Neural Networks.
The human brain is an intricate organ composed of approximately 86 billion neurons. These neurons communicate with each other using electrical and chemical signals. Each neuron can form thousands of connections to other neurons. These connections are called 'synaptic connections' or 'synapses' and they allow for complex networks of communication pathways. These networks enable humans to process information, remember experiences, think, reason, and learn. The process of learning, at a fundamental level, involves changes in these synaptic connections in response to experiences. These estimated 100 trillion synapses in the brain, all working in highly complex, dynamic, and interconnected ways are still a mystery that we trying to fully understand.
It's essential to note that the sheer number of neurons and synapses is not the sole determinant of the brain's capabilities. The organization, function, and intricate dynamics of these neural networks, combined with other cellular components and neurochemical processes, contribute to the brain's remarkable cognitive and functional abilities.
This idea is a foundational premise in computational neuroscience and Artificial Intelligence. The notion is that if we can understand and then recreate the processes and structures underlying human cognition, we can potentially bestow machines with similar cognitive capabilities. While the exact replication of the human brain's complexity and function is still beyond our current technology, these foundational concepts have inspired many advances in AI.
Artificial Neural Networks are computing systems inspired by the structure and function of biological neural networks in the human brain. ANN’s have 'artificial equivalents' for biological neurons and synapses. Specifically,
Artificial Neurons are Analogous to Biological Neurons: Artificial Neurons are simple Nodes or Computational units that receive input, process it, and produce an output. Each artificial neuron receives multiple inputs, multiplies each by an associated weight, sums the products, and then passes the result through an activation function to produce an output.
The activation function (like the sigmoid, ReLU, or tanh function) introduces non-linearity, allowing the neural network to model more complex relationships.
Then we have ‘Weights’, which are Analogous to Synapses in the Biological brain: As mentioned earlier, in the biological brain, synapses are the connections between neurons. They control the strength and direction of the signal (either excitatory or inhibitory) between connected neurons. The strength of these connections can change over time through a process known as ‘synaptic plasticity’.
In artificial neural networks, synapses are represented by weights. These are numerical values that determine the strength and sign of the connection between artificial neurons.
During the training process, these weights are adjusted using optimization techniques, like gradient descent in response to the input data, and, the error between the predicted output and the actual target values. This adjustment process is analogous to synaptic plasticity in biological systems.
ANN's are algorithms inspired by the structure and function of the brain's biological neural networks. The inspiration drawn from the structure and function of the human brain led to the progress in developing artificial neural networks.
Neural networks aim to emulate aspects of biological learning, aspiring to develop machines that learn from data in ways analogous to human cognition.
Here's a deeper look at ANN’s and their various implementation architectures:
Let’s begin with ANN Basics: Essentially, an ANN is composed of layers of interconnected nodes, similar to neurons. These nodes process information using weighted inputs, transfer functions, and outputting a signal. The weights (analogous to synaptic strengths in biological neurons) are adjusted during the learning process.
So how do ANN’s Learn? Artificial Neural Networks "learn" by adjusting the weights of connections based on the error of the output they produce compared to the desired output. This is often achieved using algorithms like back-propagation.
Deep Learning involves stacking layers of these artificial neurons to create deep neural networks capable of representing intricate functions. So, when neural networks contain many layers, they're known as ‘Deep Neural Networks’ and this is the field of "Deep Learning." These deep networks are especially powerful for tasks like image and speech recognition.
Another way to look at Deep Learning is that the neural networks have many layers – to analyze various factors of data. These networks are more complex and ‘deeper’ and we use those to learn complex things.
Earlier we mentioned Convolutional Neural Networks (or CNN’s): These are a specialized kind of neural network optimized for processing grid-like data, such as images. CNN’s have revolutionized Computer Vision tasks.
In CNN’s we get the network to scan images from Left to Right and Top to Bottom in order to recognize objects in a scene. This is how Object Recognition is accomplished in Computer Vision and AI.
Another marvel in Deep Learning is Recurrent Neural Networks (or RNN’s): These are designed for sequential data, like time series or natural language, allowing for "memory" of previous inputs in their internal structure. Humans can remember the past, like what you had for dinner last night. We can get a neural network to remember a limited past as well by using a Recurrent Neural Network.
And then in the past few years, ANN’s took a Giant Leap –The Transformer Architecture appeared: The Transformer architecture was first introduced in the paper "Attention is All You Need" and has since been a pivotal advancement in neural networks, particularly for natural language processing.
Its core innovation of using an attention mechanism allows capturing context and relationships in data effectively. The Transformer processes sequences in parallel rather than recursively, enabling faster training. The architecture is highly scalable, versatile across tasks, and simplifies model design by focusing on self-attention and feed-forward layers.
Transformer-based models like the Bidirectional Encoder Representations Transformers (or ‘BERT’) and the Generative Pre-trained Transformer (‘GPT’) have achieved state-of-the-art results across NLP benchmarks, inducing a revolution in the field.
The merging of matured AI architectures, expansive computation, and vast data resources are empowering ANN’s and Large Language Models like the Transformer, edging machines nearer to acquiring Artificial General Intelligence.
Other popular deep learning architectures include:
Generative Adversarial Networks (GANs): GANs are a type of deep learning architecture that can be used to generate realistic data, such as images, text, and audio. GANs work by training two neural networks against each other: a generator network and a discriminator network. The generator network tries to generate realistic data, while the discriminator network tries to distinguish between real and generated data.
Autoencoders: Autoencoders are a type of deep learning architecture that can be used for unsupervised learning tasks, such as dimensionality reduction and anomaly detection. Autoencoders work by training a neural network to reconstruct its input data.
Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU): LSTM and GRU are two types of RNNs that are well-suited for processing sequential data with long-term dependencies. LSTM and GRU networks are commonly used for natural language processing tasks, such as machine translation and text generation.
Conclusion
GenAI is not limited to LLMs; models such as GANs, VAEs... are also classified as Generative AI.
It is easy to get confused when many GenAI models are complex AI systems and can belong to different categories simultaneously.
In reality, most tools we are familiar with do not belong to just one model category. For example, GPT-4 from OpenAI is a transformer-based model, a multimodal model, and also a LLM.
𝐆𝐀𝐍𝐬 (Generative Adversarial Networks) - Two neural networks compete, resulting in improved content generation over time.
𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫-𝐛𝐚𝐬𝐞𝐝 𝐦𝐨𝐝𝐞𝐥𝐬- Learn from long-range dependencies between words.
𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 - Simulate data generation by iteratively adding noise to an initial signal, refining it to produce complex and realistic data.
𝐕𝐀𝐄𝐬 (Variational Autoencoders) - Learn to encode data into a compact representation, enabling creative content generation through interpolation in the latent space. ...
These are just a few of the many popular deep learning architectures. There are many other architectures that are used for a variety of tasks. The best architecture for a particular task will depend on the specific data and the desired outcome.
WATCH THESE VIDEOS TO UNDERSTAND HOW ARTIFICIAL INTELLIGENCE WORKS:
Last updated