🔲GLOSSARY OF AI TERMS

Glossary of AI Terms

Artificial intelligence (AI) and natural language (NL) technologies are critical to the enterprise business but, for many, are difficult to assess due to their complexity and nuance. No one, however, should be excluded from such an important conversation. For this very reason, we have compiled a glossary of AI- and NL-specific terms to help simplify the conversation.

The following list of terms covers words and phrases that are essential to building and expanding your knowledge of natural language and artificial intelligence technologies. With them, you can confidently navigate your journey toward adopting and implementing natural language processing and natural language understanding solutions at your enterprise organization.

Accuracy:

Accuracy is a scoring system in binary classification (i.e., determining if an answer or output is correct or not) and is calculated as (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives).

Want to get more about accuracy? Read this article on our Community.

Actionable Intelligence:

Information you can leverage to support decision making.

Anaphora:

In linguistics, an anaphora is a reference to a noun by way of a pronoun. For example, in the sentence, “While John didn’t like the appetizers, he enjoyed the entrée,” the word “he” is an anaphora.

Annotation:

The process of tagging language data by identifying and flagging grammatical, semantic or phonetic elements in language data.

Application Programming Interface (API)

An API, or application programming interface, is a set of rules and protocols that allows different software programs to communicate and exchange information with each other. It acts as a kind of intermediary, enabling different programs to interact and work together, even if they are not built using the same programming languages or technologies. API's provide a way for different software programs to talk to each other and share data, helping to create a more interconnected and seamless user experience.

Artificial Intelligence (AI):

the intelligence displayed by machines in performing tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and language understanding. AI is achieved by developing algorithms and systems that can process, analyze, and understand large amounts of data and make decisions based on that data.

Artificial Neural Network (ANN)

Commonly referred to as a neural network, this system consists of a collection of nodes/units that loosely mimics the processing abilities of the human brain.

Auto-classification

The application of machine learning, natural language processing (NLP), and other AI-guided techniques to automatically classify text in a faster, more cost-effective, and more accurate manner.

Auto-complete

Auto-complete is a search functionality used to suggest possible queries based on the text being used to compile a search query.

BERT (aka Bidirectional Encoder Representation from Transformers)

Google’s technology. A large scale pretrained model that is first trained on very large amounts of unannotated data. The model is then transferred to an NLP task where it is fed another smaller task-specific dataset which is used to fine-tune the final model.

Cataphora

In linguistics, a cataphora is a reference placed before any instance of the noun it refers to. For example, in the sentence, “Though he enjoyed the entrée, John didn’t like the appetizers,” the word “he” is a cataphora.

Categorization

Categorization is a natural language processing function that assigns a category to a document.

Want to get more about categorization? Read our blog post “How to Remove Pigeonholing from Your Classification Process“.

Category

A category is a label assigned to a document in order to describe the content within said document.

Category Trees

Enables you to view all of the rule-based categories in a collection. Used to create categories, delete categories, and edit the rules that associate documents with categories. Is also called a taxonomy, and is arranged in a hierarchy.

Classification

Techniques that assign a set of predefined categories to open-ended text to be used to organize, structure, and categorize any kind of text – from documents, medical records, emails, files, within any application and across the web or social media networks.

CLIP

CLIP (Contrastive Language Image Pre-training) by OpenAI is a large language model (LLM) that can learn the relationship between images and text. It was trained on a massive dataset of images and text pairs, and it can now generate text descriptions of images, or identify images that match a given text description.

Co-occurrence

A co-occurrence commonly refers to the presence of different elements in the same document. It is often used in business intelligence to heuristically recognize patterns and guess associations between concepts that are not naturally connected (e.g., the name of an investor often mentioned in articles about startups successfully closing funding rounds could be interpreted as the investor is particularly good at picking his or her investments.).

Cognitive Map

A mental representation (otherwise known as a mental palace) which serves an individual to acquire, code, store, recall, and decode information about the relative locations and attributes of phenomena in their environment.

Composite AI

The combined application of different AI techniques to improve the efficiency of learning in order to broaden the level of knowledge representations and, ultimately, to solve a wider range of business problems in a more efficient manner.

Learn more form expert.ai

Computational Linguistics (Text Analytics, Text Mining)

Computational linguistics is an interdisciplinary field concerned with the computational modeling of natural language.

Find out more about Computational linguistics on our blog reading this post “Why you need text analytics“.

Computational Semantics (Semantic Technology)

Computational semantics is the study of how to automate the construction and reasoning of meaning representations of natural language expressions.

Learn more about Computational semantics on our blog reading this post “Word Meaning and Sentence Meaning in Semantics“.

Content Enrichment or Enrichment

The process of applying advanced techniques such as machine learning, artificial intelligence, and language processing to automatically extract meaningful information from your text-based documents.

Controlled Vocabulary

A controlled vocabulary is a curated collection of words and phrases that are relevant to an application or a specific industry. These elements can come with additional properties that indicate both how they behave in common language and what meaning they carry, in terms of topic and more.

While the value of a controlled vocabulary is similar to that of taxonomy, they differ in that the nodes in taxonomy are only labels representing a category, while the nodes in a controlled vocabulary represent the words and phrases that must be found in a text.

Conversational AI

Used by developers to build conversational user interfaces, chatbots and virtual assistants for a variety of use cases. They offer integration into chat interfaces such as messaging platforms, social media, SMS and websites. A conversational AI platform has a developer API so third parties can extend the platform with their own customizations.

Convolutional Neural Networks (CNN)

A deep learning class of neural networks with one or more layers used for image recognition and processing.

Corpus

The entire set of language data to be analyzed. More specifically, a corpus is a balanced collection of documents that should be representative of the documents an NLP solution will face in production, both in terms of content as well as distribution of topics and concepts.

Compute Unified Device Architecture(CUDA):

CUDA is a way that computers can work on really hard and big problems by breaking them down into smaller pieces and solving them all at the same time. It helps the computer work faster and better by using special parts inside it called GPUs. It's like when you have lots of friends help you do a puzzle - it goes much faster than if you try to do it all by yourself.

The term "CUDA" is a trademark of NVIDIA Corporation, which developed and popularized the technology.

Data Discovery:

The process of uncovering data insights and getting those insights to the users who need them, when they need them.

Learn more

Data Drift

Data Drift occurs when the distribution of the input data changes over time; this is also known as covariate shift.

Data Extraction

Data extraction is the process of collecting or retrieving disparate types of data from a variety of sources, many of which may be poorly organized or completely unstructured.

Data Ingestion

The process of obtaining disparate data from multiple sources, restucturing it, and importing it into a common format or repository to make it easy to utilize.

Data Labelling

A technique through which data is marked to make objects recognizable by machines. Information is added to various data types (text, audio, image and video) to create metadata used to train AI models.

Data Processing:

The process of preparing raw data for use in a machine learning model, including tasks such as cleaning, transforming, and normalizing the data.

Data Scarcity

The lack of data that could possibly satisfy the need of the system to increase the accuracy of predictive analytics.

Deep Learning (DL)

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Deep learning deep neural networks with many layers to learn complex patterns from data. In other words, deep learning models can learn to classify concepts from images, text or sound.

In this blog post “Word Meaning and Sentence Meaning in Semantics” you can find more about Deep Learning.

Did You Mean (DYM):

“Did You Mean” is an NLP function used in search applications to identify typos in a query or suggest similar queries that could produce results in the search database being used.

DPO (Direct Preference Optimization):

DPO in the context of model training in AI, stands for Direct Preference Optimization. It's a relatively new approach that's gaining traction for its ability to align AI systems with human preferences without the need for complex reward functions or reinforcement learning.

Disambiguation

Disambiguation, or word-sense disambiguation, is the process of removing confusion around terms that express more than one meaning and can lead to different interpretations of the same string of text.

Want to learn more? Read our blog post “Disambiguation: The Cornerstone of NLU“.

Domain Knowledge

The experience and expertise your organization has acquired over time.

Embedding

When we want a computer to understand language, we need to represent the words as numbers because computers can only understand numbers. An embedding is a way of doing that. Here's how it works: we take a word, like "cat", and convert it into a numerical representation that captures its meaning. We do this by using a special algorithm that looks at the word in the context of other words around it. The resulting number represents the word's meaning and can be used by the computer to understand what the word means and how it relates to other words. For example, the word "kitten" might have a similar embedding to "cat" because they are related in meaning. Similarly, the word "dog" might have a different embedding than "cat" because they have different meanings. This allows the computer to understand relationships between words and make sense of language.

Emotion AI (aka Affective Computing)

AI to analyze the emotional state of a user (via computer vision, audio/voice input, sensors and/or software logic). It can initiate responses by performing specific, personalized actions to fit the mood of the customer.

Entity

An entity is any noun, word or phrase in a document that refers to a concept, person, object, abstract or otherwise (e.g., car, Microsoft, New York City). Measurable elements are also included in this group (e.g., 200 pounds, 14 fl. oz.)

Environmental, Social, and Governance (ESG)

An acronym initially used in business and government pertaining to enterprises’ societal impact and accountability; reporting in this area is governed by a set of binding and voluntary regulatory reporting.

ETL (Entity Recognition, Extraction)

Entity extraction is an NLP function that serves to identify relevant entities in a document.

Explainable AI / Explainability

An AI approach where the performance of its algorithms can be trusted and easily understood by humans. Unlike black-box AI, the approach arrives at a decision and the logic can be seen behind its reasoning and results.

Learn more

Extraction or Keyphrase Extraction

Mutiple words that describe the main ideas and essence of text in documents.

F-score (F-measure, F1 measure)

An F-score is the harmonic mean of a system’s precision and recall values. It can be calculated by the following formula: 2 x [(Precision x Recall) / (Precision + Recall)]. Criticism around the use of F-score values to determine the quality of a predictive system is based on the fact that a moderately high F-score can be the result of an imbalance between precision and recall and, therefore, not tell the whole story. On the other hand, systems at a high level of accuracy struggle to improve precision or recall without negatively impacting the other.

Critical (risk) applications that value information retrieval more than accuracy (i.e., producing a large number of false positives but virtually guaranteeing that all the true positives are found) can adopt a different scoring system called F2 measure, where recall is weighed more heavily. The opposite (precision is weighed more heavily) is achieved by using the F0.5 measure.

Read this article on our Community to learn more about F-score.

Feature Engineering

The process of selecting and creating new features from the raw data that can be used to improve the performance of a machine learning model.

Foundation Models (FMs)

Foundation Models are large deep learning neural networks that have changed the way data scientists approach machine learning (ML). They are trained on massive datasets. Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively. The term foundation model was coined by researchers to describe ML models trained on a broad spectrum of generalized and unlabeled data and capable of performing a wide variety of general tasks such as understanding language, generating text and images, and conversing in natural language.

Freemium

You might see the term "Freemium" used often on this site. It simply means that the specific tool that you're looking at has both free and paid options. Typically there is very minimal, but unlimited, usage of the tool at a free tier with more access and features introduced in paid tiers.

Generative Adversarial Network (GAN)

A type of computer program that creates new things, such as images or music, by training two neural networks against each other. One network, called the generator, creates new data, while the other network, called the discriminator, checks the authenticity of the data. The generator learns to improve its data generation through feedback from the discriminator, which becomes better at identifying fake data. This back and forth process continues until the generator is able to create data that is almost impossible for the discriminator to tell apart from real data. GANs can be used for a variety of applications, including creating realistic images, videos, and music, removing noise from pictures and videos, and creating new styles of art.

Generative Art

Generative art is a form of art that is created using a computer program or algorithm to generate visual or audio output. It often involves the use of randomness or mathematical rules to create unique, unpredictable, and sometimes chaotic results.

Generative Pre-trained Transformer (GPT)

GPT stands for Generative Pretrained Transformer. It is a type of large language model developed by OpenAI.

Giant Language model Test Room (GLTR)

GLTR is a tool that helps people tell if a piece of text was written by a computer or a person. It does this by looking at how each word in the text is used and how likely it is that a computer would have chosen that word. GLTR is like a helper that shows you clues by coloring different parts of the sentence different colors. Green means the word is very likely to have been written by a person, yellow means it's not sure, red means it's more likely to have been written by a computer and violet means it's very likely to have been written by a computer.

GitHub

GitHub is a platform for hosting and collaborating on software projects

Google Colab

Google Colab is an online platform that allows users to share and run Python scripts in the cloud

Gradient descent:

Gradient descent is the algorithm that helps us navigate this high-dimensional "valley" by adjusting the weights in the neural network in small steps, iteratively moving in the direction that reduces the error the most. It's like taking those small steps downhill until you can't go any further and have (hopefully) found the bottom of the valley, which is the point of minimum error.

Graphics Processing Unit (GPU)

A GPU, or graphics processing unit, is a special type of computer chip that is designed to handle the complex calculations needed to display images and video on a computer or other device. It's like the brain of your computer's graphics system, and it's really good at doing lots of math really fast. GPUs are used in many different types of devices, including computers, phones, and gaming consoles. They are especially useful for tasks that require a lot of processing power, like playing video games, rendering 3D graphics, or running machine learning algorithms.

Hallucinations

Made up data presented as fact in generated text that is plausible but are, in fact, inaccurate or incorrect. These fabrications can also include fabricated references or sources.

Chain-of-Verification (CoV) Reduces Hallucination in Large Language Models

Hybrid AI

Hybrid AI is any artificial intelligence technology that combines multiple AI methodologies. In NLP, this often means that a workflow will leverage both symbolic and machine learning techniques.

Want to learn more about hybrd AI? Read this blog post “What Is Hybrid Natural Language Understanding?“.

Hyperparameters

These are adjustable model parameters that are tuned in order to obtain optimal performance of the model.

Inference Engine

A component of a [expert] system that applies logical rules to the knowledge base to deduce new or additional information.

Inpainting:

Inpainting and Outpainting are two techniques used in AI generative art to create new images by manipulating existing ones. Inpainting refers to the process of filling in missing or damaged parts of an image. This can be done by using a variety of techniques, such as Patching and Inpainting from scratch.

Insight Engines

An insight engine, also called cognitive search or enterprise knowledge discovery. It applies relevancy methods to describe, discover, organize and analyze data. It combines search with AI capabilities to provide information for users and data for machines. The goal of an insight engine is to provide timely data that delivers actionable intelligence.

Intelligent Document Processing (IDP) or Intelligent Document Extraction and Processing (IDEP)

This is the ability to automically read and convert unstructured and semi-structured data, identify usable data and extract it, then leveraged it via automated processes. IDP is often an enabling technology for Robotic Process Automation (RPA) tasks.

Knowledge Graph

A knowledge graph is a graph of concepts whose value resides in its ability to meaningfully represent a portion of reality, specialized or otherwise. Every concept is linked to at least one other concept, and the quality of this connection can belong to different classes (see: taxonomies).

The interpretation of every concept is represented by its links. Consequently, every node is the concept it represents only based on its position in the graph (e.g., the concept of an apple, the fruit, is a node whose parents are “apple tree”, “fruit”, etc.). Advanced knowledge graphs can have many properties attached to a node including the words used in language to represent a concept (e.g., “apple” for the concept of an apple), if it carries a particular sentiment in a culture (“bad”, “beautiful”) and how it behaves in a sentence.

Learn more about knowledge graph reafding this blog post “Knowledge Graph: The Brains Behind Symbolic AI” on our blog.

Knowledge Model

A process of creating a computer interpretable model of knowledge or standards about a language, domain, or process(es). It is expressed in a data structure that enables the knowledge to be stored in a database and be interpreted by software.

Learn more

Labelled Data

see Data Labelling.

LangOps (Language Operations)

The workflows and practices that support the training, creation, testing, production deployment and ongoing curation of language models and natural language solutions.

Learn more

Language Data

Language data is data made up of words; it is a form of unstrcutured data. This is qualitative data and also known as text data, but simply it refers to the written and spoken words in language.

Large Language Models (LLM)

A supervised learning algorithm that uses ensemble learning method for regression. Ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model. LLM is trained on a very large amount of text data and is able to generate natural-sounding text.

Langchain:

LangChain is a library that helps users connect artificial intelligence models to external sources of information. The tool allows users to chain together commands or queries across different sources, enabling the creation of agents or chatbots that can perform actions on a user's behalf. It aims to simplify the process of connecting AI models to external sources of information, enabling more complex and powerful applications of artificial intelligence.

Latent Space' in Image Generation:

In the context of image generation in LLMs, particularly with techniques like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models, the term "latent space" refers to an intermediate, lower-dimensional representation of the input data learned by the model. The word 'Latent' is used to describe something which is hidden and not obvious at the moment, but which may develop further in the future. Latent space refers to the fact that this compressed representation captures the underlying, hidden, or "latent" features and factors that generate the images.

Lemma

The base form of a word representing all its inflected forms.

Lexicon

Knowledge of all of the possible meanings of words, in their proper context; is fundamental for processing text content with high precision.

Linked Data

Linked data is an expression that informs whether a recognizable store of knowledge is connected to another one. This is typically used as a standard reference. For instance, a knowledge graph in which every concept/node is linked to its respective page on Wikipedia.

Machine Learning (ML)

Machine learning is the study of computer algorithms that can improve automatically through experience and the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as “training data,” in order to make predictions or decisions without being explicitly programmed to do so. In NLP, ML-based solutions can quickly cover the entire scope of a problem (or, at least of a corpus used as sample data), but are demanding in terms of the work required to achieve production-grade accuracy.

Read this post “What Is Machine Learning? A Definition” on our blog.

Metadata

Data that describes or provides information about other data.

Model

A machine learning model is the artifact produced after an ML algorithm has processed the sample data it was fed during the training phase. The model is then used by the algorithm in production to analyze text (in the case of NLP) and return information and/or predictions.

Model Drift

Model drift is the decay of models’ predictive power as a result of the changes in real world environments. It is caused due to a variety of reasons including changes in the digital environment and ensuing changes in relationship between variables. An example is a model that detects spam based on email content and then the content used in spam was changed.

Model Parameter

These are parameters in the model that are determined by using the training data. They are the fitted/configured variables internal to the model whose value can be estimated from data. They are required by the model when making predictions. Their values define the capability and fit of the model.

Morphological Analysis

Breaking a problem with many known solutions down into its most basic elements or forms, in order to more completely understand them. Morphological analysis is used in general problem solving, linguistics and biology.

Natural Language Processing (NLP):

A subfield of AI that focuses on teaching machines to understand, process, and generate human language. A subfield of artificial intelligence and linguistics, natural language processing is focused on the interactions between computers and human language. More specifically, it focuses on the ability of computers to read and analyze large volumes of unstructured language data (e.g., text).

Read blog post “6 Real-World Examples of Natural Language Processing” to learn more about Natural Language Processing (NLP).

Natural Language Understanding

A subset of natural language processing, natural language understanding is focused on the actual computer comprehension of processed and analyzed unstructured language data. This is enabled via semantics.

Learn more about Natural Language Understanding (NLU) reading our blog post “What Is Natural Language Understanding?”.

NLG (aka Natural Language Generation)

Solutions that automatically convert structured data, such as that found in a database, an application or a live feed, into a text-based narrative. This makes the data easier for users to access by reading or listening, and therefore to comprehend.

Neural Networks:

A type of machine learning algorithm modeled on the structure and function of the brain.

Neural Radiance Fields (NeRF):

Neural Radiance Fields are a type of deep learning model that can be used for a variety of tasks, including image generation, object detection, and segmentation. NeRFs are inspired by the idea of using a neural network to model the radiance of an image, which is a measure of the amount of light that is emitted or reflected by an object.

NLQ (aka Natural Language Query)

A natural language input that only includes terms and phrases as they occur in spoken language (i.e. without non-language characters).

NLT (aka Natural Language Technology)

A subfield of linguistics, computer science and artificial intelligence (AI) dealing with Natural Language Processing (NLP), Natural Language Undestanding (NLU), and Natural Language Generation (NLG).

One-Shot Learning (OSL):

One-Shot Learning is a machine learning algorithm that requires very little data to identify and classify objects based on their similarities. It's mainly used in computer vision tasks like facial recognition and passport identification checks.

Ontology

An ontology is similar to a taxonomy, but it enhances its simple tree-like classification structure by adding properties to each node/element and connections between nodes that can extend to other branches. These properties are not standard, nor are they limited to a predefined set. Therefore, they must be agreed upon by the classifier and the user.

Read blog post “Understanding Ontology and How It Adds Value to NLU” to learn more about the ontologies.

Ontologies Function: Think of ontologies as dictionaries on steroids. They define the concepts and relationships within a specific domain.

Ontologies Structure: They organize knowledge in a formal way, specifying:

  • Entities: The types of things that exist in the domain (e.g., "car," "person," "building").

  • Attributes: The properties of these entities (e.g., "color" for cars, "name" for people, "location" for buildings).

  • Relationships: How entities are connected (e.g., "owns" - a person owns a car, "located in" - a building is located in a city).

Benefits of Ontologies:

  • Ontologies provide a shared understanding of a domain, which is crucial for tasks like:

  • Information retrieval: Finding relevant information by specifying entities and relationships.

  • Data integration: Combining data from different sources that use the same ontology.

  • Machine reasoning: Enabling machines to understand and reason about the knowledge within the ontology.

Ontology Example: An ontology for an online store might define entities like "product," "customer," and "order." It would specify attributes like "price" for products and "name" for customers. Relationships could include "buys" (customer buys product) and "part of" (order includes product).

OpenAI:

OpenAI is a research institute focused on developing and promoting artificial intelligence technologies that are safe, transparent, and beneficial to society

Overfitting:

A common problem in machine learning, in which the model performs well on the training data but poorly on new, unseen data. It occurs when the model is too complex and has learned too many details from the training data, so it doesn't generalize well.

Outpainting:

Inpainting and Outpainting are two techniques used in AI generative art to create new images by manipulating existing ones. Outpainting refers to the process of extending an image beyond its original borders. This can be done by using a variety of techniques, such as Mirroring, Pattern extension and Outpainting from scratch.

Parsing

Identifying the single elements that constitute a text, then assigning them their logical and grammatical value.

Part-of-Speech Tagging

A Part-of-Speech (POS) tagger is an NLP function that identifies grammatical information about the elements of a sentence. Basic POS tagging can be limited to labeling every word by grammar type, while more complex implementations can group phrases and other elements in a clause, recognize different types of clauses, build a dependency tree of a sentence, and even assign a logical function to every word (e.g., subject, predicate, temporal adjunct, etc.).

Find out more about Part-of-Speech (POS) tagger in this article on our Community.

PEMT (aka Post Edit Machine Translation)

Solution allows a translator to edit a document that has already been machine translated. Typically, this is done sentence-by-sentence using a specialized computer-assisted-translation application.

Post-processing

Procedures that can include various pruning routines, rule filtering, or even knowledge integration. All these procedures provide a kind of symbolic filter for noisy and imprecise knowledge derived by an algorithm.

Pre-processing

A step in the data mining and data analysis process that takes raw data and transforms it into a format that can be understood and analyzed by computers. Analyzing structured data, like whole numbers, dates, currency and percentages is straigntforward. However, unstructured data, in the form of text and images must first be cleaned and formatted before analysis.

Precision

Given a set of results from a processed document, precision is the percentage value that indicates how many of those results are correct based on the expectations of a certain application. It can apply to any class of a predictive AI system such as search, categorization and entity recognition.

For example, say you have an application that is supposed to find all the dog breeds in a document. If the application analyzes a document that mentions 10 dog breeds but only returns five values (all of which are correct), the system will have performed at 100% precision. Even if half of the instances of dog breeds were missed, the ones that were returned were correct.

Learn more about precision? Read this article on our Community.

Prompt Engineering:

A method to query AI data or a model (like ChatGPT) and get a response; it requires specific techniques and detailed language to return a valid, useful result.

Prompt:

A prompt is a piece of text that is used to prime a large language model and guide its generation

Python:

Python is a popular, high-level programming language known for its simplicity, readability, and flexibility (many AI tools use it)

RAG (Retrieval-Augmented Generation):

RAG stands for "Retrieval-Augmented Generation". It is a technique used in natural language processing (NLP) that combines two key components: a retriever model and a generator model. This method is particularly relevant in tasks that involve generating responses or text based on a large body of knowledge, such as open-domain question-answering, where the model needs access to a wide range of facts and information.

Random Forest

A supervised machine learning algorithm that grows and combines multiple decision trees to create a “forest.” Used for both classification and regression problems in R and Python.

Reinforcement Learning (RL):

A type of machine learning in which the model learns by trial and error, receiving rewards or punishments for its actions and adjusting its behavior accordingly.

RLHF (Reinforcement Learning from Human Feedback):

RHLF is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently. Reinforcement learning (RL) techniques train software to make decisions that maximize rewards, making their outcomes more accurate. RLHF incorporates human feedback in the rewards function, so the ML model can perform tasks more aligned with human goals, wants, and needs. RLHF is used throughout generative artificial intelligence (generative AI) applications, including in large language models (LLM).

Recall:

Given a set of results from a processed document, recall is the percentage value that indicates how many correct results have been retrieved based on the expectations of the application. It can apply to any class of a predictive AI system such as search, categorization and entity recognition.

For example, say you have an application that is supposed to find all the dog breeds in a document. If the application analyzes a document that mentions 10 dog breeds but only returns five values (all of which are correct), the system will have performed at 50% recall.

Find out more about recall on our Community reading this article.

Recurrent Neural Networks (RNN):

A neural network model commonly used in natural language process and speech recognition allowing previous outputs to be used as inputs.

Relations:

The identification of relationships is an advanced NLP function that presents information on how elements of a statement are related to each other. For example, “John is Mary’s father” will report that John and Mary are connected, and this datapoint will carry a link property that labels the connection as “family” or “parent-child.”

Responsible AI:

Responsible AI is a broad term that encompasses the business and ethical choices associated with how organizations adopt and deploy AI capabilities. Generally, Responsible AI looks to ensure Transparent (Can you see how an AI model works?); Explainable (Can you explain why a specific decision in an AI model was made?); Fair (Can you ensure that a specific group is not disadvantaged based on an AI model decision?); and Sustainable (Can the development and curation of AI models be done on an environmentally sustainable basis?) use of AI.

Learn more

Rules-based Machine Translation (RBMT)

Considered the “Classical Approach” of machine translation it is based on linguistic information about source and target that allow words to have different meaning depending on the context.

SAO (Subject-Action-Object)

Subject-Action-Object (SAO) is an NLP function that identifies the logical function of portions of sentences in terms of the elements that are acting as the subject of an action, the action itself, the object receiving the action (if one exists), and any adjuncts if present.

Read this article on our Community to learn more about Subject-Action-Object (SAO).

Semantic Network

A form of knowledge representation, used in several natural language processing applications, where concepts are connected to each other by semantic relationship.

Semantic Search

The use of natural language technologies to improve user search capabilities by processing the relationship and underlying intent between words by identifying concepts and entities such as people and organizations are revealed along with their attributes and relationships.

Semantics

Semantics is the study of the meaning of words and sentences. It concerns the relation of linguistic forms to non-linguistic concepts and mental representations to explain how sentences are understood by the speakers of a language.

Learn more about semantics on our blog reading this post “Introduction to Semantics“.

Semi-structured Data

Data that is structured in some way but does not obey the tablular structure of traditional databases or other conventional data tables most commonly organized in rows and columns. Attributes of the data are different even though they may be grouped together. A simple example is a form; a more advanced example is a object database where the data is represented in the form of objects that are related (e.g. automobile make relates to model relates to trim level).

Sentiment

Sentiment is the general disposition expressed in a text.

Read blog post “Natural Language Processing and Sentiment Analysis” to learn more about sentiment.

Sentiment Analysis

Sentiment analysis is an NLP function that identifies the sentiment in text. This can be applied to anything from a business document to a social media post. Sentiment is typically measured on a linear scale (negative, neutral or positive), but advanced implementations can categorize text in terms of emotions, moods, and feelings.

Similarity (and Correlation)

Similarity is an NLP function that retrieves documents similar to a given document. It usually offers a score to indicate the closeness of each document to that used in a query. However, there are no standard ways to measure similarity. Thus, this measurement is often specific to an application versus generic or industry-wide use cases.

Simple Knowledge Organization System (SKOS)

A common data model for knowledge organization systems such as thesauri, classification schemes, subject heading systems, and taxonomies.

Speech Analytics

The process of analyzing recordings or live calls with speech recognition software to find useful information and provide quality assurance. Speech analytics software identifies words and analyzes audio patterns to detect emotions and stress in a speaker’s voice.

Speech Recognition

Speech recognition or automatic speech recognition (ASR), computer speech recognition, or speech-to-text, enables a software program to process human speech into a written/text format.

Structured Data

Structured data is the data which conforms to a specific data model, has a well-defined structure, follows a consistent order and can be easily accessed and used by a person or a computer program. Structured data are usually stored in rigid schemas such as databases.

Symbolic Methodology

A symbolic methodology is an approach to developing AI systems for NLP based on a deterministic, conditional approach. In other words, a symbolic approach designs a system using very specific, narrow instructions that guarantee the recognition of a linguistic pattern. Rule-based solutions tend to have a high degree of precision, though they may require more work than ML-based solutions to cover the entire scope of a problem, depending on the application.

Learn more about symbolic methodology? Read blog post “The Case for Symbolic AI in NLP Models“.

Syntax

The arrangement of words and phrases in a specific order to create meaning in language. If you change the position of one word, it is possible to change the context and meaning.

Spatial Computing:

Spatial computing is the use of technology to add digital information and experiences to the physical world. This can include things like augmented reality, where digital information is added to what you see in the real world, or virtual reality, where you can fully immerse yourself in a digital environment. It has many different uses, such as in education, entertainment, and design, and can change how we interact with the world and with each other.

Stable Diffusion:

Stable Diffusion generates complex artistic images based on text prompts. It’s an open source image synthesis AI model available to everyone. Stable Diffusion can be installed locally using code found on GitHub or there are several online user interfaces that also leverage Stable Diffusion models.

Supervised Learning:

A type of machine learning in which the training data is labeled and the model is trained to make predictions based on the relationships between the input data and the corresponding labels.

Tagging

See Parts-of-Speech Tagging (aka POS Tagging).

Taxonomy

A taxonomy is a predetermined group of classes of a subset of knowledge (e.g., animals, drugs, etc.). It includes dependencies between elements in a “part of” or “type of” relationship, giving itself a multi-level, tree-like structure made of branches (the final node or element of every branch is known as a leaf). This creates order and hierarchy among knowledge subsets.

Companies use taxonomies to more concisely organize their documents which, in turn, enables internal or external users to more easily search for and locate the documents they need. They can be specific to a single company or become de-facto languages shared by companies across specific industries.

Find out more about taxonomy reading our blog post “What Are Taxonomies and How Should You Use Them?“.

Temporal Coherence:

Temporal Coherence refers to the consistency and continuity of information or patterns across time. This concept is particularly important in areas such as computer vision, natural language processing, and time-series analysis, where AI models need to process and understand data that evolves over time.

Temporal coherence can be viewed from different perspectives, depending on the specific application:

In computer vision, temporal coherence might refer to the smoothness and consistency of visual content in videos, where objects and scenes should maintain their properties and relationships across frames.

In natural language processing, it could refer to the consistency and flow of information in a text or conversation, ensuring that the AI model generates responses or summaries that logically follow previous statements or events.

In time-series analysis, temporal coherence could relate to the consistency of patterns and trends in the data, such that the AI model can predict future values based on past observations.

Test Set

A test set is a collection of sample documents representative of the challenges and types of content an ML solution will face once in production. A test set is used to measure the accuracy of an ML system after it has gone through a round of training.

Text Analytics

Techniques used to process large volumes of unstructured text (or text that does not have a predefined, structured format) to derive insights, patterns, and understanding; the process can include determining and classifying the subjects of texts, summarizing texts, extracting key entities from texts, and identifying the tone or sentiment of texts.

Learn more

Text Summarization

A range of techniques that automatically produce short textual summaries representing longer or multiple texts. The principal purpose of this technology is to reduce employee time and effort required to acquire insight from content, either by signaling the value of reading the source(s), or by delivering value directly in the form of the summary.

Thesauri

Language or terminological resource “dictionary” describing relationships between lexical words and phrases in a formalized form of natural language(s), enabling the use of descriptions and relationships in text processing.

Tokens

The individual words used to compose a sentence.

Training Set

A training set is the pre-tagged sample data fed to an ML algorithm for it to learn about a problem, find patterns and, ultimately, produce a model that can recognize those same patterns in future analyses.

Read this article on our Community to learn about training set.

Transformer Neural Networks:

The transformer neural network is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. It was first proposed in the paper "Attention Is All You Need" and is now a state-of-the-art technique in the field of NLP. The transformer neural network architecture aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease.

A paper called “Attention Is All You Need,” published in 2017, introduced an encoder-decoder architecture based on attention layers, which the authors called the transformer.

One main feature is that the input sequence can be passed parallelly so that GPU can be used effectively and the speed of training can also be increased. It is also based on the multi-headed attention layer, so it easily overcomes the vanishing gradient issue.

Treemap

Treemaps display large amounts of hierarchically structured (tree-structured) data. The space in the visualization is split up into rectangles that are sized and ordered by a quantitative variable. The levels in the hierarchy of the treemap are visualized as rectangles containing other rectangles.

Triple or Triplet Relations aka (Subject Action Object (SAO))

An advanced extraction technique which identifies three items (subject, predicate and object) that can be used to store information.

Tuning (aka Model Tuning or Fine Tuning)

The procedure of re-training a pre-trained language model using your own custom data. The weights of the original model are updated to account for the characteristics of the domain data and the task you are interested modeling. The customization generates the most accurate outcomes and best insights.

Unstructured Data:

Unstructured data do not conform to a data model and have no rigid structure. Lacking rigid constructs, unstructured data are often more representative of “real world” business information (examples – Web pages, images, videos, documents, audio).

Unsupervised Learning

A type of machine learning in which the training data is not labeled, and the model is trained to find patterns and relationships in the data on its own.

ViTL

Vision Transformer (ViT) is a type of Artificial Intelligence (AI) model that can process images without using traditional convolution neural networks (CNNs). Instead, ViT uses a transformer architecture, which is a type of neural network that is commonly used for natural language processing (NLP).

Webhook

A webhook is a way for one computer program to send a message or data to another program over the internet in real-time. It works by sending the message or data to a specific URL, which belongs to the other program. Webhooks are often used to automate processes and make it easier for different programs to communicate and work together. They are a useful tool for developers who want to build custom applications or create integrations between different soft

World Models

World models aim to capture the state of the environment at a specific point in time. They represent the world as the system (like a robot or AI) perceives it.

Structure: World models are more dynamic than ontologies. They include:

  • Objects: The things that exist in the environment (e.g., furniture, people, obstacles).

  • Properties: The current state of these objects (e.g., location, position, status).

  • Relationships: How objects are spatially or functionally related (e.g., "in front of," "connected to").

Ontologies and world models are both ways of representing knowledge, but they serve different purposes:

Zero Shot Video

Zero-shot in video and AI refers to the ability of a model to generate videos based on a textual description, without having seen any training data of videos with that description. This is a challenging task, as it requires the model to be able to understand the meaning of the text description and to generate a video that is consistent with the description.

Last updated