Ai Tools Research
Ctrlk
  • 🔲About Ai Tools Research
    • AI Adoption Consultation & Training Services
    • LLM Performance Benchmarks
    • Youtube Videos Directory
    • Frequently Asked Questions (FAQs)
      • A Typology of AI
      • Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs) and Diffusion Models
      • What is 'Latent Space' in Image Generation?
      • What is LoRA and How does LoRA work
      • What is Gradient Descent?
      • What are Vector databases are and how they work?
      • What is 'Inpainting' & 'Outpainting' ?
      • What is 'DPO' in LLM Training?
      • What is 'One-Shot' Learning?
      • FAQs on LLM Training and Data Labelling
        • LLMs Main Concepts Explained
        • LLM Evaluation
        • Building Datasets
        • What is an 'Uncensored LLM'
        • What are Parameters in LLMs?
        • Parameters vs Tokens in LLMs?
        • What are Model Weights?
        • What is 'Inference Cost'?
        • Training Corpus and Datasets
          • Open-Sourced Training Datasets for LLMs
          • Datasets List from Dr. Alan Thompson
          • Corpus Used by Large Language Models (LLMs) for Different Applications
        • What are 'Tokens' ?
        • What are Token Limits?
        • What Are Context Windows?
        • How to Fine Tune LLMs?
        • Case Study of Fine-Tuning an LLM
        • What is "RAG," (Retrieval-Augmented Generation)?
        • What does "Release Base, Instruct and Reward Model" mean?
    • Articles and Transcripts
  • 🔲LARGE LANGUAGE MODELS (LLM's)
  • Ai Tools Main Categories
    • 🔲TEXT & WRITING
    • 🔲AUDIO, SPEECH & MUSIC
    • 🔲VIDEO & ANIMATION
    • 🔲IMAGES, ART & DESIGN
    • 🔲PROGRAMMING & CODE
    • 🟢Prompt Design and Engineering
    • 🔲AI RESOURCES
    • 🔲AI HARDWARE (GPU's & TPU's) and Cloud Services
    • 🔲OTHER
  • 🔲SOLUTIONS & TUTORIALS
  • 🔲AI TECHNOLOGY
  • 🔲GLOSSARY OF AI TERMS
Powered by GitBook
On this page
  1. 🔲About Ai Tools Research
  2. Frequently Asked Questions (FAQs)
  3. FAQs on LLM Training and Data Labelling
  4. Training Corpus and Datasets

Datasets List from Dr. Alan Thompson

LogoDr Alan D. Thompson – LifeArchitect.aiDr Alan D. Thompson – LifeArchitect.ai

Datasets: https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=484905095

Models Table: https://lifearchitect.ai/models-table/

PreviousOpen-Sourced Training Datasets for LLMsNextCorpus Used by Large Language Models (LLMs) for Different Applications

Last updated 1 year ago