Ai Tools Research
Ctrlk
  • Group 1
  • 🔲About Ai Tools Research
    • AI Adoption Consultation & Training Services
    • LLM Performance Benchmarks
    • Youtube Videos Directory
    • Frequently Asked Questions (FAQs)
      • A Typology of AI
      • Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs) and Diffusion Models
      • What is 'Latent Space' in Image Generation?
      • What is LoRA and How does LoRA work
      • What is Gradient Descent?
      • What are Vector databases are and how they work?
      • What is 'Inpainting' & 'Outpainting' ?
      • What is 'DPO' in LLM Training?
      • What is 'One-Shot' Learning?
      • FAQs on LLM Training and Data Labelling
        • LLMs Main Concepts Explained
        • LLM Evaluation
        • Building Datasets
        • What is an 'Uncensored LLM'
        • What are Parameters in LLMs?
        • Parameters vs Tokens in LLMs?
        • What are Model Weights?
        • What is 'Inference Cost'?
        • Training Corpus and Datasets
          • Open-Sourced Training Datasets for LLMs
          • Datasets List from Dr. Alan Thompson
          • Corpus Used by Large Language Models (LLMs) for Different Applications
        • What are 'Tokens' ?
        • What are Token Limits?
        • What Are Context Windows?
        • How to Fine Tune LLMs?
        • Case Study of Fine-Tuning an LLM
        • What is "RAG," (Retrieval-Augmented Generation)?
        • What does "Release Base, Instruct and Reward Model" mean?
    • Articles and Transcripts
  • 🔲LARGE LANGUAGE MODELS (LLM's)
  • Blockchain & AI
  • Ai Tools Main Categories
    • 🔲TEXT & WRITING
    • 🔲AUDIO, SPEECH & MUSIC
    • 🔲VIDEO & ANIMATION
    • 🔲IMAGES, ART & DESIGN
    • 🔲PROGRAMMING & CODE
    • 🟢Prompt Design and Engineering
    • 🔲AI RESOURCES
    • 🔲AI HARDWARE (GPU's & TPU's) and Cloud Services
    • 🔲OTHER
  • 🔲SOLUTIONS & TUTORIALS
  • 🔲AI TECHNOLOGY
  • 🔲GLOSSARY OF AI TERMS
Powered by GitBook
On this page
  1. 🔲About Ai Tools Research
  2. Frequently Asked Questions (FAQs)
  3. FAQs on LLM Training and Data Labelling
  4. Training Corpus and Datasets

Datasets List from Dr. Alan Thompson

Datasets: https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=484905095

Models Table: https://lifearchitect.ai/models-table/

PreviousOpen-Sourced Training Datasets for LLMsNextCorpus Used by Large Language Models (LLMs) for Different Applications

Last updated 1 year ago