# 3. Transformers

Transformers\
These models, introduced in the paper "Attention is All You Need", use self-attention mechanisms and are very effective for many NLP tasks. Variants include:&#x20;

(a) Transformer (base model)&#x20;

(b) BERT (Bidirectional Encoder Representations from Transformers)&#x20;

(c) GPT (Generative Pretrained Transformer)&#x20;

(d) T5 (Text-to-Text Transfer Transformer)&#x20;

(e) BART (Bidirectional and Auto-Regressive Transformers)

(f) RoBERTa, ALBERT, DistilBERT (variants of BERT with different training strategies or model sizes)
