3. Transformers

Transformers These models, introduced in the paper "Attention is All You Need", use self-attention mechanisms and are very effective for many NLP tasks. Variants include:

(a) Transformer (base model)

(b) BERT (Bidirectional Encoder Representations from Transformers)

(c) GPT (Generative Pretrained Transformer)

(d) T5 (Text-to-Text Transfer Transformer)

(e) BART (Bidirectional and Auto-Regressive Transformers)

(f) RoBERTa, ALBERT, DistilBERT (variants of BERT with different training strategies or model sizes)

Last updated