Florence-2: Vision foundation model (Microsoft)

๐Ÿ”ฅMicrosoft drops Florence-2: Vision foundation model that slays! ๐Ÿš€ All models are released on Hugging Face hub. Learn more๐Ÿ‘‰ - 230M & 770M param models crush specialists in captioning, detection & more ๐Ÿ’ช - 230M model beats Flamingo 80B (400x bigger!) in zero-shot ๐Ÿคฏ - Trained on FLD-5B: 5.4B annotations, 126M images ๐Ÿ“Š - Fine-tuned: SOTA in captioning, VQA, referring expressions ๐Ÿ† - Excel in captioning, object detection, segmentation, VQA & more ๐ŸŽจ๐Ÿ”โ“ - Leverage multi-task learning on massive FLD-5B dataset ๐Ÿ’ก - Beat larger models like PaLI, PaLI-X in specialist tasks ๐ŸฅŠ - Available in 230M & 770M param versions for all ๐Ÿค— ๐ŸŒŸ Florence-2 is clearly a unified vision representation powerhouse! ๐Ÿฆพ ๐Ÿ™Œ Kudos to Microsoft for advancing vision foundation models and for ๐Ÿ‘ for open-sourcing! All models are on Hugging Face Hub.

