Florence-2: Vision foundation model (Microsoft)

🔥Microsoft drops Florence-2: Vision foundation model that slays! 🚀 All models are released on Hugging Face hub. Learn more👉 - 230M & 770M param models crush specialists in captioning, detection & more 💪 - 230M model beats Flamingo 80B (400x bigger!) in zero-shot 🤯 - Trained on FLD-5B: 5.4B annotations, 126M images 📊 - Fine-tuned: SOTA in captioning, VQA, referring expressions 🏆 - Excel in captioning, object detection, segmentation, VQA & more 🎨🔍❓ - Leverage multi-task learning on massive FLD-5B dataset 💡 - Beat larger models like PaLI, PaLI-X in specialist tasks 🥊 - Available in 230M & 770M param versions for all 🤗 🌟 Florence-2 is clearly a unified vision representation powerhouse! 🦾 🙌 Kudos to Microsoft for advancing vision foundation models and for 👏 for open-sourcing! All models are on Hugging Face Hub.

Last updated