META AI releases DinoV2 | | | | Turtles AI

META AI releases DinoV2
DukeRem
  #Meta #AI, a leading AI #research company, has announced the release of #DINOv2, a new method for training high-performance #computer-vision #models using #self-supervised learning (#SSL). This innovative approach allows the model to learn from any collection of images without the need for #labelled data, providing a powerful and flexible way to train AI models. DINOv2 provides strong performance features that can be directly used as inputs for simple linear classifiers, making it suitable for use as a backbone for many different computer vision tasks. The model does not require fine-tuning, which means it remains general and can be used simultaneously on many different tasks. According to Meta AI, the DINOv2 method can learn features, such as depth estimation, that the current standard approach cannot. The company has open-sourced the model and shared an interactive demo, which allows users to see the capabilities of the model. In a statement, Meta AI said that DINOv2 will be useful in a wide variety of applications. The company has already collaborated with the World Resources Institute to use AI to map forests, tree by tree, across areas the size of continents. The self-supervised model was trained on data from forests in North America but was found to generalize well and deliver accurate maps in other locations around the world. The release of DINOv2 comes at a time when the performance of joint embedding models that train features by matching data augmentations is plateauing. The evaluation performance on ImageNet had moved by only 1% since 2021, and not much since 2019. The community focused more on developing alternatives, such as masked-image modelling, limiting progress in that field. In addition, the DINO class of models, among other SSL methods, was difficult to train outside of the classical scope of ImageNet, limiting their adoption for research. Making progress from DINO to DINOv2 required overcoming several challenges, including creating a large and curated training dataset, improving the training algorithm and implementation, and designing a functional distillation pipeline. Meta AI built a pipeline to select useful data inspired by LASER and created a pretraining dataset totalling 142 million images out of the 1.2 billion source images. With more training data, larger models perform better than smaller ones, but their training poses two major challenges. First, increasing the model size makes the training more challenging because of potential instability. Second, larger models require more efficient implementations. The DINOv2 training code integrates the latest mixed-precision and distributed training implementations proposed in the cutting-edge PyTorch 2, allowing faster and more efficient iteration cycles.