Meta SAM 2 for real-time segmentation of images and videos | Bing ai | | Generative ai Learning Path | Turtles AI
Meta Releases SAM 2: Innovations in Image and Video Segmentation.
Key Points :
- Unification of Segmentation: SAM 2 segments objects into images and videos in real time.
- Open Source: Code and templates released under Apache 2.0 license, SA-V dataset under CC BY 4.0 license.
- Advanced Performance: Significant improvements in accuracy and speed over previous models.
- Broad Applications: From video editing to data annotation for machine vision systems.
Meta announced the release of SAM 2, an advanced version of the Meta Segment Anything Model, which extends object segmentation capabilities from images to video, operating in real time. SAM 2 represents a significant step toward unified image and video segmentation, offering new possibilities for both practical applications and scientific research.
SAM 2 has been released under an open source Apache 2.0 license, allowing anyone to access the code and model weights. In addition, Meta shared the SA-V dataset, consisting of about 51,000 videos and more than 600,000 masklets (space-time masks), under a CC BY 4.0 license, making it one of the largest video segmentation datasets available.
The new model is distinguished by its ability to segment objects in any video or image, even if never seen before, due to zero-shot generalization. This allows SAM 2 to be used without the need for custom adaptations, facilitating a wide range of applications.
SAM 2 has potential uses in a variety of areas. It can be used in video editing to create new effects, improve visual data annotation tools, and assist in building advanced computer vision systems. Its fast and accurate performance makes it particularly useful in applications requiring real-time segmentation, such as robotics and autonomous vehicles.
The SA-V dataset was developed using an interactive model-in-the-loop approach, with human annotators using SAM 2 to create masklets in the videos. This iterative cycle has improved both the model and the dataset, leading to a significantly larger and more diverse data collection than previous datasets.
SAM 2 introduces an advanced memory architecture to handle segmentation in the videos, storing information about the segmented objects and using it to improve predictions in subsequent frames. This approach allows segmentation accuracy to be maintained even in the presence of occlusions and illumination changes.
SAM 2 results show superior performance compared with previous models, with significant improvement in interactive video segmentation and reduced human interaction time required. Its ability to segment in real time at 44 frames per second makes it ideal for applications that require immediate response.
The release of SAM 2 and the SA-V dataset is an important contribution to the artificial intelligence community, promoting further research and development in this field. Meta invites researchers and developers to explore the potential of SAM 2 by experimenting with new applications and use cases that can benefit from its advanced segmentation capabilities.
With SAM 2, Meta continues to push the boundaries of computer vision, offering powerful and accessible tools that can have a significant impact on various fields, from creativity to scientific and medical research.