New method for image generation: high quality with less resources | | Ai image generator from image free online from text | Ai generator | Turtles AI
An innovative image generation technique allows to obtain high-quality results with reduced computational consumption, ensuring greater operational efficiency. The combination of two distinct approaches has allowed to develop a system capable of running on common devices, such as laptops and smartphones, significantly accelerating processing times.
Key points:
- Hybrid approach between autoregressive and diffusion models to improve quality and speed.
- Reduced computational consumption for execution on standard devices.
- Applications in areas such as simulation, robotics and visual design.
- Compatibility with multimodal models for advanced integration with AI systems.
AI image generation has taken a step forward with the creation of a hybrid method that combines the speed of autoregressive models with the accuracy of diffusion models. This new system, called HART (Hybrid Autoregressive Transformer), developed by a team of researchers from MIT and NVIDIA, overcomes the limitations of traditional methods by providing images of comparable or higher quality than the most advanced diffusion models, but with a generation speed up to nine times faster.
Diffusion models, such as Stable Diffusion and DALL-E, produce detailed images through an iterative process of noise removal, which is however computationally expensive. In contrast, autoregressive models predict the image sequentially, ensuring greater speed but with a higher risk of errors, due to the loss of information when compressing the data into discrete tokens. The solution introduced with HART uses an autoregressive model to outline the general structure of the image and a lightweight diffusion model to refine the details through the introduction of residual tokens, which recover the information lost in the first phase of the process. This allows to drastically reduce the number of iterations needed for the generation, improving the overall efficiency of the system.
A key aspect of this methodology is the ability to run the model on everyday devices, such as commercial laptops and smartphones, without the need for specialized hardware. This feature opens the way to a wide range of applications, from the creation of simulated environments for training autonomous vehicles to graphic design for video games and virtual scenarios. Furthermore, the structure of HART facilitates integration with multimodal AI models, which combine text and images, improving human-machine interactions in complex contexts.
From a technical point of view, HART employs a 700 million parameter autoregressive transformer and a 37 million parameter diffusion model, achieving performances comparable to those of diffusion models with 2 billion parameters, but with a 31% improved computational efficiency. This architecture overcomes the difficulties encountered in integrating the two approaches, exploiting the diffusion model only in the final phase to generate the missing details. The synergy between the two components has proven to be fundamental to avoid cumulative errors, ensuring high fidelity of the resulting images.
Future prospects include extending the method to video and audio generation, further expanding the potential applications of this technology. The project has received support from the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, Amazon Science Hub and the US National Science Foundation, with GPU infrastructure provided by NVIDIA. The research will be presented at the International Conference on Learning Representations, highlighting the impact of this approach in the generative AI landscape.
A further step towards efficiency and versatility in the field of digital imaging.