Apple, in collaboration with researchers from the University of California, Santa Barbara, has released an open-source AI model called “MGIE” (MLLM-Guided Image Editing). This model leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level image edits based on natural language instructions. MGIE can handle various editing scenarios, from simple colour adjustments to complex object manipulations, and can perform global and local edits. It can produce clear instructions from user input, generate a latent representation of the desired edit, and guide pixel-level manipulation. The model is available as an open-source project on GitHub, and users can also try it out online through a web demo hosted on Hugging Face Spaces. MGIE represents a significant breakthrough in instruction-based image editing, demonstrating the potential of MLLMs in enhancing image editing and opening new possibilities for cross-modal interaction and communication. It can be used in various scenarios, such as social media, e-commerce, education, entertainment, and art, and can help users express their ideas and emotions through images. This release also highlights Apple’s growing AI research and development prowess, demonstrating how AI can enhance everyday creative tasks. However, experts say there is still plenty of work ahead to improve multimodal AI systems.


Image Credits : VentureBeat made with Midjourney