Google Launches AI for Virtual Clothing Try-On Experience
Google has announced its latest feature that can transform the online shopping experience for clothing. The feature is called "virtual try-on," allowing users to try on clothes virtually.
With this feature, users can clearly see how clothes will fit and interact with a model's body. Users can observe in detail how the clothes will look when worn, including how they move, fold, cling, stretch, and wrinkle according to the model's body shape.
To bring this feature to life, artificial intelligence (AI) researchers at Google Shopping have developed a new generative AI model that produces realistic images of clothing worn by humans.
Let's take a closer look at this new AI model and how exactly the virtual try-on (VTO) feature works.
Our new virtual try-on feature uses a technique called diffusion to show you what clothes look like on a wide range of people. Learn more about the tech that's making it easier for you to get a better sense of what clothes will look like on you → https://t.co/MbhscWYUml pic.twitter.com/F6pWCXmFER
— Google (@Google) June 15, 2023
The famous Virtual Try-On (VTO) technique was inspired by the movie "Clueless." Since then, there have been significant advancements in this technique. Currently, through geometric transformations, clothing images can be cropped, pasted, and reshaped to match body silhouettes. However, the final results are not yet entirely realistic. The clothing does not adjust realistically to the body shape, and there are visual flaws such as incorrect folds, making the clothes look shapeless and unnatural.
Therefore, when deciding to build the new VTO feature, the primary goal was to generate high-quality and realistic clothing images. In this development, a new approach was discovered using an AI model called diffusion.
To understand how this model works, let's first explain diffusion. Diffusion is a gradual process of adding additional pixels (or "noise") to an image until the image becomes unrecognizable and then removing the noise altogether until the original image is reconstructed perfectly. Text-to-image models like Imagen use diffusion along with text from a large language model (LLM) to generate realistic images based on the text input by the user.
Inspired by Imagen, Google decided to incorporate diffusion into VTO but with modifications. Instead of using text as input during diffusion, Google uses a pair of images: one of the clothing and one of a person wearing the clothing. Each image is fed into its own neural network (U-net) and shares information through a process called "cross-attention" to produce an output image of the person realistically wearing the clothing. This combination of image-based diffusion and cross-attention becomes Google's new AI model.
To maximize the usefulness and realism of this VTO feature, rigorous training was conducted using this new AI model. However, unlike Imagen, which trained its model using LLM, this development utilized Google's Shopping Graph. Google's Shopping Graph is the most comprehensive collection of product, seller, brand, review, and inventory data in the world.
Google trained this model using numerous pairs of images, each pair consisting of a person wearing the clothing in two different poses—for example, an image of someone wearing a shirt in a sideways position and a front-facing position.
In this scenario, the AI model learns to match the shape of the shirt in the sideways pose with the person in the front-facing pose and vice versa, eventually generating realistic images of the shirt on that person from all angles.
To improve the quality, Google continues iterating this process using millions of different random pairs of images from various clothing items and people. The result enables users to see how a top looks on their chosen model.
Starting today, users can use the virtual try-on feature for women's clothing from various brands in Google's Shopping Graph, including Anthropologie, LOFT, H&M, and Everlane. Over time, Google hopes to make this tool more accurate and expand it to include other favorite brands of users.