Part 3: The 3-Engine Showdown — PASTA-GAN++ vs Nano Banana vs Vertex AI VTO
![]()
Introduction
Part 2 established that Nano Banana can perform virtual try-on effectively in isolation. But how does it compare to Google’s dedicated Vertex AI Virtual Try-On model? And how do both compare to the original PASTA-GAN++ system?
This final article documents 12 additional test cases designed to answer these questions. The comparison script (compare_vto.py) generates side-by-side images with all engines processing the same inputs — making the differences impossible to miss.
Source code: github.com/matu79go/metafit
Setting Up the Comparison
The comparison system runs Nano Banana and Vertex AI VTO against the same image pair, then composites the results into a single side-by-side image. For three-engine comparisons, PASTA-GAN++ results from the legacy system were included as an additional panel.
def create_comparison(person_img, clothing_img, nano_result, vto_result,
pasta_result=None, description=""):
panels = [person_img, clothing_img]
labels = ["Person", "Clothing"]
if pasta_result:
panels.append(pasta_result)
labels.append("PASTA-GAN++")
if nano_result:
panels.append(nano_result)
labels.append("Nano Banana")
if vto_result:
panels.append(vto_result)
labels.append("Vertex VTO")
# Resize all panels to same height, compose side by side
target_h = 800
resized = []
for panel in panels:
ratio = target_h / panel.height
resized.append(panel.resize(
(int(panel.width * ratio), target_h), Image.LANCZOS
))
# ... compose into single image
Nano Banana vs Vertex VTO: Clothing Mode
The first set of tests compared the two engines in clothing mode — applying a flat product image to a person photo. This is Vertex VTO’s primary use case.
Test: Woman + Red Dress

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result
| Aspect | Nano Banana | Vertex AI VTO |
|---|---|---|
| Color accuracy | Shifted from red to pink | Exact red preserved |
| Design fidelity | Changed to long pleated dress | Maintained original length and shape |
| Pose preservation | Good | Good |
Vertex VTO wins clearly. Nano Banana “interpreted” the dress design and altered it significantly. Vertex VTO reproduced the exact product as shown in the image.
Test: Man + Hoodie

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result
Both engines produced good results. Vertex VTO maintained the background more accurately.
Test: Man + Suit

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result
Both engines performed comparably. For simple, well-defined garments, the difference between the two was minimal.
Clothing Mode Summary
| Pattern | Nano Banana | Vertex AI VTO |
|---|---|---|
| Distinctive designs (dresses) | Tends to alter design | Faithful reproduction |
| Simple garments (T-shirts, hoodies) | Good | Good |
| Background preservation | Some variation | More consistent |
For EC applications where product fidelity is critical, Vertex VTO is the clear choice.
Nano Banana vs Vertex VTO: Transfer Mode
Transfer mode — extracting clothing from one person and applying to another — revealed fundamental differences between the two engines.
Test: Dance Pose Transfer

Left to right: Target person, Source person (clothing to extract), Nano Banana result, Vertex VTO result
- Target: Woman in dance pose, wearing tank top
- Source: Standing woman in T-shirt
| Aspect | Nano Banana | Vertex AI VTO |
|---|---|---|
| Clothing extraction | Correctly extracted source’s T-shirt and applied it | Reverted to target’s original tank top |
| Shoe transfer | Did not transfer shoes | Transferred shoes accurately |
This was a defining discovery. Vertex VTO is designed for product images (flat, white background). When given a person image as the “product,” it cannot reliably extract the clothing — it defaults to the target’s existing garment.
For transfer mode, Nano Banana is the only viable option. Vertex VTO simply does not support this use case.
Safety Filter Differences

Left to right: Target person, Source person, Nano Banana result, Vertex VTO result
- Target: Curvy woman (159cm)
- Source: Woman in dance pose, exposed clothing
| Aspect | Nano Banana | Vertex AI VTO |
|---|---|---|
| Result | Blocked by IMAGE_SAFETY | Succeeded |
| Body preservation | — | Maintained body shape |
Nano Banana’s safety filter is stricter than VTO’s for exposed clothing. This creates a practical gap for fashion applications where sportswear, swimwear, and evening wear are standard product categories.
However, VTO’s safety filter showed inconsistencies — blocking female models in some contexts while allowing male models in similar states of undress.
The Three-Engine Showdown
The most revealing tests added PASTA-GAN++ results alongside the two generative AI engines. These tests specifically targeted scenarios where PASTA-GAN++ had historically struggled: underrepresented body types and complex garment patterns.
Test: Large Male + Border Sweater (5-Panel Comparison)

Left to right: Source person, PASTA-GAN++ result, Nano Banana result (Vertex VTO failed — clothing was not transferred)
- Target: Large-build man
- Source: Slim man in border sweater and khaki shorts
| Aspect | PASTA-GAN++ | Nano Banana | Vertex AI VTO |
|---|---|---|---|
| Border pattern | Reproduced | Clean reproduction | — |
| Body preservation | Slightly slimmed | Faithful to original | — |
| Shorts transfer | Transferred | Transferred | — |
| Face quality | Degraded | High quality | — |
| Overall | Functional | Excellent | Failed — clothing not transferred |
Vertex VTO failed entirely. The API returned an image, but the subject’s clothing was unchanged — the model could not interpret the source person image as a “product.” This confirmed that VTO cannot perform transfer mode.
Tests 10-12: Curvy Woman (159cm) × 3 Outfits — The Definitive Comparison
These three tests used the same curvy female subject with different clothing sources, providing the most controlled comparison of all three engines.
Test 10: Floral Shirt + Denim (Desigual)

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO
| Aspect | PASTA-GAN++ | Nano Banana | Vertex AI VTO |
|---|---|---|---|
| Floral pattern | Collapsed — garment unrecognizable | Clean reproduction | Reasonable |
| Body preservation | Slimmed | Faithful | Pulled toward source pose |
| Denim reproduction | Failed | Accurate | Reasonable |
Test 11: Grey Sweatshirt + Black Pants (H&M)

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO
| Aspect | PASTA-GAN++ | Nano Banana | Vertex AI VTO |
|---|---|---|---|
| Garment reproduction | Collapsed | Accurate | Reasonable |
| Body preservation | Slimmed | Faithful | Pulled toward source pose |
Test 12: White T-shirt + Jeans (Male → Female)

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO
| Aspect | PASTA-GAN++ | Nano Banana | Vertex AI VTO |
|---|---|---|---|
| Garment reproduction | Collapsed | Accurate | Accurate |
| Body preservation | Slimmed | Legs faithfully reproduced | Pulled toward source pose |
| Cross-gender adaptation | Failed | Natural | Reasonable |
Nano Banana’s standout result: the jeans were rendered to match the 159cm subject’s actual leg proportions. The fabric stretched and contoured accurately around the thighs — demonstrating that the model genuinely understands how garments interact with different body shapes, rather than applying a generic template.
Three-Engine Summary
The pattern across all curvy body type tests was unambiguous:
PASTA-GAN++: Failed comprehensively. In every test, garments collapsed into visual noise and the subject’s body was systematically slimmed. This reflects GAN training data bias — the model learned primarily from slim models and could not generalize beyond that distribution.
Nano Banana: Consistently excellent in transfer mode. Body shape was faithfully preserved, complex patterns were reproduced accurately, and cross-gender adaptation worked naturally.
Vertex AI VTO: Could not perform transfer mode at all. In clothing mode, it was reliable but tended to pull the subject’s pose toward the source image’s pose.
Final Comparison Matrix
| Capability | PASTA-GAN++ | Nano Banana | Vertex AI VTO |
|---|---|---|---|
| Clothing mode (product → person) | N/A | △–○ | ◎ |
| Transfer mode (person → person) | △ | ◎ | × Not supported |
| Body type preservation | × Slims subjects | ◎ | ○ (clothing mode) |
| Complex patterns | × Collapses | ◎ | ○ |
| Face quality | △ | ◎ | ◎ |
| Shoe/accessory transfer | N/A | × | ◎ |
| Color fidelity (clothing mode) | N/A | △ Alters colors | ◎ |
| Safety filter | None (local) | △ Strict | △ Inconsistent |
| Setup complexity | High (Docker + GPU) | Low (API key) | Medium (GCP) |
| Cost per image | GPU compute | ~Free tier | ~$0.02-0.04 |
| Commercial license | × Non-commercial | ◎ | ◎ |
| Cross-gender | × | ◎ | ○ |
The Optimal Architecture
Based on all 28 test cases (16 from Part 2 + 12 from this article), the recommended production architecture is:
| Input Type | Engine | Mode | Best For |
|---|---|---|---|
| Product image (flat, white background) | Vertex AI VTO | Clothing mode | EC product pages, color fidelity, shoes/accessories |
| Person image (wearing clothes) | Nano Banana (Gemini) | Transfer mode | Social/UGC, body diversity, cross-gender adaptation |
Neither engine alone covers all use cases. The hybrid approach — routing by input type — delivers the best results across the full range of virtual try-on scenarios.
Reflecting on 20 Years
I first imagined a virtual try-on system roughly two decades ago. At the time, it was a photo booth concept — hardware that did not yet exist, powered by AI that had not yet been invented.
Over those years, I built implementations using PF-AFN, PIFu, and PASTA-GAN++. Each represented the best available technology of its era. Each produced results that were promising but ultimately insufficient: garments that collapsed on diverse body types, processing that required GPU clusters, licenses that prevented commercial use.
The arrival of generative AI changed the equation. An API key and a well-crafted prompt now produce results that surpass everything the multi-stage GAN pipeline could achieve. Body diversity — the problem I could not solve with GANs — is handled naturally. Processing speed went from minutes per image to seconds. Infrastructure went from Docker + NVIDIA GPU to a single HTTP request.
Challenges remain. Safety filters block legitimate fashion content. Extremity poses sometimes shift. Real-time processing at scale requires cost optimization. But these are engineering problems with clear solutions — not fundamental limitations of the approach.
The 20-year-old dream of seeing yourself in clothes before buying is no longer a question of if. The generative AI era has made it a question of how well — and the answer is already remarkably good.
META FIT GenAI Series:
- Part 1: From GANs to Generative AI
- Part 2: Nano Banana Virtual Try-On — 16 Test Cases
- Part 3: The 3-Engine Showdown (You are here)
Related: