Part 3: The 3-Engine Showdown — PASTA-GAN++ vs Nano Banana vs Vertex AI VTO

Introduction

Part 2 established that Nano Banana can perform virtual try-on effectively in isolation. But how does it compare to Google’s dedicated Vertex AI Virtual Try-On model? And how do both compare to the original PASTA-GAN++ system?

This final article documents 12 additional test cases designed to answer these questions. The comparison script (compare_vto.py) generates side-by-side images with all engines processing the same inputs — making the differences impossible to miss.

Source code: github.com/matu79go/metafit

Setting Up the Comparison

The comparison system runs Nano Banana and Vertex AI VTO against the same image pair, then composites the results into a single side-by-side image. For three-engine comparisons, PASTA-GAN++ results from the legacy system were included as an additional panel.

def create_comparison(person_img, clothing_img, nano_result, vto_result,
                      pasta_result=None, description=""):
    panels = [person_img, clothing_img]
    labels = ["Person", "Clothing"]

    if pasta_result:
        panels.append(pasta_result)
        labels.append("PASTA-GAN++")

    if nano_result:
        panels.append(nano_result)
        labels.append("Nano Banana")

    if vto_result:
        panels.append(vto_result)
        labels.append("Vertex VTO")

    # Resize all panels to same height, compose side by side
    target_h = 800
    resized = []
    for panel in panels:
        ratio = target_h / panel.height
        resized.append(panel.resize(
            (int(panel.width * ratio), target_h), Image.LANCZOS
        ))
    # ... compose into single image

Nano Banana vs Vertex VTO: Clothing Mode

The first set of tests compared the two engines in clothing mode — applying a flat product image to a person photo. This is Vertex VTO’s primary use case.

Test: Woman + Red Dress

Red dress comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

Aspect	Nano Banana	Vertex AI VTO
Color accuracy	Shifted from red to pink	Exact red preserved
Design fidelity	Changed to long pleated dress	Maintained original length and shape
Pose preservation	Good	Good

Vertex VTO wins clearly. Nano Banana “interpreted” the dress design and altered it significantly. Vertex VTO reproduced the exact product as shown in the image.

Test: Man + Hoodie

Hoodie comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

Both engines produced good results. Vertex VTO maintained the background more accurately.

Test: Man + Suit

Suit comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

Both engines performed comparably. For simple, well-defined garments, the difference between the two was minimal.

Clothing Mode Summary

Pattern	Nano Banana	Vertex AI VTO
Distinctive designs (dresses)	Tends to alter design	Faithful reproduction
Simple garments (T-shirts, hoodies)	Good	Good
Background preservation	Some variation	More consistent

For EC applications where product fidelity is critical, Vertex VTO is the clear choice.

Nano Banana vs Vertex VTO: Transfer Mode

Transfer mode — extracting clothing from one person and applying to another — revealed fundamental differences between the two engines.

Test: Dance Pose Transfer

Transfer mode comparison

Left to right: Target person, Source person (clothing to extract), Nano Banana result, Vertex VTO result

Target: Woman in dance pose, wearing tank top
Source: Standing woman in T-shirt

Aspect	Nano Banana	Vertex AI VTO
Clothing extraction	Correctly extracted source’s T-shirt and applied it	Reverted to target’s original tank top
Shoe transfer	Did not transfer shoes	Transferred shoes accurately

This was a defining discovery. Vertex VTO is designed for product images (flat, white background). When given a person image as the “product,” it cannot reliably extract the clothing — it defaults to the target’s existing garment.

For transfer mode, Nano Banana is the only viable option. Vertex VTO simply does not support this use case.

Safety Filter Differences

Safety filter comparison

Left to right: Target person, Source person, Nano Banana result, Vertex VTO result

Target: Curvy woman (159cm)
Source: Woman in dance pose, exposed clothing

Aspect	Nano Banana	Vertex AI VTO
Result	Blocked by IMAGE_SAFETY	Succeeded
Body preservation	—	Maintained body shape

Nano Banana’s safety filter is stricter than VTO’s for exposed clothing. This creates a practical gap for fashion applications where sportswear, swimwear, and evening wear are standard product categories.

However, VTO’s safety filter showed inconsistencies — blocking female models in some contexts while allowing male models in similar states of undress.

The Three-Engine Showdown

The most revealing tests added PASTA-GAN++ results alongside the two generative AI engines. These tests specifically targeted scenarios where PASTA-GAN++ had historically struggled: underrepresented body types and complex garment patterns.

Test: Large Male + Border Sweater (5-Panel Comparison)

3-engine comparison: border sweater on diverse body type

Left to right: Source person, PASTA-GAN++ result, Nano Banana result (Vertex VTO failed — clothing was not transferred)

Target: Large-build man
Source: Slim man in border sweater and khaki shorts

Aspect	PASTA-GAN++	Nano Banana	Vertex AI VTO
Border pattern	Reproduced	Clean reproduction	—
Body preservation	Slightly slimmed	Faithful to original	—
Shorts transfer	Transferred	Transferred	—
Face quality	Degraded	High quality	—
Overall	Functional	Excellent	Failed — clothing not transferred

Vertex VTO failed entirely. The API returned an image, but the subject’s clothing was unchanged — the model could not interpret the source person image as a “product.” This confirmed that VTO cannot perform transfer mode.

Tests 10-12: Curvy Woman (159cm) × 3 Outfits — The Definitive Comparison

These three tests used the same curvy female subject with different clothing sources, providing the most controlled comparison of all three engines.

Test 10: Floral Shirt + Denim (Desigual)

3-engine: complex floral pattern on curvy body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

Aspect	PASTA-GAN++	Nano Banana	Vertex AI VTO
Floral pattern	Collapsed — garment unrecognizable	Clean reproduction	Reasonable
Body preservation	Slimmed	Faithful	Pulled toward source pose
Denim reproduction	Failed	Accurate	Reasonable

Test 11: Grey Sweatshirt + Black Pants (H&M)

3-engine: simple garments on curvy body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

Aspect	PASTA-GAN++	Nano Banana	Vertex AI VTO
Garment reproduction	Collapsed	Accurate	Reasonable
Body preservation	Slimmed	Faithful	Pulled toward source pose

Test 12: White T-shirt + Jeans (Male → Female)

3-engine showdown highlight: male clothing on curvy female body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

Aspect	PASTA-GAN++	Nano Banana	Vertex AI VTO
Garment reproduction	Collapsed	Accurate	Accurate
Body preservation	Slimmed	Legs faithfully reproduced	Pulled toward source pose
Cross-gender adaptation	Failed	Natural	Reasonable

Nano Banana’s standout result: the jeans were rendered to match the 159cm subject’s actual leg proportions. The fabric stretched and contoured accurately around the thighs — demonstrating that the model genuinely understands how garments interact with different body shapes, rather than applying a generic template.

Three-Engine Summary

The pattern across all curvy body type tests was unambiguous:

PASTA-GAN++: Failed comprehensively. In every test, garments collapsed into visual noise and the subject’s body was systematically slimmed. This reflects GAN training data bias — the model learned primarily from slim models and could not generalize beyond that distribution.

Nano Banana: Consistently excellent in transfer mode. Body shape was faithfully preserved, complex patterns were reproduced accurately, and cross-gender adaptation worked naturally.

Vertex AI VTO: Could not perform transfer mode at all. In clothing mode, it was reliable but tended to pull the subject’s pose toward the source image’s pose.

Final Comparison Matrix

Capability	PASTA-GAN++	Nano Banana	Vertex AI VTO
Clothing mode (product → person)	N/A	△–○	◎
Transfer mode (person → person)	△	◎	× Not supported
Body type preservation	× Slims subjects	◎	○ (clothing mode)
Complex patterns	× Collapses	◎	○
Face quality	△	◎	◎
Shoe/accessory transfer	N/A	×	◎
Color fidelity (clothing mode)	N/A	△ Alters colors	◎
Safety filter	None (local)	△ Strict	△ Inconsistent
Setup complexity	High (Docker + GPU)	Low (API key)	Medium (GCP)
Cost per image	GPU compute	~Free tier	~$0.02-0.04
Commercial license	× Non-commercial	◎	◎
Cross-gender	×	◎	○

The Optimal Architecture

Based on all 28 test cases (16 from Part 2 + 12 from this article), the recommended production architecture is:

Input Type	Engine	Mode	Best For
Product image (flat, white background)	Vertex AI VTO	Clothing mode	EC product pages, color fidelity, shoes/accessories
Person image (wearing clothes)	Nano Banana (Gemini)	Transfer mode	Social/UGC, body diversity, cross-gender adaptation

Neither engine alone covers all use cases. The hybrid approach — routing by input type — delivers the best results across the full range of virtual try-on scenarios.

Reflecting on 20 Years

I first imagined a virtual try-on system roughly two decades ago. At the time, it was a photo booth concept — hardware that did not yet exist, powered by AI that had not yet been invented.

Over those years, I built implementations using PF-AFN, PIFu, and PASTA-GAN++. Each represented the best available technology of its era. Each produced results that were promising but ultimately insufficient: garments that collapsed on diverse body types, processing that required GPU clusters, licenses that prevented commercial use.

The arrival of generative AI changed the equation. An API key and a well-crafted prompt now produce results that surpass everything the multi-stage GAN pipeline could achieve. Body diversity — the problem I could not solve with GANs — is handled naturally. Processing speed went from minutes per image to seconds. Infrastructure went from Docker + NVIDIA GPU to a single HTTP request.

Challenges remain. Safety filters block legitimate fashion content. Extremity poses sometimes shift. Real-time processing at scale requires cost optimization. But these are engineering problems with clear solutions — not fundamental limitations of the approach.

The 20-year-old dream of seeing yourself in clothes before buying is no longer a question of if. The generative AI era has made it a question of how well — and the answer is already remarkably good.

META FIT GenAI Series:

Part 1: From GANs to Generative AI
Part 2: Nano Banana Virtual Try-On — 16 Test Cases
Part 3: The 3-Engine Showdown (You are here)

Related:

Part 3: The 3-Engine Showdown — PASTA-GAN++ vs Nano Banana vs Vertex AI VTO

Introduction

Setting Up the Comparison

Nano Banana vs Vertex VTO: Clothing Mode

Test: Woman + Red Dress

Test: Man + Hoodie

Test: Man + Suit

Clothing Mode Summary

Nano Banana vs Vertex VTO: Transfer Mode

Test: Dance Pose Transfer

Safety Filter Differences

The Three-Engine Showdown

Test: Large Male + Border Sweater (5-Panel Comparison)

Tests 10-12: Curvy Woman (159cm) × 3 Outfits — The Definitive Comparison

Test 10: Floral Shirt + Denim (Desigual)

Test 11: Grey Sweatshirt + Black Pants (H&M)

Test 12: White T-shirt + Jeans (Male → Female)

Three-Engine Summary

Final Comparison Matrix

The Optimal Architecture

Reflecting on 20 Years

Related Posts

Part 1: From Photo Booths to Virtual Try-On — The 20-Year Quest

Part 2: Understanding GANs — The Engine Behind Virtual Try-On

Part 3: Inside PF-AFN — The Try-On Engine in Code