Part 3: The 3-Engine Showdown — PASTA-GAN++ vs Nano Banana vs Vertex AI VTO

Introduction

Part 2 established that Nano Banana can perform virtual try-on effectively in isolation. But how does it compare to Google’s dedicated Vertex AI Virtual Try-On model? And how do both compare to the original PASTA-GAN++ system?

This final article documents 12 additional test cases designed to answer these questions. The comparison script (compare_vto.py) generates side-by-side images with all engines processing the same inputs — making the differences impossible to miss.

Source code: github.com/matu79go/metafit


Setting Up the Comparison

The comparison system runs Nano Banana and Vertex AI VTO against the same image pair, then composites the results into a single side-by-side image. For three-engine comparisons, PASTA-GAN++ results from the legacy system were included as an additional panel.

def create_comparison(person_img, clothing_img, nano_result, vto_result,
                      pasta_result=None, description=""):
    panels = [person_img, clothing_img]
    labels = ["Person", "Clothing"]

    if pasta_result:
        panels.append(pasta_result)
        labels.append("PASTA-GAN++")

    if nano_result:
        panels.append(nano_result)
        labels.append("Nano Banana")

    if vto_result:
        panels.append(vto_result)
        labels.append("Vertex VTO")

    # Resize all panels to same height, compose side by side
    target_h = 800
    resized = []
    for panel in panels:
        ratio = target_h / panel.height
        resized.append(panel.resize(
            (int(panel.width * ratio), target_h), Image.LANCZOS
        ))
    # ... compose into single image

Nano Banana vs Vertex VTO: Clothing Mode

The first set of tests compared the two engines in clothing mode — applying a flat product image to a person photo. This is Vertex VTO’s primary use case.

Test: Woman + Red Dress

Red dress comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

AspectNano BananaVertex AI VTO
Color accuracyShifted from red to pinkExact red preserved
Design fidelityChanged to long pleated dressMaintained original length and shape
Pose preservationGoodGood

Vertex VTO wins clearly. Nano Banana “interpreted” the dress design and altered it significantly. Vertex VTO reproduced the exact product as shown in the image.

Test: Man + Hoodie

Hoodie comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

Both engines produced good results. Vertex VTO maintained the background more accurately.

Test: Man + Suit

Suit comparison

Left to right: Target person, Clothing product, Nano Banana result, Vertex VTO result

Both engines performed comparably. For simple, well-defined garments, the difference between the two was minimal.

Clothing Mode Summary

PatternNano BananaVertex AI VTO
Distinctive designs (dresses)Tends to alter designFaithful reproduction
Simple garments (T-shirts, hoodies)GoodGood
Background preservationSome variationMore consistent

For EC applications where product fidelity is critical, Vertex VTO is the clear choice.


Nano Banana vs Vertex VTO: Transfer Mode

Transfer mode — extracting clothing from one person and applying to another — revealed fundamental differences between the two engines.

Test: Dance Pose Transfer

Transfer mode comparison

Left to right: Target person, Source person (clothing to extract), Nano Banana result, Vertex VTO result

  • Target: Woman in dance pose, wearing tank top
  • Source: Standing woman in T-shirt
AspectNano BananaVertex AI VTO
Clothing extractionCorrectly extracted source’s T-shirt and applied itReverted to target’s original tank top
Shoe transferDid not transfer shoesTransferred shoes accurately

This was a defining discovery. Vertex VTO is designed for product images (flat, white background). When given a person image as the “product,” it cannot reliably extract the clothing — it defaults to the target’s existing garment.

For transfer mode, Nano Banana is the only viable option. Vertex VTO simply does not support this use case.

Safety Filter Differences

Safety filter comparison

Left to right: Target person, Source person, Nano Banana result, Vertex VTO result

  • Target: Curvy woman (159cm)
  • Source: Woman in dance pose, exposed clothing
AspectNano BananaVertex AI VTO
ResultBlocked by IMAGE_SAFETYSucceeded
Body preservationMaintained body shape

Nano Banana’s safety filter is stricter than VTO’s for exposed clothing. This creates a practical gap for fashion applications where sportswear, swimwear, and evening wear are standard product categories.

However, VTO’s safety filter showed inconsistencies — blocking female models in some contexts while allowing male models in similar states of undress.


The Three-Engine Showdown

The most revealing tests added PASTA-GAN++ results alongside the two generative AI engines. These tests specifically targeted scenarios where PASTA-GAN++ had historically struggled: underrepresented body types and complex garment patterns.

Test: Large Male + Border Sweater (5-Panel Comparison)

3-engine comparison: border sweater on diverse body type

Left to right: Source person, PASTA-GAN++ result, Nano Banana result (Vertex VTO failed — clothing was not transferred)

  • Target: Large-build man
  • Source: Slim man in border sweater and khaki shorts
AspectPASTA-GAN++Nano BananaVertex AI VTO
Border patternReproducedClean reproduction
Body preservationSlightly slimmedFaithful to original
Shorts transferTransferredTransferred
Face qualityDegradedHigh quality
OverallFunctionalExcellentFailed — clothing not transferred

Vertex VTO failed entirely. The API returned an image, but the subject’s clothing was unchanged — the model could not interpret the source person image as a “product.” This confirmed that VTO cannot perform transfer mode.

Tests 10-12: Curvy Woman (159cm) × 3 Outfits — The Definitive Comparison

These three tests used the same curvy female subject with different clothing sources, providing the most controlled comparison of all three engines.

Test 10: Floral Shirt + Denim (Desigual)

3-engine: complex floral pattern on curvy body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

AspectPASTA-GAN++Nano BananaVertex AI VTO
Floral patternCollapsed — garment unrecognizableClean reproductionReasonable
Body preservationSlimmedFaithfulPulled toward source pose
Denim reproductionFailedAccurateReasonable

Test 11: Grey Sweatshirt + Black Pants (H&M)

3-engine: simple garments on curvy body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

AspectPASTA-GAN++Nano BananaVertex AI VTO
Garment reproductionCollapsedAccurateReasonable
Body preservationSlimmedFaithfulPulled toward source pose

Test 12: White T-shirt + Jeans (Male → Female)

3-engine showdown highlight: male clothing on curvy female body

Left to right: Target person, Source person, PASTA-GAN++, Nano Banana, Vertex VTO

AspectPASTA-GAN++Nano BananaVertex AI VTO
Garment reproductionCollapsedAccurateAccurate
Body preservationSlimmedLegs faithfully reproducedPulled toward source pose
Cross-gender adaptationFailedNaturalReasonable

Nano Banana’s standout result: the jeans were rendered to match the 159cm subject’s actual leg proportions. The fabric stretched and contoured accurately around the thighs — demonstrating that the model genuinely understands how garments interact with different body shapes, rather than applying a generic template.

Three-Engine Summary

The pattern across all curvy body type tests was unambiguous:

PASTA-GAN++: Failed comprehensively. In every test, garments collapsed into visual noise and the subject’s body was systematically slimmed. This reflects GAN training data bias — the model learned primarily from slim models and could not generalize beyond that distribution.

Nano Banana: Consistently excellent in transfer mode. Body shape was faithfully preserved, complex patterns were reproduced accurately, and cross-gender adaptation worked naturally.

Vertex AI VTO: Could not perform transfer mode at all. In clothing mode, it was reliable but tended to pull the subject’s pose toward the source image’s pose.


Final Comparison Matrix

CapabilityPASTA-GAN++Nano BananaVertex AI VTO
Clothing mode (product → person)N/A△–○
Transfer mode (person → person)× Not supported
Body type preservation× Slims subjects○ (clothing mode)
Complex patterns× Collapses
Face quality
Shoe/accessory transferN/A×
Color fidelity (clothing mode)N/A△ Alters colors
Safety filterNone (local)△ Strict△ Inconsistent
Setup complexityHigh (Docker + GPU)Low (API key)Medium (GCP)
Cost per imageGPU compute~Free tier~$0.02-0.04
Commercial license× Non-commercial
Cross-gender×

The Optimal Architecture

Based on all 28 test cases (16 from Part 2 + 12 from this article), the recommended production architecture is:

Input TypeEngineModeBest For
Product image (flat, white background)Vertex AI VTOClothing modeEC product pages, color fidelity, shoes/accessories
Person image (wearing clothes)Nano Banana (Gemini)Transfer modeSocial/UGC, body diversity, cross-gender adaptation

Neither engine alone covers all use cases. The hybrid approach — routing by input type — delivers the best results across the full range of virtual try-on scenarios.


Reflecting on 20 Years

I first imagined a virtual try-on system roughly two decades ago. At the time, it was a photo booth concept — hardware that did not yet exist, powered by AI that had not yet been invented.

Over those years, I built implementations using PF-AFN, PIFu, and PASTA-GAN++. Each represented the best available technology of its era. Each produced results that were promising but ultimately insufficient: garments that collapsed on diverse body types, processing that required GPU clusters, licenses that prevented commercial use.

The arrival of generative AI changed the equation. An API key and a well-crafted prompt now produce results that surpass everything the multi-stage GAN pipeline could achieve. Body diversity — the problem I could not solve with GANs — is handled naturally. Processing speed went from minutes per image to seconds. Infrastructure went from Docker + NVIDIA GPU to a single HTTP request.

Challenges remain. Safety filters block legitimate fashion content. Extremity poses sometimes shift. Real-time processing at scale requires cost optimization. But these are engineering problems with clear solutions — not fundamental limitations of the approach.

The 20-year-old dream of seeing yourself in clothes before buying is no longer a question of if. The generative AI era has made it a question of how well — and the answer is already remarkably good.


META FIT GenAI Series:


Related:

Share this article

Related Posts