Part 2: Nano Banana Virtual Try-On — 16 Test Cases and What They Revealed

Introduction

Part 1 covered why the migration from GANs to generative AI happened and how the new architecture works. This article documents what happened when the system was put to the test.

Over the course of a single day, 16 test cases were executed across three phases of increasing difficulty. The tests were designed to answer specific questions: Can Gemini handle body type diversity? Does it preserve garment details? What about action poses? And critically — does it need the preprocessing pipeline that the GAN approach required?

The answers were clear, and sometimes surprising.

Source code: github.com/matu79go/metafit

Test Setup

All tests used the Gemini 3 Pro Image Preview model (gemini-3-pro-image-preview), referred to as Nano Banana. No preprocessing (OpenPose, Graphonomy, MediaPipe) was used unless explicitly stated.

The pipeline for every test was identical:

# Clothing mode (product image → person)
python try_on_test.py --mode clothing \
  --person test_data/person/target.jpg \
  --clothing test_data/clothing/item.png

# Transfer mode (person → person)
python try_on_test.py --mode transfer \
  --person test_data/person/target.jpg \
  --source test_data/person/source.jpg

Two modes were tested:

Clothing mode: A flat product image (e.g., a black T-shirt on white background) is applied to a person photo
Transfer mode: Clothing is extracted from one person (source) and applied to another (target)

Phase 1: Initial Tests with Noisy Images

The first four tests used the same measurement-annotated images from the original PASTA-GAN++ dataset — images with visible text labels showing height, chest, waist, and hip measurements drawn directly on the photos.

Test 1: Clothing Mode — Woman + Black T-shirt

Test 1: Clothing mode — product image (left), input person (center), result (right)

Input: 160cm woman (wearing brown T-shirt, measurement text visible) + black T-shirt product image
Result: The garment fit naturally. Body shape and pose were preserved.
Issue: The face changed slightly, and accessories were added that did not exist in the original.

Test 2-3: Clothing Mode — Man + Black T-shirt (Prompt Iteration)

Test 2: Clothing mode on a larger body type — note measurement text rendered as garment design

Input: 178cm man (large build) + black T-shirt product image
Test 2: Body type was preserved well, but the framing shifted from full-body to upper-body.
Test 3: After strengthening the face preservation prompt, face quality improved — but the measurement text on the original image was interpreted as garment design and rendered onto the T-shirt.

This was the first key insight: input image noise is not filtered — it is interpreted. The model treated measurement labels as part of the clothing design.

Test 4: Transfer Mode — Cross-Gender

Test 4: Transfer mode — source person's clothing extracted and applied to target person

Input: 178cm man (target) + 160cm woman in brown T-shirt and denim shorts (source)
Result: The clothing design, color, and logo transferred correctly to the male body shape. The fit was natural.
Issue: Measurement text persisted, and shoes were also transferred (not just clothing).

Phase 1 Summary

Aspect	Clothing Mode	Transfer Mode
Garment fit	Natural	Natural
Body preservation	Good	Good
Face identity	Changed slightly	Changed slightly
Composition	Sometimes shifted	Mostly maintained
Input noise tolerance	Poor — text misinterpreted	Poor — text persisted

Conclusion: The model works, but input quality matters. Clean images are essential.

Phase 2: Clean Image Tests

All Phase 1 issues pointed to one root cause: noisy input images. Phase 2 used clean photographs without measurement annotations.

All Phase 2 tests used transfer mode — the more challenging scenario where clothing must be extracted from one person and applied to another.

Test 5: Man → Man (Standard Build)

Man-to-man transfer: source clothing (left) applied to target person (center), result (right)

Input: Standing man (target) + slim man in border sweater and khaki shorts (source)
Result: Excellent. Garment design and color transferred accurately. Body shape and face both well preserved.
Issues: None.

Test 6: Child → Child

Child-to-child transfer: source clothing (left) applied to target child (center), result (right)

Input: Child A (target) + Child B (source)
Result: Excellent. The garment adapted naturally to the child’s body proportions. High design fidelity.
Issues: None.

Test 7: Woman → Soccer Player (Action Pose, Low Resolution)

Low-resolution action pose: face quality degraded in result

Input: Female soccer player (target: action pose, 320px resolution) + standing woman (source)
Result: Clothing transferred, but face quality degraded noticeably. The action pose reduced accuracy.
Issues: Face regeneration was obvious. This prompted the MediaPipe preprocessing experiment.

A face restoration postprocess was attempted, blending the original face back onto the generated image using LAB color correction and elliptical feather masking:

After face restoration: original face blended back

The restoration improved face quality, but the underlying cause was later identified as low resolution, not a fundamental model limitation.

Test 8: Slim → Large Build (Body Type Diversity)

Body type transfer: border sweater from slim source stretches naturally on larger target

Input: Large-build man (target) + slim man in border sweater (source)
Result: Excellent. The border sweater stretched naturally over the larger body. The garment adapted to the body type difference without distortion.
Issues: None.

This was a critical test. PASTA-GAN++ consistently failed on underrepresented body types — garments would collapse into visual artifacts, and the model would “slim” the subject toward its training distribution. Nano Banana showed no such bias.

Phase 2 Summary

Test	Garment	Body	Face	Pose	Overall
Test 5: Man → Man	◎	◎	◎	◎	◎
Test 6: Child → Child	◎	◎	◎	◎	◎
Test 7: Woman → Soccer	○	○	△	△	△
Test 8: Slim → Large	◎	◎	◎	◎	◎

Conclusion: Clean images produce high-quality results for static poses. The one failure (Test 7) occurred on a low-resolution image — raising the question of whether resolution, not preprocessing, was the real variable.

Phase 3: High-Resolution Action Poses

Phase 2 left an open question: was Test 7’s quality degradation caused by the action pose, or by the 320px input resolution? Phase 3 answered this definitively by using high-resolution images (1000–6000px) from Unsplash with extreme action poses.

Test 9: Male Dance (Intense Motion, 6193×9290px)

High-res dance pose: source clothing (left), target person (center), result (right)

Input: Male dancer (target: intense dance pose, 6193×9290px) + slim man in border sweater (source)
Result: Excellent. The border sweater and khaki shorts transferred accurately despite the extreme dance pose. Every limb position was maintained.
Issues: None. High-resolution input eliminated the quality problems seen in Phase 2.

This single test invalidated the assumption that action poses were inherently problematic. The Phase 2 soccer player test failed because of 320px resolution, not because of the action pose.

Test 10: Female Punch Pose + Tight Top / Loose Bottom

Punch pose: source clothing (left), target person (center), result (right)

Input: Woman in punching pose (target, 3024×4032px) + standing woman in beige crop top and sweatpants (source)
Result: Excellent. The texture difference between the tight crop top and loose sweatpants was faithfully reproduced. The punch pose was maintained.

Test 11: Club Dance (Low Light + Colored Lighting)

Club dance: source clothing (left), target person (center), result (right)

Input: Woman dancing in club (target) + standing woman in pink pleated dress (source)
Result: Excellent. The pink dress transferred naturally under purple club lighting. The model handled the color interaction between garment and ambient lighting.

Test 12: Beach Dance (Sunset Backlight)

Beach dance: source clothing (left), target person (center), result (right)

Input: Woman dancing on pier at sunset (target) + standing woman in white T-shirt and skinny jeans (source)
Result: Excellent. The garment adapted to the backlit conditions naturally.

Test 13: Sitting ← Jumping (Maximum Pose Difference)

Extreme pose difference: source clothing (left), target person (center), result (right)

Input: Seated man (target: crouching) + jumping man in white T-shirt, black skinny jeans, and red sneakers (source)
Result: Excellent. The white T-shirt, black skinny jeans, and red sneakers all transferred to the crouching position. Natural wrinkles appeared at the compression points.
This was the most challenging pose combination: the source was airborne, the target was seated. The model handled it without issue.

Test 14: Stretch + Complex Pattern (Tie-dye + Character Print)

Input: Woman stretching sideways (target) + woman in Care Bears tie-dye T-shirt (source)
Result: Good. The complex tie-dye pattern transferred accurately. However, the source image had a hand covering the face, and the generated image revealed the face — the model appeared to prefer showing faces rather than accurately reproducing hand-over-face poses.

Test 15: Cross-Gender — Male Suit → Female Dance

Cross-gender: source clothing (left), target person (center), result (right)

Input: Woman in dance pose (target) + man in brown suit (source)
Result: Good. The suit adapted to the female body shape naturally. Shirt cuffs were reproduced. However, the leg pose changed from the original.

IMAGE_SAFETY Filter

These three source images were blocked by Gemini's IMAGE_SAFETY filter

Several tests triggered Gemini’s IMAGE_SAFETY filter:

Three images featuring dance poses and running poses (when used as source) were blocked
The common factor: exposed clothing in the source image (sportswear, dance attire)

This is a practical constraint for fashion applications, where garments with varying levels of coverage are common and legitimate.

Phase 3 Summary

Test	Garment	Body	Face	Pose	Overall
Test 9: Male Dance	◎	◎	◎	◎	◎
Test 10: Female Punch	◎	◎	◎	◎	◎
Test 11: Club Dance	◎	◎	◎	◎	◎
Test 12: Beach Dance	◎	◎	◎	◎	◎
Test 13: Sit ← Jump	◎	◎	◎	◎	◎
Test 14: Stretch + Pattern	◎	◎	◎	△	○
Test 15: Suit → Female	◎	◎	◎	△	○

The 14 Key Findings

Across all three phases and 15 test cases, the following patterns emerged:

Nano Banana alone can perform virtual try-on — no OpenPose, Graphonomy, or other preprocessing required
Transfer mode works with prompts only — the PASTA-GAN++ functionality (model-to-model re-dressing) is achievable without any trained model
Prompt engineering significantly affects quality — explicit face preservation instructions improve results
Input image quality is critical — noise (measurement text, annotations) is interpreted as content, not filtered
License is completely clean — no non-commercial components used
Clean images dramatically improve results — removing noise improved face and body preservation
Body type adaptation is strong — slim to large, the garment naturally stretches and fits
~~Action poses are problematic~~ → Action poses are fine with high-resolution input
Resolution is the single most important quality factor — 320px degrades quality; 1000px+ produces excellent results
Preprocessing (MediaPipe) is unnecessary — Gemini alone is sufficient for high-resolution inputs
Extremity poses sometimes change — hands covering faces, leg angles may shift
Cross-gender try-on works — male suit → female produced natural results
IMAGE_SAFETY filter blocks exposed clothing — a Gemini constraint for fashion applications
Physical garment behavior is understood — tight/loose texture differences, gravity-induced draping, compression wrinkles are all rendered accurately

The most consequential finding was #9: resolution, not preprocessing, determines quality. This single insight eliminated the need for the entire MediaPipe preprocessing step and the face restoration postprocessor. For production use, simply ensuring high-resolution input images is sufficient.

What Comes Next

These 16 tests established that Nano Banana can perform virtual try-on effectively. But how does it compare to the dedicated Vertex AI Virtual Try-On model? And how do both compare to the original PASTA-GAN++?

Part 3 documents the head-to-head comparison across 12 additional test cases, including the three-engine showdown that definitively demonstrates the generational gap between GANs and generative AI.

META FIT GenAI Series:

Part 1: From GANs to Generative AI
Part 2: Nano Banana Virtual Try-On — 16 Test Cases (You are here)
Part 3: The 3-Engine Showdown