Part 2: Nano Banana Virtual Try-On — 16 Test Cases and What They Revealed
![]()
Introduction
Part 1 covered why the migration from GANs to generative AI happened and how the new architecture works. This article documents what happened when the system was put to the test.
Over the course of a single day, 16 test cases were executed across three phases of increasing difficulty. The tests were designed to answer specific questions: Can Gemini handle body type diversity? Does it preserve garment details? What about action poses? And critically — does it need the preprocessing pipeline that the GAN approach required?
The answers were clear, and sometimes surprising.
Source code: github.com/matu79go/metafit
Test Setup
All tests used the Gemini 3 Pro Image Preview model (gemini-3-pro-image-preview), referred to as Nano Banana. No preprocessing (OpenPose, Graphonomy, MediaPipe) was used unless explicitly stated.
The pipeline for every test was identical:
# Clothing mode (product image → person)
python try_on_test.py --mode clothing \
--person test_data/person/target.jpg \
--clothing test_data/clothing/item.png
# Transfer mode (person → person)
python try_on_test.py --mode transfer \
--person test_data/person/target.jpg \
--source test_data/person/source.jpg
Two modes were tested:
- Clothing mode: A flat product image (e.g., a black T-shirt on white background) is applied to a person photo
- Transfer mode: Clothing is extracted from one person (source) and applied to another (target)
Phase 1: Initial Tests with Noisy Images
The first four tests used the same measurement-annotated images from the original PASTA-GAN++ dataset — images with visible text labels showing height, chest, waist, and hip measurements drawn directly on the photos.
Test 1: Clothing Mode — Woman + Black T-shirt

- Input: 160cm woman (wearing brown T-shirt, measurement text visible) + black T-shirt product image
- Result: The garment fit naturally. Body shape and pose were preserved.
- Issue: The face changed slightly, and accessories were added that did not exist in the original.
Test 2-3: Clothing Mode — Man + Black T-shirt (Prompt Iteration)

- Input: 178cm man (large build) + black T-shirt product image
- Test 2: Body type was preserved well, but the framing shifted from full-body to upper-body.
- Test 3: After strengthening the face preservation prompt, face quality improved — but the measurement text on the original image was interpreted as garment design and rendered onto the T-shirt.
This was the first key insight: input image noise is not filtered — it is interpreted. The model treated measurement labels as part of the clothing design.
Test 4: Transfer Mode — Cross-Gender

- Input: 178cm man (target) + 160cm woman in brown T-shirt and denim shorts (source)
- Result: The clothing design, color, and logo transferred correctly to the male body shape. The fit was natural.
- Issue: Measurement text persisted, and shoes were also transferred (not just clothing).
Phase 1 Summary
| Aspect | Clothing Mode | Transfer Mode |
|---|---|---|
| Garment fit | Natural | Natural |
| Body preservation | Good | Good |
| Face identity | Changed slightly | Changed slightly |
| Composition | Sometimes shifted | Mostly maintained |
| Input noise tolerance | Poor — text misinterpreted | Poor — text persisted |
Conclusion: The model works, but input quality matters. Clean images are essential.
Phase 2: Clean Image Tests
All Phase 1 issues pointed to one root cause: noisy input images. Phase 2 used clean photographs without measurement annotations.
All Phase 2 tests used transfer mode — the more challenging scenario where clothing must be extracted from one person and applied to another.
Test 5: Man → Man (Standard Build)

- Input: Standing man (target) + slim man in border sweater and khaki shorts (source)
- Result: Excellent. Garment design and color transferred accurately. Body shape and face both well preserved.
- Issues: None.
Test 6: Child → Child

- Input: Child A (target) + Child B (source)
- Result: Excellent. The garment adapted naturally to the child’s body proportions. High design fidelity.
- Issues: None.
Test 7: Woman → Soccer Player (Action Pose, Low Resolution)

- Input: Female soccer player (target: action pose, 320px resolution) + standing woman (source)
- Result: Clothing transferred, but face quality degraded noticeably. The action pose reduced accuracy.
- Issues: Face regeneration was obvious. This prompted the MediaPipe preprocessing experiment.
A face restoration postprocess was attempted, blending the original face back onto the generated image using LAB color correction and elliptical feather masking:

The restoration improved face quality, but the underlying cause was later identified as low resolution, not a fundamental model limitation.
Test 8: Slim → Large Build (Body Type Diversity)

- Input: Large-build man (target) + slim man in border sweater (source)
- Result: Excellent. The border sweater stretched naturally over the larger body. The garment adapted to the body type difference without distortion.
- Issues: None.
This was a critical test. PASTA-GAN++ consistently failed on underrepresented body types — garments would collapse into visual artifacts, and the model would “slim” the subject toward its training distribution. Nano Banana showed no such bias.
Phase 2 Summary
| Test | Garment | Body | Face | Pose | Overall |
|---|---|---|---|---|---|
| Test 5: Man → Man | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 6: Child → Child | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 7: Woman → Soccer | ○ | ○ | △ | △ | △ |
| Test 8: Slim → Large | ◎ | ◎ | ◎ | ◎ | ◎ |
Conclusion: Clean images produce high-quality results for static poses. The one failure (Test 7) occurred on a low-resolution image — raising the question of whether resolution, not preprocessing, was the real variable.
Phase 3: High-Resolution Action Poses
Phase 2 left an open question: was Test 7’s quality degradation caused by the action pose, or by the 320px input resolution? Phase 3 answered this definitively by using high-resolution images (1000–6000px) from Unsplash with extreme action poses.
Test 9: Male Dance (Intense Motion, 6193×9290px)

- Input: Male dancer (target: intense dance pose, 6193×9290px) + slim man in border sweater (source)
- Result: Excellent. The border sweater and khaki shorts transferred accurately despite the extreme dance pose. Every limb position was maintained.
- Issues: None. High-resolution input eliminated the quality problems seen in Phase 2.
This single test invalidated the assumption that action poses were inherently problematic. The Phase 2 soccer player test failed because of 320px resolution, not because of the action pose.
Test 10: Female Punch Pose + Tight Top / Loose Bottom

- Input: Woman in punching pose (target, 3024×4032px) + standing woman in beige crop top and sweatpants (source)
- Result: Excellent. The texture difference between the tight crop top and loose sweatpants was faithfully reproduced. The punch pose was maintained.
Test 11: Club Dance (Low Light + Colored Lighting)

- Input: Woman dancing in club (target) + standing woman in pink pleated dress (source)
- Result: Excellent. The pink dress transferred naturally under purple club lighting. The model handled the color interaction between garment and ambient lighting.
Test 12: Beach Dance (Sunset Backlight)

- Input: Woman dancing on pier at sunset (target) + standing woman in white T-shirt and skinny jeans (source)
- Result: Excellent. The garment adapted to the backlit conditions naturally.
Test 13: Sitting ← Jumping (Maximum Pose Difference)

- Input: Seated man (target: crouching) + jumping man in white T-shirt, black skinny jeans, and red sneakers (source)
- Result: Excellent. The white T-shirt, black skinny jeans, and red sneakers all transferred to the crouching position. Natural wrinkles appeared at the compression points.
- This was the most challenging pose combination: the source was airborne, the target was seated. The model handled it without issue.
Test 14: Stretch + Complex Pattern (Tie-dye + Character Print)
- Input: Woman stretching sideways (target) + woman in Care Bears tie-dye T-shirt (source)
- Result: Good. The complex tie-dye pattern transferred accurately. However, the source image had a hand covering the face, and the generated image revealed the face — the model appeared to prefer showing faces rather than accurately reproducing hand-over-face poses.
Test 15: Cross-Gender — Male Suit → Female Dance

- Input: Woman in dance pose (target) + man in brown suit (source)
- Result: Good. The suit adapted to the female body shape naturally. Shirt cuffs were reproduced. However, the leg pose changed from the original.
IMAGE_SAFETY Filter

Several tests triggered Gemini’s IMAGE_SAFETY filter:
- Three images featuring dance poses and running poses (when used as source) were blocked
- The common factor: exposed clothing in the source image (sportswear, dance attire)
This is a practical constraint for fashion applications, where garments with varying levels of coverage are common and legitimate.
Phase 3 Summary
| Test | Garment | Body | Face | Pose | Overall |
|---|---|---|---|---|---|
| Test 9: Male Dance | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 10: Female Punch | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 11: Club Dance | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 12: Beach Dance | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 13: Sit ← Jump | ◎ | ◎ | ◎ | ◎ | ◎ |
| Test 14: Stretch + Pattern | ◎ | ◎ | ◎ | △ | ○ |
| Test 15: Suit → Female | ◎ | ◎ | ◎ | △ | ○ |
The 14 Key Findings
Across all three phases and 15 test cases, the following patterns emerged:
- Nano Banana alone can perform virtual try-on — no OpenPose, Graphonomy, or other preprocessing required
- Transfer mode works with prompts only — the PASTA-GAN++ functionality (model-to-model re-dressing) is achievable without any trained model
- Prompt engineering significantly affects quality — explicit face preservation instructions improve results
- Input image quality is critical — noise (measurement text, annotations) is interpreted as content, not filtered
- License is completely clean — no non-commercial components used
- Clean images dramatically improve results — removing noise improved face and body preservation
- Body type adaptation is strong — slim to large, the garment naturally stretches and fits
Action poses are problematic→ Action poses are fine with high-resolution input- Resolution is the single most important quality factor — 320px degrades quality; 1000px+ produces excellent results
- Preprocessing (MediaPipe) is unnecessary — Gemini alone is sufficient for high-resolution inputs
- Extremity poses sometimes change — hands covering faces, leg angles may shift
- Cross-gender try-on works — male suit → female produced natural results
- IMAGE_SAFETY filter blocks exposed clothing — a Gemini constraint for fashion applications
- Physical garment behavior is understood — tight/loose texture differences, gravity-induced draping, compression wrinkles are all rendered accurately
The most consequential finding was #9: resolution, not preprocessing, determines quality. This single insight eliminated the need for the entire MediaPipe preprocessing step and the face restoration postprocessor. For production use, simply ensuring high-resolution input images is sufficient.
What Comes Next
These 16 tests established that Nano Banana can perform virtual try-on effectively. But how does it compare to the dedicated Vertex AI Virtual Try-On model? And how do both compare to the original PASTA-GAN++?
Part 3 documents the head-to-head comparison across 12 additional test cases, including the three-engine showdown that definitively demonstrates the generational gap between GANs and generative AI.
META FIT GenAI Series:
- Part 1: From GANs to Generative AI
- Part 2: Nano Banana Virtual Try-On — 16 Test Cases (You are here)
- Part 3: The 3-Engine Showdown