Part 5: Results, Failure Modes, and the Path to Modern Image Generation
![]()
Introduction
Over the previous four parts, we traced META FIT from its origins as a photo booth concept through GAN theory, the PF-AFN implementation, and the supporting systems for pose estimation and body measurement. This final installment takes an honest look at results: what the system achieved, where it failed, why those failures occurred, the smartphone app design that would bring it to consumers, and how the generational shift from GANs to diffusion models is reshaping the future of virtual try-on.
What Worked
For standard scenarios — a front-facing person with a typical body type, wearing or trying on a relatively simple garment — the system produced genuinely useful results.
Garment transfer quality. The warped garment preserved its original texture, pattern, and color with reasonable fidelity. Stripes stayed straight. Solid colors remained consistent. The overall shape of the garment conformed to the person’s body contour.
Body contour adaptation. The coarse-to-fine flow estimation in the AFWM successfully adapted garments to different body shapes. Broader shoulders resulted in a wider garment placement; a narrower torso produced appropriate compression. The garment looked like it was being worn, not pasted on.
![]()
![]()
Multi-mode support. PASTA-GAN++‘s three modes — full (top and bottom together), upper, and lower — all produced functional results. Users could try on a complete outfit or swap individual garments independently.
PASTA-GAN++ vs. PF-AFN. In comparative testing, PASTA-GAN++ consistently outperformed PF-AFN for full-body try-on. The addition of OpenPose and Graphonomy as auxiliary inputs — the very parsing information that PF-AFN’s “parser-free” approach eliminated — proved valuable for accurate garment placement, particularly at the boundaries between upper and lower garments.
![]()
![]()
![]()
Systematic Failure Analysis
Thorough testing revealed several systematic failure modes, documented in the project’s improvement notes. These failures were not random — they pointed to fundamental limitations of the GAN-based approach and the training data.
Processing Speed
This was the single largest barrier to productization. The full inference pipeline — OpenPose, Graphonomy, and PASTA-GAN++ — requires substantial GPU computation. A single try-on image generation takes far longer than what a consumer-facing mobile app would need for a responsive experience. Users expect near-instantaneous results; the system could not deliver that.
The infrastructure requirement compounded the problem. Each inference requires a Docker container with NVIDIA GPU access. Scaling this to thousands of concurrent users would demand significant cloud GPU resources, making the per-user cost prohibitive for a consumer product without substantial optimization.
Body Type Diversity
The training datasets (VITON, DeepFashion) consist predominantly of slim fashion models in standard poses. This training data bias manifested directly in the results:
- For body types well-represented in the training data, results were good.
- For larger body types, unusual proportions, or non-standard poses, quality degraded noticeably.
- The model would sometimes distort body proportions to match the distribution it had learned, rather than accurately rendering the garment on the actual body shape.
![]()
![]()
This is not merely a technical limitation — it is a fairness issue. A virtual try-on system that works well only for one body type range fails to serve the broader population it is intended for.
Specific Failure Modes
Five distinct failure patterns emerged during systematic testing:
1. Garment design alteration. Patterns, prints, and logos sometimes changed during the transfer process. The GAN would “hallucinate” textures rather than faithfully preserving the original design. A striped shirt might arrive with slightly different stripe spacing. A logo might blur or shift. This is a fundamental challenge with generative models: they create plausible outputs, but plausibility and fidelity are not the same thing.
![]()
2. Body proportion distortion. In some outputs, bust and hip measurements visibly shifted compared to the input person. The generator, trained on a narrow distribution of body types, would subtly pull proportions toward its learned average. The person in the output still looked like a person, but not always like the same person.
3. Limb artifacts. Arms occasionally appeared cut off or blurred at the boundaries where garment met skin. The composite mask struggled with precise edge transitions, particularly where sleeves ended and bare arms began. This boundary problem is inherent to the mask-based composition approach — the mask must make a sharp decision about what is garment and what is not, and it does not always get that decision right.
![]()
4. Skinny pants rendering. Tight-fitting lower garments produced color bleeding artifacts. The flow-based warping, designed to handle the relatively loose fit of most clothing, struggled when garments needed to conform closely to leg shape. The warped garment would bleed color into adjacent skin regions, producing an unnatural appearance.
5. Male body results. Results for male subjects were generally less reliable than for female subjects, particularly for lower-body garments. The training data is predominantly female fashion models in women’s clothing, creating a significant domain gap for male try-on. This was most pronounced for items like trousers and shorts, where male and female body geometry differs substantially.
![]()
Smartphone App Design
Despite the processing speed limitations for real-time use, I designed a complete mobile application to define the user experience that META FIT would deliver.
Alpha Version
The initial release would establish the core flow:
- User registration and profile setup (height, preferences)
- Basic photo capture with pose guidance overlay
- Single garment try-on with server-side processing
Beta Version: Full Experience
The beta design expanded the flow into a comprehensive shopping experience:
| Screen | Function |
|---|---|
| Welcome | Onboarding and first-time setup |
| Dashboard | Main navigation hub |
| Create Avatar | Photo capture with real-time pose guidance (stand here, arms at sides) |
| Avatar Display | Generated avatar with extracted body measurements |
| Outfit Menu | Category selection: tops, bottoms, full outfits, shoes |
| Outfit List | Browse available garments with filters |
| Fitting Room | Virtual try-on result display with before/after comparison |
| Cart | Purchase flow integration with partner retailers |
The design targeted the iPhone 12 form factor and comprised 10 complete screen mockups created in Figma. The guiding UX principle was minimum friction: from opening the app to seeing a try-on result should require as few steps as possible. Take a photo, pick a garment, see yourself wearing it.
The gap between this polished app design and the backend’s processing speed limitations highlighted the central challenge: the technology worked, but not fast enough for the experience it was meant to power.
The Generational Shift: From GANs to Diffusion Models
META FIT was built during what might be called the “GAN era” of generative AI, roughly spanning 2014 to 2021. The technical choices throughout this series — adversarial training, flow-based warping, U-Net generators with composite masks — reflect the state of the art during that period.
Since then, the landscape has shifted fundamentally. Diffusion models — DDPM, DALL-E 2, Stable Diffusion, Midjourney — have demonstrated that an entirely different generative paradigm can match and surpass GAN quality across most image generation tasks.
Why Diffusion Models Matter for Virtual Try-On
Several properties of diffusion models address the specific limitations we encountered with the GAN-based approach:
More stable training. Without the adversarial dynamic — no generator-discriminator balancing act, no mode collapse — diffusion models train more reliably and consistently. The training instabilities that plagued GAN development are largely absent.
Better generalization. Diffusion models trained on large, diverse datasets generalize more effectively across body types, poses, and garment categories. The body type bias that limited our GAN results is less pronounced when the training data is web-scale.
Higher image quality. Modern diffusion models produce images with fewer artifacts, better detail preservation, and more natural compositing — precisely the areas where our GAN-based system showed failure modes.
More natural compositing. Rather than relying on an explicit warping step followed by mask-based blending, diffusion-based approaches can handle the entire generation process in a unified framework. The “pasted on” artifacts that arise from imperfect masks become less of a concern.
The trade-off: inference speed. Diffusion models require multiple denoising steps (typically 20-50), making each generation slower than a single GAN forward pass. However, recent advances in distillation (fewer-step models), optimized architectures, and hardware acceleration are steadily closing this gap.
Current Experiments and Future Directions
Early experiments with modern image synthesis approaches for virtual try-on show substantial improvements over the GAN-based results. The field is advancing rapidly, with new papers and models appearing regularly.
Key improvements that modern approaches are delivering:
- Faster inference through model distillation and optimized sampling, bringing generation times closer to real-time requirements
- Better body type coverage from training on larger, more diverse datasets
- Higher resolution output that captures fine garment details — stitching, fabric texture, button placement
- More natural garment-body blending without explicit mask-based composition
- Multi-view consistency enabling try-on from different angles
The fundamental vision that started this project two decades ago — enabling anyone to see how clothes look on their own body before purchasing — is closer to practical reality than it has ever been.
Technical Stack Summary
| Component | Technology | Status |
|---|---|---|
| Virtual Try-On (primary) | PASTA-GAN++ (PyTorch, CUDA) | Functional prototype |
| Virtual Try-On (initial) | PF-AFN (PyTorch, CUDA) | Evaluated, superseded |
| Pose Estimation | OpenPose (body_pose_model.pth) | Integrated |
| Human Parsing | Graphonomy (inference.pth) | Integrated |
| Body Measurement | Custom Python (OpenCV, NumPy) | Validated with 10 subjects |
| 3D Reconstruction | PiFu (HGPIFuNet) | Explored, deferred |
| Web Prototype | TensorFlow.js, PoseNet | Proof-of-concept complete |
| App Design | Figma mockups (iPhone 12) | 10 screens designed |
| Infrastructure | Docker, NVIDIA NGC, GPU compute | Operational |
Reflection
This project spans two decades and multiple generations of technology. It started with a simple idea — a photo booth that shows you wearing different clothes — and evolved through classical computer vision, convolutional neural networks, generative adversarial networks, and now the early stages of diffusion-based generation.
Each technology generation brought the vision measurably closer to reality, and each also revealed its own limitations. The GAN-based system demonstrated that photorealistic virtual try-on is achievable, but also that achieving it at consumer-grade speed, quality, and inclusivity requires continued advancement.
What I have carried forward from this project is not just a set of technical skills — flow-based warping, GAN training, pose estimation, body measurement — but a deeper appreciation for the iterative nature of building AI systems. Research, prototype, test, identify limitations, integrate the next generation of tools. The cycle has repeated several times over the life of this project, and it will continue.
The twenty-year-old question has not changed: can we let people see how clothes look on their body without trying them on? The answer, finally, is approaching “yes.”
META FIT Series:
- Part 1: From Photo Booths to Virtual Try-On
- Part 2: Understanding GANs — The Engine Behind Virtual Try-On
- Part 3: Inside PF-AFN — The Try-On Engine in Code
- Part 4: Pose Estimation, Body Measurement, and 3D Reconstruction
- Part 5: Results, Failure Modes, and the Path to Modern Image Generation (You are here)