Part 5: Results, Failure Modes, and the Path to Modern Image Generation

Introduction

Over the previous four parts, we traced META FIT from its origins as a photo booth concept through GAN theory, the PF-AFN implementation, and the supporting systems for pose estimation and body measurement. This final installment takes an honest look at results: what the system achieved, where it failed, why those failures occurred, the smartphone app design that would bring it to consumers, and how the generational shift from GANs to diffusion models is reshaping the future of virtual try-on.


What Worked

For standard scenarios — a front-facing person with a typical body type, wearing or trying on a relatively simple garment — the system produced genuinely useful results.

Garment transfer quality. The warped garment preserved its original texture, pattern, and color with reasonable fidelity. Stripes stayed straight. Solid colors remained consistent. The overall shape of the garment conformed to the person’s body contour.

Body contour adaptation. The coarse-to-fine flow estimation in the AFWM successfully adapted garments to different body shapes. Broader shoulders resulted in a wider garment placement; a narrower torso produced appropriate compression. The garment looked like it was being worn, not pasted on.

Successful try-on results: tops adapted to different body contours while preserving garment textures

Pattern preservation: various skirt styles maintain their visual identity after virtual try-on

Multi-mode support. PASTA-GAN++‘s three modes — full (top and bottom together), upper, and lower — all produced functional results. Users could try on a complete outfit or swap individual garments independently.

PASTA-GAN++ vs. PF-AFN. In comparative testing, PASTA-GAN++ consistently outperformed PF-AFN for full-body try-on. The addition of OpenPose and Graphonomy as auxiliary inputs — the very parsing information that PF-AFN’s “parser-free” approach eliminated — proved valuable for accurate garment placement, particularly at the boundaries between upper and lower garments.

Men's virtual try-on: business casual jackets and shirts fitted to different body types

Virtual try-on results across diverse subjects and garment styles

Additional results demonstrating the system's capability with different body types and garment categories


Systematic Failure Analysis

Thorough testing revealed several systematic failure modes, documented in the project’s improvement notes. These failures were not random — they pointed to fundamental limitations of the GAN-based approach and the training data.

Processing Speed

This was the single largest barrier to productization. The full inference pipeline — OpenPose, Graphonomy, and PASTA-GAN++ — requires substantial GPU computation. A single try-on image generation takes far longer than what a consumer-facing mobile app would need for a responsive experience. Users expect near-instantaneous results; the system could not deliver that.

The infrastructure requirement compounded the problem. Each inference requires a Docker container with NVIDIA GPU access. Scaling this to thousands of concurrent users would demand significant cloud GPU resources, making the per-user cost prohibitive for a consumer product without substantial optimization.

Body Type Diversity

The training datasets (VITON, DeepFashion) consist predominantly of slim fashion models in standard poses. This training data bias manifested directly in the results:

  • For body types well-represented in the training data, results were good.
  • For larger body types, unusual proportions, or non-standard poses, quality degraded noticeably.
  • The model would sometimes distort body proportions to match the distribution it had learned, rather than accurately rendering the garment on the actual body shape.

Body type diversity failure: the system produces significant artifacts for larger body types (center) while slim figures produce acceptable results (left, right)

Training data bias: garment patterns become severely distorted on body types underrepresented in training data

This is not merely a technical limitation — it is a fairness issue. A virtual try-on system that works well only for one body type range fails to serve the broader population it is intended for.

Specific Failure Modes

Five distinct failure patterns emerged during systematic testing:

1. Garment design alteration. Patterns, prints, and logos sometimes changed during the transfer process. The GAN would “hallucinate” textures rather than faithfully preserving the original design. A striped shirt might arrive with slightly different stripe spacing. A logo might blur or shift. This is a fundamental challenge with generative models: they create plausible outputs, but plausibility and fidelity are not the same thing.

Garment fidelity issues: subtle changes in pattern alignment and garment structure during transfer

2. Body proportion distortion. In some outputs, bust and hip measurements visibly shifted compared to the input person. The generator, trained on a narrow distribution of body types, would subtly pull proportions toward its learned average. The person in the output still looked like a person, but not always like the same person.

3. Limb artifacts. Arms occasionally appeared cut off or blurred at the boundaries where garment met skin. The composite mask struggled with precise edge transitions, particularly where sleeves ended and bare arms began. This boundary problem is inherent to the mask-based composition approach — the mask must make a sharp decision about what is garment and what is not, and it does not always get that decision right.

Boundary artifacts: visible issues at garment-skin transitions, particularly around sleeves and hemlines

4. Skinny pants rendering. Tight-fitting lower garments produced color bleeding artifacts. The flow-based warping, designed to handle the relatively loose fit of most clothing, struggled when garments needed to conform closely to leg shape. The warped garment would bleed color into adjacent skin regions, producing an unnatural appearance.

5. Male body results. Results for male subjects were generally less reliable than for female subjects, particularly for lower-body garments. The training data is predominantly female fashion models in women’s clothing, creating a significant domain gap for male try-on. This was most pronounced for items like trousers and shorts, where male and female body geometry differs substantially.

Male try-on failure: the same blazer outfit (left, original) produces significant body proportion distortion and artifacts when transferred to a male subject (center, right)


Smartphone App Design

Despite the processing speed limitations for real-time use, I designed a complete mobile application to define the user experience that META FIT would deliver.

Alpha Version

The initial release would establish the core flow:

  • User registration and profile setup (height, preferences)
  • Basic photo capture with pose guidance overlay
  • Single garment try-on with server-side processing

Beta Version: Full Experience

The beta design expanded the flow into a comprehensive shopping experience:

ScreenFunction
WelcomeOnboarding and first-time setup
DashboardMain navigation hub
Create AvatarPhoto capture with real-time pose guidance (stand here, arms at sides)
Avatar DisplayGenerated avatar with extracted body measurements
Outfit MenuCategory selection: tops, bottoms, full outfits, shoes
Outfit ListBrowse available garments with filters
Fitting RoomVirtual try-on result display with before/after comparison
CartPurchase flow integration with partner retailers

The design targeted the iPhone 12 form factor and comprised 10 complete screen mockups created in Figma. The guiding UX principle was minimum friction: from opening the app to seeing a try-on result should require as few steps as possible. Take a photo, pick a garment, see yourself wearing it.

The gap between this polished app design and the backend’s processing speed limitations highlighted the central challenge: the technology worked, but not fast enough for the experience it was meant to power.


The Generational Shift: From GANs to Diffusion Models

META FIT was built during what might be called the “GAN era” of generative AI, roughly spanning 2014 to 2021. The technical choices throughout this series — adversarial training, flow-based warping, U-Net generators with composite masks — reflect the state of the art during that period.

Since then, the landscape has shifted fundamentally. Diffusion models — DDPM, DALL-E 2, Stable Diffusion, Midjourney — have demonstrated that an entirely different generative paradigm can match and surpass GAN quality across most image generation tasks.

Why Diffusion Models Matter for Virtual Try-On

Several properties of diffusion models address the specific limitations we encountered with the GAN-based approach:

More stable training. Without the adversarial dynamic — no generator-discriminator balancing act, no mode collapse — diffusion models train more reliably and consistently. The training instabilities that plagued GAN development are largely absent.

Better generalization. Diffusion models trained on large, diverse datasets generalize more effectively across body types, poses, and garment categories. The body type bias that limited our GAN results is less pronounced when the training data is web-scale.

Higher image quality. Modern diffusion models produce images with fewer artifacts, better detail preservation, and more natural compositing — precisely the areas where our GAN-based system showed failure modes.

More natural compositing. Rather than relying on an explicit warping step followed by mask-based blending, diffusion-based approaches can handle the entire generation process in a unified framework. The “pasted on” artifacts that arise from imperfect masks become less of a concern.

The trade-off: inference speed. Diffusion models require multiple denoising steps (typically 20-50), making each generation slower than a single GAN forward pass. However, recent advances in distillation (fewer-step models), optimized architectures, and hardware acceleration are steadily closing this gap.


Current Experiments and Future Directions

Early experiments with modern image synthesis approaches for virtual try-on show substantial improvements over the GAN-based results. The field is advancing rapidly, with new papers and models appearing regularly.

Key improvements that modern approaches are delivering:

  • Faster inference through model distillation and optimized sampling, bringing generation times closer to real-time requirements
  • Better body type coverage from training on larger, more diverse datasets
  • Higher resolution output that captures fine garment details — stitching, fabric texture, button placement
  • More natural garment-body blending without explicit mask-based composition
  • Multi-view consistency enabling try-on from different angles

The fundamental vision that started this project two decades ago — enabling anyone to see how clothes look on their own body before purchasing — is closer to practical reality than it has ever been.


Technical Stack Summary

ComponentTechnologyStatus
Virtual Try-On (primary)PASTA-GAN++ (PyTorch, CUDA)Functional prototype
Virtual Try-On (initial)PF-AFN (PyTorch, CUDA)Evaluated, superseded
Pose EstimationOpenPose (body_pose_model.pth)Integrated
Human ParsingGraphonomy (inference.pth)Integrated
Body MeasurementCustom Python (OpenCV, NumPy)Validated with 10 subjects
3D ReconstructionPiFu (HGPIFuNet)Explored, deferred
Web PrototypeTensorFlow.js, PoseNetProof-of-concept complete
App DesignFigma mockups (iPhone 12)10 screens designed
InfrastructureDocker, NVIDIA NGC, GPU computeOperational

Reflection

This project spans two decades and multiple generations of technology. It started with a simple idea — a photo booth that shows you wearing different clothes — and evolved through classical computer vision, convolutional neural networks, generative adversarial networks, and now the early stages of diffusion-based generation.

Each technology generation brought the vision measurably closer to reality, and each also revealed its own limitations. The GAN-based system demonstrated that photorealistic virtual try-on is achievable, but also that achieving it at consumer-grade speed, quality, and inclusivity requires continued advancement.

What I have carried forward from this project is not just a set of technical skills — flow-based warping, GAN training, pose estimation, body measurement — but a deeper appreciation for the iterative nature of building AI systems. Research, prototype, test, identify limitations, integrate the next generation of tools. The cycle has repeated several times over the life of this project, and it will continue.

The twenty-year-old question has not changed: can we let people see how clothes look on their body without trying them on? The answer, finally, is approaching “yes.”


META FIT Series:

Share this article

Related Posts