Rare disease detection in medical imaging is fundamentally a data problem: underrepresented classes are systematically misclassified because they’ve never seen enough examples to learn from. This project evaluated whether GAN-generated synthetic images could solve that problem without collecting new data — using the ODIR retinal fundus dataset, which contains eight ocular disease labels with substantial class imbalance. Class-specific GANs were trained separately for two underrepresented conditions — age-related macular degeneration and hypertensive retinopathy — and filtered synthetic samples were used to augment a ResNet-18 multi-label classifier. The results were striking in one direction and instructive in the other: augmentation nearly doubled F1 for macular degeneration (0.279 → 0.507) and improved macro-F1 from 0.417 to 0.455, while hypertensive retinopathy showed no improvement due to insufficient generative fidelity — a finding with direct implications for how synthetic augmentation should be scoped in practice.
A secondary augmentation-percentage analysis with validation-tuned thresholds revealed that performance peaked at +50% synthetic augmentation (macro-F1 0.592 vs. 0.565 baseline) and declined at higher levels, confirming a saturation effect where additional synthetic samples introduce diminishing returns rather than useful diversity. The study also demonstrated that a single global decision threshold is suboptimal for imbalanced multi-label classification — class-specific threshold calibration was necessary to recover discriminative signal, particularly for rare classes where predicted probabilities were systematically miscalibrated. Together, these findings underscore that synthetic augmentation in medical imaging is only beneficial when generative fidelity, calibration, and threshold selection are jointly considered.