In this thesis we present a data generation system designed to improve training of object‑counting models. Existing datasets often contain incorrect or missing annotations and are typically biased toward a single dominant object category, which leads models to learn to count only the dominant classes. Our system, denoted GeCoGen, mitigates these limitations by controlled image generation that allows adjusting category diversity, instance‑count distributions, and class balance. It also provides parameters to simulate occlusion, minimum visibility, and partial coverage of instances. Using GeCoGen we created the training dataset GeCoDa, comprising 7,900 images across 79 categories, with an average of 1.87 different categories per image. We evaluated the impact of this dataset on the state‑of‑the‑art general object counter GeCo. GeCo was trained on three variants: GeCoDa, FSC-147, and the combined GeCoDa and FSC-147. Performance was measured on FSC-147, CA-44 and MCAC. The model trained on the combined dataset reduced MAE on MCAC by 61% compared to the model trained only on FSC-147, while the model trained solely on GeCoDa achieved a 58% MAE reduction on CA-44 relative to the FSC-147 baseline.
|