Image annotation is a crucial but often time-consuming step in preparing image datasets. In addition to the graphical interface of the annotation tool, the speed of annotation is influenced by the implemented segmentation approaches. With the development of deep learning in the field of computer vision, the possibility has arisen to replace manual annotation and traditional segmentation algorithms with faster and more accurate approaches. One such model is the foundation model Segment Anything, which we analyzed in various versions (ViT-b, ViT-l, ViT-h, MobileSAM, SAM-Med2D, MedSAM) and tested on the Kvasir-SEG dataset of colonoscopic images and the Kvasir-Instrument dataset of colonoscopic instruments. We evaluated the segmentation accuracy and time complexity of the models with ground-truth object masks and, based on the results, implemented the functionalities of the best model into a prototype annotation program.
|