This masters thesis presents and describes modern methods of optical character recognition in natural scenes. Methods with high classification results and are robust to illumination and geometric transformations were selected for the thesis. Our work is based on the implementation of three different methods for obtaining features. The basic HOG method, which also underlies the other two methods is one of the most popular feature extraction methods in object detection and character recognition. HOG method was primarily used in connection with human detection, but was adapted for character recognition also. PHOG method, which is based on HOG, converts the basic HOG algorithm into a pyramid scheme and also includes bilinear interpolation. Due to the pyramid structure of PHOG, the method is slower than the HOG algorithm, but more precise, since the feature vectors are larger. The third feature extraction method, which we have implemented is Co-HOG algorithm, which inherits all the good qualities of HOG method, such as invariance to illumination and geometric changes. Co-HOG is differs from HOG and PHOG, by its feature representation, where it also captures the spatial relationship of neighbouring pixels in order to describe the character more accurately. Among other things Coo-HOG is also a computationally faster than HOG and PHOG.
Due to various factors in natural scene text images, the traditional character recognition systems produces inaccurate results, because it assumes that the characters do not differ in fonts and colors and presumes a monotonous background of images, whereas in obtaining features from natural scene images, the algorithms should be robust and invariant to character sizes, background noise, different fonts, local illumination changes and visual effects that draw attention, such as color blending. The above described methods do not require preprocessing and segmentation as traditional systems do, since the extract features with methods that describe the appearance of the object and the shape with gradient intensity and edge directions.
Feature extraction methods were evaluated on a variety of databases such as ICDAR, Chars74K, CVL OCR DB. We have also generated a synthetic database of character images, that simulates characters in natural scenes, by including large variety of different fonts and noises in images. Synthetic image database was generated with the aim of increasing the training set and the improvement of classification accuracy.
|