A Study on Artificial Intelligence for Scene Text Recognition and Its Applications
- A Study on Artificial Intelligence for Scene Text Recognition and Its Applications
- Date Issued
- This thesis proposes scene text recognition algorithms, and these algorithms are applied to the recognition of slab identi cation numbers (SINs) in factory scenes. The recognition of SINs is a challenging problem due to complex background of factory scenes and low quality of characters in SINs. To address these challenges and develop data-driven algorithms, convolutional neural network (CNN) and fully convolutional network (FCN) are used in the proposed algorithms.
In the first part of the thesis, a CNN-based algorithm is proposed to recognize machine-printed SINs in factory scenes. Patch images are extracted by a sliding window method, and individual patches are classi fied into a category of a character or background by using a CNN classi er. The main contribution of the proposed algorithm is two-fold: accumulated response map and model-based score function. Adjacent patches of a true character region tend to be classi ed as the identical character, and an isolated character candidate tends to be a false candidate. The accumulated response map combines the information of neighboring patches to reduce the effect of incorrect classi cation. A model-based score function is proposed to use structural information of SINs. A SIN consists of 9 characters, and correct recognition of entire characters in a SIN is required to identify the slab. By optimizing the model-based score function, whole characters in a SIN can be simultaneously recognized.
In the second part, a FCN based algorithm is proposed to recognize handwritten SINs. The CNN-based method is computationally inefficient due to multiscale analysis for recognizing SINs of various sizes. Furthermore, the CNN-based method cannot be applied to the recognition of handwritten SINs because handwritten characters are irregularly positioned with different sizes in a SIN. To address these limitations, a recognition pipeline that uses a FCN with deconvolution layers is proposed. The main contribution of the second part is on the novel ground-truth data (GTD), called position-based GTD, for the training of a FCN. By using the position-based GTD, a FCN can be trained in the manner of image-to-image training. Experiments were conducted on industrial data collected from an actual steelworks to verify the effectiveness of the proposed method.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.