Open Access System for Information Sharing

Login Library


Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

A Study on Artificial Intelligence for Scene Text Recognition and Its Applications

A Study on Artificial Intelligence for Scene Text Recognition and Its Applications
Date Issued
This thesis proposes scene text recognition algorithms, and these algorithms are applied to the recognition of slab identi cation numbers (SINs) in factory scenes. The recognition of SINs is a challenging problem due to complex background of factory scenes and low quality of characters in SINs. To address these challenges and develop data-driven algorithms, convolutional neural network (CNN) and fully convolutional network (FCN) are used in the proposed algorithms. In the first part of the thesis, a CNN-based algorithm is proposed to recognize machine-printed SINs in factory scenes. Patch images are extracted by a sliding window method, and individual patches are classi fied into a category of a character or background by using a CNN classi er. The main contribution of the proposed algorithm is two-fold: accumulated response map and model-based score function. Adjacent patches of a true character region tend to be classi ed as the identical character, and an isolated character candidate tends to be a false candidate. The accumulated response map combines the information of neighboring patches to reduce the effect of incorrect classi cation. A model-based score function is proposed to use structural information of SINs. A SIN consists of 9 characters, and correct recognition of entire characters in a SIN is required to identify the slab. By optimizing the model-based score function, whole characters in a SIN can be simultaneously recognized. In the second part, a FCN based algorithm is proposed to recognize handwritten SINs. The CNN-based method is computationally inefficient due to multiscale analysis for recognizing SINs of various sizes. Furthermore, the CNN-based method cannot be applied to the recognition of handwritten SINs because handwritten characters are irregularly positioned with different sizes in a SIN. To address these limitations, a recognition pipeline that uses a FCN with deconvolution layers is proposed. The main contribution of the second part is on the novel ground-truth data (GTD), called position-based GTD, for the training of a FCN. By using the position-based GTD, a FCN can be trained in the manner of image-to-image training. Experiments were conducted on industrial data collected from an actual steelworks to verify the effectiveness of the proposed method.
Article Type
Files in This Item:
There are no files associated with this item.


  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads