Enhancing the segmentation of Arabic characters using baseline information

Hadeel M. Al-Ateeq, AbdulMalik S. Al-Salman

Abstract


Optical Character Recognition (OCR) system was present to provide an automatic recognition of large printed documents for archiving or processing which improves the interaction between human and machine in many applications. The proposed system was developed for Arabic language. Arabic language is written cursively and consists of 29 characters each with different shapes. The Arabic OCR (AOCR) system is divided into several steps: image acquisition, preprocessing, segmentation, feature extraction, recognition and post processing. Segmentation step segments the text into lines, then into glyphs and finally into characters. The most important and sensitive step is the character segmentation step which its result may affect the following steps and at the end the recognition rate. This study has concentrated on character segmentation by enhancing an already published algorithm in the literature. As a result, the new algorithm decreases the processing time and avoids the segmentation of descenders such as the letter Ra (ر) and End Ya shapes (ـي، ى، ئ).


References



Full Text: PDF

Refbacks

  • There are currently no refbacks.


International Multilingual Academic Journal

Copyright © IMAJ 2023