載入...

A Post-Processing Scheme for Malayalam using Statistical Sub-character Language Models

Most of the Indian scripts do not have any robust commer- cial OCRs. Many of the laboratory prototypes report rea- sonable results at recognition/classification stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word i...

全面介紹

書目詳細資料
主要作者: Karthika Mohan and C. V. Jawahar
格式: Printed Book
出版: ACM 2010
主題:
在線閱讀:http://10.26.1.76/ks/005435.pdf
LEADER 01741nam a22001457a 4500
100 |a Karthika Mohan and C. V. Jawahar  |9 26700 
245 |a A Post-Processing Scheme for Malayalam using Statistical Sub-character Language Models 
260 |b ACM  |c 2010 
500 |a DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems 493-500  
520 |a Most of the Indian scripts do not have any robust commer- cial OCRs. Many of the laboratory prototypes report rea- sonable results at recognition/classification stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word increase. For Malayalam, the average number of char- acters in a word is almost twice that of English. Moreover, the number of words required to cover 80% of the Malay- alam language is more than forty times that of other Indian languages such as Hindi. Hence a direct dictionary based post-processing scheme is not suitable for Malayalam. In this paper, we propose a post-processing scheme which uses statistical language models at the sub-character level to boost word level recognition results. We use a multi-stage graph representation and formulate the recognition task as an optimization problem. Edges of the graph encode the language information and nodes represent the visual simi- larities. An optimal path from source node to destination node represents the recognized text. We validate our method on more than 10,000 words from a Malayalam corpus. 
650 |a UNICODE   |a CONFERENCE PROCEEDINGS  |9 26701 
856 |u http://10.26.1.76/ks/005435.pdf 
942 |c KS 
999 |c 76175  |d 76175 
952 |0 0  |1 0  |4 0  |7 0  |9 68174  |a MGUL  |b MGUL  |d 2016-02-08  |l 0  |r 2016-02-08  |w 2016-02-08  |y KS