Loading...
A Post-Processing Scheme for Malayalam using Statistical Sub-character Language Models
Most of the Indian scripts do not have any robust commer- cial OCRs. Many of the laboratory prototypes report rea- sonable results at recognition/classification stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word i...
| Main Author: | |
|---|---|
| Format: | Printed Book |
| Published: |
ACM
2010
|
| Subjects: | |
| Online Access: | http://10.26.1.76/ks/005435.pdf |
| LEADER | 01741nam a22001457a 4500 | ||
|---|---|---|---|
| 100 | |a Karthika Mohan and C. V. Jawahar |9 26700 | ||
| 245 | |a A Post-Processing Scheme for Malayalam using Statistical Sub-character Language Models | ||
| 260 | |b ACM |c 2010 | ||
| 500 | |a DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems 493-500 | ||
| 520 | |a Most of the Indian scripts do not have any robust commer- cial OCRs. Many of the laboratory prototypes report rea- sonable results at recognition/classification stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word increase. For Malayalam, the average number of char- acters in a word is almost twice that of English. Moreover, the number of words required to cover 80% of the Malay- alam language is more than forty times that of other Indian languages such as Hindi. Hence a direct dictionary based post-processing scheme is not suitable for Malayalam. In this paper, we propose a post-processing scheme which uses statistical language models at the sub-character level to boost word level recognition results. We use a multi-stage graph representation and formulate the recognition task as an optimization problem. Edges of the graph encode the language information and nodes represent the visual simi- larities. An optimal path from source node to destination node represents the recognized text. We validate our method on more than 10,000 words from a Malayalam corpus. | ||
| 650 | |a UNICODE |a CONFERENCE PROCEEDINGS |9 26701 | ||
| 856 | |u http://10.26.1.76/ks/005435.pdf | ||
| 942 | |c KS | ||
| 999 | |c 76175 |d 76175 | ||
| 952 | |0 0 |1 0 |4 0 |7 0 |9 68174 |a MGUL |b MGUL |d 2016-02-08 |l 0 |r 2016-02-08 |w 2016-02-08 |y KS | ||