Web         Gaudiya Discussions
Gaudiya Discussions Archive » TECH ISSUES
PC problems, recommended software, tips and tricks, coding and so forth. Things that make your life in the cyberspace easier.

OCR scanning & diacritics -

braja - Wed, 17 Nov 2004 23:45:47 +0530
Is there any OCR software around that can deal with diacritics?
Keshava - Thu, 18 Nov 2004 01:18:39 +0530
I don't know of any. If you can find a product that you can teach to recognize new forms you could train it to recognize the diacritics. Otherwise what I have found is the when you OCR a text with diacritics in it the program substitutes other characters for the diacritic letters. Then you have to go through and do search and replace editing for those. The problem is that sometimes the programs recognize diacritic letters for other common letters. If you could just get a program that would substitute a distinct letter or symbol no matter what it was for each different diacritic letter then search and replace would be easy.
Keshava - Thu, 18 Nov 2004 01:20:58 +0530
Come to think of it my best suggestion is to use an OCR program that supports other languages like French, or even some Eastern European languages that have basically Roman scripts but do use diacritical marks.
arekaydee - Thu, 18 Nov 2004 01:38:35 +0530
OmniPage supports some fairly diacritic heavy languages. It's a little pricey though.

OmniPage Supported languages.

I think Keshava is right and that some training of the software would be needed.
Madhava - Thu, 18 Nov 2004 03:43:15 +0530
I once played around with OmniPage and tried to teach it some tricks, but if memory serves I never got too far. Maybe it will work if you persist, I just didn't have a pressing need at that time so I didn't. For what I hear it seems to be the best of the lot.
Kalkidas - Thu, 18 Nov 2004 05:01:14 +0530
Try FineReader Pro.
It has option to make user-defined languages (on the base of existing), and recognises 177 languages, so I think there shouldn't be any problems with finding needed diacritics letters.
Attachment: Image