Google
Web         Gaudiya Discussions
Gaudiya Discussions Archive » TECH ISSUES
PC problems, recommended software, tips and tricks, coding and so forth. Things that make your life in the cyberspace easier.

HK to Unicode Bangla input - need consensus on extra bengali letters



ananga - Fri, 04 Feb 2005 04:26:42 +0530
I have found this webpage that converts transcripted bengali to unicode bangla. The code is open source (GPL licence) and it looks fairly easy to hack the code to convert HK to bangla.

Of course there needs to be some agreement regarding extra letters not used in sanskrit. Any ideas Jagat, Madhava? Perhaps the help page of this website could give some ideas

bangla unicode coversion page
Keshava - Sat, 05 Feb 2005 06:08:43 +0530
I tried the webpage conversion to Bangla on "oM namo nArAyaNAya".

The problem with the output is that some Unicode characters like the Bangla character for "o" are written before and after the consonant they are attached to. The output simply writes both parts of that output as after or to the right of the consonant. Thus the output is not good. What is needed is a realization by the code that when it see a consonant followed by an "o" that it inserts the letter for medial "e" before the consonant and the letter for medial "A" after the consonant.

For those of you who are familiar with Bangla and Devanagari (or even other Indic) scripts this should make sense. I don't know why the creators of Unicode fonts don't realise the way in which these letters have to be combined in the Indic script. Probably because the people who make the Unicode scripts don't actually know the languages involved mostly.

For a better way to make a proper Bangla (or many other Indic script) output see the Itrans online interface at

http://www.aczoom.com/itrans/online
ananga - Thu, 10 Feb 2005 05:29:14 +0530
Keshava

That's a very nice webpage and I'd probably make use of it if I had broadband which I don't have yet.

The fact that the o-kar comes after the consonant rather than wrapped around it means that your browser is not configured properly for indic scripts. On my machine (windowsME) it does that in Firefox but in IE v5.5 it works fine.

I have hacked around with the JavaScript code on the above webpage enough to get it to produce fairly decent bangla output from something which is nearly standard HK with some additions which are by no means set in stone. I welcome any feedback

Ananga



Attachment: convertAgain9Feb.html
Gaurasundara - Thu, 10 Feb 2005 06:34:23 +0530
While we're on the subject of transliteration. Does anyone know an encoding format for Tamil, as in the works of the [ZrI VaiSNava] AlvArs?

Just like Madhava produced a picture of Harvard-Kyoto encoding, I would prefer if someone could produce a similar simple transliteration scheme for Tamil, as I haven't been able to find any?
ananga - Thu, 10 Feb 2005 16:27:31 +0530
the ITRANS link that Keshava posted above appears to deal with tamil, whether or not it has all the necessary characters for what you want to do, you'll have to experiment.

It's probably a good idea to download a Unicode Tamil font if you don't already have one.
Keshava - Fri, 11 Feb 2005 06:59:32 +0530
QUOTE(Gaurasundara @ Feb 9 2005, 03:04 PM)
While we're on the subject of transliteration. Does anyone know an encoding format for Tamil, as in the works of the [ZrI VaiSNava] AlvArs?



Do you mean what is the standard scholarly diacritic system for special Tamil letters?
Check this out![attachmentid=1297]

Or

Do you mean how do we reperesent Tamil script?

Itrans deals with both. Check out aczoom.com/itrans/online
Attachment: Tamil.pdf
Talasiga - Fri, 11 Feb 2005 16:23:49 +0530
gO.Diiya Diskashans smile.gif
Keshava - Tue, 15 Feb 2005 08:31:01 +0530
QUOTE(ananga @ Feb 10 2005, 12:57 AM)
It's probably a good idea to download a Unicode Tamil font if you don't already have one.



I have several Unicode fonts that include Tamil letters. But none are satisfactory. Using Itrans I have the ability to use the archaic Tamil forms as well as the modern ones. The Unicode fonts did not even have the ability to properly do consonant vowel combinations properly. If someone has a good Unicode font that can actually do all the conjuncts that Itrans can do please let me know. Also I have see Itrans do numbered Tamil letters for doing Sanskrit in Tamil script. This is very important as Tamil does not have as many letters as Sanskrit and some people add small subscripted numbers to the consonants to indicate their true pronunciation. Also if anyone knows of a Grantha script font I would love to get one. Grantha is a Tamil like script that actually has all the letters for doing Sanskrit completely. Many very old manuscripts and some old books are printed in this script. I have books on the script but so far have not seen it made for computers except only a demonstration of it by the guys at the IIT Madras.
ananga - Fri, 25 Feb 2005 02:33:22 +0530
This is primarily for Malika (Advitiya) but also for anyone else interested.

I am using the Likhan TTF font from www.stat.wisc.edu/~deepayan/bengali/webpage/font/fonts.html which makes the attached latest version of my JavaScript HTML file display properly.

There are some glitches regarding numbers, darhi, double darhi, khondo to and a suitable codes for flapped retroflex consonants but I'll overcome those with time and as I learn how to program in JavaScript.

I need to adapt the code to implement sanskrit-only HK codes i.e. rare letters, sandhi, compound nouns, apostrophe versus avagraha, and things I don't understand like

anunAsika &
upadhmnya f
jihvAmUlIya x
udAtta ;
and svarita :

but we probably need to be of use to sanskritists who want to use bangla script
A complete list is at:
http://www.ucl.ac.uk/~ucgadkw/members/tran...tml#x1-120005.2 which are of more specialised usage.

Maybe Madhava, you could disturb Jagat's bhajan at some opportune moment to ask him his opinion on the subject. And sometime Madhava when you get back some help with JavaScript would be awesome

onek koshto dilam

Ananga
Attachment: convertAgain10Feb.html
Advitiya - Fri, 25 Feb 2005 03:19:53 +0530
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

অনঙ্গবাবা কাজ করছে। আমায় ফোন কোরে বাংলা লিখতে সাহায্য করছে।
ananga - Fri, 25 Feb 2005 03:46:42 +0530
QUOTE(Advitiya @ Feb 24 2005, 09:49 PM)
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

অনঙ্গবাবা কাজ করছে। আমায় ফোন কোরে বাংলা লিখতে সাহায্য করছে।




Yata la_ibe rAdhAra nAma karibe Ananda //
dibAnizi bhaja re mana zrI rAdhe govinda //


যত লইবে রাধার নাম করিবে আনন্দ //

দিবানিশি ভজ রে মন শ্রী রাধে গোবিন্দ //
Keshava - Sat, 26 Feb 2005 00:38:25 +0530
QUOTE(Advitiya @ Feb 24 2005, 11:49 AM)
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

? ? ? ? ? ? ? ? ? ?



Look at the above and now compare the output made by Itrans below:

[attachmentid=1394]
Attachment: Image
Advitiya - Sat, 26 Feb 2005 00:50:54 +0530
bAbA Ananga, ekTu zuddha kore dilam |

nAma zune YAra eta prema jAge, cokhe Ane kata jala, sakhi se hari kemana bala

নাম শুনে যার এত প্রেম জাগে, চোখে আনে কত জল, সখি সে হরি কেমন বল
Advitiya - Sat, 26 Feb 2005 00:58:14 +0530
QUOTE
QUOTE
(Advitiya @ Feb 24 2005, 11:49 AM)
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

? ? ? ? ? ? ? ? ? ?

Look at the above and now compare the output made by Itrans below:


I like the other one better.
Keshava - Sat, 26 Feb 2005 02:05:50 +0530
QUOTE(Advitiya @ Feb 25 2005, 09:28 AM)
I like the other one better.



I don't know which one you are talking about. I editied my last post and changed the GIF to a JPEG so that you can see it better. However when I look at all the other attempts they are not correct Bengali script. Please note that the conjuncts are not done properly and neither are the vowels added to the consonants properly. What do you have to say about that?
ananga - Sat, 26 Feb 2005 04:34:37 +0530
Keshava
Your computer is clearly not configured to display Bengali properly.

Firstly which Operating System and browser are you running? Malika and I are both running Windows ME and Microsoft explorer and once she installed the likhan font (which is the only one I've tried it with so far) It all displayed fine. If you are using Internet explorer you may need to select tools:internet options and
  1. click on the fonts buttons and select likhan for the bengali language font
  2. click on the fonts button and add the bengali language

When I used firefox or use the computer at the library I got similar results that you describe.


The ITRANS is quite nice but
  1. it is only available online
  2. the GIF that it produces are not editable text which can be easily shared with others or used in PMs etc
  3. hanging around GD I have naturally become more fluent in HK so I prefer that to Itrans and why when I've satisfactorily completed the HK to unicode bengali converter I'll extend it to convert from HK to devanagari and I'll be able to use the very magnificent sanskrit2003 font which has awesome devanagari conjuncts and splendid typography

The online Itrans website does also convert into Unicode text which is handy and even better if you have a broadband connection. I don't.


Ananga
Advitiya - Sat, 26 Feb 2005 09:30:30 +0530
QUOTE
QUOTE
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

অনঙ্গবাবা কাজ করছে। আমায় ফোন কোরে বাংলা লিখতে সাহায্য করছে।

However when I look at all the other attempts they are not correct Bengali script. Please note that the conjuncts are not done properly and neither are the vowels added to the consonants properly. What do you have to say about that?


Keshavaji! I don't understand. The conjuncts are not done properly? The vowels are not added to the consonants properly? What do you mean? Give me an example, please.

Did you ever try Banglaword? It's very easy to type directly as you type in general on your keyboard. Of course, here we are trying the conversion from HK.
Kalkidas - Sat, 26 Feb 2005 14:09:54 +0530
QUOTE(Keshava @ Feb 25 2005, 10:08 PM)
QUOTE(Advitiya @ Feb 24 2005, 11:49 AM)
anaGgabAbA kAja karache| AmAya phona kore bAMlA likhate sAhAYYa karache|

? ? ? ? ? ? ? ? ? ?



Look at the above and now compare the output made by Itrans below:

[attachmentid=1394]



Dear Keshavaji, do you see the same, that we see?

[attachmentid=1397]

I can't figure much differenses from what you've posted...
Attachment: Image
Keshava - Sun, 27 Feb 2005 04:59:02 +0530
Thanks for posting the screen shot. I am NOT seeing what you are seeing. Must be my Unicode bengali font or the fact that I am using a Mac or the browser settings. Does anyone have any ideas?

But this is my very point. Unicode is fine if everyone can see it correctly. Otherwise how is it better than other systems? With Itrans I can generate a postscript file, HTML, a pdf file or a gif (which can then be made into a JPEG, etc). So when Unicode fails to show up correctly the JPEG, GIF or PDF will.

OK, well I just tried this HK to Bangla converter again with IE using Virtual PC to emulate Windows XP Pro. It still did not work. I could not cut from the document and paste it to my Unicode capable text editor Textedit. It came out just as screwed up as before.

If someone can send me a unicode text file that will appear correctly in either bengali or tamil or devanagari which has been processed by one of these java script converters please do at gregjay(at)bluebottle.com

The devanagari converter at http://ccat.sas.upenn.edu/plc/tamilweb/tra...ansunicode.html works just fine for me.

Check out the converters at http://www.higopi.com/ucedit/index.html
ananga - Sun, 27 Feb 2005 06:38:41 +0530
QUOTE(Keshava @ Feb 26 2005, 11:29 PM)
If someone can send me a unicode text file that will appear correctly in either bengali or tamil or devanagari which has been processed by one of these java script converters please do at gregjay(at)bluebottle.com

The devanagari converter at http://ccat.sas.upenn.edu/plc/tamilweb/tra...ansunicode.html works just fine for me.

Check out the converters at http://www.higopi.com/ucedit/index.html




Try going to http://www.stat.wisc.edu/~deepayan/Bengali...Font/fonts.html and downloading the likhan.otf file instead of the likhan.ttf file .

OTF is a new cross-platform font standard which should work on any platform (if the computer is new enough, which mine isn't). The likhan.ttf file is what I have been using and it has reasonably good but not outstanding conjuncts but it is, I think, a PC specific font file whereas the the otf file should theoretically work on a recent mac.

Alan Wood's unicode website had scant information on bangla on the mac but if you can find any information on how to set up devanagari on the mac it will be similar as both require "indic script processing" to sort out the vowel matra placement problem.

Hope that'll be of some help.

Ananga
Advitiya - Thu, 31 Mar 2005 23:01:39 +0530
bAhirete AlAbholA
antare hRdaya galA
mukhe sadA kRSNa balA
cokhe azrumAlA
dInatAya se mATira mAnuSa
niSThAte acalA |

kRSNa dite kRSNa nite
dhare zakti saba
alaukika lokavat_
gauXIya vaiSNava ||

বাহিরেতে আলাভোলা

অন্তরে হৃদয় গলা

মুখে সদা কৃষ্ণ বলা

চোখে অশ্রুমালা

দীনতায় সে মাটির মানুষ

নিষ্ঠাতে অচলা ।



কৃষ্ণ দিতে কৃষ্ণ নিতে

ধরে শক্তি সব

অলৌকিক লোকবত্

গৌড়ীয় বৈষ্ণব ।।


Sorry Ananga, I couldn't get "khnada ta" although I typed it the way it was shown above.
ananga - Tue, 30 Aug 2005 00:54:02 +0530
I've fixed the da shanya ra and dha shanya rha ড/ড় & ঢ/ঢ় problem.

I've fixed the numerals ১২৩৪৫৬৭৮৯০

and the dari & double dari । & ॥

khondoto still not working and conjuncts with y only work if you type Y instead which is a bearable halfway house.

Still more useful than the previous attempt though.

I've had to get my hands messy with coding and getting my head round nested/multiple if-then-else constructions crying.gif and anticipate properly rewriting the code, I'm still at the working out what someone else has written stage rather than writing my own code.

by the way, has anybody using linux or Mac OS been able to read these pages with bangla in them?

জয় জয় শ্রী রাধে শ্যাম নিতাই গৌর হরিবল
Attachment: HK_to_Bangla_Unicode_converter_29_Aug.html
Advitiya - Tue, 30 Aug 2005 02:16:15 +0530
জয় রাধে জয় রাধে গোবিন্দ
জয় রাধে জয় রাধে ॥

ভোর সময় কালে
কোকিলা ডাকে ডালে
ভ্রমরা হরিগুণ গায় রে ॥

একৈ পালঙ্ক করি
দুঁহু-জন বৈঠল
দুঁহু-মুখ সুন্দর সাজে ॥

শ্যামের বামে রাই
নবীন কিশোরী
মুচকি মুচকি হাসে ॥

শ্যামের গলে
বন-মালা বিরাজিত
রাই-গলে গজমোতি সাজে ॥

শ্যামের করে রতন-বলয়
রাই-করে কঙ্কন সাজে ॥

দুঁহার চরণে মণিময় নূপুর
রুনুঝুনু রুনুঝুনু বাজে ॥


This kIrtan has been sung in morning melody[প্রভাতী সুর].
Advitiya - Tue, 30 Aug 2005 02:38:35 +0530
গৌড়ীয় বৈষ্ণব

গৌড়ীয় ডিস্কাশন্স্

দৃঢ় করি ধর নিতাইর পায়//


O.k. I've got ড় & ঢ় but what happened to one dAri two dAri-s?