Google
Web         Gaudiya Discussions
Gaudiya Discussions Archive » TECH ISSUES
PC problems, recommended software, tips and tricks, coding and so forth. Things that make your life in the cyberspace easier.

I'm creating a combined bangla/hindi/sanskrit diacritic font - and working out what characters I need to include



ananga - Mon, 16 Aug 2004 18:35:25 +0530
Hello everybody

During the recent visit of my (sort of siksha-) guru Sri Raman Bihari Das Babaji, we discussed compiling a song book of all the songs that are sung in our group and also to document my paramgurudev's padavali kirtan that has remained untranslated in devotees' tape collections.

My style of transliteration is somewhat different from the usual type as I also include the original bengali script (and in future the devanagari) and my english transliteration aims only to reproduce the pronounciation rather than the spelling which is available in the original script anyway.

This is an example of one of my dodgy translations in progress.

Attachment: Image
ananga - Mon, 16 Aug 2004 18:38:14 +0530
this chart explains the correspondance between the letters, my tranlsliteration and the conventional system (that in my opinion is inadequate for bengali).
Attachment: Image
ananga - Mon, 16 Aug 2004 18:40:51 +0530

I am pretty happy with the bengali system I have developed and although it does have a few disadvantages I can live with them, they are:

1:
the initial vowel is sometimes silent (in spoken bengali) and sometimes sung (in metre when sung) and as yet I have no symbol for an optional final inherent vowel

2:
currently I have only two possible symbols for the inherent vowel either an "" or an "o" and there are more than two actual sounds in bangla because it is a continuum of vowel sounds, that I can live with though.

The songbook will include songs in bangla, hindi and sanskrit. For sanskrit I will use the standard transliteration system possibly substituting the occasional gy for jn.

I will probably be using Itranslator99 for the hindi devanagari whuch in the main is pretty good but I am not completely happy with the transliteration it produces mainly for the anushvara and chandrabindu so I am looking for an appropriate replacement. Would a chandrabindu work for that?

I am currently working out symbols I need to include in a font that will:

1:
reproduce pronounciation in Bengali

2:
serve as a normal sanskrit diacritic font

and most crucially at the moment
3:
faithfully reproduce pronounciation in Hindi


I am not really very good at hindi, which I sing like a bengali. I would like to ask the help of people with good hindi pronounciation for help in deciding what would be an appropriate choice of characters. I'll write more later.

As an aside I've decided to create a sans serif font as it will have a distinctive look and not look like balaram!

Any comments or feedback welcome.

Ananga Manjari Das
Glastonbury, Somerset, UK
Madhava - Mon, 16 Aug 2004 18:47:45 +0530
QUOTE (ananga @ Aug 16 2004, 03:05 PM)
This is an example of one of my dodgy translations in progress.

The Bengali for the second stanza is repeated twice, the first one is missing.
ananga - Mon, 16 Aug 2004 18:55:23 +0530
Here is a hindi example with diacritics generated by Itranslator99

Attachment: Image
ananga - Mon, 16 Aug 2004 18:56:43 +0530
thanks Madhava

Actually there's loads of mistakes. If anyone sees any more message me privately as I'd like to limit the public conversation to topics of legibility, typography, pronounciation etc and the book will go through lengthy proofong and editing but thats a seperate issue really.

I loved your photos of your delhi book printing adventure (going off-topic myself!) wink.gif
Jagat - Mon, 16 Aug 2004 19:17:42 +0530
I'll have to look at this more closely when I get the time.

The good thing about Balaram (at least a good direction it's headed...) is that it is convertible from the transliterated font to Devanagari. It was never done with Bengali in mind, and as far as I know, no such conversion program has been written. And there are difficulties in the current Balaram system even if we wanted to do that.

Nevertheless: If this hope exists, then the idea of one-to-one correspondence of glyphs is vital. More than one letter having the same correspondence in English transliteration (such as "j" or "sh") causes problems.

So let's forget that for the time being.

(1) I don't care much for the a-dot solution for the short "o" sound. I would prefer using "o" and "" or "" for the long o sound. "koro" just seems more natural than "kr". You could then either abandon the a- division, using only "a" as you have been doing, or possibly find a real need for it.

Another possible solution I have considered is to use a character from linguistics, which I am sorry I cannot reproduce here for the moment. (Backwards "c")

(2) The a-i, a-u problem would not be entirely solved by (1) above. The current solution of umlaut is the one that I prefer. hayA rather than haiyA. (In the current Balaram system, this cannot be done, and so BBT books usually have ha-iyA. ha-uk. This is a very bad solution.

Your system takes care of this, as there is no confusion between i and oi, whereas my proposed solution above would not solve it.

(3) The "w" in how seems superfluous. I have been using jaoa haoa, which in English automatically incorporates the "w" sound. The same for when the letter replaces "y" (though I tend to use it) jAiA could be as good as jAiyA

(4) One of the biggest defects of Balaram is its inability to accomodate properly anuswar/candra-bindu. Most current systems prefer using the tilde over the accompanying vowel, like in Portuguese. I don't care much from m-dot for anusvara, but it's not really a big problem. It's less sloppy than "ng." The phonetic character that looks like an n with a hook on it seems the most succinct and natural solution, but regrettably a little too radical.

Those are preliminary comments. I'll have to look more closely later.

Jagat - Mon, 16 Aug 2004 19:19:47 +0530
The Hindi looks very nice.
Madhava - Mon, 16 Aug 2004 19:42:32 +0530
QUOTE (Jagat @ Aug 16 2004, 03:47 PM)
(1) I don't care much for the a-dot solution for the short "o" sound. I would prefer using "o" and "" or "" for the long o sound. "koro" just seems more natural than "kr". You could then either abandon the a- division, using only "a" as you have been doing, or possibly find a real need for it.

is actually a pretty good choice, here we call it "the Swedish o". For example, Bhrigu's first name is Mns, we pronounce it Mons.


QUOTE
Another possible solution I have considered is to use a character from linguistics, which I am sorry I cannot reproduce here for the moment. (Backwards "c")

I would strongly recommend against using non-ASCII characters, as crunching them back and forth on different platforms can turn out to be a real pain. Extended ASCII, from which many of Balaram's characters have been taken, is still all right. 255 characters, however, is the limit at the moment. Basically, you can use any of the following:

! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9
: ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[ ] ^ _ `
a b c d e f g h i j k l m n o p q r s t u v w x y z
{ | } ~ 






QUOTE
(4) One of the biggest defects of Balaram is its inability to accomodate properly anuswar/candra-bindu. Most current systems prefer using the tilde over the accompanying vowel, like in Portuguese. I don't care much from m-dot for anusvara, but it's not really a big problem. It's less sloppy than "ng." The phonetic character that looks like an n with a hook on it seems the most succinct and natural solution, but regrettably a little too radical.

I can crunch together a font with whatever you need if you can come up with a proposal of what we need. I can also add characters to the existing Balaram if necessary, but I would be much in favor of an open-source solution. This is one of the projects in progress.
ananga - Mon, 16 Aug 2004 21:08:59 +0530
Thanks Jagat for your response.

Whatever system I choose to use it'll be a compromise. I'll stick to the a with a circle (which I'm using instead of the backwards c) cos visually it looks like what it is; something between an "a" and an "o". It is clumsy though having to make a distincion sometimes but I think it is better than the "sri guru chaaaaraaaanaaaaa paaaadmaaaa" scenario. Since I'm including the original devanagari/bangla script that should satisfy all the linguo-geeks who should be able to read the original script anyway
wink.gif .

I'm not so worried about the bengali although I found out that for the word cha(chandrabindu)DAbe that I need to create the somewhat inelegant glyph with made up from a letter "a" a circle/letter "o" above it and a chandrabindu above that. I prefer a chandrabindu instead of a tilda cos people recognise chandrabindus from the letter OM and people are generally better at OMming than they are at deciphering strange symbols.

I next have to decide which symbols I actually need for hindi.

So far the font I have cobbled together from balaram just uses the uppercase letters for the accented letters since there are no actual uppercase letters in the original scripts I thought they were disposable so I shouldn't have to go beyond the 256 and probably not have to go beyond the "lower 128" either if I continue to use uppercase letters for accented letters. These technical difficulties will probably fall into place when I have researched such matters further.
ananga - Wed, 18 Aug 2004 03:04:38 +0530


QUOTE
The good thing about Balaram (at least a good direction it's headed...) is that it is convertible from the transliterated font to Devanagari. It was never done with Bengali in mind, and as far as I know, no such conversion program has been written. And there are difficulties in the current Balaram system even if we wanted to do that.

Nevertheless: If this hope exists, then the idea of one-to-one correspondence of glyphs is vital. More than one letter having the same correspondence in English transliteration (such as "j" or "sh") causes problems.


Jagadananda, We appear to work in two different ways:

I enter my text in bengali directly and transliterate to (almost) faithfully reproduce the pronounciation for kirtan singers.

You (and others on this site) are very fluent in HK which (almost) faithfully reproduces bengali spelling and is understood by a more scholarly audience who for example either already know that an initial "y" is pronounced j or don't really care.

It would be a good idea therefore to agree on an extension to the HK scheme to satisfactorily and uniquely represent bengali Spelling as this would be a good starting point for conversion into balaram or bangla script as is already possible for. We would need to agree on and actively adopt codes for:

onusvar
chandrabindu, may I suggest a ~
ntohstho y (ntohstho j with a dot underneath)
murdhonyo r (murdhonyo d with a dot underneath)
murdhonyo rh (murdhonyo dh with a dot underneath)

one would have to be very careful to explicitly include all unspoken inherent vowels for otherwise conversion into BALARAM or into bangla script would not work

REGARDING THE SPECIFICATION OF A BALARAM-TYPE FONT

QUOTE
One of the biggest defects of Balaram is its inability to accomodate properly anuswar/candra-bindu. Most current systems prefer using the tilde over the accompanying vowel, like in Portuguese. I don't care much from m-dot for anusvara, but it's not really a big problem. It's less sloppy than "ng." The phonetic character that looks like an n with a hook on it seems the most succinct and natural solution, but regrettably a little too radical.


I prefer stealing the chandrabindu and using it in my font but a tilda is good too. There are a lot of vowels to tilda-fy and that's probably why the BBT people went over to an "n with a dot on top" I would prefer to keep said n solely for the "n" in the first row of consonants whose name eludes me.

I'm happy with the ng for the onusvar but I appreciate the need for alternatives, we could have an ng ligature or we could use something from the IPA hopefully consistent with the Dr Radice's system which I'll mention in a bit.

On the subject of ligatures I'm thinking of an ai ligature for the hindi vowel to distinguish it from the sanskrit dipthong.

There is no reason why any "replacement balaram" shouldn't include my phonetic approach maybe some IPA characters AND the normal accented characters with the additions for the above letters and for all sounds/letters in sanskrit, hindi, bangla, brajabasha, brajabuli (sp?) and maybe oriya that are used by Gaudiya Vaishnavas.

For interesting use of IPA characters for bengali and an example of a transliteration that conveys both spelling AND pronounciation I recommend looking at Dr William Radice's excellent Teach yourself Bengali which is also an example of excellent bengali typography)

QUOTE

You could then either abandon the a- division, using only "a" as you have been doing, or possibly find a real need for it.


"a-dot" or "backwards c" some character is definitely needed for the inherent vowel here:
maNi nao mANika nao AJcale bA~ndhile rao (prArthanA)
Or could you get away without it Jagat? wink.gif


QUOTE

The a-i, a-u problem would not be entirely solved by (1) above. The current solution of umlaut is the one that I prefer. hayA rather than haiyA. (In the current Balaram system, this cannot be done, and so BBT books usually have ha-iyA. ha-uk. This is a very bad solution.


I like the umlaut, lets have lots of umlauts in our font. The way they are used in french they go on the second vowel of the dipthong, I think we should be consistent with that otherwise I would get confused, what umlauted characters would you like? Definitely an umlauted u, i, any others?


[TECHNICAL TANGENT]

I have drawn a preliminary venn diagram (!) of all characters I think should be included which I shall upload soon.

Documents already created in balaram should also be readable using any new font or at least convertible into a form which can be read by the new font.

In my transliteration font I made the "executive decision" not to use any upper case letters. This was partly because indian languages don't use them and because this allowed me to use shifted letters and normally unused upper case letters eg "q" and "x" for my accented characters without having to delve too deeply into the murky world of font creation. There is also the issue of cross platform compatibility and compatibility with older systems and simple text editors like Notepad on the PC. I have been reliably informed by Ananda Das that if we restricted ourselves to the standard ASCII lower 128 positions in the font then we would have fewer compatibility issues

If we chose this route we would need a means of converting all uppercase balaram letters into lower case. I also have to learn how to stop MSWord from capitalising the first letter of each sentence which is getting very tiresome.
[/TECHNICAL TANGENT]

I agree with Madhava that any new font should be open source (or something
similar). Are there such things available that we could adapt? I'd personally prefer a serif font but there's no reason why we can't have both serif and sans serif.

QUOTE

I can crunch together a font with whatever you need if you can come up with a proposal of what we need. I can also add characters to the existing Balaram if necessary, but I would be much in favor of an open-source solution. This is one of the projects in progress.


Tell me Madhava about relevant open-source projects in process please
Open source is essential, Maybe it could tie in with open source bangla font development and we could end up with something similar to Itranslator but in bengali.

That's enough for now!

Radhe Radhe
Ananda das - Sat, 21 Aug 2004 13:14:57 +0530
Prabhus, you may wish to look at transliteration tables for both Bengali and Hindi available at



From the page, you can download "Bengali-Assamese-Manipuri.pdf" and "Hindi-Marathi-Nepali.pdf".

Even if we have disagreements with the system they advocate (apparently an ISO 15919 standard for both Bengali and Hindi -- there is a lot of overlap between the transliteration schemes), it may serve as a good basis for discussion. Explicitly state which characters we ought to change, and provide a rationale. For example, the argument could be made that people trained to pronounce the Sanskrit letters through a unified or canonical transliteration scheme are quite likely to transfer their pronunciation habits to Bengali texts. Therefore, we could replace only the characters which have a distinctly different pronunciation in standard Bengali from the pronunciation of the etymologically equivalent character in Sanskrit.

This could be done by creating a font with all the agreed-upon Bengali variants in distinct positions from their Sanskrit etymological equivalents. If Ananga Manjari prabhu then were to encode a Bengali text using this font, the /a/ would be encoded as an /o/ or /[IPA open-o]/ (or possibly as /a-ring/) glyph, depending upon the correct pronunciation context. Similarly for /oi/ ligatures, etc.

Now, if Jagat prabhu receives a text from Ananga prabhu encoded in this way, he could either run a little Perl or Python script to convert all the characters in the text-stream to standard Sanskrit (assuming he wished to do this) or, even better, make use of a second font which has the identical Roman-transliteration character glyph used in Sanskrit, but located in two positions in his font layout grid. The second alternative has the advantage of preserving bidirectional translation among the two character schemes. Those who wished to download texts by the Jagat method would utilize the special dual-location J-font. Those who wished to more closely indicate the pronunciation differences would utilize the richer-glyph A-font.

Looking over the glyph correspondence table approved by the International Standards Organisation, I note that even the Sanskritists may wish to change their transliteration scheme. It seems, for example, that the ISKCON "r-dot-under" for the vowel in Krsna should really be "r-ring-under". This preserves the dotted r for the flapped ra and rha consonants. Other characters with a macron under or two dots under are used for letters which are dotted in Hindi script.

I hope some of this may be helpful in crystallizing an agreement on direction.

Best regards to all,
Ananda das
Jagat - Sat, 21 Aug 2004 18:10:07 +0530
Much food for thought, Anandaji. I am afraid that I have gotten too familiar with doing things the way they have been and have not been putting much thought into this.

Certainly the ISO system should be given serious consideration, though I recall finding certain defects. I was not all that crazy about the "r"s. But Balaram is seriously deficient, there's no doubt about it.

I'd like to see how Ananga responds. With your expertise and contribution, maybe we could actually get something done.

I have about five or six Bengali fonts that follow different ASCII schemes. There is an ISCII or something that supposedly standardizes the Indian fonts, but Bangla Desh follows a different scheme. The Assamese seem to follow yet another. The "Amar Bangla" fonts that we use on "Banglaword" follow the Bangladeshi scheme, but there are several deficiencies: Quote marks and other punctuation, for one. But also it misses several characters necessary for Sanskrit text in Bengali font.
DharmaChakra - Sat, 28 Aug 2004 19:29:38 +0530
Just out of total curiosity, and it has nothing to do with transliteration schemes and the like, but will you be designing the glyphs themselves from scratch? I know Balaram and most Indic transliteration fonts are absolutely terrible looking... I consider Balaram to be absolutely unreadable on a screen... at least on any Linux box I've put it on...

I don't have any experience with glyph design, and I know its a bit of an esoteric science, but I think an excellent screen transliteration font is much needed in the community...

Just my $.02
ananga - Sat, 04 Sep 2004 03:46:58 +0530
The deficencies in Balaram are pretty much the same as the deficiencies in the Harvard Kyoto so here is my response to the deficiencies of the latter:

http://www.gaudiyadiscussions.com/index.ph...pic=2207&st=15#

Feedback from bengali linguists please HERE not there.

radhe radhe

Ananga

ananga - Sat, 04 Sep 2004 06:18:01 +0530
this should make it clearer
Attachment: Image
ananga - Sat, 04 Sep 2004 06:18:51 +0530
and
Attachment: Image
ananga - Sat, 04 Sep 2004 06:36:55 +0530
Other issues
Madhava - Sun, 08 May 2005 19:47:23 +0530
Did we ever get anywhere with extending H-K to accommodate Bengali? Capital Y is very sensible and seems to be free in the grand H-K scheme. However, what about .D, and the a:i, e:i, a:u etc.?
ananga - Sun, 08 May 2005 23:00:00 +0530
QUOTE(Madhava @ May 8 2005, 02:17 PM)
Did we ever get anywhere with extending H-K to accommodate Bengali? Capital Y is very sensible and seems to be free in the grand H-K scheme. However, what about .D, and the a:i, e:i, a:u etc.?




I adapted someone's JavaScript code to make this still somewhat buggy converter (look at message # 9 of this thread) and I decided on Y for the onthoshto jo,





a_i, o_u for two vowels in a row because that was what the original coder whose code I adapted had used and it seems sensible especially as it is a basic ASCII character.





I like the idea of P for gauPiya discussions for three somewhat silly reasons: It sounds like "more gopis" which I like, it is almost a letter "R" and it almost looks like a "D" backwards and its a capital greek letter Rho.





I definitely don't like the .D because when I write by hand I write the dot afterwards and anyway the "." is already used for explaining how words are joined together in sanksrit.





I seem to have problems with khondoto at the end of words not displaying proplerly, with numerals and I still need to work out how to get nitya নিত্য by typing in nitya not nitYa.





Another minor issue is deciding whether " ' " is turned into an avagraha (for sankrit) or an apostrophe (for bengali)