Google
Web         Gaudiya Discussions
Gaudiya Discussions Archive » TECH ISSUES
PC problems, recommended software, tips and tricks, coding and so forth. Things that make your life in the cyberspace easier.

XML or some other markup language - how language tags could help



ananga - Tue, 25 Jan 2005 02:46:45 +0530
I recently spent quite a few hours converting one of Jagat's essays from a mix of Harvard Kyoto and english text to a mixture of balaram and english text. I used Madhavananda's Javascript webpage converter but I still had a lot of cutting and pasting to do as well as loads of proofing afterwards to check I hadn't missed anything.

While I was doing this I was wondering whether there was a shared desire to automate processes like this even further and the answer seems to lie in Langage tagging

from the section Ambiguity and language tagging at
http://www.ucl.ac.uk/~ucgadkw/members/tran...l/translit.html
QUOTE
Language tagging is required if automatic retransliteration is envisaged,as in the use of the widely-distributed Indian GIST card, a system for the entry, display, and printing of Indian and roman transliterated scripts on IBM compatible personal computers.

Does anyone have any experience of GIST?

some preliminary questions
Would XML or XHTML do the job in this case?
Can XML files be viewed in any browser?
What wysiwyg XML editors are available (preferably open-source and cross platform)?

I am also thinking that if we can store information (=granthas, songs, shastras etc.) in this form it would have a number of advantages.

A source XML file would be seperate from any output formatting and this would be a good thing as with adequate tags it would be possible to create any of the following automatically or semi-automatically:
  1. a portrait A6 booklet of all sanskrit songs about Radharani including word-for-word and translation.
  2. create a pdf of all bengali songs in the original bengali script an english translation but no transliteration
  3. create a landscape A4 (or US letter size) booklet of a particular aSTakam, one shloka per page with very large devanagari, smaller devanagari sandhi underneath and english word-for-word underneath that with a small english translation in the bottom right corner.
I am assuming that others here would enjoy this kind of facility

some secondary questions
I've heard that XML uses style sheets in a similar way as HTML uses CSS; does anyone know anything about these?
What would constitute a sufficient set of tags for vaishnava shastrologists?!
What programming/scripting languages or tools can be used to process XML files?
Has anyone done this already? We wouldn't want to reinvent the wheel would we.

comments welcome!

Ananga
Madhava - Tue, 25 Jan 2005 10:43:51 +0530
XML you can type in in notepad or any plain text editor. And you can just up your own tags. smile.gif

However I would suggest just keeping the originals in Balaram. Balaram is "lossless", as there are separate characters for all diacritic characters, while Harvard-Kyoto is not reversable in an environment with capital letters due to its emulating diacritic characters with caps.

In this case, why don't you just e-mail Jagat and ask for the originals. smile.gif

Of course if someone wanted to produce source files in XML-format, they could easily be crunched into whatever encoding you wanted. However if you imported them into a text processor, the tags would have to be removed for the text to look sensible. Which would again leave us with a non-reversible format as far as XML is concerned.

You can parse XML for example with PHP, I suspect that'd be the easiest way around. That is, if you intend to start from a scratch.

And XHTML wouldn't do you much good here...
DharmaChakra - Tue, 25 Jan 2005 21:06:18 +0530
QUOTE(ananga @ Jan 24 2005, 05:16 PM)
some preliminary questions
Would XML or XHTML do the job in this case?
Can XML files be viewed in any browser?
What wysiwyg XML editors are available (preferably open-source and cross platform)?

Linux based:
kxmleditor (KDE)
Conglomerate (Gnome)

Sorry, I'm not so familiar with Windows apps... I'm sure there are some freeware editors out there... however, most editors are not really smooth WYSISYG environments, but more workflow automation apps, making it easier to work with the structure of XML files.
QUOTE(ananga @ Jan 24 2005, 05:16 PM)
some secondary questions
I've heard that XML uses style sheets in a similar way as HTML uses CSS; does anyone know anything about these?
What would constitute a sufficient set of tags for vaishnava shastrologists?!
What programming/scripting languages  or tools can be used to process XML files?
Has anyone done this already? We wouldn't want to reinvent the wheel would we.
Ananga


XML has a few style sheet systems in place. What you are doing is taking an XML marked up document, and giving meaning to the tags with the style sheets, and then preparing some transform into the output format you need. As you can imagine, its not something you throw together overnight biggrin.gif There has been some work done on a standardized format called DocBook, and O'Reilly has an OpenBook on it here. As far as I understand, it is a premade set of schemas for publishing books and various other print matters. Takes the work out of tagging Table of Contents, Indexes, etc. I would think a good approach would be to extend the format to cover multilingual documents (assuming someone has not done so already).

Another option might be LaTeX. Madhava's script makes it easy to convert to LaTeX (the Velthuis option), and it should be lossless as far as diacritics are concerned. LaTeX easily converts to PDF, and there are HTML/XML output systems as well. You just have to learn it first biggrin.gif (If Madhava knows any LaTeX, he can hear my evil chuckle...) Personally, I like LaTeX, but it does have a somewhat steep learning curve.