Tamil Discussion archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: glyph choices for char. encoding-version 1.2



Dear Kalyan,

>i) If my little understanding of tamil is correct, the basic mei /
>consonants of tamil alphabet are the series ik, ing, ic, ing,...
>and not the series ka, nga, ca,....
>So wouldn't be better if the former series appear first, right
>after the vowel (uyir) series followed by ka, nga series?
>(Nagu: I will invert the Ra/La in the next version).

The following sequence seems logical to me :

	first - the modifiers (kokkis, kombus, kaiyakaram)
	uyirs - a to aq (or ak or q - whatever ;-))
	akaram Eriya uyirs - (ka to na)
	di & dI  (I just wish these are not this unique ;-))
	ukaram Eriya uyirs - (ku to nu)
	UkAram Eriya uyirs - (kU to nU)
	mey's

This is what I wanted to send you in a GIF.  Assists a great deal in
sorting. 

Also, I see no problems in starting the first character (kaal) at
slot 128.  Any objections ? (and why please ;-))

>ii) Regarding the slots 149-152, I fully agree with Prof. Hart 
>that once and for all we should decide on what goes there -
>either the four diacritical markers or the set of old style characters.
>Going for a series of fonts where each one has different glyphs
>put in these will reduce significantly all current efforts to have
>a world standard.
>
>My personal preferences are for the diacritical markers. I have
>stated clearly several times the reasons for this: all major libraries
>use them to catalogue tamil books, indologist all around the world
>use them; practically all south asian journals use them in place of
>tamil or other indic scripts; all major tamil research centers of
>tamilnadu
>(particularly Inst. of Asian studies and International Inst. of Tamil
>Studies) use predominantly transliterated tamil with these markers
>(including monumental reference works such as Encyclopaedia).
>Having these four glyphs in the scheme will allow integration of all
>these efforts under a single umbrella. I can even think of OCR 
>packages for tamil based on this unified, polyvalent font that can
>scan all these texts -containing either tamil script or transliterated
>texts and save them in electronic form.

I think these should follow at the tail end - though I still feel that
we do not need diacritical markers to be encoded.  IMHO, having a huge
collection of materials on one format does not sound good enough a 
reason - especially when I feel that transliteration standards without
diac. marks will have tremendous benefits that'll make a one-time
conversion effort more that worth it !!!

I have not also heard comments on my input on how we are going to store
Tamil e-text.  Having diac. marks *encoded* suggests that we can store
text in both formats - right ?  Is this OK ?  Something's not quite
right here - right ?  Comments please......  Please enlighten me if I'm
missing something.

What if I suggest that we do away with diac. marks ?  Any other points
besides *legacy* storage ? ;-)

>As far as old stlye characters, they can still go in the other spots
>(currently blanked off as X) or have it available as a special pull
>down option in dedicated DTP packages/softwares - in some form
>of "font substitution". All DTP packages involving romanized or
>phonetic input must put out current version of lai, nai etc as the
>default option.  
>Such a procedure can be less controversial and easily digestable.
>Please throw in your views so that we can decide on this soon.

Angain - pardon me for raising this point again and again -
but I have not heard any *solid* arguments for the points I have
raised on ambguity in recognising etext if we support two forms or
na, ra and la.  Have you thought trough these issues ?  What are
your thoughts ?  I think I have a valid point - at least I'm 
confused as to how we can deal with this electronically.  Having
an automatic substitution in any text manipulation function (i.e.
to *guess* which la or na one has used) is rather painful.  Any
suggestions on how we can fix that ?  Otherwise, do you agree that
it's only wise to drop them from the encoding and implement them
as substitution glyphs ?  (Team, you may want to check out the
white papers on OpenType - which allows for glyph substitution.
just do a net search - there a huge collection in www.microsoft.com).
May not even involve any programmin - can/may be implemented in the
font itself.  

>iii) having the meis of grantha characters (is, ih, ij,...) also in
>place
>of the modifier 'dot':  Since we have the special situation that, all
>uyirmeis of grantha are to be generated using the modifiers 
>(aakara, ikara, iikara, ukara and uukara varisais), having the meis
>also generated this way is consistent. If the mapping/correspondance
>table is clearly defined on how the entire 256 tamil alphabets are
>to be generated using the present character encoding scheme
>(I am currently working on this), I do not see any problem.
>In any case, we need to generate some slots (delete exising ones)
>if we want to do this. 

I don't quite see the consistancy as I do not differentiate the 
granthas ;-)  Moreover, ukarams and UkArams of granthas are generated
with discrete modifiers - thus it's perfectly alright to encode
them the way we have done now.

If we assign separate slots for the mey's of grantha's - than all the
uyirs, all the meys and all the akaram eriya uryimeis are treated
exatctly the same way *electronically*.  IMHO doing this is has a lot
more *value* than having old style characters - they can make way for
this ;-).  Doesn't this sound more consistant ? :-)

>iv) smart quote replacement in most softwares: 
>I can talk only about Mac softwares. 
>Yes Word of Microsoft and ClarisDraw of Claris particularly
>have this automatic replacement of straight quotes by curly quotes.
>I have written hundreds of emails to Mylai users that they have to
>remove this default replacement for tamil vowels e and E in Mylai 
>to appear properly in screen and in print.
>I find it unnecessary that this option is forced on us as a default.

Any comments on this point from others ?  *user* problem (as 
Kalyan stated) is exactly my point as well - since the entire world
defaults to "smart quotes".  Are there any critical apps. that *do not*
let the user overwrite this default ?

Looking forward to hearing from the team :-)

anbudan,

~ MUTHU

 ------------------------------- End of Message --------------------------------


---- Begin included message ----
Dear Muthu and Friends:
While we are reviewing still the version 1.2 of the proposed
character encoding scheme, I have the following questions:

i) If my little understanding of tamil is correct, the basic mei /
consonants of tamil alphabet are the series ik, ing, ic, ing,...
and not the series ka, nga, ca,....
So wouldn't be better if the former series appear first, right
after the vowel (uyir) series followed by ka, nga series?
(Nagu: I will invert the Ra/La in the next version).

ii) Regarding the slots 149-152, I fully agree with Prof. Hart 
that once and for all we should decide on what goes there -
either the four diacritical markers or the set of old style characters.
Going for a series of fonts where each one has different glyphs
put in these will reduce significantly all current efforts to have
a world standard.

My personal preferences are for the diacritical markers. I have
stated clearly several times the reasons for this: all major libraries
use them to catalogue tamil books, indologist all around the world
use them; practically all south asian journals use them in place of
tamil or other indic scripts; all major tamil research centers of
tamilnadu
(particularly Inst. of Asian studies and International Inst. of Tamil
Studies) use predominantly transliterated tamil with these markers
(including monumental reference works such as Encyclopaedia).
Having these four glyphs in the scheme will allow integration of all
these efforts under a single umbrella. I can even think of OCR 
packages for tamil based on this unified, polyvalent font that can
scan all these texts -containing either tamil script or transliterated
texts and save them in electronic form.

As far as old stlye characters, they can still go in the other spots
(currently blanked off as X) or have it available as a special pull
down option in dedicated DTP packages/softwares - in some form
of "font substitution". All DTP packages involving romanized or
phonetic input must put out current version of lai, nai etc as the
default option.  
Such a procedure can be less controversial and easily digestable.
Please throw in your views so that we can decide on this soon.

iii) having the meis of grantha characters (is, ih, ij,...) also in
place
of the modifier 'dot':  Since we have the special situation that, all
uyirmeis of grantha are to be generated using the modifiers 
(aakara, ikara, iikara, ukara and uukara varisais), having the meis
also generated this way is consistent. If the mapping/correspondance
table is clearly defined on how the entire 256 tamil alphabets are
to be generated using the present character encoding scheme
(I am currently working on this), I do not see any problem.
In any case, we need to generate some slots (delete exising ones)
if we want to do this. 

iv) smart quote replacement in most softwares: 
I can talk only about Mac softwares. 
Yes Word of Microsoft and ClarisDraw of Claris particularly
have this automatic replacement of straight quotes by curly quotes.
I have written hundreds of emails to Mylai users that they have to
remove this default replacement for tamil vowels e and E in Mylai 
to appear properly in screen and in print.
I find it unnecessary that this option is forced on us as a default.

Kalyan

PS:  AN APPEAL TO TNC Members:
We are witnessing an unprecedented, healthy situation for
tamil computing where people from four corners of the
world are actively participating in the standardisation debate
via electronic mail. We are sampling a large, representative
mass involved/interested in tamil computing.
Yesterday I quoted part of the summary report of the 
discussion panel of the last TamilNet'97 conference held in
Singapore regarding their preferences for a 8-bit scheme.
I would like to draw particular attention to one line there
"The TamilNadu Computer Standardization 
Committee will work with developers towards a unified 
8-bit character set. "
Since we are now kind of going around in circles on what
should go in the character set, it is high time that the members
of TNC break their 'silent spectator role' and throw in their
viewpoints/preferences. This will be along the spirits of the 
above decisions made at the last Singapore conference 
(in the presence of majority of the TNC members and the
Hon'ble Minister of Tamil Culture for Tamilnadu, Prof.
Thamizhkudimagan). 
I stated yesterday: "Deciding on whether we go for 
7-bit font or 8-bit fonts, 
with or without diacritical markers,
with or without old style tamil alphabets, 
with or without grantha characters
is a difficult issue, since the choices are more at the
personal preferences level. "
Bottom line: we can live with any of these choices.
It will be a futile exercise if we all agree on one and
TNC comes up with something else for reasons better
known to them.

Kalyan
---- End included message ----

Home | Main Index | Thread Index