Tamil Discussion archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: glyph choices for char.encoding -version 1.2
>> Also, I see no problems in starting the first character
>> (kaal) at slot 128. Any objections ? (and why please ;-))
>In case you have missed, Dr. Srinivasan wrote this monday
>specifically the following:
Thanks for including this Kalyan - yes, I missed this. Must
have come in while I switched mailers during my last travel.
>"The positions 128, 129, 130, 141, 142, 143, 144, 157, 158,
>222, 234 do not display in either MS WORD or in COREL WORD
>PERFECT. That covers about 90% of the MS Windows word
>processor market. That is why I left those few places
>empty in the 8 Bit Roman-Tamil bilingual ADHAWIN.TTF.
>Of course, we can standardize some other layout like
>ISO-8859-X, ignore the 90% of existing softwares and
>develop "our own softwares" :-) "
>Ravi Paul also expressed similar reservations on putting
>key characters in the slots 128-159. These two have
>extensive track experience writing softwares for PCs
>and for Windows. So I wanted to implement their advice.
>So, I masked off these slots in the version 1.2 and placed
>only unused tamil characters ngu, nguu, nyu and nyuu.
>Probably in Anjal you use all slots 128-255.
>Are you asking for opinions from others for clarifications?
Yes, I'd like to hear more on this as I have not seen problems
with these slots with Word or Word Perfect. Also, could
Dr. Srinivasan and/or Ravi comment on why you think this is
happening ? It may take us to the root of the problem. I
have experienced problems with slot 160 - which I do not
understand why to date. I also see other character sets
avoiding 160 (almost all the time).
>ii) I am not sure which is the best way to decide on
>to have or not to have a certain sets of glyphs - be
>it diacritical markers or old style or grantha ones:
>No objections were raised for tamil numerals and so
>the case was clear.
I'm not sure if this is how it sould be concluded. IMHO
the decision should be based on logical diagnosis on the
implications. These take up space so it has an effect on
the encoding at large. If it does, we sould identify and
work on solutions or workarounds. I took it that we place
low priority on this (am I right ?)
>For grantha ones, amongst those who cared to let know
>their views, majority are for keeping them.
Yeap, agreed, and IMHO it's rather incomplete without them.
>For the old-style tamil characters, the best approach
>appears to be assigning them the lowest priority.
>For the diacritical markers, I listed several reasons
>why it is useful to have them. You have not questioned
>any of these nor gave any specific objections as to
>why they should not be there. No one else made any
>statements one way or other.
I think I have said it enough several times. You may
want to check my last posting.
>I tend to agree with your preferences to have transl.
>tamil using plain/lower ASCII roman without diacritics.
>But I am not sure what the majority opinion is.
Let's hope we hear them too - not just a *vote* of yes
or no - but with some supporting notes :-)
>It would be better if we debate specifically
>this point, viz standards for transliterated form
>of writing tamil before we go further. Vasu
>Ranganathan raised this issue (and also Sujatha)
>earlier and listed several questions to be answered.
>
>i) Should be go for transliteration schemes that are
> based on plain ASCII without diacritics or
> adopt a scheme with diacritics ?
>ii) what should be the actual scheme under either of
> the above two possibilities?
>
>If we decide to go for plain ASCII without diacritics,
>then there is no need to keep these markers in the
>character encoding scheme.
>If we agree for keeping transliteration scheme with
>diacritics as the standard for translit. tamil, then
>we cannot have a second font just for handling these.
>I would like to hear from you and others, why not have
>such standards in solid grounds with specific encoding
>in a single font. I am not in favour of leaving
>it as a playground for software developers to choose the
>way they want to treat it. Having a code assigned
>for a marker makes its standardised.
>I am sure Profs. Hart, Schiffman, Vasu and many others
>on this net have something to say on standards for
>transliterated tamil. I request them to post their
>views.
I think we *can* take the issue of transliteration out of
the encoding (refere my points at the end).
>> Having diac. marks *encoded* suggests that we
>> can store text in both formats - right ? Is this OK ?
>> Something's not quite right here - right ?
>> Comments please......
>Let me explain what I think. May be others can comment.
>Having diac. marks encoded means that, using a single
>font, tamil texts can be entered in either format (
>in tamil script OR in transliterated format).
>
>In self-standing fonts with direct output features,
>the format will be decided already by the input
>process. I can also open up any of the thousands
>of archived materials in either format using the
>same std. tamil font and read them. All tamil related
>materials are handled by one single font. Period.
Kalyan, you only talked about *viewing* Tamil text.
How would one implement a search function in a database
of documents that has both format text ? Will the
search need to be done twice ?
>Of course in specialised DTP softwares with convertor
>routines incorporated, additional options are possible
>to store in either format - store in tamil script
>even if the input is in transliterated format.
>The latter is no different from the romanized
>input method already accepted as a standard
>inputing process.
No, I'm not talking about converter routines - they are
plain and simple and we know how those work. I'm suggesting
that we store Tamil text in *only* one format - whatever the
transliteration scheme we adopt - and have the glyph substitution
implemented in the font. This is in line with what the UNICODE
folks thought about and what OpenType font specification is
all about. We are encoding characters - we are *not* designing
a font.
With this, we will have the following :
1. Ability to render Tamil text in it's current form (i.e. without
old style nai etc) with any one-on-one mapped font. In other
words, anyone will be able to develop a TTF or Type1 (or in
one of the zillion formats around) that maps every character to
a glyph. If we accept in include the grantha meys as single
characters, we *can* implement this on character terminals,
POS systems, and display boards as well !
2. Ability to render Tamil text in old-style (nai etc) with the
glyph substitution technology that is *embedded* in OpenType
fonts. (no programming reqd). Note that the old style chars
need *not* be encoded. The number of glyphs in a (OpenType)
font need not match the number of encoded characters. I believe
this is also true for True Type - except that the later does
not provide for glyph substitution.
3. Ability to render Tamil text in transliterated form using
either :
a. Conversion routines that trans-es the text on the fly.
- useful for both graphical and non-graphical (dumb) teminals
as well.
- needs coding
b. Using glyph substitution (as in (2)), substituting
roman (diac-ed or otherwise) glyphs for tamil character
sequences.
- no coding reqd. (I'll need to verify this).
Comments please....
anbudan,
~ MUTHU
------------------------------- End of Message --------------------------------
---- Begin included message ----
Dear Muthu:
> Also, I see no problems in starting the first character
> (kaal) at slot 128. Any objections ? (and why please ;-))
In case you have missed, Dr. Srinivasan wrote this monday
specifically the following:
"The positions 128, 129, 130, 141, 142, 143, 144, 157, 158,
222, 234 do not display in either MS WORD or in COREL WORD
PERFECT. That covers about 90% of the MS Windows word
processor market. That is why I left those few places
empty in the 8 Bit Roman-Tamil bilingual ADHAWIN.TTF.
Of course, we can standardize some other layout like
ISO-8859-X, ignore the 90% of existing softwares and
develop "our own softwares" :-) "
Ravi Paul also expressed similar reservations on putting
key characters in the slots 128-159. These two have
extensive track experience writing softwares for PCs
and for Windows. So I wanted to implement their advice.
So, I masked off these slots in the version 1.2 and placed
only unused tamil characters ngu, nguu, nyu and nyuu.
Probably in Anjal you use all slots 128-255.
Are you asking for opinions from others for clarifications?
ii) I am not sure which is the best way to decide on
to have or not to have a certain sets of glyphs - be
it diacritical markers or old style or grantha ones:
No objections were raised for tamil numerals and so
the case was clear.
For grantha ones, amongst those who cared to let know
their views, majority are for keeping them.
For the old-style tamil characters, the best approach
appears to be assigning them the lowest priority.
For the diacritical markers, I listed several reasons
why it is useful to have them. You have not questioned
any of these nor gave any specific objections as to
why they should not be there. No one else made any
statements one way or other.
I tend to agree with your preferences to have transl.
tamil using plain/lower ASCII roman without diacritics.
But I am not sure what the majority opinion is.
It would be better if we debate specifically
this point, viz standards for transliterated form
of writing tamil before we go further. Vasu
Ranganathan raised this issue (and also Sujatha)
earlier and listed several questions to be answered.
i) Should be go for transliteration schemes that are
based on plain ASCII without diacritics or
adopt a scheme with diacritics ?
ii) what should be the actual scheme under either of
the above two possibilities?
If we decide to go for plain ASCII without diacritics,
then there is no need to keep these markers in the
character encoding scheme.
If we agree for keeping transliteration scheme with
diacritics as the standard for translit. tamil, then
we cannot have a second font just for handling these.
I would like to hear from you and others, why not have
such standards in solid grounds with specific encoding
in a single font. I am not in favour of leaving
it as a playground for software developers to choose the
way they want to treat it. Having a code assigned
for a marker makes its standardised.
I am sure Profs. Hart, Schiffman, Vasu and many others
on this net have something to say on standards for
transliterated tamil. I request them to post their
views.
> Having diac. marks *encoded* suggests that we
> can store text in both formats - right ? Is this OK ?
> Something's not quite right here - right ?
> Comments please......
Let me explain what I think. May be others can comment.
Having diac. marks encoded means that, using a single
font, tamil texts can be entered in either format (
in tamil script OR in transliterated format).
In self-standing fonts with direct output features,
the format will be decided already by the input
process. I can also open up any of the thousands
of archived materials in either format using the
same std. tamil font and read them. All tamil related
materials are handled by one single font. Period.
Of course in specialised DTP softwares with convertor
routines incorporated, additional options are possible
to store in either format - store in tamil script
even if the input is in transliterated format.
The latter is no different from the romanized
input method already accepted as a standard
inputing process.
Kalyan
---- End included message ----
Home |
Main Index |
Thread Index