Tamil Discussion archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Thought provoking on Tamil encoding
--------
It is my sincere effort to make very clear that in no way I intend to hurt
or lament anyone and my interest is purely and solely to put my views for
the enshrinement of the sweet Tamil. If at all, in any way at any place of
my writing puts someone doing their mighty service for the enshrinement of
TAMIL or those using it, I once again repeat and highlight it to forego and
forgive it for the sake of prosperity of the language for which such an
unprecedented discussions are taking place. No doubt, these discussions
would pave means for better understandings and the best solutions for the
TAMIL.
Tamil language is evolved and reformed over a period of time immemorial.
Since, we believe in what we are seeing, some of the participants in the
discussion believe that representing (displaying and printing) of Tamil on
computers is Tamil encoding.
Tamil has witnessed and withstood many changes in its script form as well as
in its character set.
Before starting of encoding of Tamil glyphs, a requirement (aim) has to be
formulated about what is that is going to be encoded, without which it is
not advisable to select, segregate the Tamil glyphs as per ones taste.
I want to make one point very clear. There appears to be some confusion
somewhere within the ambit of discussion between font/glyph and character
set. Set of glyphs is not a Tamil character set, itself. The character set
of Tamil is "uzhir eluthukkal and Mei eluthukkal". These thirty letters are
the basis for Tamil and the combinations of these letters forms hundreds of
characters and it is not possible to encode all these characters on
computers. The basic common characters are considered as character set of
Indian languages and encoded in ISCII (new standard).
It is because of some misinterpretation of some people involved in the
earlier versions of ISCII Standards, an unnecessary coding appears to have
been done for matra characters. These matras (vowel signs) are indicating
the corresponding vowel present in the "uyir mei eluthukkal". These vowel
signs could be a just one sign or two or three amongst the Indian languages
(In Tamil, only upto two signs are used). It can come only on right side as
in "kA,ki,kI" etc. or on only left side as in 'kai,ke,kE" etc. or on both
the sides as in "ko,kO, kou" etc. It is not so only in Tamil, but also in
some of the other Tamil influenced languages like Malayalam, Oriya,
Bengali(Bangla), Assamese, etc.
Even though we call the composite characters as "Uyir mei" its actual
composition stands out to be consonant (Mei) and vowel (uyir). Using this as
a basis, Indian scripts being coded on the computers. This is applicable
even to earlier ISCII Standards ISCII-91 (called as level 1). In ISCII-91
consonants are followed by matras. It is the same in Unicode also. KANNAL
Kanpadthum poi....... theera vicharippadhe mei. It seems that most of the
participants didn't understood the encoding followed in ISCII and as well as
in Unicode.
I humbly repeat, character encoding and font design are two different issues
these are not be mixed up together.
It is widely misunderstood by someone as the current discussion on encoding
glyphs as encoding Tamil on computers and is the basis for enshrining Tamil
electronically. This appears to be a wrong conception and false image
engulfed in the discussion.
Font encoding cannot solve many issues like, sorting, searching, indexing
and preserving Tamil itself. Font encoding is just one way of displaying
(rendering) Tamil on computers (since lot of maturing desired on software
development).
Regarding "glyph substitution" (wrongly stated as font substitution - a font
substitution means substituting one font, say 'arial' in Windows
environment with 'times new roman'), I feel, we can think as one of the
option. Since glyph substitution already implemented in windows NT and
windows 95, True Type fonts (this is not open type)
is the best option. It is all depending on our requirement (all of us - we
have not yet decided to what environment, we are discussing the issue). If
we are talking about the future including the present day computers capable
of running windows 95 for PCs or system 7.x on Apple, we can definitely
adopt "Glyph substitution method". If our target is something else, glyph
substitution will fail to support us. "Future international extensions to
True type may require a unique Glyph" is as mentioned in the True type
documentation "True type 1.0 font files - Technical specification Version
1.66" by Microsoft. Since True type is being promoted by both Microsoft and
Apple, it seems that Glyph substitution will continue.
The glyph ordering followed by Dr Kalyan seems to be illogical, to arrive at
correct order just follow the thamizh nedunkanakku.
I feel the Glyph encoding has to be discussed, whether we need 8 bit or 7
bit, whether to support only GUI computers or atleast from AT 286 (most of
the Government offices still use these outdated machines in India) or to
cater to all electronic gadgets as someone pointed about POS. I have
implemented few Indian languages on Pagers.
Someone may wonder to raise a query as to why we cannot use 128-160. These
128-160 is just a replication of 0-32. It means the 160 (no break space) has
to be same as 32 (space) with the same advance width.
Regarding Dr Herald Schiffmans' requirement and like-minded linguists (the
old Tamil letters are nothing but different 'varivadivam' for the same basic
constituents) is taken care in ISCII Standard. That is, any Tamil
literature could be stored using ISCII encoding scheme to preserve Tamil.
Since, no common interface softwares are available yet, the current
developers can provide a kind of converters to store in ISCII Standard,
(Apple has implemented ISCII - level in their machine and Microsoft is going
for ISCII level 2).
I remember, Dr Herald Schiffman was referring to quote marks. I would like
to present my view here. I feel his requirement is for us to have the quote
marks as used in Tamil texts (and in Indian languages and English in India)
that is the single quote will look like as if the comma is shifted to match
the ascender of the character. Since the Glyphs encoding is round about 8
bit encoding retaining English, now, it is to accommodate in the upper slot.
Quote marks used in Tamil is different from the one used in English.
There are two different single quote marks as open quote and close quote
marks. They are similar to inverted comma and comma as seen at ANSI
character position 145 and 146 in Arial fonts used in Windows.
In India, the Indian language numerals are seen to gain its popularity
(except Tamil) because of the pushing effort and as it is being recognised
as part of the language itself (I feel, a language cannot be complete
without its own numbering system).
I have not seen the romanised keyboard which is proposed by the Tamilnadu
Standardisation Committee (has it been finalised). If, it is finalised, does
it uses only English alphabets or even diacritic marks. If it is only based
on the English alphabets it provides a keyboarding without any 'extras' it
is the end of transliteration subject. I feel the transliteration scheme
should facilitate to key in tamil without any extra font or softwares.
In conclusion of my views, I suggest to encode Tamil based on its basic
character set. Tamil is not like English having one to one relationship
between character coding and display. Tamil has to be handled by two level.
I.e., an encoding based on Tamil characters and a font to render (display)
Tamil Script. In the present scenario, It is not possible to have a single
character encoding scheme and single font encoding scheme to cater to all
the living computers and its operating systems. ASCII in the DOS
environment and ANSI in the WINDOWS ( and other) environment are two
different encoding scheme.
ANBU arasan.
Home |
Main Index |
Thread Index