Tamil Discussion archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[WMASTERS] Anbu Arasan mail-repost
________________________________________________
This week's sponsors -The Asia Pacific Internet Company (APIC)
@ Nothing Less Than A Tamil Digital Renaissance Now @
<http://www.apic.net> Click now<mailto:info@apic.net> for instant info
________________________________________________
Mani wrote:
>>on Sep 17 AnbuArasan clearly explain, that the proposed encodings should
>>be based on basic characters not glyphs, and he gave the reasons. At
>>this point we have to look at his mail seriously. can any one of you
>>repost that mail?
>
>Unfortunately I don't have that post. I second this request.
Here I repost the posting of Anbu Arasan referred to above.
-----REPOST------
Received-Date: Wed, 17 Sep 97 19:55:53 +0530
Posted-Date: Wed Sep 17 19:29:09 1997
Message-Id: <9709171359.AA07487@axcess.nbanglore.axcess.net.in>
To: tamilnet@tamilnews.org.sg
X-Sent-To-Axcess: sujatha@md2.vsnl.net.in(1|NN| )
X-Sent-To-Axcess: govind@irdu.nus.sg(1|NN| )
X-Sent-To-Axcess: ananda@md2.vsnl.net.in(1|NN| )
X-Sent-To-Axcess-Cc: ghart@socrates.berkeley.edu(1|NN| )
X-Sent-To-Axcess-Cc: haroldfs@ccat.sas.upenn.edu(1|NN| )
Date: Wed Sep 17 19:29:09 1997
X-Mailer: aXcess Mail (version 2.0)
Subject: Thought provoking on Tamil encoding
Sender: owner-tamilnet@irdu.nus.sg
Precedence: bulk
Reply-To: anbu.arasan@axcess.net.in
--------
It is my sincere effort to make very clear that in no way I intend to
hurt
or lament anyone and my interest is purely and solely to put my views
for
the enshrinement of the sweet Tamil. If at all, in any way at any place
of
my writing puts someone doing their mighty service for the enshrinement
of
TAMIL or those using it, I once again repeat and highlight it to forego
and
forgive it for the sake of prosperity of the language for which such
an
unprecedented discussions are taking place. No doubt, these
discussions
would pave means for better understandings and the best solutions for
the
TAMIL.
Tamil language is evolved and reformed over a period of time immemorial.
Since, we believe in what we are seeing, some of the participants in
the
discussion believe that representing (displaying and printing) of Tamil
on
computers is Tamil encoding.
Tamil has witnessed and withstood many changes in its script form as
well as
in its character set.
Before starting of encoding of Tamil glyphs, a requirement (aim) has to
be
formulated about what is that is going to be encoded, without which it
is
not advisable to select, segregate the Tamil glyphs as per ones taste.
I want to make one point very clear. There appears to be some
confusion
somewhere within the ambit of discussion between font/glyph and
character
set. Set of glyphs is not a Tamil character set, itself. The character
set
of Tamil is "uzhir eluthukkal and Mei eluthukkal". These thirty letters
are
the basis for Tamil and the combinations of these letters forms hundreds
of
characters and it is not possible to encode all these characters
on
computers. The basic common characters are considered as character set
of
Indian languages and encoded in ISCII (new standard).
It is because of some misinterpretation of some people involved in
the
earlier versions of ISCII Standards, an unnecessary coding appears to
have
been done for matra characters. These matras (vowel signs) are
indicating
the corresponding vowel present in the "uyir mei eluthukkal". These
vowel
signs could be a just one sign or two or three amongst the Indian
languages
(In Tamil, only upto two signs are used). It can come only on right side
as
in "kA,ki,kI" etc. or on only left side as in 'kai,ke,kE" etc. or on
both
the sides as in "ko,kO, kou" etc. It is not so only in Tamil, but also
in
some of the other Tamil influenced languages like Malayalam,
Oriya,
Bengali(Bangla), Assamese, etc.
Even though we call the composite characters as "Uyir mei" its
actual
composition stands out to be consonant (Mei) and vowel (uyir). Using
this as
a basis, Indian scripts being coded on the computers. This is
applicable
even to earlier ISCII Standards ISCII-91 (called as level 1). In
ISCII-91
consonants are followed by matras. It is the same in Unicode also.
KANNAL
Kanpadthum poi....... theera vicharippadhe mei. It seems that most of
the
participants didn't understood the encoding followed in ISCII and as
well as
in Unicode.
I humbly repeat, character encoding and font design are two different
issues
these are not be mixed up together.
It is widely misunderstood by someone as the current discussion on
encoding
glyphs as encoding Tamil on computers and is the basis for enshrining
Tamil
electronically. This appears to be a wrong conception and false
image
engulfed in the discussion.
Font encoding cannot solve many issues like, sorting, searching,
indexing
and preserving Tamil itself. Font encoding is just one way of
displaying
(rendering) Tamil on computers (since lot of maturing desired on
software
development).
Regarding "glyph substitution" (wrongly stated as font substitution - a
font
substitution means substituting one font, say 'arial' in
Windows
environment with 'times new roman'), I feel, we can think as one of
the
option. Since glyph substitution already implemented in windows NT
and
windows 95, True Type fonts (this is not open type)
is the best option. It is all depending on our requirement (all of us -
we
have not yet decided to what environment, we are discussing the issue).
If
we are talking about the future including the present day computers
capable
of running windows 95 for PCs or system 7.x on Apple, we can
definitely
adopt "Glyph substitution method". If our target is something else,
glyph
substitution will fail to support us. "Future international extensions
to
True type may require a unique Glyph" is as mentioned in the True
type
documentation "True type 1.0 font files - Technical specification
Version
1.66" by Microsoft. Since True type is being promoted by both Microsoft
and
Apple, it seems that Glyph substitution will continue.
The glyph ordering followed by Dr Kalyan seems to be illogical, to
arrive at
correct order just follow the thamizh nedunkanakku.
I feel the Glyph encoding has to be discussed, whether we need 8 bit or
7
bit, whether to support only GUI computers or atleast from AT 286 (most
of
the Government offices still use these outdated machines in India) or
to
cater to all electronic gadgets as someone pointed about POS. I
have
implemented few Indian languages on Pagers.
Someone may wonder to raise a query as to why we cannot use 128-160.
These
128-160 is just a replication of 0-32. It means the 160 (no break space)
has
to be same as 32 (space) with the same advance width.
Regarding Dr Herald Schiffmans' requirement and like-minded linguists
(the
old Tamil letters are nothing but different 'varivadivam' for the same
basic
constituents) is taken care in ISCII Standard. That is, any
Tamil
literature could be stored using ISCII encoding scheme to preserve
Tamil.
Since, no common interface softwares are available yet, the
current
developers can provide a kind of converters to store in ISCII
Standard,
(Apple has implemented ISCII - level in their machine and Microsoft is
going
for ISCII level 2).
I remember, Dr Herald Schiffman was referring to quote marks. I would
like
to present my view here. I feel his requirement is for us to have the
quote
marks as used in Tamil texts (and in Indian languages and English in
India)
that is the single quote will look like as if the comma is shifted to
match
the ascender of the character. Since the Glyphs encoding is round about
8
bit encoding retaining English, now, it is to accommodate in the upper
slot.
Quote marks used in Tamil is different from the one used in
English.
There are two different single quote marks as open quote and close
quote
marks. They are similar to inverted comma and comma as seen at
ANSI
character position 145 and 146 in Arial fonts used in Windows.
In India, the Indian language numerals are seen to gain its
popularity
(except Tamil) because of the pushing effort and as it is being
recognised
as part of the language itself (I feel, a language cannot be
complete
without its own numbering system).
I have not seen the romanised keyboard which is proposed by the
Tamilnadu
Standardisation Committee (has it been finalised). If, it is finalised,
does
it uses only English alphabets or even diacritic marks. If it is only
based
on the English alphabets it provides a keyboarding without any 'extras'
it
is the end of transliteration subject. I feel the transliteration
scheme
should facilitate to key in tamil without any extra font or softwares.
In conclusion of my views, I suggest to encode Tamil based on its
basic
character set. Tamil is not like English having one to one
relationship
between character coding and display. Tamil has to be handled by two
level.
I.e., an encoding based on Tamil characters and a font to render
(display)
Tamil Script. In the present scenario, It is not possible to have a
single
character encoding scheme and single font encoding scheme to cater to
all
the living computers and its operating systems. ASCII in the
DOS
environment and ANSI in the WINDOWS ( and other) environment are
two
different encoding scheme.
ANBU arasan.
________________________________________________
Sponsors/Advertisers needed - please email bala@tamil.net
Check out the tamil.net web site on <http://tamil.net>
Postings to <webmasters@tamil.net>. To unsubscribe send
the text - unsubscribe webmasters - to majordomo@tamil.net
________________________________________________
Home |
Main Index |
Thread Index