Thought provoking on Tamil encoding

To: tamilnet@tamilnews.org.sg
Subject: Thought provoking on Tamil encoding
From: anbu.arasan@axcess.net.in
Date: Wed Sep 17 19:29:09 1997
Content-Length: 8218
Content-Type: text
Posted-Date: Wed Sep 17 19:29:09 1997
Reply-To: anbu.arasan@axcess.net.in
Sender: owner-tamilnet@irdu.nus.sg
--------
It is my sincere effort to make very clear that in no way I intend to hurt  
or lament anyone and my interest is purely and solely to put my views for   
the enshrinement of the sweet Tamil.  If at all, in any way at any place of 
my writing puts someone doing their mighty service for the enshrinement of  
TAMIL or those using it, I once again repeat and highlight it to forego and 
forgive it for the sake of prosperity of the language for which such an     
unprecedented discussions are taking place.  No doubt, these discussions    
would pave means for better understandings and the best solutions for the   
TAMIL.

Tamil language is evolved and reformed over a period of time immemorial.

Since, we believe in what we are seeing, some of the participants in the    
discussion believe that representing (displaying and printing) of Tamil on  
computers is Tamil encoding. 

Tamil has witnessed and withstood many changes in its script form as well as
 in its character set.

Before starting of encoding of Tamil glyphs, a requirement (aim) has to be  
formulated about what is that is going to be encoded, without which it is   
not advisable to select, segregate the Tamil glyphs as per ones taste.

I want to make one point very clear. There appears to be some confusion     
somewhere within the ambit of discussion between font/glyph and character   
set. Set of glyphs is not a Tamil character set, itself. The character set  
of Tamil is "uzhir eluthukkal and Mei eluthukkal". These thirty letters are 
the basis for Tamil and the combinations of these letters forms hundreds of 
characters and it is not possible to encode all these characters on         
computers. The basic common characters are considered as character set of   
Indian languages and encoded in ISCII (new standard).

It is because of some misinterpretation of some people involved in the      
earlier versions of ISCII Standards, an unnecessary coding appears to have  
been done for matra characters. These matras (vowel signs) are indicating   
the corresponding vowel present in the "uyir mei eluthukkal". These vowel   
signs could be a just one sign or two or three amongst the Indian languages 
(In Tamil, only upto two signs are used). It can come only on right side as 
in "kA,ki,kI" etc. or on only left side as in 'kai,ke,kE" etc. or on both   
the sides as in "ko,kO, kou" etc. It is not so only in Tamil, but also in   
some of the other Tamil influenced languages like Malayalam, Oriya,         
Bengali(Bangla), Assamese, etc.

Even though we call the composite characters as "Uyir mei" its actual       
composition stands out to be consonant (Mei) and vowel (uyir). Using this as
 a basis, Indian scripts being coded on the computers. This is applicable   
even to earlier ISCII Standards ISCII-91 (called as level 1). In ISCII-91   
consonants are followed by matras. It is the same in Unicode also. KANNAL   
Kanpadthum poi....... theera vicharippadhe mei.  It seems that most of the  
participants didn't understood the encoding followed in ISCII and as well as
 in Unicode.

I humbly repeat, character encoding and font design are two different issues
 these are not be mixed up together.

It is widely misunderstood by someone as the current discussion on encoding 
glyphs as encoding Tamil on computers and is the basis for enshrining Tamil 
electronically. This appears to be a wrong conception and false image       
engulfed in the discussion.

Font encoding cannot solve many issues like, sorting, searching, indexing   
and preserving Tamil itself. Font encoding is just one way of displaying    
(rendering) Tamil on computers (since lot of maturing desired on software   
development).

Regarding "glyph substitution" (wrongly stated as font substitution - a font
 substitution means substituting one font, say 'arial' in Windows           
environment with 'times new roman'), I feel, we can think as one of the     
option. Since glyph substitution already implemented in windows NT and      
windows 95, True Type fonts (this is not open type) 


is the best option. It is all depending on our requirement (all of us - we  
have not yet decided to what environment, we are discussing the issue). If  
we are talking about the future including the present day computers capable 
of running windows 95 for PCs or system 7.x on Apple, we can definitely     
adopt "Glyph substitution method". If our target is something else, glyph   
substitution will fail to support us.  "Future international extensions to  
True type may require a unique Glyph" is as mentioned in the True type      
documentation "True type 1.0 font files - Technical specification Version   
1.66" by Microsoft. Since True type is being promoted by both Microsoft and 
Apple, it seems that Glyph substitution will continue.

The glyph ordering followed by Dr Kalyan seems to be illogical, to arrive at
 correct order just follow the thamizh nedunkanakku.

I feel the Glyph encoding has to be discussed, whether we need 8 bit or 7   
bit, whether to support only GUI computers or atleast from AT 286 (most of  
the Government offices still use these outdated machines in India) or to    
cater to all electronic gadgets as someone pointed about POS.  I have       
implemented few Indian languages on Pagers.

Someone may wonder to raise a query as to why we cannot use 128-160. These  
128-160 is just a replication of 0-32. It means the 160 (no break space) has
 to be same as 32 (space) with the same advance width.

Regarding Dr Herald Schiffmans' requirement and like-minded linguists (the  
old Tamil letters are nothing but different 'varivadivam' for the same basic
 constituents) is taken care in ISCII Standard. That is, any Tamil          
literature could be stored using ISCII encoding scheme to preserve Tamil.   
Since, no common interface softwares are available yet, the current         
developers can provide a kind of converters to store in ISCII Standard,     
(Apple has implemented ISCII - level in their machine and Microsoft is going
 for ISCII level 2).

I remember, Dr Herald Schiffman was referring to quote marks. I would like  
to present my view here. I feel his requirement is for us to have the quote 
marks as used in Tamil texts (and in Indian languages and English in India) 
that is the single quote will look like as if the comma is shifted to match 
the ascender of the character. Since the Glyphs encoding is round about 8   
bit encoding retaining English, now, it is to accommodate in the upper slot.
  Quote marks used in Tamil is different from the one used in English.      
There are two different single quote marks as open quote and close quote    
marks.  They are similar to inverted comma and comma as seen at ANSI        
character position 145 and 146 in Arial fonts used in Windows.

In India, the Indian language numerals are seen to gain its popularity      
(except Tamil) because of the pushing effort and as it is being recognised  
as part of the language itself (I feel, a language cannot be complete       
without its own numbering system).

I have not seen the romanised keyboard which is proposed by the Tamilnadu   
Standardisation Committee (has it been finalised). If, it is finalised, does
 it uses only English alphabets or even diacritic marks. If it is only based
 on the English alphabets it provides a keyboarding without any 'extras' it 
is the end of transliteration subject.  I feel the transliteration scheme   
should facilitate to key in tamil without any extra font or softwares.

In conclusion of my views, I suggest to encode Tamil based on its basic     
character set.  Tamil is not like English having one to one relationship    
between character coding and display.  Tamil has to be handled by two level.
 I.e., an encoding based on Tamil characters and a font to render (display) 
Tamil Script.  In the present scenario, It is not possible to have a single 
character encoding scheme and single font encoding scheme to cater to all   
the living computers and its operating systems.  ASCII in the DOS           
environment and ANSI in the WINDOWS ( and other) environment are two        
different encoding scheme.

ANBU arasan.
Prev by Date: Profound misconceptions
Next by Date: Re: Old Orthography
Prev by thread: Re: [WMASTERS] What we are doing.
Next by thread: Response to Arasan's Comments
Index(es):
- Date
- Thread
Home | Main Index | Thread Index