SMS standards

The mobile technology is an important means of communications today. With the accelerating growth of this technology in India, the number of subscribers from rural areas will grow manifold for the simple reason that English literacy is relatively low in rural areas. In other words, unless Indian language messaging support is improved significantly, a large number of subscribers will be deprived of the benefits of SMS.

In the Mobile technology, the multilingual data handling becomes vital across different layers. Any chosen encoding scheme for data transmission should consider the following:

• The data encoding scheme should support all possible characters, character combinations in
Indian Languages as per Unicode standard.
• There should be a provision to change languages within a single message.
• The encoding should be flexible for future Unicode standard.

Currently prevalent 3 SMS encoding schemes in India are :

• ISCII based 7-Bit encoding
• 7-bit default alphabets as per GSM standard 
• UTF-8

The GSM standard supports 7-bits default alphabet and UCS2. For Indian languages, these encodings have their own pros and cons; especially when it comes to number of characters, standard implementation etc. The 7-bit EA-ISCII is capable of handling all the intricacies of Indian languages but it lacks the flexibility and at present does not support all the Unicode characters.
But adopting 7-bit standards to cater growing demands of Indian Languages will not make mobile devices truly localized for Indian languages.

Advantages of UTF-8 encoding for SMS

The UTF-8 encoding is more widely used by websites, emails and many open source applications. It is able to represent any character in the Unicode standard. It has many advantages like:
• UTF-8 form preserves ASCII Transparency
• UTF-8 is the preferred encoding for HTML and similar protocols for Internet
• UTF is byte serialized
• Self- Synchronizing features - The first byte of UTF-8 code indicates the number of bytes to
follow in a multi-byte Sequence
• All code points in the Unicode code points would be represented in 3 bytes
• UTF-8 allows efficient forward parsing
• UTF-8 is a variable width encoding form, using 8 bits code units, in which high bits of each code unit indicate the part of the code unit sequence to which each byte belong.

W3C View

According to the W3C view on encoding standards Unicode is a good choice for representing content when served in multiple languages. The amount of bandwidth required to transmit content can vary significantly depending on the character encoding used.

Keyboard layouts for Mobile

There are multiple keyboard layouts prevailing in the market. C-DAC has 1 patent (MUM 805 dt. 2002) for enabling the handset with Indian Languages which include the Rasterizer Engine and two key non predictive inputting mechanism. The keyboard layout for mobile is sub-set of this which uses 9 keys available on the handset and it supports inputting for all Indian languages.

Recently, CEWiT [Centre for Wireless Technology, IIT Madras] is working on Indian Language keypad. Luna Ergonomics has also developed On Screen keypad for Indian languages.

Role of TDIL

TDIL programme of MeitY is actively participating with stakeholders to promote UTF-8 encoding for SMS in Indian languages and also brainstorming on developing standard for enabling Indian languages on mobile devices.