go-தமிழ்
Whenever you see a post with this callout, it means I had ported some of my old (but still close to heart) post from my old/legacy blog host to here. Enjoy some good old classics ☕️
Having got my feet wet with golang by implementing lsgo equivalent command in golang
, It’s now time to explore some depths.
In this post, I’m gonna talk about my new project, go-தமிழ் - Tamil transliteration tool in golang for fun & learning
தமிழ்
(Thamizh not Tamil) is my mother tongue. It is absolutely fantastic to see thamizh letters in internet. Due to recent developments in typographic / indic technologies, it is now very easy to type & view in native languages.
At a very basic level, native languages are represented as Unicode, UTF-8, UTF-16, UTF-32 special characters. This way, computers can make sense of every char of every possible language as just an integer.
Although handling UTF-8 strings is defnitely a pain, golang
seems to support this out of the box. Especially their unicode/utf8
package is worth a read.
Having known that golang
can support தமிழ் natively & having learnt the basics of golang
, why not develop a english -> தமிழ் transileration tool ??
Basics of go-தமிழ்
தமிழ்
can be largely categorized as உயிர் ( Primary), மெய் (Secondary) & உயிர்மெய் (Vowels).
For example தமிழ்
letter க
is derived from க்
( which is மெய்
) and அ
( which is உயிர்
).
i.e க் + அ = க
. Similalry மி = ம் + இ
.
However in unicode world, the vowels appear as special character. They appear in ், ா, ி form only. So in unicode world, inorder to get மி
, we should concatinate ம
& ி
i.e மி = ம + ி
.
So it turns out that, generating tamil characters is quite challenging & interesting. Upon receiving a english transileration text say vanakkam
, we need first find the pattern of difference between printing a உயிர்
& உயிர்மெய்
.
For instance, vaa
can be interpreted as வஅ
or வா
. So it is quite clear that, we need a mechanism to identify whether the user wants to pronounce vowel sounds, or they want to get the actual letter here. In-order to solve this problem, I resorted to have my own encoding scheme for go-தமிழ்
.
Architecture of go-தமிழ்
Having decided that, I need to come up with my own encoding rules ( heck this is my own new encoding tool for fun! ), I then started to lay out basic grammer for my own tanglish
language.
You can take look at the grammar for go-தமிழ்
in the help
page of the webpage that gets served as part of go-தமிழ்
daemon mode.
To give you some glimpse…
தமிழ் | English |
---|---|
அ | a |
ஆ | 2a |
இ | i |
ஈ | 2i |
உ | u |
ஊ | 2u |
எ | e |
ஏ | 2e |
ஐ | 3i |
ஒ | o |
ஓ | 2o |
For complete details on go-தமிழ்
encoding rules, please this page.
Algo
- Get the input text and split it based on space delimiter, resulting in
slice
of input tokens. - Now iterate over each token and perceive every letter of input token as in-turn a
slice
. - By using Golang slicing of the slice technique, iterate from
0
tolen(token)
.- Match the new slice with either uyir, mei or vowels pattern.
- If found, then increment both start & end indices.
- If not, then increment only end and re-slice the slice from
start:end
pattern. - Loop & repeat till exit.
Deployment
After the main logic got working, now it is just a matter of how to present & package the tool. Usablity is the key aspect here.
Next, inorder to spice up the meal, I decided to have 2 modes of operation - Console mode & Daemon mode.
Console mode
Console mode will mimic a go-தமிழ் >>
shell, which takes in english input and return தமிழ்
text in the terminal out ( if terminal support is there for UTF-8).
Daemon mode
Daemon mode will run a webserver at port 8080 and it will serve transliteration as a service .
For this, I shamelessly 🙈 copied Golang playground CSS and re-used to my theme. I have to say, it perfectly fitted to my design and I’m kinda proud of it :-)
Although this is not a full-fleged webserver, it does the job for this fun excercise. So I’m good with it.
Looking back, I authored the original version back in 2017 and I tweaked a little for this blog. It was a nostalgic moment to look back how I evolved from a curious Gopher 🐣 to where I’m today. Time flies indeed 🦅
I also publish a newsletter where I share my techo adventures in the intersection of Telecom, AI/ML, SW Engineering and Distributed systems. If you like getting my post delivered directly to your inbox whenever I publish, then consider subscribing to my substack.
I pinky promise 🤙🏻 . I won’t sell your emails!