
Who I am

Anand kumar M
Working as a Research Associate in CEN,Amrita Viswa Vidyapeetham. Coimbatore.

Native : Portonovo/Parangipettai Cuddalore Dist

Areas of Interest : Morphological Analyzer and Generator, Dependency Parsing, Statistical Machine Translation, Machine learning, Support Vector Machines, Machine learning for NLP .

Completed Projects

POS-Tagger for Tamil.
Morphological Analyzer for Tamil(Novel Method)
Morphological Analyzer for Malayalam
Morphological Analyzer for Telugu
Morphological Generator for Tamil (Novel Method)
Morphological Generator for Malayalam
Morphological Generator for Telugu
Statistical Machine Translation for English to Tamil (Currently Working)


International Journals

A Novel Data Driven Algorithm for Tamil Morphological Generator, International Journal of Computer Applications(IJCA) - Foundation of Computer Science, 6(12):52,56, 2010. Download PDF

A Sequence Labeling Approach to Morphological. Analyzer for Tamil Language, International Journal on Computer Science and Engineering (IJCSE), Vol. 02, No. 06, 2201-2208, 2010. Download PDF

A Natural Language Processing Tools for Tamil Grammar Learning and Teaching, International Journal of Computer Applications(IJCA) - Foundation of Computer Science, October 2010. Download PDF

“Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, ISSN 1797-9617. PDF

A Paradigm Based Morphological Analyzer for English to Kannada using a Machine Learning Approach, Research India Publication(RIP), October 2010.

International Conferences

“Tamil Part-of-Speech tagger based on SVMTool”, Proceedings of International Conference on Asian Language Processing 2008 (IALP 2008), Chiang Mai, Thailand .

“Morphological Analyzer for Agglutinative Languages Using Machine Learning Approaches”, Proceedings of International Conference on Advances in Recent Technologies in Communication and Computing( ARTCom 2009), Kottayam, India .

“Chunker for Tamil”, Proceedings of International Conference on Advances in Recent Technologies in Communication and Computing( ARTCom 2009), Kottayam, India.

“Postagger and Chunker for Tamil Language”, Proceedings of the 8th Tamil Internet Conference, Cologne, Germany.

“A Novel Approach for Tamil Morphological Analyzer”, Proceedings of the 8th Tamil Internet Conference 2009, Cologne, Germany.

“Chunker for Tamil using Machine Learning”, 7th International Conference on Natural Language Processing 2009( ICON2009), IIIT Hyderabad, India.

“Morphological generator for Tamil a new data driven approach”, 9th Tamil Internet Conference, Chemmozhi Maanaadu, Coimbatore, India.

“Grammar Teaching Tools for Tamil” Technology for Education Conference (T4E), IIT Bombay, India.

“ A Novel Approach to Morphological Generator for Tamil”, 2nd International Conference on Data Engineering and Management (ICDEM 2010) , Trichy, India.

“Morphological Analyzer for Malayalam Using Machine Learning ”, 2nd International Conference on Data Engineering and Management (ICDEM 2010) , Trichy, India.

“Morphological analyzer for Telugu using Support Vector Machine”, International Conference on Advances in Information and Communication Technologies (ICT 2010), Kochi, India.

“A Novel Algorithm for Tamil Morphological generator”, 8th International Conference on Natural Language Processing 2010( ICON2010), IIT-Kharagpur, India.

Saturday, December 4, 2010

POS Tagger and Chunker for Tamil

Part Of Speech tagging and chunking are the fundamental processing
steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. These are done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty-two tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the POS tagger and Chunker. We found that SVM based machine learning tool affords the
most encouraging result for Tamil POS tagger (95.64%) and chunker (95.82%).


  1. sir, I am sathish kumar doing b.e in cse in anna university. I want the tamil pos tagger and chunker tool for free download. can you send the address link to my mail

  2. Sir, I am Mohamed Hashim Miver, a student of South Asian University New Delhi I am doing my thesis on Pos tagging and tamil text classification.
    I couldn't find Pos tagger. Therefore I would be thankful to you if you send me the link to download Pos tagger for free download. my email is
