
Part Of Speech tagging and chunking are the fundamental processing
steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. These are done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty-two tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the POS tagger and Chunker. We found that SVM based machine learning tool affords the
most encouraging result for Tamil POS tagger (95.64%) and chunker (95.82%).
sir, I am sathish kumar doing b.e in cse in anna university. I want the tamil pos tagger and chunker tool for free download. can you send the address link to my mail id.is techosatz@gmail.com
ReplyDeleteSir, I am Mohamed Hashim Miver, a student of South Asian University New Delhi I am doing my thesis on Pos tagging and tamil text classification.
ReplyDeleteI couldn't find Pos tagger. Therefore I would be thankful to you if you send me the link to download Pos tagger for free download. my email is mhminver@gmail.com