Week from 06/07 to 06/13
This week my metor and I started writing our research paper "Evaluating Language Identification Methods over Linked Data". I was assigned the task of evaluating the language detection tools LangTag(s), LangTag(C), langdetect, openNLP and Tika. LangTag(s) is the model we trained on qald-7 training dataset and LangTag(C) is the model we trained on all the qald datasets from qald-3 to qald-9. How the evaluations are done? All the language detection tools are evaluated on keywords and full sentences of the qald test datasets. First I created .csv files of the qald datasets that consists only the question, keywords and language columns. Then the evaluation is done. We evaluated the tools not only just full qald datasets. Also the tools are evaluated on English, French and German languages seperately. We recorded the accuracy of the language detection tools. Observations We observed that for full sentences, our models perform better than the other three models. But all the models p...