Week from 06/07 to 06/13

July 13, 2020

This week my metor and I started writing our research paper "Evaluating Language Identification Methods over Linked Data". I was assigned the task of evaluating the language detection tools LangTag(s), LangTag(C), langdetect, openNLP and Tika.

LangTag(s) is the model we trained on qald-7 training dataset and LangTag(C) is the model we trained on all the qald datasets from qald-3 to qald-9.

How the evaluations are done?

All the language detection tools are evaluated on keywords and full sentences of the qald test datasets. First I created .csv files of the qald datasets that consists only the question, keywords and language columns. Then the evaluation is done.

We evaluated the tools not only just full qald datasets. Also the tools are evaluated on English, French and German languages seperately.

We recorded the accuracy of the language detection tools.

Observations

We observed that for full sentences, our models perform better than the other three models. But all the models performed worse in keyword evaluations.(The results are in the research paper)

Search This Blog

My GSoC blog