Week from 08/02 to 08/08

 My mentor points me out that maybe the reason for our models' better performance is that we are evaluating fewer languages.

Approach

#Languages

LangTag(S)

10

LangTag(C)

12

langdetect

55

Tika

18

openNLP

103

Since our model only supports 12 languages and openNLP supports for 103 languages, that can be the reason for our models' better performance.

How did you evaluate the other tools by restricting the languages?

My mentor suggested that by getting the vector of probabilities that used to predict the language by this model, we can obtain the desired result. I was able to get the probability vectors of openNLP model and langdetect . But for Tika  it was not possible. I reevaluated the openNLP and langdetect tools on QALD datasets, entity-labels and abstracts. And updated the results tables in the repo.

Observations

It was observed that the openNLP and langdetect tools performed much better when the languages are restricted.

Comments

Popular posts from this blog

DBpedia Neural Multilingual QA - GSoC project summery

Machine Translation Task

Week from 07/12 to 07/18