Week from 08/02 to 08/08
My mentor points me out that maybe the reason for our models' better performance is that we are evaluating fewer languages.
Since our model only supports 12 languages and openNLP supports for 103 languages, that can be the reason for our models' better performance.
How did you evaluate the other tools by restricting the languages?
My mentor suggested that by getting the vector of probabilities that used to predict the language by this model, we can obtain the desired result. I was able to get the probability vectors of openNLP model and langdetect . But for Tika it was not possible. I reevaluated the openNLP and langdetect tools on QALD datasets, entity-labels and abstracts. And updated the results tables in the repo.
Observations
It was observed that the openNLP and langdetect tools performed much better when the languages are restricted.
Comments
Post a Comment