Week from 06/14 to 06/20
This week I re-evaluated the models' performances and calculated the runtimes of the models.
How the evaluations are done?
All the language detection tools are evaluated on keywords and full sentences of the qald test datasets. Runtimes are calculated through the python library 'time'. The full results are presented in this repository. https://github.com/AKSW/LangTagger .
QALD | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Questions | K | F | K | F | K | F | K | F | K | F | K | F | K | F |
LangTag(S) | 0.0003 | 0.0003 | 0.0006 | 0.0003 | 0.0006 | 0.0003 | 0.0002 | 0.0002 | 0.0019 | 0.0004 | 0.0041 | 0.0014 | 0.0001 | 0.0002 |
LangTag(C) | 0.0018 | 0.0012 | 0.0026 | 0.0021 | 0.0029 | 0.0022 | 0.0017 | 0.0011 | 0.0036 | 0.0031 | 0.0131 | 0.0120 | 0.0017 | 0.0012 |
langdetect | 0.0087 | 0.0063 | 0.0079 | 0.0042 | 0.0072 | 0.0057 | 0.0078 | 0.0054 | 0.0082 | 0.0041 | 0.0092 | 0.0021 | 0.0075 | 0.0116 |
Tika | 1.5677 | 1.4068 | 1.4021 | 1.4009 | 1.6072 | 1.3928 | 1.5981 | 1.3978 | 1.4379 | 1.3955 | 1.4213 | 1.3778 | 1.9081 | 1.4836 |
openNLP | 0.0027 | 0.0011 | 0.0036 | 0.0039 | 0.0035 | 0.0030 | 0.0023 | 0.0011 | 0.0058 | 0.0062 | 0.0032 | 0.0026 | 0.0012 | 0.0014 |
We can observe from the above table that LangTag(S) and LangTag(C) models are faster than other language detection tools. We can conclude that since the LangTag models are using a simple Naive Bayes model, which is a fast probabilistic model it is able to perform better.
Comments
Post a Comment