Week from 06/14 to 06/20

This week I re-evaluated the models' performances and calculated the runtimes of the models.

How the evaluations are done?

All the language detection tools are evaluated on keywords and full sentences of the qald test datasets. Runtimes are calculated through the python library 'time'. The full results are presented in this repository. https://github.com/AKSW/LangTagger . 

QALD

3

4

5

6

7

8

9

Questions

K

F

K

F

K

F

K

F

K

F

K

F

K

F

LangTag(S)

0.0003

0.0003

0.0006

0.0003

0.0006

0.0003

0.0002

0.0002

0.0019

0.0004

0.0041

0.0014

0.0001

0.0002

LangTag(C)

0.0018

0.0012

0.0026

0.0021

0.0029

0.0022

0.0017

0.0011

0.0036

0.0031

0.0131

0.0120

0.0017

0.0012

langdetect

0.0087

0.0063

0.0079

0.0042

0.0072

0.0057

0.0078

0.0054

0.0082

0.0041

0.0092

0.0021

0.0075

0.0116

Tika

1.5677

1.4068

1.4021

1.4009

1.6072

1.3928

1.5981

1.3978

1.4379

1.3955

1.4213

1.3778

1.9081

1.4836

openNLP

0.0027

0.0011

0.0036

0.0039

0.0035

0.0030

0.0023

0.0011

0.0058

0.0062

0.0032

0.0026

0.0012

0.0014


We can observe from the above table that LangTag(S) and LangTag(C) models are faster than other language detection tools. We can conclude that since the LangTag models are using a simple Naive Bayes model, which is a fast probabilistic model it is able to perform better. 


Comments

Popular posts from this blog

DBpedia Neural Multilingual QA - GSoC project summery

Week from 07/12 to 07/18

Machine Translation Task