Week from 06/14 to 06/20

August 20, 2020

This week I re-evaluated the models' performances and calculated the runtimes of the models.

How the evaluations are done?

All the language detection tools are evaluated on keywords and full sentences of the qald test datasets. Runtimes are calculated through the python library 'time'. The full results are presented in this repository. https://github.com/AKSW/LangTagger .

QALD	3		4		5		6		7		8		9
Questions	K	F	K	F	K	F	K	F	K	F	K	F	K	F
LangTag(S)	0.0003	0.0003	0.0006	0.0003	0.0006	0.0003	0.0002	0.0002	0.0019	0.0004	0.0041	0.0014	0.0001	0.0002
LangTag(C)	0.0018	0.0012	0.0026	0.0021	0.0029	0.0022	0.0017	0.0011	0.0036	0.0031	0.0131	0.0120	0.0017	0.0012
langdetect	0.0087	0.0063	0.0079	0.0042	0.0072	0.0057	0.0078	0.0054	0.0082	0.0041	0.0092	0.0021	0.0075	0.0116
Tika	1.5677	1.4068	1.4021	1.4009	1.6072	1.3928	1.5981	1.3978	1.4379	1.3955	1.4213	1.3778	1.9081	1.4836
openNLP	0.0027	0.0011	0.0036	0.0039	0.0035	0.0030	0.0023	0.0011	0.0058	0.0062	0.0032	0.0026	0.0012	0.0014

We can observe from the above table that LangTag(S) and LangTag(C) models are faster than other language detection tools. We can conclude that since the LangTag models are using a simple Naive Bayes model, which is a fast probabilistic model it is able to perform better.

Search This Blog

My GSoC blog

Week from 06/14 to 06/20

How the evaluations are done?

Comments

Post a Comment

Popular posts from this blog

DBpedia Neural Multilingual QA - GSoC project summery

Week from 07/12 to 07/18

Machine Translation Task