Week from 07/19 to 07/25

August 28, 2020

This week we started to focus on the Machine Translation task. First I tried to understand how the "Tensorflow Neural Machine Translation" model is implemented. And then trained the model on QALD datasets.

How the datasets are created?

From the QALD datasets from QALD-3 to QALD-7, I created a dataset which consists of language pairs such as English-Spanish, English-Deutsch, etc. These pairs are created for all languages Deutsch, Spanish, French Brazilian Portuguese, Portuguese, Italian, Dutch, Hindi, Romanian, Persian, and Russian.

How the evaluation is done?

Using the "Tensorflow Neural Machine translation with attention model", trained the datasets created as said above and got the following results,

language	accuracy %
spanish	60.6299
german	65.8595
french	63.3587
russian	14.6666
italian	31.6301
portugese	3.33333
pt_BR	4.54545
hindi	37.3333
dutch	61.9422
persian	8.14479
romanian	52.3026

Observations

It is observed that the results are very poor. This can be due to reasons such as the small dataset size, small vocabulary.

Search This Blog

My GSoC blog