Automated methods for Question-Answering in Icelandic

Abstract

Question Answering (QA) is the automated task of providing an answer to a question posed in human language. Whether through search engines or speech controlled home assistants it has become a tightly integrated part of many peoples daily routine at work or home. In recent years, these methods have improved greatly for commonly spoken languages such as English. This can almost wholly be attributed to advances in sequence modeling using deep neural networks, an increase in computing power, and the creation of large data sets suitable for training. In this thesis, such QA methods are described, implemented and evaluated for Icelandic. The methods applied are a statistical approach based on term frequency, a current standard practices approach using a neural language model for Icelandic and a modern variant using pre-encoded phrase lookup. A new QA corpus and Icelandic language models are also presented. The result is a baseline for extractive QA in Icelandic, where an answer is highlighted in a single document or larger corpora. Finally, a cross-lingual extension of the phrase lookup method is investigated and adapted for Icelandic QA. In this system, questions can be asked in Icelandic and are answered with segments from the English Wikipedia. This system is then adapted to answer Icelandic questions in Icelandic using segments from the Icelandic Wikipedia, taking advantage of a bilingual language model.

Publication
MSc University of Iceland