[en] Mining a parallel corpus for automatic generation of Estonian grammar exercises

Contenu précédent Contenu suivant

Type de document

Conference papers

Auteur(s)

Chalvin, Antoine

Eensoo, Egle

Stuck, François

Titre de l'ouvrage

Electronic lexicography in the 21st century : thinking outside the paper. Proceedings of the eLex 2013 conference.

Instance

INALCO

Est une partie de

Études Nordiques

Meeting

Third biennial conference on electronic lexicography (eLex 2013) Electronic lexicography in the 21st century : thinking outside the paper - 2013-10-17 / 2013-10-19 - Tallinn - Estonia

Mots clés en

Computer Science [cs]/Technology for Human Learning

Humanities and Social Sciences/Education

Humanities and Social Sciences/Library and information sciences

Humanities and Social Sciences/Linguistics

parallel corpora

readability

e-learning

Estonian as a foreign language

grammar exercises

Mots clés fr

Informatique [cs]/Environnements Informatiques pour l'Apprentissage Humain

Sciences de l'Homme et Société/Education

Sciences de l'Homme et Société/Linguistique

Sciences de l'Homme et Société/Sciences de l'information et de la communication

Date de publication

2013-10-17

Langue du document

Anglais

Résumé

[en] The aim of our research is to develop a system to generate Estonian grammar exercises for French-speaking learners, based on a large lemmatised parallel corpus (http://corpus.estfra.ee) and on the data of the Comprehensive French–Estonian Dictionary (http://www.estfra.ee). We concentrate on exercises on nominal and verbal morphology. Although the corpus is not syntactically tagged, we also explore the possibilities of generating some types of syntax exercises. The system generates on demand exercises consisting of a specified number of Estonian sentences, in which relevant word forms are replaced by their lemmas. The learner has to construct the right form and can check his or her answers. Sentences are accompanied by their French translation. In this article, we concentrate on the problems related to the definition and tuning of sentence selection criteria. Exercises can be generated at three levels of difficulty. Relevant sentences are picked up in the corpus according to their length and the " frequency " of the lemmas they contain, i.e. the presence of the lemmas in one of the four subsets of headwords specified in the data of the dictionary : basic vocabulary (4000 words), small dictionary (10 000 words), lower-medium dictionary (15 000 words), and upper-medium dictionary (40 000 words).

Provenance

Electronic lexicography in the 21st century : thinking outside the paper. Proceedings of the eLex 2013 conference.

Contenu précédent Contenu suivant

Collection

HAL

Source

HAL

Type de ressource

Texte intégral

Est une version de

https://inalco.hal.science/hal-01295040v1

Licence

Distributed under a Creative Commons Attribution 4.0 International License

Citation bibliographique

Antoine Chalvin, Egle Eensoo, François Stuck. Mining a parallel corpus for automatic generation of Estonian grammar exercises. Third biennial conference on electronic lexicography (eLex 2013) "Electronic lexicography in the 21st century : thinking outside the paper", Eesti Keele Instituut (Tallinn, Estonia), Oct 2013, Tallinn, Estonia. pp.280-295. [hal-01295040]

Citer cette ressource

[en] Mining a parallel corpus for automatic generation of Estonian grammar exercises, dans Études nordiques, consulté le 19 Avril 2025, https://etudes-nordiques.cnrs.fr/s/numenord/item/17367