Master's Thesis in Computer Science: Direkt Pro l18th January 2007Supervisors: Jonas Granfeldt and Pierre NuguesAbstractDirekt Pro l (DP) is a system for grammatical pro ling. It detects, annotates and displays grammatical constructs, both correct and incorrect, in freely-written texts by Swedish-speaking learners of French. It can also determine the learner's developmental stage, given a text with enough identifying attributes.
The scope of my work is the nal step, the classi cation of the text as being typical for a certain stage, for which machine learning (ML) methods, more speci cally C4.5, LMT (Logistic Model Tree) and SVM (Support Vector Machine), have been applied.
This thesis is part of a longer-term research project, led by Jonas Granfeldt and Suzanne Schlyter at the Centre for languages and literature at Lund University. The research project aims at increasing our knowledge regarding how Swedish- speaking learners acquire pro ciency in written French.
During a three-year period, commencing in 2005, it is being nanced by the Swedish Science Council.
In my experiments with an early version (1.5.2) of the pro ling system, precision and recall values of a ternary classi er (basic/intermediate/native), based on sup- port vector machines, have reached 70 83 %. The system has also been tested with C4.5- and logistic model tree-based classi ers, yielding similar (LMT) or slightly inferior (C4.5) results.
Direkt Pro l 2.0 gives similar performance even for a quintary classi er, and ternary classi er precision and recall is somewhat improved as well (up to 88 %). The Naive Bayes method yields a small further overall precision/recall increase, and is much faster than SMO (SVM) on the experiment corpus.
This project paves the way for further experiments with parameter selection and classi er performance.
Contents1Introduction12Learning a Foreign Language32.1Introduction . . . . . .