View Single Post
Posts: 86 | Thanked: 362 times | Joined on Dec 2007 @ Paris / France
#158
There are a lot of questions about adding new languages to OKBoard, so here is some details (sorry for the length, no TL/DR)


The main focus of this keyboard is to allow fast typing (higher words per minute), so it must be tolerant to inacurate strokes. Try to type as fast as you can and you will understand :-)
As a consequence, when you input a word, there are a lot of possible candidates.

A strategy is to get the "best match", but this usually does not work as when you type faster you will likely produce a stroke that looks like another word and this will reduce guessing rate.

The solution chosen is to implement a language model that can figure out which candidates are likely in a given context (and filter out the one that cannot occurs in this context).

Here is an example: I have just tried to type: "There is no place like home"
When I swipe "home", the context (surrounding words) is "There is no place like" and the candidates are home, hone, gimme, hinge, gone. The keyboard will clearly prefer "home" to other candidates because they do not make sense (or are far less likely in an English sentence).


To train the model you need to feed it with a huge volume of text. The text should be representative of the kind of text you will type.
For example is you use a Wikipedia corpus, the keyboard will be very uncooperative if you try to type informal text that would look unnatural in a Wikipedia article.

Building language files is not just a matter of pouring random text in the build tool or you will end up with a high error rate.
I recommend using a lot of text (my French corpus is over 40 million words, and in some cases this is not enough), and using different kind of documents: articles (new / wikipedia), e-mail, IRC and chat logs ...

At the moment the keyboard still has a lot of issues so that accuracy will still be low even with a good language model, but hopefully it will improve soon :-)

I have added lots of information to the README file (how to package and distribute files ...). Any patches or suggestions for improvement are welcome.
 

The Following 20 Users Say Thank You to eber42 For This Useful Post: