![]() |
Re: [Announcement]Open source text prediction input plugin
Quote:
I am pretty sure we'd get a Tay.ai type prediction engine out of that corpus! :p |
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
With such a huge file, we may have to split it into smaller parts. Otherwise RAM will probably become an issue.
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
I might need to adjust the dictionary size a bit, but as a non-native speaker I await your opinions before doing something more for Finnish. I will try to find some time to continue to work on the hyphenation problems that are really annoying in Swedish at least. |
Re: [Announcement]Open source text prediction input plugin
I had time to test it this morning and it seems to work pretty good after quick testing :). I can confirm that there is a hyphenation problem with some words. However, it is not a big problem in normal use since the issue seems to be linked to compound words. Here is few examples:
English: Finnish: my input: text-prediction
I put the text-prediction for comparison with an Android phone and both predictions were working quite similarly with most common words. Sometimes the most obvious conjugation is among the last words in the list but I believe that will improve after use (in Sailfish). Also the prediction knows every bad words in Finnish and some name-calling slang words. I believe that it is not a surprise since the corpus was from forum. EDIT: And I almost forgot: huge thanks for you, tusen tack! |
Re: [Announcement]Open source text prediction input plugin
Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...
|
Re: [Announcement]Open source text prediction input plugin
Quote:
dozens of conjugation forms. Here are few examples: Word: run = juosta
|
All times are GMT. The time now is 16:35. |
vBulletin® Version 3.8.8