Advanced text entry on Sailfish (Swype or similar) - Page 28 - maemo.org

Active Topics

Sailfish OS for the Motorola Moto G7 Power (XT1955-5) - (ocean) (3)
to SailfishOS by edp17 - 2 hrs, 36 mins ago
The N900 is now 15 years old! (6)
to Nokia N900 by Macros - 1 day, 6 hrs ago
F(x)tec Pro1x (QWERTY) for sale (0)
to Buy & Sell by aln00000 - 1 day, 19 hrs ago
n770 youtube video (0)
to General by nmdv - 5 days, 18 hrs ago
more...

Page 28 of 38

Thread Tools

spidernik84	2016-02-05 , 09:41
Posts: 27 \| Thanked: 35 times \| Joined on Jan 2016 @ Sweden	#271

@ferlanero: what os did you use to compile the files?

Quote & Reply |

ferlanero	2016-02-05 , 10:23
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#272

Originally Posted by spidernik84

@ferlanero: what os did you use to compile the files?

ArchLinux but still I can't write words with letter "ñ" like España

Quote & Reply |

	Feathers McGraw	2016-02-06 , 23:10
	Posts: 654 \| Thanked: 2,368 times \| Joined on Jul 2014 @ UK	#273

Originally Posted by ferlanero

ArchLinux

Quote & Reply |

eber42	2016-02-07 , 18:22
Posts: 86 \| Thanked: 362 times \| Joined on Dec 2007 @ Paris / France	#274

Originally Posted by ferlanero

I can't write words with letter "ñ" like España

Make sure you use the real "N" key and not "Ñ" (this is a known limitation that has been discussed earlier, and I have some ideas on how to fix it).

If it still does not work, send me your language files and logs (you have to enable them in the application and they can be found in ~/.local/share/okboard). And do not expect a quick answer (I did not even start to work on the weeks old transparency issue).

BTW, last week I worked on the engine that processes swipes. Now it is not that much more accurate, but it does no more assign bad scores to expected results. Next week I will start to re-implement the prediction engine (this is the part which handle language model and learning. At the moment it is completely broken) and after I will go back to regular bug fixing.

Quote & Reply |

The Following 14 Users Say Thank You to eber42 For This Useful Post:
Feathers McGraw, Hariainm, jonquark, Jordi, juiceme, LameDuck, mautz, Mikkosssss, nodevel, rob_kouw, Saturn, spidernik84, velox, Watchmaker

ferlanero	2016-02-19 , 10:13
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#275

Originally Posted by eber42

Make sure you use the real "N" key and not "Ñ" (this is a known limitation that has been discussed earlier, and I have some ideas on how to fix it).

If it still does not work, send me your language files and logs (you have to enable them in the application and they can be found in ~/.local/share/okboard). And do not expect a quick answer (I did not even start to work on the weeks old transparency issue).

BTW, last week I worked on the engine that processes swipes. Now it is not that much more accurate, but it does no more assign bad scores to expected results. Next week I will start to re-implement the prediction engine (this is the part which handle language model and learning. At the moment it is completely broken) and after I will go back to regular bug fixing.

Hi eber42! Thank you very much for your answer. Yes, if I swipe over N instead Ñ for words containig that word, it works perfect. So I'll advice it in the openrepos for Spanish OKBoard language. Thank you for all your efforts! However it'll be pretty if swipes over Ñ works. If you need testers, please, ask me.

Thank you very much!

Quote & Reply |

uggeli	2016-03-26 , 12:15
Posts: 10 \| Thanked: 12 times \| Joined on Jun 2013	#276

Originally Posted by ferlanero

The steps to do that are these:

-You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too)

- You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory

- You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post)

-You need the corpora files of your language (e.g. Spanish)
http://corpora2.informatik.uni-leipzig.de/download.html
http://www.cs.upc.edu/~nlp/wikicorpus/
http://opus.lingfil.uu.se/OpenSubtitles2016.php
http://www.lllf.uam.es/ESP/Corlec.html
https://tatoeba.org/spa/downloads

- You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro.

- You need "lbzip2" package installed in your system too.

-You need "rsync" installed in your system.

-You need "QT5" installed in your system.

- Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/username/okboard/langs)

-If you have several corpora files, then:
Code:
cat file1 file2 file3 file4 file5 > corpus-es.txt
- Open a terminal window

- And set the two environment variables:
Code:
export CORPUS_DIR=/home/username/okboard/langs
Code:
export WORK_DIR=/home/username/okboard/langs
- You can see those variables with
Code:
echo $VARIABLE_NAME
if you're curious

- You need to compress the file (Spanish.dic.txt) you put before in /home/username/okboard/langs:
Code:
bzip2 Spanish.dic.txt
- Now should be named corpus-$LANG.txt.bz2 In our case: corpus-es.txt.bz2 because of Spanish

- There should be a single file inside.

- The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code.
Code:
cd /home/username/okb-engine-master/
- In 'db' folder you must create a lang-es.cf file first. You can copy it from another .cf file in the same folder (e.g. copy lang-en.cf and rename it into lang-es.cf)

-And left only ASCII characteres on those files:
Code:
lbzip2 -d < corpus.txt.bz2 | clean_corpus.py | lbzip2 > new_corpus.txt.bz2
- Execute
Code:
db/build.sh es
("es" in case of Spanish)

- After this, the script create the dictionaries for OKBoard with next list of files:

add-words-fr.txt
es-predict.dict
lang-fr.cf
clusters-es.log
es-test.txt.bz2
lang-nl.cf
clusters-es.txt
es.tre
predict-es.db
corpus-es.txt.bz2
grams-es-full.csv.bz2
predict-es.ng
db.version
grams-es-learn.csv.bz2
predict-es.rpt.bz2
es-full.dict
grams-es-test.csv.bz2
predict-es.txt.bz2
es-full.tre
lang-en.cf
words-es.txt
es-learn.txt.bz2
lang-es.cf

- So, now we have the Spanish dictionary created.

After this. I don't know what to do with these files. So any help is welcome

-----------------------------------

I'm trying to make Finnish support for OKBoard, but have to ask some tips from you guys. I'm not experienced in stuff like this. Anyway here is my current check list:

1) I have Linux distribution to use

2) I've downloaded OKBoard tarball

3) Dictionaries... There's no dictionary file for Finninh at the link provided.

4) Corpora file. I first tried to use http://www.corpora.heliohost.org/download.html But file has CRC error (the 2016 version), so I ended up to get Finnish version from here instead: http://opus.lingfil.uu.se/OpenSubtitles2016.php

5) I think Finnish spellchecking doesn't use aspell, but Malaga based Voikko: http://voikko.puimula.org/ and if I'm not misunderstood, voikko is used by ispell for example. But how to get that finnish dictionary file is somehow unclear to me.

After all this is done I could try to get forward with this but still lot of work as it seems. Also, What do you think, would it be good to include some additional sources too (like more official source ( http://kielitoimistonsanakirja.fi / http://kaino.kotus.fi/sanat/nykysuomi/ ) and if multible sources, how to easilly remove duplicates?

Quote & Reply |

The Following 2 Users Say Thank You to uggeli For This Useful Post:
ajalkane, eekkelund

spidernik84	2016-03-26 , 17:10
Posts: 27 \| Thanked: 35 times \| Joined on Jan 2016 @ Sweden	#277

Originally Posted by uggeli

5) I think Finnish spellchecking doesn't use aspell, but Malaga based Voikko: http://voikko.puimula.org/ and if I'm not misunderstood, voikko is used by ispell for example. But how to get that finnish dictionary file is somehow unclear to me.

After a quick look, I could not find any packaged Finnish dict either.
I did a search and came up with this file: https://packetstormsecurity.com/file...innish.gz.html

My knowledge of Finnish is limited to "Tervetuloa", "Kiitos", "Rautatientori" and some mixed insults that I'd rather not write, so I'm not the best person to judge if that file is free of spelling mistakes.
I think it's not the best dict, since it seems to contain even occasional English and Italian words, but it's a good start.

The okboard readme explains how to name the file and where to put it to bypass the aspell dict.

Good luck again!

Quote & Reply |

mautz	2016-03-26 , 17:54
Posts: 635 \| Thanked: 1,535 times \| Joined on Feb 2014 @ Germany	#278

Why don't you use aspell-fi?

Finnish corpora files are available at: http://corpora2.informatik.uni-leipzig.de/download.html

Quote & Reply |

uggeli	2016-03-29 , 07:14
Posts: 10 \| Thanked: 12 times \| Joined on Jun 2013	#279

Well I'm now testdriving Fedora, still have to used to this. So I first just looked at "software center" and there was no aspell-fi. But just tested via commandline and it seems to be there, but... If I'm not wrong all spelling efforts have been put to voikko for years now. I don't know that any distro uses nothing else than voikko as default for Finnish. But then again as said, this whole thing is so new to me that might be more than possible that someone else has to do this at the end, but until that I can give this a try and study things when I have some time.