View Single Post
Posts: 106 | Thanked: 205 times | Joined on Dec 2015 @ Spain
#170
Originally Posted by cvp View Post
can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable
The steps to do that are these:

-You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too)

- You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory

- You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post)

-You need the corpora files of your language (e.g. Spanish)
http://corpora2.informatik.uni-leipzig.de/download.html
http://www.cs.upc.edu/~nlp/wikicorpus/
http://opus.lingfil.uu.se/OpenSubtitles2016.php
http://www.lllf.uam.es/ESP/Corlec.html
https://tatoeba.org/spa/downloads

Take in mind this tip to make your corpora files:
Corpora file =< 4GB for 16GB RAM Computers
Corpora file =< 1,5GB for 8GB RAM Computers

- You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro.

- You need "lbzip2" package installed on your system too.

-You need "rsync" installed on your system.

-You need "QT5" installed on your system.

-You need "python3-dev" installed on your system.

- Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/USERNAME/okboard/langs)

-If you have several corpora files, then:

Code:
cat file1 file2 file3 file4 file5 > corpus-es.txt
- Open a terminal window

- And set the two environment variables:

*NOTE: You should change USERNAME by your own name
Code:
export CORPUS_DIR=/home/USERNAME/okboard/langs
Code:
export WORK_DIR=/home/USERNAME/okboard/langs
- You can see those variables with

Code:
echo $VARIABLE_NAME
if you're curious

- You need to compress the file (corpus-es.txt) you put before in /home/username/okboard/langs:

Code:
bzip2 /home/USERNAME/okboard/langs/corpus-es.txt
- Now should be named corpus-$LANG.txt.bz2 In our case: corpus-es.txt.bz2 because of Spanish

- There should be a single file inside.

- The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code.

Code:
cd /home/USERNAME/okb-engine-master/
- In 'db' folder you must create a lang-es.cf file first. You can copy it from another .cf file in the same folder (e.g. copy lang-en.cf and rename it into lang-es.cf)

-And left only ASCII characters on those files:

Code:
lbzip2 -d < /home/USERNAME/okboard/langs/corpus-es.txt.bz2 | ./tools/clean_corpus.py | lbzip2 > /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2
Then move the corpus-es.txt original file to a safe place:

Code:
mv /home/USERNAME/okboard/langs/corpus-es.txt.bz2 /home/USERNAME/okboard
And rename the cleaned file fro non valid ASCII characters to the proper name to start the process:

Code:
mv /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2 /home/USERNAME/okboard/langs/corpus-es.txt.bz2
- Execute
Code:
db/build.sh es
("es" in case of Spanish)

- After this, the script creates the dictionaries for OKBoard in path /home/USERNAME/okboard/langs/ :

add-words-es.txt
affixes-es.txt
clusters-es.log
clusters-es.txt
corpus-es.txt.bz2
db.version
es-full.dict
es-full.tre
es-learn.txt.bz2
es-predict.dict
es-test.txt.bz2
es.tre
grams-es-full.csv.bz2
grams-es-learn.csv.bz2
grams-es-test.csv.bz2
lang-es.cf
ngrams-es.rpt
predict-es.db
predict-es.id
predict-es.ng
predict-es.rpt.bz2
predict-es.txt.bz2
tmp-words-es.txt

- So, now we have the Spanish dictionary created.

Now we have to compress in .gz the files OKBoard will use to swype our texts:

Code:
gzip -9 /home/USERNAME/okboard/langs/es.tre
Code:
gzip -9 /home/USERNAME/okboard/langs/predict-es.db
Code:
gzip -9 /home/USERNAME/okboard/langs/predict-es.ng
Now, connect the phone via ssh to the computer as I explain here: http://www.linuxleon.org/2015/09/how...jolla-con.html

And follow these instructions to create RPM package directly on your Sailfish phone: http://talk.maemo.org/showthread.php?t=92963

After that, place .gz files onto phone:

Code:
scp /home/USERNAME/okboard/langs/es.tre.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.db.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.id /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.ng.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Last edited by ferlanero; 2017-01-10 at 10:05.
 

The Following 6 Users Say Thank You to ferlanero For This Useful Post: