maemo.org - Talk - View Single Post - Advanced text entry on Sailfish (Swype or similar)

spidernik84	2016-01-25 , 20:02
Posts: 27 \| Thanked: 35 times \| Joined on Jan 2016 @ Sweden	#233

Originally Posted by ljo

@spidernik84 et al, this should rather be between 0.7-1.8 million wordforms but not much more based on the 92034 stems (roughly what we count as words) which is about the size of a standard working vocabulary of other latin script languages like french (0.63 million aspell wordforms). So there is something wrong with the assumptions in the expansion processing.

I think you are right. I just failed another generation attempt (ran out of 20GB of RAM plus 5GB of swap... ).
I did a comparison with the English language, this is what I see:

Code:

nico@hendrix:~/aspell/aspell6-it-2.4-20070901-0$ aspell -l en dump master | aspell -l en expand | wc
 119789  119789 1153336
nico@hendrix:~/aspell/aspell6-it-2.4-20070901-0$ aspell -l it dump master | aspell -l it expand | wc
  95193 36636439 655315062

The number of words generated for the Italian language is INSANE.
You seem to know a lot of this. Have you got any idea of what can be done to keep the dictionary smaller? I've been searching for aspell alternative dictionaries with no luck...

Thanks. I surely hope we don't need to rent a Cray cluster to generate this dict...

Quote & Reply |

The Following 3 Users Say Thank You to spidernik84 For This Useful Post:
eber42, Feathers McGraw, juiceme