View Single Post
Posts: 635 | Thanked: 1,535 times | Joined on Feb 2014 @ Germany
#205
I used all the news files from the leipzig website to build my corpus file. The wiki articles use a to formal and sometimes scientific speech for everday use i think. For the dictionary i found a file which contains the 10000 most used words of my language(german in my case) and i dumped my sms conversation from my jolla and added them to the dictionary file as well. A good think to add to the corpora file are some ebooks which use everyday language.

I'm fine tuning my dictionary and corpus file and will look into building an rpm file to release the german language on openrepos in a few days.
 

The Following 6 Users Say Thank You to mautz For This Useful Post: