I think that a good dictionary is far more important than an extremely huge corpora file. Mine is only 200mb compressed and with my dictionary the prediction accuracy is nearly perfect.