![]() |
2018-12-06
, 06:42
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#22
|
The Following User Says Thank You to FlyingAntero For This Useful Post: | ||
![]() |
2018-12-06
, 15:49
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#23
|
I don't know how it will work out. I have downloaded the files and uploaded them to the drive (65Gb). I can share a link if someone wants to try it out. If not then I might try with The National Library's journal's Finnish n-grams by myself because it is easier that way.
The Following 2 Users Say Thank You to ljo For This Useful Post: | ||
![]() |
2018-12-06
, 19:55
|
Posts: 1,414 |
Thanked: 7,547 times |
Joined on Aug 2016
@ Estonia
|
#24
|
The Following 3 Users Say Thank You to rinigus For This Useful Post: | ||
![]() |
2018-12-07
, 03:19
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#25
|
![]() |
2018-12-07
, 09:07
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#26
|
The Following 4 Users Say Thank You to ljo For This Useful Post: | ||
![]() |
2018-12-11
, 11:48
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#27
|
The Following 3 Users Say Thank You to ljo For This Useful Post: | ||
![]() |
2018-12-12
, 09:29
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#28
|
The Following User Says Thank You to FlyingAntero For This Useful Post: | ||
![]() |
2018-12-12
, 10:03
|
Posts: 1,414 |
Thanked: 7,547 times |
Joined on Aug 2016
@ Estonia
|
#29
|
The Following 2 Users Say Thank You to rinigus For This Useful Post: | ||
![]() |
2018-12-12
, 10:39
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#30
|
Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...
The Following User Says Thank You to FlyingAntero For This Useful Post: | ||
![]() |
Tags |
predictive text, presage, text-prediction |
|
I am pretty sure we'd get a Tay.ai type prediction engine out of that corpus!
Dave999: Meateo balloons. What’s so special with em? Is it a ballon?