View Single Post
Posts: 959 | Thanked: 3,427 times | Joined on Apr 2012
#775
Originally Posted by mariusmssj View Post
Where would one find espeak? I looked on open repos but found nothing for jolla
I installed MartinK's build from here.



Originally Posted by nodevel View Post
Thanks for the app and for the compliment - I'm glad you like the icon

It looks very promising, but I have few comments:
1)
I think that the reason for misspelling, mentioned by vistaus, is the limited recognition dictionary.
Currently, it seems like Saera tries to fit everything you say into few words/sentences it recognizes, which results in triggering many actions you didn't want to trigger.

One example:
I was trying to say something to Saera, but it recognized one word I said as a name of a city and immediately changed my home location to that city.

If it had a bigger dictionary, it could either recognize it correctly, or recognize it incorrectly as something else (but not necessarily the city name, as there would be more words "between" your pronounced word and the city name) and then say "I'm sorry, I don't know what you mean by..." which doesn't seem to be an option right now.

I remember that this was a problem on the N900, but with the computing power of Jolla, it shouldn't be a problem to have more robust recognition. Or am I wrong?
It's not really a question of computing power; the input is currently decoded in essentially real-time, and would have no problem with more words. The issue with recognition accuracy is the training data. This is what sets commercial speech recognition systems like Google's apart - they use the same algorithms, but trained with vastly more data. Unfortunately, even if we had access to those data sets, they would be useless on a mobile platform simply due to size - and in that regard, the Jolla has no more storage space than the N900. The acoustic model that I have included is the VoxForge model, which is nearly 7 MB. To get recognition accuracy like Google's, you need tens of gigabytes of training data, which is not feasible on a mobile platform (and why Google Now and Siri send audio off to be processed on a server).

That's for the acoustic model. The other issue is the language model, which tells the recognition engine how words are likely to fit into a sentence. I currently have to hand-assemble the language model, which is partly why it is not so big - it took me about two days to build the current model. I'm working on scripting some bits, so it can load new words and fit them into grammar types (like being able to pronounce contact names), but the fact is that Julius doesn't have a free speech (dictation) grammar for English, and even if it did it would likely have accuracy issues like Pocketsphinx did.

Originally Posted by nodevel View Post
2)
I know that it can't get into the Harbour right now, but in case this situation changes, you should consider renaming the binary/package to 'harbour-saera' to prevent upgrade path breakages in the future.
Good point; I'll have that changed by release.

Originally Posted by nodevel View Post
3)
It would be cool to have some kind of modularity in the future:
Something like:
  • the GHNS API in KDE.
  • the Situations app on SailfishOS which allows to download more features from inside the app (even paid ones, in the future).

For example I thought of a silly plugin 'Truth or Dare', but it is too silly to push it to upstream, yet something I'd do if I had some free time


Anyways, good luck with the app!
Plugins are a planned feature, but not ready for this release.
 

The Following 10 Users Say Thank You to taixzo For This Useful Post: