maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   Applications (https://talk.maemo.org/forumdisplay.php?f=41)
-   -   [DEVEL] Saera: Siri clone for Maemo5, Harmattan and Sailfish OS (https://talk.maemo.org/showthread.php?t=84753)

vistaus 2015-04-18 17:59

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Awesoooooome! But buggy still. I speak as clear as possible but she thinks I'm asking way different things than I really asked for :( Otherwise, great already!

taixzo 2015-04-18 20:37

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Quote:

Originally Posted by deiv (Post 1467640)
good job! but, on my phone, saera doesn't speak and "she" mispelling almost everything

For speech support you need eSpeak installed. As for misspelling: not really sure how this would happen, could you give me an example?

Quote:

Originally Posted by vistaus (Post 1467642)
Awesoooooome! But buggy still. I speak as clear as possible but she thinks I'm asking way different things than I really asked for :( Otherwise, great already!

Bear in mind that as yet Saera supports only a limited number of actions and will try to interpret anything you say as one of those. Here are examples of things Saera should recognize correctly:
  • Set alarm for two thirty
  • Flip a coin
  • What emails do I have
  • Hello
  • Play music
  • Pause the music
  • Roll a pair of dice
  • I live in London
  • What time is it
  • What is the weather in New York

Does Saera correctly recognize things from this list?

mariusmssj 2015-04-18 20:48

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Where would one find espeak? I looked on open repos but found nothing for jolla

nodevel 2015-04-18 20:55

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Thanks for the app and for the compliment - I'm glad you like the icon :)

It looks very promising, but I have few comments:
1)
I think that the reason for misspelling, mentioned by vistaus, is the limited recognition dictionary.
Currently, it seems like Saera tries to fit everything you say into few words/sentences it recognizes, which results in triggering many actions you didn't want to trigger.

One example:
I was trying to say something to Saera, but it recognized one word I said as a name of a city and immediately changed my home location to that city.

If it had a bigger dictionary, it could either recognize it correctly, or recognize it incorrectly as something else (but not necessarily the city name, as there would be more words "between" your pronounced word and the city name) and then say "I'm sorry, I don't know what you mean by..." which doesn't seem to be an option right now.

EDIT:
I hope you know what I mean - more freedom in recognition would bring the possibility to 'fail' - to say something that doesn't trigger anything, but is at least well recognized


I remember that this was a problem on the N900, but with the computing power of Jolla, it shouldn't be a problem to have more robust recognition. Or am I wrong?

2)
I know that it can't get into the Harbour right now, but in case this situation changes, you should consider renaming the binary/package to 'harbour-saera' to prevent upgrade path breakages in the future.


3)
It would be cool to have some kind of modularity in the future:
Something like:
  • the GHNS API in KDE.
  • the Situations app on SailfishOS which allows to download more features from inside the app (even paid ones, in the future).

For example I thought of a silly plugin 'Truth or Dare', but it is too silly to push it to upstream, yet something I'd do if I had some free time :)


Anyways, good luck with the app!

taixzo 2015-04-18 21:19

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Quote:

Originally Posted by mariusmssj (Post 1467659)
Where would one find espeak? I looked on open repos but found nothing for jolla

I installed MartinK's build from here.



Quote:

Originally Posted by nodevel (Post 1467660)
Thanks for the app and for the compliment - I'm glad you like the icon :)

It looks very promising, but I have few comments:
1)
I think that the reason for misspelling, mentioned by vistaus, is the limited recognition dictionary.
Currently, it seems like Saera tries to fit everything you say into few words/sentences it recognizes, which results in triggering many actions you didn't want to trigger.

One example:
I was trying to say something to Saera, but it recognized one word I said as a name of a city and immediately changed my home location to that city.

If it had a bigger dictionary, it could either recognize it correctly, or recognize it incorrectly as something else (but not necessarily the city name, as there would be more words "between" your pronounced word and the city name) and then say "I'm sorry, I don't know what you mean by..." which doesn't seem to be an option right now.

I remember that this was a problem on the N900, but with the computing power of Jolla, it shouldn't be a problem to have more robust recognition. Or am I wrong?

It's not really a question of computing power; the input is currently decoded in essentially real-time, and would have no problem with more words. The issue with recognition accuracy is the training data. This is what sets commercial speech recognition systems like Google's apart - they use the same algorithms, but trained with vastly more data. Unfortunately, even if we had access to those data sets, they would be useless on a mobile platform simply due to size - and in that regard, the Jolla has no more storage space than the N900. The acoustic model that I have included is the VoxForge model, which is nearly 7 MB. To get recognition accuracy like Google's, you need tens of gigabytes of training data, which is not feasible on a mobile platform (and why Google Now and Siri send audio off to be processed on a server).

That's for the acoustic model. The other issue is the language model, which tells the recognition engine how words are likely to fit into a sentence. I currently have to hand-assemble the language model, which is partly why it is not so big - it took me about two days to build the current model. I'm working on scripting some bits, so it can load new words and fit them into grammar types (like being able to pronounce contact names), but the fact is that Julius doesn't have a free speech (dictation) grammar for English, and even if it did it would likely have accuracy issues like Pocketsphinx did.

Quote:

Originally Posted by nodevel (Post 1467660)
2)
I know that it can't get into the Harbour right now, but in case this situation changes, you should consider renaming the binary/package to 'harbour-saera' to prevent upgrade path breakages in the future.

Good point; I'll have that changed by release.

Quote:

Originally Posted by nodevel (Post 1467660)
3)
It would be cool to have some kind of modularity in the future:
Something like:
  • the GHNS API in KDE.
  • the Situations app on SailfishOS which allows to download more features from inside the app (even paid ones, in the future).

For example I thought of a silly plugin 'Truth or Dare', but it is too silly to push it to upstream, yet something I'd do if I had some free time :)


Anyways, good luck with the app!

Plugins are a planned feature, but not ready for this release. ;)

taixzo 2015-04-19 04:11

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
I have created a website for the project.

mariusmssj 2015-04-19 06:43

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Can't seem to install MartinK's build of espeak, says "failed to install" every time. I guess I am missing other libraries that it requires.

phap 2015-04-19 06:50

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Sorry if it was already asked or answered, but is there a "how to" somewhere ? I saw in the website previous post all the features, but how do you make them work, what do you have to ask?

nodevel 2015-04-19 07:07

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Quote:

Originally Posted by mariusmssj (Post 1467678)
Can't seem to install MartinK's build of espeak, says "failed to install" every time. I guess I am missing other libraries that it requires.

Yes, you need to install portaudio first.

EDIT:
Quote:

Originally Posted by taixzo (Post 1467663)
To get recognition accuracy like Google's, you need tens of gigabytes of training data, which is not feasible on a mobile platform (and why Google Now and Siri send audio off to be processed on a server).

Thank you for explanation! Could you maybe in the future offer an option to choose between Julius and the Google Speech Recognition API?
I am aware of the advantages of Julius (can work offline, not sending data to a 3rd party), but it shouldn't be too hard to implement.

mariusmssj 2015-04-19 08:00

Re: [DEVEL] Saera: Siri clone for Maemo5 and Harmattan
 
Thanks nodevel :) I found that neil made a rmp on the openrepos

also taixzo the list of commands that it can do work pretty well :)


All times are GMT. The time now is 23:53.

vBulletin® Version 3.8.8