maemo.org - Talk - [DEVEL] Saera: Siri clone for Maemo5, Harmattan and Sailfish OS

maemo.org - Talk (https://talk.maemo.org/index.php)

- Applications (https://talk.maemo.org/forumdisplay.php?f=41)

- - [DEVEL] Saera: Siri clone for Maemo5, Harmattan and Sailfish OS (https://talk.maemo.org/showthread.php?t=84753)

Re: [DEVEL] Saera: Siri clone for Maemo 5

Hi taixzo

Originally posted by taixzo:

Quote:

I hope there is some way to extract the score, if not I may have to dive into the plugin code and try to find a way to extract it.

I am not sure whether I made myself clear, the patched plugin already delivers the score. In order to be compatible it sends an additional message (called 'result_score') right before it sends the 'result' message. So after receiving 'result_score' the 'result' message can be omitted. If you want I can send you that plugin.

Quote:

One thing that I hope for is runtime model/dictionary switching,

My app always aims to keep the current dictionary/language model as small as possible (for a better recognition accuracy) and switches the dict/lm according to its internal context. For example: imagine there is a small command-set which allows to launch a new app. After the voice-control-app recognized the command 'launch' it switches its context to 'I have to launch something', loads a new appropriate dictionary which only contains all the names of the programs that might be launched etc.

Perhaps this may also be an approach for your saera.

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

Originally Posted by taixzo (Post 1224906)

One thing that I hope for is runtime model/dictionary switching, to allow full dictation when sending e.g. a text. Hopefully I can use some gstreamer trickery (like using a filesink at the end of the pipeline), and re-try to understand what was said (with a larger dictionary/model) if the score is low...I hope there is some way to extract the score, if not I may have to dive into the plugin code and try to find a way to extract it.

The only way out for me is some kind of grammar engine. But it'd be hard to code.

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

Originally Posted by myra (Post 1224883)

that german translation is absolutely correct except two minor spelling errors. It should be "Wusstest Du, dass das N900 Mac OSX verwenden kann?" instead of "Wußtest Du, das das N900 Mac OSX verwenden kann?" but this does not influence espeak's pronunciation.

Thanks. And you are somewhat right on the second part ('das das' is wrong, there should be a sharp 's' as the first one, i.e. 'daß das').

But i'm afraid i have to clarify i'm an evangelist of the true (literally old school) German language rules, and therefore i strongly detest the silly, unwanted (by most of the population) spelling reforms forced upon us by the Government in 1996 - so, f.e. no double 's' etc. for me, but 'ß'.

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

Originally Posted by myra (Post 1225297)

Hi taixzo

Originally posted by taixzo:

I am not sure whether I made myself clear, the patched plugin already delivers the score. In order to be compatible it sends an additional message (called 'result_score') right before it sends the 'result' message. So after receiving 'result_score' the 'result' message can be omitted. If you want I can send you that plugin.

My app always aims to keep the current dictionary/language model as small as possible (for a better recognition accuracy) and switches the dict/lm according to its internal context. For example: imagine there is a small command-set which allows to launch a new app. After the voice-control-app recognized the command 'launch' it switches its context to 'I have to launch something', loads a new appropriate dictionary which only contains all the names of the programs that might be launched etc.

Perhaps this also an approach for your saera.

I would greatly appreciate if you would send me that code. Regarding what you said: that was more or less what I had in mind, the base dictionary would have a few words (actions like "call", "text", "launch" etc. and question words like "what", "when" etc.) which would then switch to the appropriate dictionary, only re-trying with the larger list if the score wasn't high enough.

This should also allow Saera to start up faster: we don't necessarily need to load the big model until Saera needs to re-try something.

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

Originally Posted by don_falcone (Post 1225304)

Let's use 'ß' in the sentences list, and we can add an option to convert them all to 'ss' if necessary.

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

i strongly detest the silly, unwanted (by most of the population) spelling reforms

Same here don_falcone, but on the long run we have to accept it. Hopefully you don't mind, let me ask you: are you from Bavaria?

Re: [DEVEL] Saera: Siri clone for Maemo 5

(H)o(b|pe)viously not in my lifetime :) And nope too - but check my location, it's obvious ;)

Re: [DEVEL] Saera: Siri clone for Maemo 5

taixzo,
you can download the patched gstpocketsphinx-plugin
here.
md5sum: 84fd12b19df535177870920c29615789

Here are the appropriate saera-code snippets to make use of the scored result:

Code:

class Saera:

        def __init__(self):

                self.result_score = False

Code:

        def init_gst(self):

                """Initialize the speech components"""

                self.pipeline = gst.parse_launch('pulsesrc ! audioconvert ! audioresample '

                                                                                 + '! vader name=vad  auto-threshold=true '

                                                                                 + '! pocketsphinx name=asr ! fakesink')

                asr = self.pipeline.get_by_name('asr')

                asr.connect('partial_result', self.asr_partial_result)

                asr.connect('result', self.asr_result)

                asr.connect('result_score', self.asr_result_score)

                asr.set_property('configured', True)

Code:

        def asr_result_score(self, asr, text, score):

                """Forward result signals on the bus to the main thread."""

                struct = gst.Structure('result_score')

                struct.set_value('hyp', text)

                struct.set_value('score', score)

                asr.post_message(gst.message_new_application(asr, struct))

Code:

        def application_message(self, bus, msg):

                """Receive application messages from the bus."""

                msgtype = msg.structure.get_name()

                if msgtype == 'partial_result':

                        self.partial_result(msg.structure['hyp'], msg.structure['uttid'])

                elif msgtype == 'result_score':

                        self.result_score = True

                        self.final_result_score(msg.structure['hyp'], msg.structure['score'])

                        # self.pipeline.set_state(gst.STATE_PAUSED)

                elif msgtype == 'result'  and  self.result_score == False:

                        self.final_result(msg.structure['hyp'], msg.structure['uttid'])

Code:

        def final_result_score(self, hyp, score):

                """Insert the final result."""

                # All this stuff appears as one single action

                print "Final Result: ", hyp, "  score: ", score

                if int(score) > -18500000:

                        self.run_saera(None, "speech-event", hyp)

Regarding the minimum score, this depends on the device (on my laptop the score is much higher than on the N900) and on the pocketsphinx settings (min frequency, max frequency, num channels etc), so I suggest to stay with the default settings. (I tried to vary these settings in order to get a better accuracy, but no luck.)

BTW: How can I upload a file on talk.maemo.org ?

Re: [DEVEL] Saera: Siri clone for Maemo 5

taixzo, it's absolutely wonderful and amazing, what this project evolved into in just few days time. I'm absolutely sure, that You're one of favorites for Coding Competition with Saera, so please, don't forget to apply there.

As for issues with uploading permissions, we're working hard with our technical contact, in order to fix it. Of course, I could also lend You my garage account with upload permissions, but I think it's quite pointless - we need whole procedure working smoothly.
---

As for program itself - again, what amazes me most, is that it's not simply "speak recognition" program, but first attempt to bring AI to our N900. Sure, I also don't expect it to pass Turing test soon ;) (not that Turing test is good measurement of intelligence, anyway). I really hope, that being a basic - and developed - AI won't disappear from scope of Saera, for the sake of functionality.

BTW, what's the current status of possibility to change her name? For some reasons too long to explain here, it would be very useful in my use case :)

Also, I'm quite new to this whole digitized speech thing - even if I do, lets say, polish corpus.txt, how to ensure that text written in correct polish will be pronounced correctly?

I know You're busy guy her,e so I don't ask for (re)writing tutorial - maybe some link for documentation? Or things already present in package are everything I need?

/Estel

Re: [DEVEL] Saera: Siri clone for Maemo 5

Quote:

Originally Posted by Estel (Post 1226448)

According to the website that the banner links to, the submission site isn't live yet, so I added it to the table on that site.

Quote:

As for issues with uploading permissions, we're working hard with our technical contact, in order to fix it. Of course, I could also lend You my garage account with upload permissions, but I think it's quite pointless - we need whole procedure working smoothly.

Thank you!

Quote:

As for program itself - again, what amazes me most, is that it's not simply "speak recognition" program, but first attempt to bring AI to our N900. Sure, I also don't expect it to pass Turing test soon ;) (not that Turing test is good measurement of intelligence, anyway). I really hope, that being a basic - and developed - AI won't disappear from scope of Saera, for the sake of functionality.

Saera will continue to be an AI project, and I am hopeful that this will allow for increased functionality (maybe you can eventually 'teach' Saera how to do things she doesn't know how to do).

Quote:

BTW, what's the current status of possibility to change her name? For some reasons too long to explain here, it would be very useful in my use case :)

Changing the official name or changing the name after installation? In the latter case it would be fairly simple, just search and replace "Saera" in a few files.

Quote:

Also, I'm quite new to this whole digitized speech thing - even if I do, lets say, polish corpus.txt, how to ensure that text written in correct polish will be pronounced correctly?

I know You're busy guy her,e so I don't ask for (re)writing tutorial - maybe some link for documentation? Or things already present in package are everything I need?

/Estel

The beginning of each sentences_<language>.py file contains a line with the espeak command line, of the form

Code:

espeak_cmdline = "espeak -vCC+f2"

, where CC is the two-letter language code. You can also change the 'f' to a 'm' to give Saera a male voice.
As for documentation - I haven't written any; the basics of the corpus is that it defines the words that Pocketsphinx will recognize, and gives it a model to build grammar from; it should recognize any sentence in the corpus nearly 100%, but if you say something else it will try to build it out of words and grammar constructs found in the corpus.