It's one thing to match pre-recorded samples to microphone input, and an entirely different matter to do complete speech-to-text recognition. Try for example Google's own recognition system: even with all the processing power their servers possess the recognition system can't achieve more than about 70% accuracy and that is when the speaker speaks very, very clear. If the speaker doesn't speak all that clearly, if there is any kind of background noise, if the person speaks some dialect, or if (s)he has some sort of an accent to the speech the correctness of recognition drops sharply. Then there's the issue of N900 being a small device with limited microphone capabilities: there is not enough processing power to do accurate recognition, and the microphone would receive sufficiently clear input only when spoken very near to it.