speech recognition - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

Question

Welcome To Ask or Share your Answers For Others

speech recognition - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

speech recognition - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

I was playing around with this SAPI v5.1 library. So I was testing a sample WAV file I have. (Download it from here). Anyway, the sound in that file is clear and easy. It contains only one word which is number three. Now when I run the following code, I get number 8 or "eight". If I remove it, I get 7. If I try to randomize the list I get different results and so on. I'm really getting confused and started to think that SpeachRecognition in SAPI library doesn't work at all...

Anyway here is what I'm doing,

    private void button1_Click(object sender, EventArgs e)
    {
        //Add choices to grammar.
        Choices mychoices = new Choices();
        mychoices.Add("one");
        mychoices.Add("two");
        mychoices.Add("three");
        mychoices.Add("four");
        mychoices.Add("five");
        mychoices.Add("six");
        mychoices.Add("seven");
        mychoices.Add("eight");
        mychoices.Add("nine");
        mychoices.Add("zero");
        mychoices.Add("1");
        mychoices.Add("2");
        mychoices.Add("3");
        mychoices.Add("4");
        mychoices.Add("5");
        mychoices.Add("6");
        mychoices.Add("7");
        mychoices.Add("8");
        mychoices.Add("9");
        mychoices.Add("0");

        Grammar myGrammar = new Grammar(new GrammarBuilder(mychoices));

        //Create the engine.
        SpeechRecognitionEngine reco = new SpeechRecognitionEngine();

        //Read audio stream from wav file.
        reco.SetInputToWaveFile("3.wav");
        reco.LoadGrammar(myGrammar);

        //Get the recognized value.
        reco.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(reco_SpeechRecognized);

        reco.RecognizeAsync(RecognizeMode.Multiple);
    }

    void reco_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        MessageBox.Show(e.Result.Text);
    }

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:50:17+0000

How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

8 bits per sample
single channel mono
22,050 samples per second
PCM encoding

You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer - RecognizerInfo.SupportedAudioFormats Property.

Update:

Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

On my Windows 7 machine, my default recognizer supports only the following audio formats:

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: "three" and "3". This probably isn't what you want. You could use a semantic result value in your grammar to return the number 3 for the word "three".

Categories

speech recognition - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

speech recognition - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags