Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
294 views
in Technique[技术] by (71.8m points)

ios - Continuous speech recogn. with SFSpeechRecognizer (ios10-beta)

I am trying to perform cont. speech recognition using AVCapture on iOS 10 beta. I have setup captureOutput(...) to continuously get CMSampleBuffers. I put these buffers directly into SFSpeechAudioBufferRecognitionRequest which I set up previously like this:

... do some setup
  SFSpeechRecognizer.requestAuthorization { authStatus in
    if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized {
      self.m_recognizer = SFSpeechRecognizer()
      self.m_recognRequest = SFSpeechAudioBufferRecognitionRequest()
      self.m_recognRequest?.shouldReportPartialResults = false
      self.m_isRecording = true
    } else {
      print("not authorized")
    }
  }
.... do further setup


func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

if(!m_AV_initialized) {
  print("captureOutput(...): not initialized !")
  return
}
if(!m_isRecording) {
  return
}

let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer)
let mediaType = CMFormatDescriptionGetMediaType(formatDesc!)
if (mediaType == kCMMediaType_Audio) {
  // process audio here
  m_recognRequest?.appendAudioSampleBuffer(sampleBuffer)
}
return
}

The whole things works for a few seconds. Then captureOutput is not called anymore. If I comment out the line appendAudioSampleBuffer(sampleBuffer) then the captureOutput is called as long as the app runs (as expected). Obviously putting the sample buffers into the speech recognition engine somehow blocks further execution. I guess that the available Buffers are consumed after some time and the process stops somehow because it can't get anymore buffers ???

I should mention that everything that is recorded during the first 2 seconds leads to correct recognitions. I just don't know how exactly the SFSpeech API is working since Apple did not put any text into the beta docs. BTW: How to use SFSpeechAudioBufferRecognitionRequest.endAudio() ?

Anybody knows something here ?

Thanks Chris

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I converted the SpeakToMe sample Swift code from the Speech Recognition WWDC developer talk to Objective-C, and it worked for me. For Swift, see https://developer.apple.com/videos/play/wwdc2016/509/, or for Objective-C see below.

- (void) viewDidAppear:(BOOL)animated {

_recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
[_recognizer setDelegate:self];
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
    switch (authStatus) {
        case SFSpeechRecognizerAuthorizationStatusAuthorized:
            //User gave access to speech recognition
            NSLog(@"Authorized");
            break;

        case SFSpeechRecognizerAuthorizationStatusDenied:
            //User denied access to speech recognition
            NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
            break;

        case SFSpeechRecognizerAuthorizationStatusRestricted:
            //Speech recognition restricted on this device
            NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
            break;

        case SFSpeechRecognizerAuthorizationStatusNotDetermined:
            //Speech recognition not yet authorized

            break;

        default:
            NSLog(@"Default");
            break;
    }
}];

audioEngine = [[AVAudioEngine alloc] init];
_speechSynthesizer  = [[AVSpeechSynthesizer alloc] init];         
[_speechSynthesizer setDelegate:self];
}


-(void)startRecording
{
[self clearLogs:nil];

NSError * outError;

AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
[audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation  error:&outError];

request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

inputNode = [audioEngine inputNode];

if (request2 == nil) {
    NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
}

if (inputNode == nil) {

    NSLog(@"Unable to created a inputNode object");
}

request2.shouldReportPartialResults = true;

_currentTask = [_recognizer recognitionTaskWithRequest:request2
                delegate:self];

[inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
    NSLog(@"Block tap!");

    [request2 appendAudioPCMBuffer:buffer];

}];

    [audioEngine prepare];
    [audioEngine startAndReturnError:&outError];
    NSLog(@"Error %@", outError);
}

- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {

NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

[self log:translatedString];

if ([result isFinal]) {
    [audioEngine stop];
    [inputNode removeTapOnBus:0];
    _currentTask = nil;
    request2 = nil;
}
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...