将输入流式传输到System.Speech.Recognition.SpeechRecognitionEngine

我试图从TCP套接字在C#中进行“流式”语音识别。 我遇到的问题是SpeechRecognitionEngine.SetInputToAudioStream()似乎需要一个可以寻找的定义长度的Stream。 现在,我能想到的唯一方法就是在更多输入进来时在MemoryStream上重复运行识别器。

这里有一些代码来说明:

SpeechRecognitionEngine appRecognizer = new SpeechRecognitionEngine(); System.Speech.AudioFormat.SpeechAudioFormatInfo formatInfo = new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono); NetworkStream stream = new NetworkStream(socket,true); appRecognizer.SetInputToAudioStream(stream, formatInfo); // At the line above a "NotSupportedException" complaining that "This stream does not support seek operations." 

有谁知道怎么解决这个问题? 它必须支持某种流输入,因为它使用SetInputToDefaultAudioDevice()与麦克风一起正常工作。

谢谢,肖恩

我通过覆盖流类来获得实时语音识别:

 class SpeechStreamer : Stream { private AutoResetEvent _writeEvent; private List _buffer; private int _buffersize; private int _readposition; private int _writeposition; private bool _reset; public SpeechStreamer(int bufferSize) { _writeEvent = new AutoResetEvent(false); _buffersize = bufferSize; _buffer = new List(_buffersize); for (int i = 0; i < _buffersize;i++ ) _buffer.Add(new byte()); _readposition = 0; _writeposition = 0; } public override bool CanRead { get { return true; } } public override bool CanSeek { get { return false; } } public override bool CanWrite { get { return true; } } public override long Length { get { return -1L; } } public override long Position { get { return 0L; } set { } } public override long Seek(long offset, SeekOrigin origin) { return 0L; } public override void SetLength(long value) { } public override int Read(byte[] buffer, int offset, int count) { int i = 0; while (i= _writeposition) { _writeEvent.WaitOne(100, true); continue; } buffer[i] = _buffer[_readposition+offset]; _readposition++; if (_readposition == _buffersize) { _readposition = 0; _reset = false; } i++; } return count; } public override void Write(byte[] buffer, int offset, int count) { for (int i = offset; i < offset+count; i++) { _buffer[_writeposition] = buffer[i]; _writeposition++; if (_writeposition == _buffersize) { _writeposition = 0; _reset = true; } } _writeEvent.Set(); } public override void Close() { _writeEvent.Close(); _writeEvent = null; base.Close(); } public override void Flush() { } } 

...并使用它的实例作为SetInputToAudioStream方法的流输入。 一旦流返回长度或返回的计数小于请求的数量,识别引擎就认为输入已完成。 这将设置一个永不完成的循环缓冲区。

您是否尝试在System.IO.BufferedStream中包装网络流?

 NetworkStream netStream = new NetworkStream(socket,true); BufferedStream buffStream = new BufferedStream(netStream, 8000*16*1); // buffers 1 second worth of data appRecognizer.SetInputToAudioStream(buffStream, formatInfo); 

我最后缓冲输入,然后连续更大的块发送到语音识别引擎。 例如,我可能首先发送前0.25秒,然后是前0.5秒,然后是前0.75秒,依此类推,直到我得到结果。 我不确定这是否是最有效的解决方法,但它会为我带来满意的结果。

祝你好运,肖恩

显然它无法完成(“按设计”!)。 见http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/fcf62d6d-19df-4ca9-9f1f-17724441f84e

这是我的解决方案。

 class FakeStreamer : Stream { public bool bExit = false; Stream stream; TcpClient client; public FakeStreamer(TcpClient client) { this.client = client; this.stream = client.GetStream(); this.stream.ReadTimeout = 100; //100ms } public override bool CanRead { get { return stream.CanRead; } } public override bool CanSeek { get { return false; } } public override bool CanWrite { get { return stream.CanWrite; } } public override long Length { get { return -1L; } } public override long Position { get { return 0L; } set { } } public override long Seek(long offset, SeekOrigin origin) { return 0L; } public override void SetLength(long value) { stream.SetLength(value); } public override int Read(byte[] buffer, int offset, int count) { int len = 0, c = count; while (c > 0 && !bExit) { try { len = stream.Read(buffer, offset, c); } catch (Exception e) { if (e.HResult == -2146232800) // Timeout { continue; } else { //Exit read loop break; } } if (!client.Connected || len == 0) { //Exit read loop return 0; } offset += len; c -= len; } return count; } public override void Write(byte[] buffer, int offset, int count) { stream.Write(buffer,offset,count); } public override void Close() { stream.Close(); base.Close(); } public override void Flush() { stream.Flush(); } } 

如何使用:

 //client connect in TcpClient clientSocket = ServerSocket.AcceptTcpClient(); FakeStreamer buffStream = new FakeStreamer(clientSocket); ... //recognizer init m_recognizer.SetInputToAudioStream(buffStream , audioFormat); ... //recognizer end if (buffStream != null) buffStream.bExit = true;