“Chunked”MemoryStream

我正在寻找MemoryStream的实现,它不会将内存分配为一个大块,而是一个块的集合。 我想在内存(64位)中存储几GB的数据,并避免内存碎片的限制。

您需要首先确定虚拟地址碎片是否是问题。

如果你是64位机器(你似乎表明你是),我非常怀疑它。 每个64位进程几乎都有可用的整个64位虚拟内存空间,您唯一担心的是虚拟地址空间碎片而非物理内存碎片(这是操作系统必须担心的)。 操作系统内存管理器已经在内存中寻找内存。 对于可预见的未来,在物理内存耗尽之前,您不会耗尽虚拟地址空间。 在我们退休之前,这不太可能发生变化。

如果您有32位地址空间,然后在GB ramge中分配连续的大块内存,您将很快遇到碎片问题。 CLR中没有库存块分配内存流。 ASP.NET中有一个(由于其他原因),但它不可访问。 如果你必须走这条路,你可能最好自己写一个,因为你的应用程序的使用模式不太可能与许多其他类似,并且试图将你的数据放入32位地址空间可能是你的性能瓶颈。

如果您正在操作GB数据,我强烈建议您需要64位进程。 无论你是多么敏锐,它都会比32位地址空间碎片的手动解决方案做得更好。

像这样的东西:

class ChunkedMemoryStream : Stream { private readonly List _chunks = new List(); private int _positionChunk; private int _positionOffset; private long _position; public override bool CanRead { get { return true; } } public override bool CanSeek { get { return true; } } public override bool CanWrite { get { return true; } } public override void Flush() { } public override long Length { get { return _chunks.Sum(c => c.Length); } } public override long Position { get { return _position; } set { _position = value; _positionChunk = 0; while (_positionOffset != 0) { if (_positionChunk >= _chunks.Count) throw new OverflowException(); if (_positionOffset < _chunks[_positionChunk].Length) return; _positionOffset -= _chunks[_positionChunk].Length; _positionChunk++; } } } public override int Read(byte[] buffer, int offset, int count) { int result = 0; while ((count != 0) && (_positionChunk != _chunks.Count)) { int fromChunk = Math.Min(count, _chunks[_positionChunk].Length - _positionOffset); if (fromChunk != 0) { Array.Copy(_chunks[_positionChunk], _positionOffset, buffer, offset, fromChunk); offset += fromChunk; count -= fromChunk; result += fromChunk; _position += fromChunk; } _positionOffset = 0; _positionChunk++; } return result; } public override long Seek(long offset, SeekOrigin origin) { long newPos = 0; switch (origin) { case SeekOrigin.Begin: newPos = offset; break; case SeekOrigin.Current: newPos = Position + offset; break; case SeekOrigin.End: newPos = Length - offset; break; } Position = Math.Max(0, Math.Min(newPos, Length)); return newPos; } public override void SetLength(long value) { throw new NotImplementedException(); } public override void Write(byte[] buffer, int offset, int count) { while ((count != 0) && (_positionChunk != _chunks.Count)) { int toChunk = Math.Min(count, _chunks[_positionChunk].Length - _positionOffset); if (toChunk != 0) { Array.Copy(buffer, offset, _chunks[_positionChunk], _positionOffset, toChunk); offset += toChunk; count -= toChunk; _position += toChunk; } _positionOffset = 0; _positionChunk++; } if (count != 0) { byte[] chunk = new byte[count]; Array.Copy(buffer, offset, chunk, 0, count); _chunks.Add(chunk); _positionChunk = _chunks.Count; _position += count; } } } class Program { static void Main(string[] args) { ChunkedMemoryStream cms = new ChunkedMemoryStream(); Debug.Assert(cms.Length == 0); Debug.Assert(cms.Position == 0); cms.Position = 0; byte[] helloworld = Encoding.UTF8.GetBytes("hello world"); cms.Write(helloworld, 0, 3); cms.Write(helloworld, 3, 3); cms.Write(helloworld, 6, 5); Debug.Assert(cms.Length == 11); Debug.Assert(cms.Position == 11); cms.Position = 0; byte[] b = new byte[20]; cms.Read(b, 3, (int)cms.Length); Debug.Assert(b.Skip(3).Take(11).SequenceEqual(helloworld)); cms.Position = 0; cms.Write(Encoding.UTF8.GetBytes("seeya"), 0, 5); Debug.Assert(cms.Length == 11); Debug.Assert(cms.Position == 5); cms.Position = 0; cms.Read(b, 0, (byte) cms.Length); Debug.Assert(b.Take(11).SequenceEqual(Encoding.UTF8.GetBytes("seeya world"))); Debug.Assert(cms.Length == 11); Debug.Assert(cms.Position == 11); cms.Write(Encoding.UTF8.GetBytes(" again"), 0, 6); Debug.Assert(cms.Length == 17); Debug.Assert(cms.Position == 17); cms.Position = 0; cms.Read(b, 0, (byte)cms.Length); Debug.Assert(b.Take(17).SequenceEqual(Encoding.UTF8.GetBytes("seeya world again"))); } } 

Bing团队发布了RecyclableMemoryStream并在此处写了相关内容。 他们引用的好处是:

  1. 使用池缓冲区消除大对象堆分配
  2. 由于GC导致停顿时间少得多,因此产生的二代GC数量要少得多
  3. 通过限制池大小避免内存泄漏
  4. 避免内存碎片
  5. 提供出色的可调试性
  6. 提供绩效跟踪指标

我在我的应用程序中发现了类似的问题。 我已经阅读了大量压缩数据,并且使用MemoryStream遭遇了OutOfMemoryException。 我根据字节数组的集合编写了自己的“chunked”内存流实现。 如果你有任何想法如何使这个内存流更有效,请写信告诉我。

  public sealed class ChunkedMemoryStream : Stream { #region Constants private const int BUFFER_LENGTH = 65536; private const byte ONE = 1; private const byte ZERO = 0; #endregion #region Readonly & Static Fields private readonly Collection _chunks; #endregion #region Fields private long _length; private long _position; private const byte TWO = 2; #endregion #region C'tors public ChunkedMemoryStream() { _chunks = new Collection { new byte[BUFFER_LENGTH], new byte[BUFFER_LENGTH] }; _position = ZERO; _length = ZERO; } #endregion #region Instance Properties public override bool CanRead { get { return true; } } public override bool CanSeek { get { return true; } } public override bool CanWrite { get { return true; } } public override long Length { get { return _length; } } public override long Position { get { return _position; } set { if (!CanSeek) throw new NotSupportedException(); _position = value; if (_position > _length) _position = _length - ONE; } } private byte[] CurrentChunk { get { long positionDividedByBufferLength = _position / BUFFER_LENGTH; var chunkIndex = Convert.ToInt32(positionDividedByBufferLength); byte[] chunk = _chunks[chunkIndex]; return chunk; } } private int PositionInChunk { get { int positionInChunk = Convert.ToInt32(_position % BUFFER_LENGTH); return positionInChunk; } } private int RemainingBytesInCurrentChunk { get { Contract.Ensures(Contract.Result() > ZERO); int remainingBytesInCurrentChunk = CurrentChunk.Length - PositionInChunk; return remainingBytesInCurrentChunk; } } #endregion #region Instance Methods public override void Flush() { } public override int Read(byte[] buffer, int offset, int count) { if (offset + count > buffer.Length) throw new ArgumentException(); if (buffer == null) throw new ArgumentNullException(); if (offset < ZERO || count < ZERO) throw new ArgumentOutOfRangeException(); if (!CanRead) throw new NotSupportedException(); int bytesToRead = count; if (_length - _position < bytesToRead) bytesToRead = Convert.ToInt32(_length - _position); int bytesreaded = 0; while (bytesToRead > ZERO) { // get remaining bytes in current chunk // read bytes in current chunk // advance to next position int remainingBytesInCurrentChunk = RemainingBytesInCurrentChunk; if (remainingBytesInCurrentChunk > bytesToRead) remainingBytesInCurrentChunk = bytesToRead; Array.Copy(CurrentChunk, PositionInChunk, buffer, offset, remainingBytesInCurrentChunk); //move position in source _position += remainingBytesInCurrentChunk; //move position in target offset += remainingBytesInCurrentChunk; //bytesToRead is smaller bytesToRead -= remainingBytesInCurrentChunk; //count readed bytes; bytesreaded += remainingBytesInCurrentChunk; } return bytesreaded; } public override long Seek(long offset, SeekOrigin origin) { switch (origin) { case SeekOrigin.Begin: Position = offset; break; case SeekOrigin.Current: Position += offset; break; case SeekOrigin.End: Position = Length + offset; break; } return Position; } private long Capacity { get { int numberOfChunks = _chunks.Count; long capacity = numberOfChunks * BUFFER_LENGTH; return capacity; } } public override void SetLength(long value) { if (value > _length) { while (value > Capacity) { var item = new byte[BUFFER_LENGTH]; _chunks.Add(item); } } else if (value < _length) { var decimalValue = Convert.ToDecimal(value); var valueToBeCompared = decimalValue % BUFFER_LENGTH == ZERO ? Capacity : Capacity - BUFFER_LENGTH; //remove data chunks, but leave at least two chunks while (value < valueToBeCompared && _chunks.Count > TWO) { byte[] lastChunk = _chunks.Last(); _chunks.Remove(lastChunk); } } _length = value; if (_position > _length - ONE) _position = _length == 0 ? ZERO : _length - ONE; } public override void Write(byte[] buffer, int offset, int count) { if (!CanWrite) throw new NotSupportedException(); int bytesToWrite = count; while (bytesToWrite > ZERO) { //get remaining space in current chunk int remainingBytesInCurrentChunk = RemainingBytesInCurrentChunk; //if count of bytes to be written is fewer than remaining if (remainingBytesInCurrentChunk > bytesToWrite) remainingBytesInCurrentChunk = bytesToWrite; //if remaining bytes is still greater than zero if (remainingBytesInCurrentChunk > ZERO) { //write remaining bytes to current Chunk Array.Copy(buffer, offset, CurrentChunk, PositionInChunk, remainingBytesInCurrentChunk); //change offset of source array offset += remainingBytesInCurrentChunk; //change bytes to write bytesToWrite -= remainingBytesInCurrentChunk; //change length and position _length += remainingBytesInCurrentChunk; _position += remainingBytesInCurrentChunk; } if (Capacity == _position) _chunks.Add(new byte[BUFFER_LENGTH]); } } ///  /// Gets entire content of stream regardless of Position value and return output as byte array ///  /// byte array public byte[] ToArray() { var outputArray = new byte[Length]; if (outputArray.Length != ZERO) { long outputPosition = ZERO; foreach (byte[] chunk in _chunks) { var remainingLength = (Length - outputPosition) > chunk.Length ? chunk.Length : Length - outputPosition; Array.Copy(chunk, ZERO, outputArray, outputPosition, remainingLength); outputPosition = outputPosition + remainingLength; } } return outputArray; } ///  /// Method set Position to first element and write entire stream to another ///  /// Target stream public void WriteTo(Stream stream) { Contract.Requires(stream != null); Position = ZERO; var buffer = new byte[BUFFER_LENGTH]; int bytesReaded; do { bytesReaded = Read(buffer, ZERO, BUFFER_LENGTH); stream.Write(buffer, ZERO, bytesReaded); } while (bytesReaded > ZERO); } #endregion } 

这是一个完整的实现:

 ///  /// Defines a MemoryStream that does not sit on the Large Object Heap, thus avoiding memory fragmentation. ///  public sealed class ChunkedMemoryStream : Stream { ///  /// Defines the default chunk size. Currently defined as 0x10000. ///  public const int DefaultChunkSize = 0x10000; // needs to be < 85000 private List _chunks = new List(); private long _position; private int _chunkSize; private int _lastChunkPos; private int _lastChunkPosIndex; ///  /// Initializes a new instance of the  class. ///  public ChunkedMemoryStream() : this(DefaultChunkSize) { } ///  /// Initializes a new instance of the  class. ///  /// Size of the underlying chunks. public ChunkedMemoryStream(int chunkSize) : this(null) { } ///  /// Initializes a new instance of the  class based on the specified byte array. ///  /// The array of unsigned bytes from which to create the current stream. public ChunkedMemoryStream(byte[] buffer) : this(DefaultChunkSize, buffer) { } ///  /// Initializes a new instance of the  class based on the specified byte array. ///  /// Size of the underlying chunks. /// The array of unsigned bytes from which to create the current stream. public ChunkedMemoryStream(int chunkSize, byte[] buffer) { FreeOnDispose = true; ChunkSize = chunkSize; _chunks.Add(new byte[chunkSize]); if (buffer != null) { Write(buffer, 0, buffer.Length); Position = 0; } } ///  /// Gets or sets a value indicating whether to free the underlying chunks on dispose. ///  /// true if [free on dispose]; otherwise, false. public bool FreeOnDispose { get; set; } ///  /// Releases the unmanaged resources used by the  and optionally releases the managed resources. ///  /// true to release both managed and unmanaged resources; false to release only unmanaged resources. protected override void Dispose(bool disposing) { if (FreeOnDispose) { if (_chunks != null) { _chunks = null; _chunkSize = 0; _position = 0; } } base.Dispose(disposing); } ///  /// When overridden in a derived class, clears all buffers for this stream and causes any buffered data to be written to the underlying device. /// This implementation does nothing. ///  public override void Flush() { // do nothing } ///  /// When overridden in a derived class, reads a sequence of bytes from the current stream and advances the position within the stream by the number of bytes read. ///  /// An array of bytes. When this method returns, the buffer contains the specified byte array with the values between  and ( +  - 1) replaced by the bytes read from the current source. /// The zero-based byte offset in  at which to begin storing the data read from the current stream. /// The maximum number of bytes to be read from the current stream. ///  /// The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached. ///  ///  /// The sum of  and  is larger than the buffer length. ///  ///  ///  is null. ///  ///  ///  or  is negative. ///  ///  /// Methods were called after the stream was closed. ///  public override int Read(byte[] buffer, int offset, int count) { if (buffer == null) throw new ArgumentNullException("buffer"); if (offset < 0) throw new ArgumentOutOfRangeException("offset"); if (count < 0) throw new ArgumentOutOfRangeException("count"); if ((buffer.Length - offset) < count) throw new ArgumentException(null, "count"); CheckDisposed(); int chunkIndex = (int)(_position / ChunkSize); if (chunkIndex == _chunks.Count) return 0; int chunkPos = (int)(_position % ChunkSize); count = (int)Math.Min(count, Length - _position); if (count == 0) return 0; int left = count; int inOffset = offset; int total = 0; do { int toCopy = Math.Min(left, ChunkSize - chunkPos); Buffer.BlockCopy(_chunks[chunkIndex], chunkPos, buffer, inOffset, toCopy); inOffset += toCopy; left -= toCopy; total += toCopy; if ((chunkPos + toCopy) == ChunkSize) { if (chunkIndex == (_chunks.Count - 1)) { // last chunk break; } chunkPos = 0; chunkIndex++; } else { chunkPos += toCopy; } } while (left > 0); _position += total; return total; } ///  /// Reads a byte from the stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream. ///  ///  /// The unsigned byte cast to an Int32, or -1 if at the end of the stream. ///  ///  /// Methods were called after the stream was closed. ///  public override int ReadByte() { CheckDisposed(); if (_position >= Length) return -1; byte b = _chunks[(int)(_position / ChunkSize)][_position % ChunkSize]; _position++; return b; } ///  /// When overridden in a derived class, sets the position within the current stream. ///  /// A byte offset relative to the  parameter. /// A value of type  indicating the reference point used to obtain the new position. ///  /// The new position within the current stream. ///  ///  /// Methods were called after the stream was closed. ///  public override long Seek(long offset, SeekOrigin origin) { CheckDisposed(); switch (origin) { case SeekOrigin.Begin: Position = offset; break; case SeekOrigin.Current: Position += offset; break; case SeekOrigin.End: Position = Length + offset; break; } return Position; } private void CheckDisposed() { if (_chunks == null) throw new ObjectDisposedException(null, "Cannot access a disposed stream"); } ///  /// When overridden in a derived class, sets the length of the current stream. ///  /// The desired length of the current stream in bytes. ///  /// Methods were called after the stream was closed. ///  public override void SetLength(long value) { CheckDisposed(); if (value < 0) throw new ArgumentOutOfRangeException("value"); if (value > Length) throw new ArgumentOutOfRangeException("value"); long needed = value / ChunkSize; if ((value % ChunkSize) != 0) { needed++; } if (needed > int.MaxValue) throw new ArgumentOutOfRangeException("value"); if (needed < _chunks.Count) { int remove = (int)(_chunks.Count - needed); for (int i = 0; i < remove; i++) { _chunks.RemoveAt(_chunks.Count - 1); } } _lastChunkPos = (int)(value % ChunkSize); } ///  /// Converts the current stream to a byte array. ///  /// An array of bytes public byte[] ToArray() { CheckDisposed(); byte[] bytes = new byte[Length]; int offset = 0; for (int i = 0; i < _chunks.Count; i++) { int count = (i == (_chunks.Count - 1)) ? _lastChunkPos : _chunks[i].Length; if (count > 0) { Buffer.BlockCopy(_chunks[i], 0, bytes, offset, count); offset += count; } } return bytes; } ///  /// When overridden in a derived class, writes a sequence of bytes to the current stream and advances the current position within this stream by the number of bytes written. ///  /// An array of bytes. This method copies  bytes from  to the current stream. /// The zero-based byte offset in  at which to begin copying bytes to the current stream. /// The number of bytes to be written to the current stream. ///  /// The sum of  and  is greater than the buffer length. ///  ///  ///  is null. ///  ///  ///  or  is negative. ///  ///  /// Methods were called after the stream was closed. ///  public override void Write(byte[] buffer, int offset, int count) { if (buffer == null) throw new ArgumentNullException("buffer"); if (offset < 0) throw new ArgumentOutOfRangeException("offset"); if (count < 0) throw new ArgumentOutOfRangeException("count"); if ((buffer.Length - offset) < count) throw new ArgumentException(null, "count"); CheckDisposed(); int chunkPos = (int)(_position % ChunkSize); int chunkIndex = (int)(_position / ChunkSize); if (chunkIndex == _chunks.Count) { _chunks.Add(new byte[ChunkSize]); } int left = count; int inOffset = offset; do { int copied = Math.Min(left, ChunkSize - chunkPos); Buffer.BlockCopy(buffer, inOffset, _chunks[chunkIndex], chunkPos, copied); inOffset += copied; left -= copied; if ((chunkPos + copied) == ChunkSize) { chunkIndex++; chunkPos = 0; if (chunkIndex == _chunks.Count) { _chunks.Add(new byte[ChunkSize]); } } else { chunkPos += copied; } } while (left > 0); _position += count; if (chunkIndex == (_chunks.Count - 1)) { if ((chunkIndex > _lastChunkPosIndex) || ((chunkIndex == _lastChunkPosIndex) && (chunkPos > _lastChunkPos))) { _lastChunkPos = chunkPos; _lastChunkPosIndex = chunkIndex; } } } ///  /// Writes a byte to the current position in the stream and advances the position within the stream by one byte. ///  /// The byte to write to the stream. ///  /// Methods were called after the stream was closed. ///  public override void WriteByte(byte value) { CheckDisposed(); int chunkIndex = (int)(_position / ChunkSize); int chunkPos = (int)(_position % ChunkSize); if (chunkPos > (ChunkSize - 1)) //changed from (chunkPos >= (ChunkSize - 1)) { chunkIndex++; chunkPos = 0; if (chunkIndex == _chunks.Count) { _chunks.Add(new byte[ChunkSize]); } } _chunks[chunkIndex][chunkPos++] = value; _position++; if (chunkIndex == (_chunks.Count - 1)) { if ((chunkIndex > _lastChunkPosIndex) || ((chunkIndex == _lastChunkPosIndex) && (chunkPos > _lastChunkPos))) { _lastChunkPos = chunkPos; _lastChunkPosIndex = chunkIndex; } } } ///  /// Writes to the specified stream. ///  /// The stream. public void WriteTo(Stream stream) { if (stream == null) throw new ArgumentNullException("stream"); CheckDisposed(); for (int i = 0; i < _chunks.Count; i++) { int count = (i == (_chunks.Count - 1)) ? _lastChunkPos : _chunks[i].Length; stream.Write(_chunks[i], 0, count); } } ///  /// When overridden in a derived class, gets a value indicating whether the current stream supports reading. ///  ///  /// true if the stream supports reading; otherwise, false. ///  public override bool CanRead { get { return true; } } ///  /// When overridden in a derived class, gets a value indicating whether the current stream supports seeking. ///  ///  /// true if the stream supports seeking; otherwise, false. ///  public override bool CanSeek { get { return true; } } ///  /// When overridden in a derived class, gets a value indicating whether the current stream supports writing. ///  ///  /// true if the stream supports writing; otherwise, false. ///  public override bool CanWrite { get { return true; } } ///  /// When overridden in a derived class, gets the length in bytes of the stream. ///  ///  ///  /// A long value representing the length of the stream in bytes. ///  ///  /// Methods were called after the stream was closed. ///  public override long Length { get { CheckDisposed(); if (_chunks.Count == 0) return 0; return (_chunks.Count - 1) * ChunkSize + _lastChunkPos; } } ///  /// Gets or sets the size of the underlying chunks. Cannot be greater than or equal to 85000. ///  /// The chunks size. public int ChunkSize { get { return _chunkSize; } set { if ((value <= 0) || (value >= 85000)) throw new ArgumentOutOfRangeException("value"); _chunkSize = value; } } ///  /// When overridden in a derived class, gets or sets the position within the current stream. ///  ///  ///  /// The current position within the stream. ///  ///  /// Methods were called after the stream was closed. ///  public override long Position { get { CheckDisposed(); return _position; } set { CheckDisposed(); if (value < 0) throw new ArgumentOutOfRangeException("value"); if (value > Length) throw new ArgumentOutOfRangeException("value"); _position = value; } } } 

在处理超过2GB的内存块时应该使用UnmanagedMemoryStream ,因为MemoryStream限制为2GB,并且UnmanagedMemoryStream用于处理此问题。

SparseMemoryStream在.NET中执行此操作,但它深埋在内部类库中 – 当然,源代码可用,因为Microsoft将其全部作为开源提供。

你可以在这里获取代码: http : //www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/wpf/src/Base/MS/Internal/IO /包装/ SparseMemoryStream @ CS / 1305600 / SparseMemoryStream @ CS

话虽这么说,我强烈建议不要按原样使用它 – 至少删除所有对启动器的IsolatedStorage调用,因为这似乎是框架的打包API中没有bug结束*的原因。

(*:除了在流中传播数据外,如果它变得太大,它基本上会出于某种原因重新发明交换文件 – 在用户的隔离存储中不会少 – 而且巧合的是,大多数允许基于.NET的MS产品加载项没有以可以访问独立存储的方式设置其应用程序域 – 例如,VSTO加载项因此问题而臭名昭着。)

分块流的另一个实现可以被视为库存MemoryStream替换。 另外,它允许在LOH上分配一个大字节数组,用作“chunk”池,在所有ChunkedStream实例之间共享…

https://github.com/ImmortalGAD/ChunkedStream