改进大型结构列表的二进制序列化性能

我有一个结构,在3个整数中持有3d坐标。 在测试中,我将一个包含100万个随机点的List 组合在一起,然后将二进制序列化用于内存流。

内存流大约是21 MB – 这看起来非常低效,因为1000000点* 3个coords * 4个字节应该在最小11MB时出现

它在我的测试台上也需要约3秒钟。

有关提高性能和/或尺寸的想法吗?

(如果有帮助,我不必保留ISerialzable接口,我可以直接写入内存流)

编辑 – 从下面的答案我已经把一个序列化摊牌比较BinaryFormatter,’原始’BinaryWriter和Protobuf

using System; using System.Text; using System.Collections.Generic; using System.Linq; using Microsoft.VisualStudio.TestTools.UnitTesting; using System.Runtime.Serialization; using System.Runtime.Serialization.Formatters.Binary; using System.IO; using ProtoBuf; namespace asp_heatmap.test { [Serializable()] // For .NET BinaryFormatter [ProtoContract] // For Protobuf public class Coordinates : ISerializable { [Serializable()] [ProtoContract] public struct CoOrd { public CoOrd(int x, int y, int z) { this.x = x; this.y = y; this.z = z; } [ProtoMember(1)] public int x; [ProtoMember(2)] public int y; [ProtoMember(3)] public int z; } internal Coordinates() { } [ProtoMember(1)] public List Coords = new List(); public void SetupTestArray() { Random r = new Random(); List coordinates = new List(); for (int i = 0; i < 1000000; i++) { Coords.Add(new CoOrd(r.Next(), r.Next(), r.Next())); } } #region Using Framework Binary Formatter Serialization void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("Coords", this.Coords); } internal Coordinates(SerializationInfo info, StreamingContext context) { this.Coords = (List)info.GetValue("Coords", typeof(List)); } #endregion # region 'Raw' Binary Writer serialization public MemoryStream RawSerializeToStream() { MemoryStream stream = new MemoryStream(Coords.Count * 3 * 4 + 4); BinaryWriter writer = new BinaryWriter(stream); writer.Write(Coords.Count); foreach (CoOrd point in Coords) { writer.Write(point.x); writer.Write(point.y); writer.Write(point.z); } return stream; } public Coordinates(MemoryStream stream) { using (BinaryReader reader = new BinaryReader(stream)) { int count = reader.ReadInt32(); Coords = new List(count); for (int i = 0; i < count; i++) { Coords.Add(new CoOrd(reader.ReadInt32(),reader.ReadInt32(),reader.ReadInt32())); } } } #endregion } [TestClass] public class SerializationTest { [TestMethod] public void TestBinaryFormatter() { Coordinates c = new Coordinates(); c.SetupTestArray(); // Serialize to memory stream MemoryStream mStream = new MemoryStream(); BinaryFormatter bformatter = new BinaryFormatter(); bformatter.Serialize(mStream, c); Console.WriteLine("Length : {0}", mStream.Length); // Now Deserialize mStream.Position = 0; Coordinates c2 = (Coordinates)bformatter.Deserialize(mStream); Console.Write(c2.Coords.Count); mStream.Close(); } [TestMethod] public void TestBinaryWriter() { Coordinates c = new Coordinates(); c.SetupTestArray(); MemoryStream mStream = c.RawSerializeToStream(); Console.WriteLine("Length : {0}", mStream.Length); // Now Deserialize mStream.Position = 0; Coordinates c2 = new Coordinates(mStream); Console.Write(c2.Coords.Count); } [TestMethod] public void TestProtoBufV2() { Coordinates c = new Coordinates(); c.SetupTestArray(); MemoryStream mStream = new MemoryStream(); ProtoBuf.Serializer.Serialize(mStream,c); Console.WriteLine("Length : {0}", mStream.Length); mStream.Position = 0; Coordinates c2 = ProtoBuf.Serializer.Deserialize(mStream); Console.Write(c2.Coords.Count); } } } 

结果(注意PB v2.0.0.423 beta)

  Serialize | Ser + Deserialize | Size ----------------------------------------------------------- BinaryFormatter 2.89s | 26.00s !!! | 21.0 MB ProtoBuf v2 0.52s | 0.83s | 18.7 MB Raw BinaryWriter 0.27s | 0.36s | 11.4 MB 

显然,这仅仅是考虑速度/尺寸而没有考虑任何其他因素。

使用BinaryFormatter二进制序列化包括它生成的字节中的类型信息。 这占用了额外的空间。 例如,在您不知道另一端需要什么样的数据结构的情况下,它非常有用。

在您的情况下,您知道数据在两端的格式,并且听起来不会改变。 所以你可以编写一个简单的编码和解码方法。 您的CoOrd类不再需要可序列化。

我将使用System.IO.BinaryReader和System.IO.BinaryWriter ,然后遍历每个CoOrd实例并读取/写入流的X,Y,Z属性值。 假设您的许多数字小于0x7F和0x7FFF,那些类甚至会将您的整数打包成小于11MB。

像这样的东西:

 using (var writer = new BinaryWriter(stream)) { // write the number of items so we know how many to read out writer.Write(points.Count); // write three ints per point foreach (var point in points) { writer.Write(point.X); writer.Write(point.Y); writer.Write(point.Z); } } 

要从流中读取:

 List points; using (var reader = new BinaryReader(stream)) { var count = reader.ReadInt32(); points = new List(count); for (int i = 0; i < count; i++) { var x = reader.ReadInt32(); var y = reader.ReadInt32(); var z = reader.ReadInt32(); points.Add(new CoOrd(x, y, z)); } } 

为了简化使用预构建的序列化程序,我推荐使用protobuf-net ; 这里是protobuf-net v2,只添加了一些属性:

 [DataContract] public class Coordinates { [DataContract] public struct CoOrd { public CoOrd(int x, int y, int z) { this.x = x; this.y = y; this.z = z; } [DataMember(Order = 1)] int x; [DataMember(Order = 2)] int y; [DataMember(Order = 3)] int z; } [DataMember(Order = 1)] public List Coords = new List(); public void SetupTestArray() { Random r = new Random(123456); List coordinates = new List(); for (int i = 0; i < 1000000; i++) { Coords.Add(new CoOrd(r.Next(10000), r.Next(10000), r.Next(10000))); } } } 

使用:

 ProtoBuf.Serializer.Serialize(mStream, c); 

序列化。 这需要10,960,823个字节,但请注意我调整了SetupTestArray以将大小限制为10,000,因为默认情况下它对整数使用“varint”编码,这取决于大小。 10k在这里并不重要(事实上我没有检查“步骤”是什么)。 如果您更喜欢固定尺寸(允许任何范围):

  [ProtoMember(1, DataFormat = DataFormat.FixedSize)] int x; [ProtoMember(2, DataFormat = DataFormat.FixedSize)] int y; [ProtoMember(3, DataFormat = DataFormat.FixedSize)] int z; 

这需要16,998,640字节