通过StringBuilder使用XmlWriter进行XML序列化是utf-16而via Stream是utf-8?

当我遇到它时,我感到很惊讶,并编写了一个控制台应用程序来检查它,并确保我没有做任何其他事情。

有谁能解释一下?

这是代码:

using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Xml; using System.Xml.Serialization; namespace ConsoleApplication1 { public class Program { static void Main(string[] args) { var o = new SomeObject { Field1 = "string value", Field2 = 8 }; Console.WriteLine("ObjectToXmlViaStringBuilder"); Console.Write(ObjectToXmlViaStringBuilder(o)); Console.WriteLine(); Console.WriteLine(); Console.WriteLine("ObjectToXmlViaStream"); Console.Write(StreamToString(ObjectToXmlViaStream(o))); Console.ReadKey(); } public static string ObjectToXmlViaStringBuilder(SomeObject someObject) { var output = new StringBuilder(); var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true }; using (var xmlWriter = XmlWriter.Create(output, settings)) { var serializer = new XmlSerializer(typeof(SomeObject)); var namespaces = new XmlSerializerNamespaces(); xmlWriter.WriteStartDocument(); xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null); namespaces.Add(string.Empty, string.Empty); serializer.Serialize(xmlWriter, someObject, namespaces); } return output.ToString(); } private static string StreamToString(Stream stream) { var reader = new StreamReader(stream); return reader.ReadToEnd(); } public static Stream ObjectToXmlViaStream(SomeObject someObject) { var output = new MemoryStream(); var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true }; using (var xmlWriter = XmlWriter.Create(output, settings)) { var serializer = new XmlSerializer(typeof(SomeObject)); var namespaces = new XmlSerializerNamespaces(); xmlWriter.WriteStartDocument(); xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null); namespaces.Add(string.Empty, string.Empty); serializer.Serialize(xmlWriter, someObject, namespaces); } output.Seek(0L, SeekOrigin.Begin); return output; } public class SomeObject { public string Field1 { get; set; } public int Field2 { get; set; } } } } 

这是结果:

ObjectToXmlViaStringBuilder

    string value 8  

ObjectToXmlViaStream

    string value 8  

当您在TextWriter周围创建XmlWriterXmlWriter始终使用基础TextWriter的编码。 StringWriter的编码始终是UTF-16,因为这就是.NET字符串在内部编码的方式。

Stream周围创建XmlWriter ,没有为Stream定义编码,因此它使用XmlWriterSettings指定的编码。

对我来说最优雅的解决方案是写入内存流,然后使用编码将流编码为所需的任何编码。 像这样

  using (MemoryStream memS = new MemoryStream()) { //set up the xml settings XmlWriterSettings settings = new XmlWriterSettings(); settings.OmitXmlDeclaration = OmitXmlHeader; using (XmlWriter writer = XmlTextWriter.Create(memS, settings)) { //write the XML to a stream xmlSerializer.Serialize(writer, objectToSerialize); writer.Close(); } //encode the memory stream to xml retString.AppendFormat("{0}", encoding.GetString(memS.ToArray())); memS.Close(); } 

编码发生在…. encoding.GetString(memS.ToArray())…

在可能的情况下,XmlWriter使用基础流的编码。 它将UTF-8数据写入它知道为UTF-16的流,你最终会弄得一团糟。 将UTF-16数据写入UTF-8流也会导致问题,尤其是对于使用空终止字符串(如C / C ++)的环境。

StringBuilder / StringWriter向XmlWriter提供UTF-16流,因此XmlWriter会忽略您请求的设置并使用它。

在实践中我通常不会发出标题,这样我就可以使用下面的StringBuilder并保存几行代码乱码与切换编码。