将字符串的字符编码从windows-1252转换为utf-8

我已经将Word文档(docx)转换为html,转换后的html将windows-1252作为其字符编码。 在.Net中,对于这个1252字符编码,所有特殊字符都显示为“ ”。 这个html正在Rad编辑器中显示,如果html是Utf-8格式,它将正确显示。

我曾尝试过以下代码但没有静脉

Encoding wind1252 = Encoding.GetEncoding(1252); Encoding utf8 = Encoding.UTF8; byte[] wind1252Bytes = wind1252.GetBytes(strHtml); byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes); char[] utf8Chars = new char[utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length)]; utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0); string utf8String = new string(utf8Chars); 

有关如何将html转换为UTF-8的任何建议?

这应该这样做:

 Encoding wind1252 = Encoding.GetEncoding(1252); Encoding utf8 = Encoding.UTF8; byte[] wind1252Bytes = wind1252.GetBytes(strHtml); byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes); string utf8String = Encoding.UTF8.GetString(utf8Bytes); 

实际上问题出在这里

 byte[] wind1252Bytes = wind1252.GetBytes(strHtml); 

我们不应该从html字符串中获取字节。 我尝试了下面的代码,它工作。

 Encoding wind1252 = Encoding.GetEncoding(1252); Encoding utf8 = Encoding.UTF8; byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile)); byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes); string utf8String = Encoding.UTF8.GetString(utf8Bytes); public static byte[] ReadFile(string filePath) { byte[] buffer; FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read); try { int length = (int)fileStream.Length; // get file length buffer = new byte[length]; // create buffer int count; // actual number of bytes read int sum = 0; // total number of bytes read // read until Read method returns 0 (end of the stream has been reached) while ((count = fileStream.Read(buffer, sum, length - sum)) > 0) sum += count; // sum is a buffer offset for next reading } finally { fileStream.Close(); } return buffer; } 

你打算如何使用生成的HTML? 在我看来,解决问题最合适的方法是使用编码规范添加meta 。 就像是:

  

使用Encoding.Convert方法。 详细信息在Encoding.Convert方法MSDN文章中 。