在C＃中将字符串（UTF-16）转换为UTF-8

我需要在C＃中将字符串转换为UTF-8。我已经尝试了很多方法，但没有一个像我想的那样有效。我将我的字符串转换为字节数组，然后尝试将其写入XML文件（编码为UTF-8 ….）但是我得到了相同的字符串（根本没有编码）要么我得到了一个列表无用的字节….有人面临同样的问题吗？

编辑：这是我使用的一些代码：

str= "testé"; byte[] utf8Bytes = Encoding.UTF8.GetBytes(str); return Encoding.UTF8.GetString(utf8Bytes);

结果是“testé”或者我期待像“testé”这样的东西……

如果你想要一个UTF8字符串，每个字节都是正确的（’Ö’ – > [195,0]，[150,0]），你可以使用以下内容：

 public static string Utf16ToUtf8(string utf16String) { /************************************************************** * Every .NET string will store text with the UTF16 encoding, * * known as Encoding.Unicode. Other encodings may exist as * * Byte-Array or incorrectly stored with the UTF16 encoding. * * * * UTF8 = 1 bytes per char * * ["100" for the ansi 'd'] * * ["206" and "186" for the russian 'κ'] * * * * UTF16 = 2 bytes per char * * ["100, 0" for the ansi 'd'] * * ["186, 3" for the russian 'κ'] * * * * UTF8 inside UTF16 * * ["100, 0" for the ansi 'd'] * * ["206, 0" and "186, 0" for the russian 'κ'] * * * * We can use the convert encoding function to convert an * * UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8 * * encoding to string method now, we will get a UTF16 string. * * * * So we imitate UTF16 by filling the second byte of a char * * with a 0 byte (binary 0) while creating the string. * **************************************************************/ // Storage for the UTF8 string string utf8String = String.Empty; // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String); byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes); // Fill UTF8 bytes inside UTF8 string for (int i = 0; i < utf8Bytes.Length; i++) { // Because char always saves 2 bytes, fill char with 0 byte[] utf8Container = new byte[2] { utf8Bytes[i], 0 }; utf8String += BitConverter.ToChar(utf8Container, 0); } // Return UTF8 return utf8String; }

在我的例子中，DLL请求也是UTF8字符串，但不幸的是UTF8字符串必须用UTF16编码解释（'Ö' - > [195,0]，[19,32]）。所以ANSI' - '150必须转换为UTF16' - '即8211.如果你也有这种情况，你可以使用以下代码：

 public static string Utf16ToUtf8(string utf16String) { // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String); byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes); // Return UTF8 bytes as ANSI string return Encoding.Default.GetString(utf8Bytes); }

或Native-Method：

 [DllImport("kernel32.dll")] private static extern Int32 WideCharToMultiByte(UInt32 CodePage, UInt32 dwFlags, [MarshalAs(UnmanagedType.LPWStr)] String lpWideCharStr, Int32 cchWideChar, [Out, MarshalAs(UnmanagedType.LPStr)] StringBuilder lpMultiByteStr, Int32 cbMultiByte, IntPtr lpDefaultChar, IntPtr lpUsedDefaultChar); public static string Utf16ToUtf8(string utf16String) { Int32 iNewDataLen = WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, utf16String.Length, null, 0, IntPtr.Zero, IntPtr.Zero); if (iNewDataLen > 1) { StringBuilder utf8String = new StringBuilder(iNewDataLen); WideCharToMultiByte(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf16String, -1, utf8String, utf8String.Capacity, IntPtr.Zero, IntPtr.Zero); return utf8String.ToString(); } else { return String.Empty; } }

如果您需要反过来，请参阅Utf8ToUtf16 。希望我能提供帮助。

C＃中的字符串总是 UTF-16，没有办法“转换”它。只要您在内存中操作字符串，编码就无关紧要了，只有将字符串写入流（文件，内存流，网络流……）才有意义。

如果要将字符串写入XML文件，只需在创建XmlWriter时指定编码

  private static string Utf16ToUtf8(string utf16String) { /************************************************************** * Every .NET string will store text with the UTF16 encoding, * * known as Encoding.Unicode. Other encodings may exist as * * Byte-Array or incorrectly stored with the UTF16 encoding. * * * * UTF8 = 1 bytes per char * * ["100" for the ansi 'd'] * * ["206" and "186" for the russian '?'] * * * * UTF16 = 2 bytes per char * * ["100, 0" for the ansi 'd'] * * ["186, 3" for the russian '?'] * * * * UTF8 inside UTF16 * * ["100, 0" for the ansi 'd'] * * ["206, 0" and "186, 0" for the russian '?'] * * * * We can use the convert encoding function to convert an * * UTF16 Byte-Array to an UTF8 Byte-Array. When we use UTF8 * * encoding to string method now, we will get a UTF16 string. * * * * So we imitate UTF16 by filling the second byte of a char * * with a 0 byte (binary 0) while creating the string. * **************************************************************/ // Get UTF16 bytes and convert UTF16 bytes to UTF8 bytes byte[] utf16Bytes = Encoding.Unicode.GetBytes(utf16String); byte[] utf8Bytes = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Bytes); char[] chars = (char[])Array.CreateInstance(typeof(char), utf8Bytes.Length); for (int i = 0; i < utf8Bytes.Length; i++) { chars[i] = BitConverter.ToChar(new byte[2] { utf8Bytes[i], 0 }, 0); } // Return UTF8 return new String(chars); }

在原post作者连接字符串。每个sting操作都将导致.Net中的字符串重新创建。 String实际上是一种引用类型。结果，提供的function将明显变慢。不要那样做。使用字符数组，直接在那里写，然后将结果转换为字符串。在我的情况下处理500 kb的文本差异几乎是5分钟。

检查Jon Skeet对这个问题的回答： UTF-16到UTF-8的转换（用于Windows中的脚本）

它包含您需要的源代码。

希望能帮助到你。

这个例子有帮助吗？

 using System; using System.IO; using System.Text; class Test { public static void Main() { using (StreamWriter output = new StreamWriter("practice.txt")) { // Create and write a string containing the symbol for Pi. string srcString = "Area = \u03A0r^2"; // Convert the UTF-16 encoded source string to UTF-8 and ASCII. byte[] utf8String = Encoding.UTF8.GetBytes(srcString); byte[] asciiString = Encoding.ASCII.GetBytes(srcString); // Write the UTF-8 and ASCII encoded byte arrays. output.WriteLine("UTF-8 Bytes: {0}", BitConverter.ToString(utf8String)); output.WriteLine("ASCII Bytes: {0}", BitConverter.ToString(asciiString)); // Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded // string and write. output.WriteLine("UTF-8 Text : {0}", Encoding.UTF8.GetString(utf8String)); output.WriteLine("ASCII Text : {0}", Encoding.ASCII.GetString(asciiString)); Console.WriteLine(Encoding.UTF8.GetString(utf8String)); Console.WriteLine(Encoding.ASCII.GetString(asciiString)); } }

}

 class Program { static void Main(string[] args) { String unicodeString = "This Unicode string contains two characters " + "with codes outside the traditional ASCII code range, " + "Pi (\u03a0) and Sigma (\u03a3)."; Console.WriteLine("Original string:"); Console.WriteLine(unicodeString); UnicodeEncoding unicodeEncoding = new UnicodeEncoding(); byte[] utf16Bytes = unicodeEncoding.GetBytes(unicodeString); char[] chars = unicodeEncoding.GetChars(utf16Bytes, 2, utf16Bytes.Length - 2); string s = new string(chars); Console.WriteLine(); Console.WriteLine("Char Array:"); foreach (char c in chars) Console.Write(c); Console.WriteLine(); Console.WriteLine(); Console.WriteLine("String from Char Array:"); Console.WriteLine(s); Console.ReadKey(); } }

在C＃中将字符串（UTF-16）转换为UTF-8

entity framework：将现有子POCO添加到新的父POCO，在DB中创建新子项

为什么通用类型定义实现的接口会丢失类型信息？

如何编译我的64位应用程序使其更快或更好？

设置_NO_DEBUG_HEAP

如何将类（从通用“基础”类派生）转换为该通用“基础”类

类库引用问题

使用从属服务重新启动服务？

如何检查字符串是否包含单词的所有字符

使用System.Uri删除多余的斜杠

从DataGrid中选择DataGridCell