替换C＃中的字符（ascii）

我有一个包含这些字符的文件：à，è，ì，ò，ù – À。我需要做的是用普通字符替换那些字符，例如：à= a，è= e等……这是我的代码到目前为止：

StreamWriter sw = new StreamWriter(@"C:/JoinerOutput.csv"); string path = @"C:/Joiner.csv"; string line = File.ReadAllText(path); if (line.Contains("à")) { string asAscii = Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, new EncoderReplacementFallback("a"), new DecoderExceptionFallback()), Encoding.UTF8.GetBytes(line))); Console.WriteLine(asAscii); Console.ReadLine(); sw.WriteLine(asAscii); sw.Flush(); }

基本上，这会在文件中搜索特定字符，并将其替换为另一个字符。我遇到的问题是我的if语句不起作用。我该如何解决这个问题？

这是输入文件的示例：

 DimàkàtsoMokgàlo
 MàmàRàtlàdi
 KoosNèl
 PàsèkàModisè
 JèrèmiàhMorèmi
 KhèthiwèButhèlèzi
 TiànàPillày
 ViviànMàswàngànyè
 ThirèshànRèddy
 WàdèCornèlius
 ènosNètshimbupfè

如果使用，则输出为：line = line.Replace（’à’，’a’）; ：

 Chï¿½rlï¿½nï¿½Kirstï¿½n
 Mï¿½mï¿½Rï¿½tlï¿½di
 KoosNï¿½l
 Pï¿½sï¿½kï¿½Modisï¿½
 Jï¿½rï¿½miï¿½hMorï¿½mi
 Khï¿½thiwï¿½Buthï¿½lï¿½zi
 Pīï½½
 Viviï¿½nMï¿½swï¿½ngï¿½nyï¿½
 Th sh sh R R R R R R R R R R d d d d
 Wï¿½dï¿½Cornï¿½lius
 ï¿½nosNï¿½tshimbupfï¿½

使用我的代码，符号将被完全删除

不知道它是否有用但是在内部工具中在led屏幕上写入消息我们有以下替换（我确信有更智能的方法使这个工作用于unicode表，但是这个就够了对于这个小型内部工具）：

  strMessage = Regex.Replace(strMessage, "[éèëêð]", "e"); strMessage = Regex.Replace(strMessage, "[ÉÈËÊ]", "E"); strMessage = Regex.Replace(strMessage, "[àâä]", "a"); strMessage = Regex.Replace(strMessage, "[ÀÁÂÃÄÅ]", "A"); strMessage = Regex.Replace(strMessage, "[àáâãäå]", "a"); strMessage = Regex.Replace(strMessage, "[ÙÚÛÜ]", "U"); strMessage = Regex.Replace(strMessage, "[ùúûüµ]", "u"); strMessage = Regex.Replace(strMessage, "[òóôõöø]", "o"); strMessage = Regex.Replace(strMessage, "[ÒÓÔÕÖØ]", "O"); strMessage = Regex.Replace(strMessage, "[ìíîï]", "i"); strMessage = Regex.Replace(strMessage, "[ÌÍÎÏ]", "I"); strMessage = Regex.Replace(strMessage, "[š]", "s"); strMessage = Regex.Replace(strMessage, "[Š]", "S"); strMessage = Regex.Replace(strMessage, "[ñ]", "n"); strMessage = Regex.Replace(strMessage, "[Ñ]", "N"); strMessage = Regex.Replace(strMessage, "[ç]", "c"); strMessage = Regex.Replace(strMessage, "[Ç]", "C"); strMessage = Regex.Replace(strMessage, "[ÿ]", "y"); strMessage = Regex.Replace(strMessage, "[Ÿ]", "Y"); strMessage = Regex.Replace(strMessage, "[ž]", "z"); strMessage = Regex.Replace(strMessage, "[Ž]", "Z"); strMessage = Regex.Replace(strMessage, "[Ð]", "D"); strMessage = Regex.Replace(strMessage, "[œ]", "oe"); strMessage = Regex.Replace(strMessage, "[Œ]", "Oe"); strMessage = Regex.Replace(strMessage, "[«»\u201C\u201D\u201E\u201F\u2033\u2036]", "\""); strMessage = Regex.Replace(strMessage, "[\u2026]", "...");

需要注意的一点是，如果在大多数语言中，在这样的处理之后文本仍然是可以理解的，并不总是如此，并且经常会强迫读者参考句子的上下文以便能够理解它。如果你有选择的话，不是你想要的东西。

请注意，正确的解决方案是使用unicode表，用集成的变音符号替换带有“组合变音符号”+字符forms的字符，然后删除变音符号…

其他人评论使用Unicode查找表来删除Diacritics。我做了一个快速的谷歌搜索，并找到了这个例子。代码无耻地复制，（重新格式化），并在下面发布：

 using System; using System.Text; using System.Globalization; public static class Remove { public static string RemoveDiacritics(string stIn) { string stFormD = stIn.Normalize(NormalizationForm.FormD); StringBuilder sb = new StringBuilder(); for(int ich = 0; ich < stFormD.Length; ich++) { UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]); if(uc != UnicodeCategory.NonSpacingMark) { sb.Append(stFormD[ich]); } } return(sb.ToString().Normalize(NormalizationForm.FormC)); } }

因此，您的代码可以通过调用来清理输入：

 line = Remove.RemoveDiacritics(line);

我经常使用基于Dana版本提供的扩展方法。快速解释：

归一化形成D分裂特征，如è到e和非间距`
由此，删除了nospacing字符
结果归一化为D（我不确定这是否是必要的）

码：

 using System.Linq; using System.Text; using System.Globalization; // namespace here public static class Utility { public static string RemoveDiacritics(this string str) { if (str == null) return null; var chars = from c in str.Normalize(NormalizationForm.FormD).ToCharArray() let uc = CharUnicodeInfo.GetUnicodeCategory(c) where uc != UnicodeCategory.NonSpacingMark select c; var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC); return cleanStr; } }

你为什么要把事情弄复杂？

 line = line.Replace('à', 'a');

更新：

File.ReadAllText的文档说：

此方法尝试根据字节顺序标记的存在自动检测文件的编码。可以检测到编码格式UTF-8和UTF-32（big-endian和little-endian）。

读取可能包含导入文本的文件时，请使用ReadAllText（String，Encoding）方法重载，因为可能无法正确读取无法识别的字符。

什么编码是C:/Joiner.csv ？也许您应该使用File.ReadAllText的其他重载，您自己指定输入编码？

用这个：

  if (line.Contains(“OldChar”)) { line = line.Replace(“OldChar”, “NewChar”); }

这样做很简单。下面的代码将在2行代码中将所有特殊字符替换为ASCII字符。它给你的结果与Julien Roncaglia的解决方案相同。

 byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(inputText); string outputText = System.Text.Encoding.ASCII.GetString(bytes);

听起来你想要做的是将扩展ASCII（8位）转换为ASCII（7位） – 所以搜索它可能会有所帮助。

我已经看过用其他语言处理这个库的库，但是从来没有用C＃做过，这看起来有点像启发：

将两个ascii字符转换为它们的“对应”一个字符扩展ascii表示

替换C＃中的字符（ascii）

将非ascii域转换为SMTP兼容

ASCIIEncoding在Windows Phone 7中

C＃随机密码生成器

强制C＃使用ASCII

C＃hex到ascii

URL编码ASCII / UTF16字符

如何获取ASCII后面的二进制代码（C＃）

使用C＃检测文件名字符是否被视为国际字符

如何在Windows Forms C＃中将扩展ASCII转换为十进制？

如何使用unix行结尾将字符串转换为字节？