如何将文字拆分成文字？

示例文字：

“哦，你无能为力，”猫说：“我们都疯了。我生气了。你疯了。’

该行中的文字是：

哦
您
不能
救命
那
说过
该
猫
我们
所有
狂
这里
我
狂
你是
狂

在空格上拆分文本，然后修剪标点符号。

var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"; var punctuation = text.Where(Char.IsPunctuation).Distinct().ToArray(); var words = text.Split().Select(x => x.Trim(punctuation));

完全同意例子。

首先，删除所有特殊的characeters：

 var fixedInput = Regex.Replace(input, "[^a-zA-Z0-9% ._]", string.Empty); // This regex doesn't support apostrophe so the extension method is better

然后分开它：

 var splitted = fixedInput.Split(' ');

对于更简单的C＃解决方案来删除特殊字符（您可以轻松更改），请添加此扩展方法（我添加了对撇号的支持）：

 public static string RemoveSpecialCharacters(this string str) { StringBuilder sb = new StringBuilder(); foreach (char c in str) { if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '\'') { sb.Append(c); } } return sb.ToString(); }

然后像这样使用它：

 var words = input.RemoveSpecialCharacters().Split(' ');

你会惊讶地发现这种扩展方法非常有效（肯定比正则表达式更高效）所以我建议你使用它;）

更新

我同意这是一种仅限英语的方法，但要使其兼容Unicode，您只需要替换：

 (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')

附：

 char.IsLetter(c)

哪个支持Unicode，。Net还为各种案例提供char.IsSymbol和char.IsLetterOrDigit

只是为@Adam Fridental的答案添加一个非常好的答案，你可以尝试这个正则表达式：

 var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"; var matches = Regex.Matches(text, @"\w+[^\s]*\w+|\w"); foreach (Match match in matches) { var word = match.Value; }

我相信这是最短的RegEx，可以得到所有的话

 \w+[^\s]*\w+|\w

如果你不想使用Regex对象，你可以做类似……

 string mystring="Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad."; List words=mystring.Replace(",","").Replace(":","").Replace(".","").Split(" ").ToList();

你仍然需要在“那个”结尾处理尾随撇号。

这是解决方案之一，我不使用任何帮助程序类或方法。

  public static List ExtractChars(string inputString) { var result = new List(); int startIndex = -1; for (int i = 0; i < inputString.Length; i++) { var character = inputString[i]; if ((character >= 'a' && character <= 'z') || (character >= 'A' && character <= 'Z')) { if (startIndex == -1) { startIndex = i; } if (i == inputString.Length - 1) { result.Add(GetString(inputString, startIndex, i)); } continue; } if (startIndex != -1) { result.Add(GetString(inputString, startIndex, i - 1)); startIndex = -1; } } return result; } public static string GetString(string inputString, int startIndex, int endIndex) { string result = ""; for (int i = startIndex; i <= endIndex; i++) { result += inputString[i]; } return result; }

您可以尝试使用正则表达式删除未被字母（即单引号）包围的撇号，然后使用Char静态方法去除所有其他字符。首先调用正则表达式，你可以保留收缩撇号（例如can't ），但删除单引号，如'Oh 。

 string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'"; Regex reg = new Regex("\b[\"']\b"); myText = reg.Replace(myText, ""); string[] listOfWords = RemoveCharacters(myText); public string[] RemoveCharacters(string input) { StringBuilder sb = new StringBuilder(); foreach (char c in input) { if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'') sb.Append(c); } return sb.ToString().Split(' '); }

如何将文字拆分成文字？

用C＃.Net播放.WMV

.Net WebDAV服务器

富文本框如何突出显示文本块

我应该如何在ASP.NET MVC 4服务提供程序中实现SAMLP 2.0？

如何循环通过一个复选框列表，找到已检查和未检查的内容

如何在列表中找到子列表的索引？

.NET单位类，英寸到毫米

TextBox的数据绑定

有没有办法在.NET Core中只使用一个文件来运行控制台应用程序？

需要帮助：zip文件流