用逗号分隔字符串,忽略引号中的任何标点符号(包括’,’)
如何用逗号分隔字符串(来自文本框), 不包括双引号中的字符串( 不删除引号 ),以及其他可能的标点符号(例如’。”;” – ‘)?
例如,如果有人在文本框中输入以下内容:
apple, orange, "baboons, cows", rainbow, "unicorns, gummy bears"
如何将上面的字符串拆分为以下字符串(例如,放入List中)?
apple orange "baboons, cows" rainbow "Unicorns, gummy bears..."
谢谢您的帮助!
您可以尝试下面使用正向前瞻的正则表达式,
string value = @"apple, orange, ""baboons, cows"", rainbow, ""unicorns, gummy bears"""; string[] lines = Regex.Split(value, @", (?=(?:""[^""]*?(?: [^""]*)*))|, (?=[^"",]+(?:,|$))"); foreach (string line in lines) { Console.WriteLine(line); }
输出:
apple orange "baboons, cows" rainbow "unicorns, gummy bears"
IDEONE
试试这个:
Regex str = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled); foreach (Match m in str.Matches(input)) { Console.WriteLine(m.Value.TrimStart(',')); }
您也可以尝试查看FileHelpers
就像CSV解析器,而不是正则表达式,你可以遍历每个字符,如下所示:
public List ItemStringToList(string inputString) { var itemList = new List (); var currentIem = ""; var quotesOpen = false; for (int i = 0; i < inputString.Length; i++) { if (inputString[i] == '"') { quotesOpen = !quotesOpen; continue; } if (inputString[i] == ',' && !quotesOpen) { itemList.Add(currentIem); currentIem = ""; continue; } if (currentIem == "" && inputString[i] == ' ') continue; currentIem += inputString[i]; } if (currentIem != "") itemList.Add(currentIem); return itemList; }
示例测试用法:
var test1 = ItemStringToList("one, two, three"); var test2 = ItemStringToList("one, \"two\", three"); var test3 = ItemStringToList("one, \"two, three\""); var test4 = ItemStringToList("one, \"two, three\", four, \"five six\", seven"); var test5 = ItemStringToList("one, \"two, three\", four, \"five six\", seven"); var test6 = ItemStringToList("one, \"two, three\", four, \"five six, seven\""); var test7 = ItemStringToList("\"one, two, three\", four, \"five six, seven\"");
如果想要更快的字符连接,可以将其更改为使用StringBuilder。
尝试使用它可以在许多方面使用分割数组字符串,如果你想通过空格分割,只需在(”)中放置一个空格。
namespace LINQExperiment1 { class Program { static void Main(string[] args) { string[] sentence = new string[] { "apple", "orange", "baboons cows", " rainbow", "unicorns gummy bears" }; Console.WriteLine("option 1:"); Console.WriteLine("————-"); // option 1: Select returns three string[]'s with // three strings in each. IEnumerable words1 = sentence.Select(w => w.Split(' ')); // to get each word, we have to use two foreach loops foreach (string[] segment in words1) foreach (string word in segment) Console.WriteLine(word); Console.WriteLine(); Console.WriteLine("option 2:"); Console.WriteLine("————-"); // option 2: SelectMany returns nine strings // (sub-iterates the Select result) IEnumerable words2 = sentence.SelectMany(segment => segment.Split(',')); // with SelectMany we have every string individually foreach (var word in words2) Console.WriteLine(word); // option 3: identical to Opt 2 above written using // the Query Expression syntax (multiple froms) IEnumerable words3 =from segment in sentence from word in segment.Split(' ') select word; } } }
这比我想象的要复杂,我认为这是一个很好的实际问题。
以下是我为此提出的解决方案。 我不喜欢我的解决方案的一件事是必须添加双引号,另一个是变量的名称:p:
internal class Program { private static void Main(string[] args) { string searchString = @"apple, orange, ""baboons, cows. dogs- hounds"", rainbow, ""unicorns, gummy bears"", abc, defghj"; char delimeter = ','; char excludeSplittingWithin = '"'; string[] splittedByExcludeSplittingWithin = searchString.Split(excludeSplittingWithin); List splittedSearchString = new List (); for (int i = 0; i < splittedByExcludeSplittingWithin.Length; i++) { if (i == 0 || splittedByExcludeSplittingWithin[i].StartsWith(delimeter.ToString())) { string[] splitttedByDelimeter = splittedByExcludeSplittingWithin[i].Split(delimeter); for (int j = 0; j < splitttedByDelimeter.Length; j++) { splittedSearchString.Add(splitttedByDelimeter[j].Trim()); } } else { splittedSearchString.Add(excludeSplittingWithin + splittedByExcludeSplittingWithin[i] + excludeSplittingWithin); } } foreach (string s in splittedSearchString) { if (s.Trim() != string.Empty) { Console.WriteLine(s); } } Console.ReadKey(); } }
另一个Regex解决方案:
private static IEnumerable Parse(string input) { // if used frequently, should be instantiated with Compiled option Regex regex = new Regex(@"(?<=^|,\s)(\""(?:[^\""]|\""\"")*\""|[^,\s]*)"); return regex.Matches(inputData).Where(m => m.Success); }