解析包含数组的字符串

我想将包含递归字符串数组的字符串转换为深度为一的数组。

例:

StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") == ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"] 

看起来很简单。 但是,我来自function背景,我不熟悉.NET Framework标准库,所以每次(我从头开始像3次)我最终只是简单的丑陋代码。 我最近的实施就在这里 。 如你所见,这很丑陋。

那么,C#的方法是什么呢?

@ojlovecd使用正则表达式有一个很好的答案。
然而,他的答案过于复杂,所以这是我类似的,更简单的答案。

 public string[] StringToArray(string input) { var pattern = new Regex(@" \[ (?: \s* (?(?: (?(open) [^\[\]]+ | [^\[\],]+ ) |(?\[) |(?<-open>\]) )+) (?(open)(?!)) ,? )* \] ", RegexOptions.IgnorePatternWhitespace); // Find the first match: var result = pattern.Match(input); if (result.Success) { // Extract the captured values: var captures = result.Groups["results"].Captures.Cast().Select(c => c.Value).ToArray(); return captures; } // Not a match return null; } 

使用此代码,您将看到StringToArray("[a, b, [c, [d, e]], f, [g, h], i]")将返回以下数组: ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"]

有关我用于匹配平衡括号的平衡组的更多信息,请查看Microsoft的文档 。

更新
根据评论,如果你想平衡报价,这里有一个可能的修改。 (请注意,在C#中"被转义为"" )我还添加了模式的描述以帮助澄清它:

  var pattern = new Regex(@" \[ (?: \s* (?(?: # Capture everything into 'results' (?(open) # If 'open' Then [^\[\]]+ # Capture everything but brackets | # Else (not open): (?: # Capture either: [^\[\],'""]+ # Unimportant characters | # Or ['""][^'""]*?['""] # Anything between quotes ) ) # End If |(?\[) # Open bracket |(?<-open>\]) # Close bracket )+) (?(open)(?!)) # Fail while there's an unbalanced 'open' ,? )* \] ", RegexOptions.IgnorePatternWhitespace); 

使用Regex,它可以解决您的问题:

 static string[] StringToArray(string str) { Regex reg = new Regex(@"^\[(.*)\]$"); Match match = reg.Match(str); if (!match.Success) return null; str = match.Groups[1].Value; List list = new List(); reg = new Regex(@"\[[^\[\]]*(((?'Open'\[)[^\[\]]*)+((?'-Open'\])[^\[\]]*)+)*(?(Open)(?!))\]"); Dictionary dic = new Dictionary(); int index = 0; str = reg.Replace(str, m => { string temp = "ojlovecd" + (index++).ToString(); dic.Add(temp, m.Value); return temp; }); string[] result = str.Split(','); for (int i = 0; i < result.Length; i++) { string s = result[i].Trim(); if (dic.ContainsKey(s)) result[i] = dic[s].Trim(); else result[i] = s; } return result; } 

老实说,我只是在F#程序集中编写这个方法,因为它可能更容易。 如果你看一下C#中的JavaScriptSerializer实现(使用像dotPeek或者reflection器这样的反编译器),你可以看到数组解析代码对于JSON中的类似数组是多么混乱。 当然,这必须处理更多种类的令牌,但你明白了。

这是他们的DeserializeList实现,比通常作为dotPeek的反编译版本更丑,而不是原始版本,但是你明白了。 DeserializeInternal将递归到子列表。

 private IList DeserializeList(int depth) { IList list = (IList) new ArrayList(); char? nullable1 = this._s.MoveNext(); if (((int) nullable1.GetValueOrDefault() != 91 ? 1 : (!nullable1.HasValue ? 1 : 0)) != 0) throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayStart)); bool flag = false; char? nextNonEmptyChar; char? nullable2; do { char? nullable3 = nextNonEmptyChar = this._s.GetNextNonEmptyChar(); if ((nullable3.HasValue ? new int?((int) nullable3.GetValueOrDefault()) : new int?()).HasValue) { char? nullable4 = nextNonEmptyChar; if (((int) nullable4.GetValueOrDefault() != 93 ? 1 : (!nullable4.HasValue ? 1 : 0)) != 0) { this._s.MovePrev(); object obj = this.DeserializeInternal(depth); list.Add(obj); flag = false; nextNonEmptyChar = this._s.GetNextNonEmptyChar(); char? nullable5 = nextNonEmptyChar; if (((int) nullable5.GetValueOrDefault() != 93 ? 0 : (nullable5.HasValue ? 1 : 0)) == 0) { flag = true; nullable2 = nextNonEmptyChar; } else goto label_8; } else goto label_8; } else goto label_8; } while (((int) nullable2.GetValueOrDefault() != 44 ? 1 : (!nullable2.HasValue ? 1 : 0)) == 0); throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExpectComma)); label_8: if (flag) throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExtraComma)); char? nullable6 = nextNonEmptyChar; if (((int) nullable6.GetValueOrDefault() != 93 ? 1 : (!nullable6.HasValue ? 1 : 0)) != 0) throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayEnd)); else return list; } 

递归解析虽然在C#中也没有被管理,因为它在F#中。

这样做没有真正的“标准”方式。 请注意,如果您想考虑所有可能性,实现可能会非常混乱。 我会推荐一些像递归的东西:

  private static IEnumerable StringToArray2(string input) { var characters = input.GetEnumerator(); return InternalStringToArray2(characters); } private static IEnumerable InternalStringToArray2(IEnumerator characters) { StringBuilder valueBuilder = new StringBuilder(); while (characters.MoveNext()) { char current = characters.Current; switch (current) { case '[': yield return InternalStringToArray2(characters); break; case ']': yield return valueBuilder.ToString(); valueBuilder.Clear(); yield break; case ',': yield return valueBuilder.ToString(); valueBuilder.Clear(); break; default: valueBuilder.Append(current); break; } 

虽然你不限于递归,但总是可以回归到像

  private static IEnumerable StringToArray1(string input) { Stack> levelEntries = new Stack>(); List current = null; StringBuilder currentLineBuilder = new StringBuilder(); foreach (char nextChar in input) { switch (nextChar) { case '[': levelEntries.Push(current); current = new List(); break; case ']': current.Add(currentLineBuilder.ToString()); currentLineBuilder.Clear(); var last = current; if (levelEntries.Peek() != null) { current = levelEntries.Pop(); current.Add(last); } break; case ',': current.Add(currentLineBuilder.ToString()); currentLineBuilder.Clear(); break; default: currentLineBuilder.Append(nextChar); break; } } return current; } 

无论什么味道对你有好处

 using System; using System.Text; using System.Text.RegularExpressions; using Microsoft.VisualBasic.FileIO; //Microsoft.VisualBasic.dll using System.IO; public class Sample { static void Main(){ string data = "[a, b, [c, [d, e]], f, [g, h], i]"; string[] fields = StringToArray(data); //check print foreach(var item in fields){ Console.WriteLine("\"{0}\"",item); } } static string[] StringToArray(string data){ string[] fields = null; Regex innerPat = new Regex(@"\[\s*(.+)\s*\]"); string innerStr = innerPat.Matches(data)[0].Groups[1].Value; StringBuilder wk = new StringBuilder(); var balance = 0; for(var i = 0;i