如何分割字符串保留整个单词?

我需要将长句分成保留整个单词的部分。 每个部分应该给出最大数量的字符(包括空格,点等)。 例如:

int partLenght = 35; string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon." 

输出:

 1 part: "Silver badges are awarded for" 2 part: "longer term goals. Silver badges are" 3 part: "uncommon." 

试试这个:

  static void Main(string[] args) { int partLength = 35; string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."; string[] words = sentence.Split(' '); var parts = new Dictionary(); string part = string.Empty; int partCounter = 0; foreach (var word in words) { if (part.Length + word.Length < partLength) { part += string.IsNullOrEmpty(part) ? word : " " + word; } else { parts.Add(partCounter, part); part = word; partCounter++; } } parts.Add(partCounter, part); foreach (var item in parts) { Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length); } Console.ReadLine(); } 

我知道必须有一个很好的LINQ-y方式这样做,所以这里是为了它的乐趣:

 var input = "The quick brown fox jumps over the lazy dog."; var charCount = 0; var maxLineLength = 11; var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries) .GroupBy(w => (charCount += w.Length + 1) / maxLineLength) .Select(g => string.Join(" ", g)); // That's all :) foreach (var line in lines) { Console.WriteLine(line); } 

显然,只要查询不是并行的,这个代码就可以工作,因为它依赖于charCount “以字顺序”递增。

我一直在测试Jon和Lessan的答案,但是如果你的最大长度需要是绝对的而不是近似的,它们就不能正常工作。 当它们的计数器递增时,它不计算在行尾留下的空白空间。

根据OP的示例运行他们的代码,您得到:

 1 part: "Silver badges are awarded for " - 29 Characters 2 part: "longer term goals. Silver badges are" - 36 Characters 3 part: "uncommon. " - 13 Characters 

第二行的“是”,应该在第三行。 发生这种情况是因为计数器不包含第一行末尾的6个字符。

我想出了以下对Lessan的答案的修改:

 public static class ExtensionMethods { public static string[] Wrap(this string text, int max) { var charCount = 0; var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max) ? max - (charCount % max) : 0) + w.Length + 1) / max) .Select(g => string.Join(" ", g.ToArray())) .ToArray(); } } 

用一个分割字符串 (空格),从结果数组中构建新字符串,在每个新段的限制之前停止。

未经测试的伪代码:

 string[] words = sentence.Split(new char[] {' '}); IList sentenceParts = new List(); sentenceParts.Add(string.Empty); int partCounter = 0; foreach (var word in words) { if(sentenceParts[partCounter].Length + word.Length > myLimit) { partCounter++; sentenceParts.Add(string.Empty); } sentenceParts[partCounter] += word + " "; } 

起初我以为这可能是一个正则表达式的东西,但这是我的镜头:

 List parts = new List(); int partLength = 35; string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."; string[] pieces = sentence.Split(' '); StringBuilder tempString = new StringBuilder(""); foreach(var piece in pieces) { if(piece.Length + tempString.Length + 1 > partLength) { parts.Add(tempString.ToString()); tempString.Clear(); } tempString.Append(" " + piece); } 

扩展jon的答案; 我需要用g.toArray()切换g ,并将max更改为(max + 2)以获得(max + 2)字符的精确包装。

 public static class ExtensionMethods { public static string[] Wrap(this string text, int max) { var charCount = 0; var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2)) .Select(g => string.Join(" ", g.ToArray())) .ToArray(); } } 

以下是NUnit测试的示例用法:

 [Test] public void TestWrap() { Assert.AreEqual(2, "ABC".Wrap(4).Length); Assert.AreEqual(1, "ABC".Wrap(5).Length); Assert.AreEqual(2, "AA BB CC".Wrap(7).Length); Assert.AreEqual(1, "AA BB CC".Wrap(8).Length); Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length); Assert.AreEqual(2, " TEST TEST TEST TEST ".Wrap(10).Length); Assert.AreEqual("TEST TEST", " TEST TEST TEST TEST ".Wrap(10)[0]); } 

Joel你的代码中有一个小错误,我已在此处更正:

 public static string[] StringSplitWrap(string sentence, int MaxLength) { List parts = new List(); string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."; string[] pieces = sentence.Split(' '); StringBuilder tempString = new StringBuilder(""); foreach (var piece in pieces) { if (piece.Length + tempString.Length + 1 > MaxLength) { parts.Add(tempString.ToString()); tempString.Clear(); } tempString.Append((tempString.Length == 0 ? "" : " ") + piece); } if (tempString.Length>0) parts.Add(tempString.ToString()); return parts.ToArray(); } 

这有效:

 int partLength = 35; string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."; List lines = sentence .Split(' ') .Aggregate(new [] { "" }.ToList(), (a, x) => { var last = a[a.Count - 1]; if ((last + " " + x).Length > partLength) { a.Add(x); } else { a[a.Count - 1] = (last + " " + x).Trim(); } return a; }); 

它给了我:

银徽章被授予 
长期目标。 银徽章 
不常见。 

虽然CsConsoleFormat †主要用于格式化控制台的文本,但它也支持生成纯文本。

 var doc = new Document().AddChildren( new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") { TextWrap = TextWrapping.WordWrap } ); var bounds = new Rect(0, 0, 35, Size.Infinity); string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds); 

而且,如果你真的需要修剪字符串,如你的问题:

 List lines = text.Trim() .Split(new[] { Environment.NewLine }, StringSplitOptions.None) .Select(s => s.Trim()) .ToList(); 

除了空格上的自动换行,您还可以正确处理连字符,零宽度空格,不间断空格等。

†CsConsoleFormat是我开发的。