检测给定列表中至少3个序列号的序列

我有一个数字列表,例如21,4,7,9,12,22,17,8,2,20,23

我希望能够选出连续数字序列(最少3个项目),所以从上面的例子可以看出它是7,8,9和20,21,22,23。

我玩了一些丑陋的庞大function,但我想知道是否有一个简洁的LINQ-ish方法来做到这一点。

有什么建议?

更新:

非常感谢所有的回应,非常感谢。 我现在正与他们一起玩,看看哪个最适合我们的项目。

Jon Skeet / Timwi的解决方案是可行的方法。

为了好玩,这是一个完成工作的LINQ查询( 非常低效):

var sequences = input.Distinct() .GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1) .TakeWhile(input.Contains) .Last()) //use the last member of the consecutive sequence as the key .Where(seq => seq.Count() >= 3) .Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence. 

通过将输入加载到HashSet (以改进Contains )可以略微提高查询的性能,但这仍然不会产生任何接近有效的解决方案。

我所知道的唯一错误是如果序列包含大数量的负数(我们不能代表Rangecount参数),则算术溢出的可能性。 使用自定义static IEnumerable To(this int start, int end)扩展方法很容易解决static IEnumerable To(this int start, int end) 。 如果有人能想到任何其他避免溢出的简单技术,请告诉我。

编辑:这是一个稍微冗长(但同样效率低)的变种,没有溢出问题。

 var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num) .OrderBy(candidate => candidate) .TakeWhile((candidate, index) => candidate == num + index) .Last()) .Where(seq => seq.Count() >= 3) .Select(seq => seq.OrderBy(num => num)); 

让我感到震惊的是,你要做的第一件事就是订购清单。 然后,只需要走过它,记住当前序列的长度并检测它何时结束。 说实话,我怀疑一个简单的foreach循环将是最简单的方法 – 我不能立即想到任何奇妙的LINQ类似的方法。 如果你真的想要,你当然可以在迭代器块中执行它,但请记住,从列表开始排序意味着你有一个合理的“预先”成本。 所以我的解决方案看起来像这样:

 var ordered = list.OrderBy(x => x); int count = 0; int firstItem = 0; // Irrelevant to start with foreach (int x in ordered) { // First value in the ordered list: start of a sequence if (count == 0) { firstItem = x; count = 1; } // Skip duplicate values else if (x == firstItem + count - 1) { // No need to do anything } // New value contributes to sequence else if (x == firstItem + count) { count++; } // End of one sequence, start of another else { if (count >= 3) { Console.WriteLine("Found sequence of length {0} starting at {1}", count, firstItem); } count = 1; firstItem = x; } } if (count >= 3) { Console.WriteLine("Found sequence of length {0} starting at {1}", count, firstItem); } 

编辑:好的,我刚刚想到了更多LINQ-ish的做事方式。 我现在没有时间全面实施它,但是:

  • 订购序列
  • 使用SelectWithPrevious (可能更好地命名为SelectConsecutive )来获取连续的元素对
  • 使用包含索引的Select的重载来获取(索引,当前,前一个)元组
  • 过滤掉(current = previous + 1)的任何项目,以获得计为序列开头的任何位置(特殊情况索引= 0)
  • 在结果上使用SelectWithPrevious来获取两个起点之间的序列长度 (从前一个减去一个索引)
  • 过滤掉长度小于3的任何序列

怀疑你需要在有序序列上连接int.MinValue ,以保证最终项目被正确使用。

编辑:好的,我已经实现了这一点。 这是关于我能想到的LINQiest方法…我使用null值作为“sentinel”值来强制开始和结束序列 – 有关更多详细信息,请参阅注释。

总的来说,我不推荐这个解决方案。 很难让你的头脑清醒,虽然我有理由相信这是正确的,但我花了一些时间考虑可能的一对一错误等。这是一个有趣的旅程,你可以用LINQ做什么……还有什么你可能不应该。

哦,请注意我已将“最小长度为3”的部分推送到调用者 – 当你有一系列像这样的元组时,单独过滤它是更清晰的,IMO。

 using System; using System.Collections.Generic; using System.Linq; static class Extensions { public static IEnumerable SelectConsecutive (this IEnumerable source, Func selector) { using (IEnumerator iterator = source.GetEnumerator()) { if (!iterator.MoveNext()) { yield break; } TSource prev = iterator.Current; while (iterator.MoveNext()) { TSource current = iterator.Current; yield return selector(prev, current); prev = current; } } } } class Test { static void Main() { var list = new List { 21,4,7,9,12,22,17,8,2,20,23 }; foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3)) { Console.WriteLine("Found sequence of length {0} starting at {1}", sequence.Item1, sequence.Item2); } } private static readonly int?[] End = { null }; // Each tuple in the returned sequence is (length, first element) public static IEnumerable> FindSequences (IEnumerable input) { // Use null values at the start and end of the ordered sequence // so that the first pair always starts a new sequence starting // with the lowest actual element, and the final pair always // starts a new one starting with null. That "sequence at the end" // is used to compute the length of the *real* final element. return End.Concat(input.OrderBy(x => x) .Select(x => (int?) x)) .Concat(End) // Work out consecutive pairs of items .SelectConsecutive((x, y) => Tuple.Create(x, y)) // Remove duplicates .Where(z => z.Item1 != z.Item2) // Keep the index so we can tell sequence length .Select((z, index) => new { z, index }) // Find sequence starting points .Where(both => both.z.Item2 != both.z.Item1 + 1) .SelectConsecutive((start1, start2) => Tuple.Create(start2.index - start1.index, start1.z.Item2.Value)); } } 

我认为我的解决方案更优雅,更简单,因此更容易validation为正确:

 /// Returns a collection containing all consecutive sequences of /// integers in the input collection. /// The collection of integers in which to find /// consecutive sequences. /// Minimum length that a sequence should have /// to be returned. static IEnumerable> ConsecutiveSequences( IEnumerable input, int minLength = 1) { var results = new List>(); foreach (var i in input.OrderBy(x => x)) { var existing = results.FirstOrDefault(lst => lst.Last() + 1 == i); if (existing == null) results.Add(new List { i }); else existing.Add(i); } return minLength <= 1 ? results : results.Where(lst => lst.Count >= minLength); } 

优于其他解决方案:

  • 它可以找到重叠的序列。
  • 它可以正确地重复使用并记录在案。
  • 我没有发现任何错误;-)

以下是如何以“LINQish”方式解决问题:

 int[] arr = new int[]{ 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 }; IOrderedEnumerable sorted = arr.OrderBy(x => x); int cnt = sorted.Count(); int[] sortedArr = sorted.ToArray(); IEnumerable selected = sortedArr.Where((x, idx) => idx <= cnt - 3 && sortedArr[idx + 1] == x + 1 && sortedArr[idx + 2] == x + 2); IEnumerable result = selected.SelectMany(x => new int[] { x, x + 1, x + 2 }).Distinct(); Console.WriteLine(string.Join(",", result.Select(x=>x.ToString()).ToArray())); 

由于arrays复制和重建,这个解决方案 – 当然 – 不如传统的循环解决方案那么高效。

不是100%Linq,但这是一个通用变体:

 static IEnumerable> GetSequences( int minSequenceLength, Func areSequential, IEnumerable items) where TItem : IComparable { items = items .OrderBy(n => n) .Distinct().ToArray(); var lastSelected = default(TItem); var sequences = from startItem in items where startItem.Equals(items.First()) || startItem.CompareTo(lastSelected) > 0 let sequence = from item in items where item.Equals(startItem) || areSequential(lastSelected, item) select (lastSelected = item) where sequence.Count() >= minSequenceLength select sequence; return sequences; } static void UsageInt() { var sequences = GetSequences( 3, (a, b) => a + 1 == b, new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 }); foreach (var sequence in sequences) Console.WriteLine(string.Join(", ", sequence.ToArray())); } static void UsageChar() { var list = new List( "abcdefghijklmnopqrstuvwxyz".ToCharArray()); var sequences = GetSequences( 3, (a, b) => (list.IndexOf(a) + 1 == list.IndexOf(b)), "PleaseBeGentleWithMe".ToLower().ToCharArray()); foreach (var sequence in sequences) Console.WriteLine(string.Join(", ", sequence.ToArray())); } 

这是我的镜头:

 public static class SequenceDetector { public static IEnumerable> DetectSequenceWhere(this IEnumerable sequence, Func inSequenceSelector) { List subsequence = null; // We can only have a sequence with 2 or more items T last = sequence.FirstOrDefault(); foreach (var item in sequence.Skip(1)) { if (inSequenceSelector(last, item)) { // These form part of a sequence if (subsequence == null) { subsequence = new List(); subsequence.Add(last); } subsequence.Add(item); } else if (subsequence != null) { // We have a previous seq to return yield return subsequence; subsequence = null; } last = item; } if (subsequence != null) { // Return any trailing seq yield return subsequence; } } } public class test { public static void run() { var list = new List { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 }; foreach (var subsequence in list .OrderBy(i => i) .Distinct() .DetectSequenceWhere((first, second) => first + 1 == second) .Where(seq => seq.Count() >= 3)) { Console.WriteLine("Found subsequence {0}", string.Join(", ", subsequence.Select(i => i.ToString()).ToArray())); } } } 

这将返回形成子序列的特定项目,并允许任何类型的项目和任何标准定义,只要可以通过比较相邻项目来确定。

那么对数组进行排序然后创建另一个数组,该数组是前一个元素之间的差异

 sortedArray = 8, 9, 10, 21, 22, 23, 24, 27, 30, 31, 32 diffArray = 1, 1, 11, 1, 1, 1, 3, 3, 1, 1 

现在遍历差异数组; 如果差值等于1,则将变量countHength的计数增加1.如果差值> 1,检查sequenceLength是否> = 2,那么您的序列至少有3个连续元素。 然后将sequenceLenght重置为0并继续在差异数组上循环。

这是我在F#中解决的一个解决方案,将它转换为C#LINQ查询应该相当容易,因为fold几乎等同于LINQ聚合运算符。

 #light let nums = [21;4;7;9;12;22;17;8;2;20;23] let scanFunc (mainSeqLength, mainCounter, lastNum:int, subSequenceCounter:int, subSequence:'a list, foundSequences:'a list list) (num:'a) = (mainSeqLength, mainCounter + 1, num, (if num <> lastNum + 1 then 1 else subSequenceCounter+1), (if num <> lastNum + 1 then [num] else subSequence@[num]), if subSequenceCounter >= 3 then if mainSeqLength = mainCounter+1 then foundSequences @ [subSequence@[num]] elif num <> lastNum + 1 then foundSequences @ [subSequence] else foundSequences else foundSequences) let subSequences = nums |> Seq.sort |> Seq.fold scanFunc (nums |> Seq.length, 0, 0, 0, [], []) |> fun (_,_,_,_,_,results) -> results 

Linq不是所有东西的解决方案,有时候你通过一个简单的循环就会更好。 这是一个解决方案,只需要一点Linq来订购原始序列并过滤结果

 void Main() { var numbers = new[] { 21,4,7,9,12,22,17,8,2,20,23 }; var sequences = GetSequences(numbers, (prev, curr) => curr == prev + 1); .Where(s => s.Count() >= 3); sequences.Dump(); } public static IEnumerable> GetSequences( IEnumerable source, Func areConsecutive) { bool first = true; T prev = default(T); List seq = new List(); foreach (var i in source.OrderBy(i => i)) { if (!first && !areConsecutive(prev, i)) { yield return seq.ToArray(); seq.Clear(); } first = false; seq.Add(i); prev = i; } if (seq.Any()) yield return seq.ToArray(); } 

我想到了和Jon一样的事情:代表一系列连续的整数,你真正需要的只是两个整数! 所以我从那里开始:

 struct Range : IEnumerable { readonly int _start; readonly int _count; public Range(int start, int count) { _start = start; _count = count; } public int Start { get { return _start; } } public int Count { get { return _count; } } public int End { get { return _start + _count - 1; } } public IEnumerator GetEnumerator() { for (int i = 0; i < _count; ++i) { yield return _start + i; } } // Heck, why not? public static Range operator +(Range x, int y) { return new Range(x.Start, x.Count + y); } // skipping the explicit IEnumerable.GetEnumerator implementation } 

从那里,您可以编写一个静态方法来返回一系列与序列的连续编号对应的Range值。

 static IEnumerable FindRanges(IEnumerable source, int minCount) { // throw exceptions on invalid arguments, maybe... var ordered = source.OrderBy(x => x); Range r = default(Range); foreach (int value in ordered) { // In "real" code I would've overridden the Equals method // and overloaded the == operator to write something like // if (r == Range.Empty) here... but this works well enough // for now, since the only time r.Count will be 0 is on the // first item. if (r.Count == 0) { r = new Range(value, 1); continue; } if (value == r.End) { // skip duplicates continue; } else if (value == r.End + 1) { // "append" consecutive values to the range r += 1; } else { // return what we've got so far if (r.Count >= minCount) { yield return r; } // start over r = new Range(value, 1); } } // return whatever we ended up with if (r.Count >= minCount) { yield return r; } } 

演示:

 int[] numbers = new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 }; foreach (Range r in FindConsecutiveRanges(numbers, 3)) { // Using .NET 3.5 here, don't have the much nicer string.Join overloads. Console.WriteLine(string.Join(", ", r.Select(x => x.ToString()).ToArray())); } 

输出:

 7,8,9
 20,21,22,23

这是我对LINQ-y问题的看法:

 static IEnumerable> ConsecutiveSequences(this IEnumerable input, int minLength = 3) { int order = 0; var inorder = new SortedSet(input); return from item in new[] { new { order = 0, val = inorder.First() } } .Concat( inorder.Zip(inorder.Skip(1), (x, val) => new { order = x + 1 == val ? order : ++order, val })) group item.val by item.order into list where list.Count() >= minLength select list; } 
  • 不使用显式循环,但仍应为O(n lg n)
  • 使用SortedSet而不是.OrderBy().Distinct()
  • 将连续元素与list.Zip(list.Skip(1))结合起来

这是一个使用Dictionary而不是排序的解决方案……它将项添加到Dictionary中,然后为每个值增加上下以找到最长的序列。
它不是严格的LINQ,虽然它确实使用了一些LINQ函数,我认为它比纯LINQ解决方案更具可读性。

 static void Main(string[] args) { var items = new[] { -1, 0, 1, 21, -2, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 }; IEnumerable> sequences = FindSequences(items, 3); foreach (var sequence in sequences) { //print results to consol Console.Out.WriteLine(sequence.Select(num => num.ToString()).Aggregate((a, b) => a + "," + b)); } Console.ReadLine(); } private static IEnumerable> FindSequences(IEnumerable items, int minSequenceLength) { //Convert item list to dictionary var itemDict = new Dictionary(); foreach (int val in items) { itemDict[val] = val; } var allSequences = new List>(); //for each val in items, find longest sequence including that value foreach (var item in items) { var sequence = FindLongestSequenceIncludingValue(itemDict, item); allSequences.Add(sequence); //remove items from dict to prevent duplicate sequences sequence.ForEach(i => itemDict.Remove(i)); } //return only sequences longer than 3 return allSequences.Where(sequence => sequence.Count >= minSequenceLength).ToList(); } //Find sequence around start param value private static List FindLongestSequenceIncludingValue(Dictionary itemDict, int value) { var result = new List(); //check if num exists in dictionary if (!itemDict.ContainsKey(value)) return result; //initialize sequence list result.Add(value); //find values greater than starting value //and add to end of sequence var indexUp = value + 1; while (itemDict.ContainsKey(indexUp)) { result.Add(itemDict[indexUp]); indexUp++; } //find values lower than starting value //and add to start of sequence var indexDown = value - 1; while (itemDict.ContainsKey(indexDown)) { result.Insert(0, itemDict[indexDown]); indexDown--; } return result; }