使用Linq在IEnumerable 中查找序列

使用LINQ在IEnumerable查找序列的最有效方法是什么

我希望能够创建一个允许以下调用的扩展方法：

 int startIndex = largeSequence.FindSequence(subSequence)

匹配必须是相邻的并且是有序的。

这是一个算法的实现，它在序列中查找子序列。我调用了方法IndexOfSequence ，因为它使intent更加明确，并且类似于现有的IndexOf方法：

 public static class ExtensionMethods { public static int IndexOfSequence(this IEnumerable source, IEnumerable sequence) { return source.IndexOfSequence(sequence, EqualityComparer.Default); } public static int IndexOfSequence(this IEnumerable source, IEnumerable sequence, IEqualityComparer comparer) { var seq = sequence.ToArray(); int p = 0; // current position in source sequence int i = 0; // current position in searched sequence var prospects = new List(); // list of prospective matches foreach (var item in source) { // Remove bad prospective matches prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k])); // Is it the start of a prospective match ? if (comparer.Equals(item, seq[0])) { prospects.Add(p); } // Does current character continues partial match ? if (comparer.Equals(item, seq[i])) { i++; // Do we have a complete match ? if (i == seq.Length) { // Bingo ! return p - seq.Length + 1; } } else // Mismatch { // Do we have prospective matches to fall back to ? if (prospects.Count > 0) { // Yes, use the first one int k = prospects[0]; i = p - k + 1; } else { // No, start from beginning of searched sequence i = 0; } } p++; } // No match return -1; } }

我没有完全测试它，所以它可能仍然包含错误。我只是对众所周知的角落情况进行了一些测试，以确保我没有陷入明显的陷阱。到目前为止似乎工作正常……

我认为复杂性接近于O（n），但我不是Big O符号的专家，所以我可能是错的…至少它只列举一次源序列，不管怎么回事，所以它应该是合理有效。

您希望能够使用的代码不是LINQ，因此我不明白为什么需要使用LINQ实现它。

这与子字符串搜索基本上是同一个问题（实际上，顺序重要的枚举是“字符串”的概括）。

由于计算机科学长期以来经常考虑这个问题，所以你要站在巨人的肩膀上。

一些合理的起点是：

http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

http://en.wikipedia.org/wiki/Rabin-karp

即使只是维基百科文章中的伪代码也足以轻松移植到C＃。查看不同情况下的性能描述，并确定代码最可能遇到的情况。

我知道这是一个老问题，但我需要这个确切的方法，我这样写了：

 public static int ContainsSubsequence(this IEnumerable elements, IEnumerable subSequence) where T: IEquatable { return ContainsSubsequence(elements, 0, subSequence); } private static int ContainsSubsequence(IEnumerable elements, int index, IEnumerable subSequence) where T: IEquatable { // Do we have any elements left? bool elementsLeft = elements.Any(); // Do we have any of the sub-sequence left? bool sequenceLeft = subSequence.Any(); // No elements but sub-sequence not fully matched if (!elementsLeft && sequenceLeft) return -1; // Nope, didn't match // No elements of sub-sequence, which means even if there are // more elements, we matched the sub-sequence fully if (!sequenceLeft) return index - subSequence.Count(); // Matched! // If we didn't reach a terminal condition, // check the first element of the sub-sequence against the first element if (subSequence.First().Equals(e.First())) // Yes, it matched - move onto the next. Consume (skip) one element in each return ContainsSubsequence(elements.Skip(1), index + 1 subSequence.Skip(1)); else // No, it didn't match. Try the next element, without consuming an element // from the sub-sequence return ContainsSubsequence(elements.Skip(1), index + 1, subSequence); }

更新为不仅返回子序列匹配，而是在原始序列中开始。

这是IEnumerable上的一种扩展方法，完全是懒惰的，提前终止，并且比当前最高投票的答案更有效。但是，如同@ wai-ha-lee指出的那样，它是递归的并且会创建大量的枚举器。在适用的地方使用它（性能/内存）。这对我的需求很好，但是YMMV。

更新：鉴于问题的澄清，我在下面的回复并不适用。离开它是为了历史目的。

您可能想要使用mySequence.Where（）。然后关键是优化谓词以在您的环境中正常运行。根据您的要求和典型使用模式，这可能会有很大差异。

对于小型集合而言，很有可能适用于大型集合，这取决于T的类型。

当然，如果90％的使用是针对小型集合，那么优化exception值大集合似乎有点YAGNI。

你可以使用这个名为Sequences库来做到这一点（免责声明：我是作者）。

它有一个IndexOfSlice方法，可以完全满足您的需求 – 它是Knuth-Morris-Pratt算法的一个实现。

 int startIndex = largeSequence.AsSequence().IndexOfSlice(subSequence);

使用Linq在IEnumerable 中查找序列

Google使用服务帐户协调OAuth2

使用SQLite的System.AccessViolationException

XML反序列化到XSD生成的类中的麻烦

如何从XElement创建的节点中删除空xmlns属性

使用相同的C＃源构建一个完整的.net，PCL和dotnet核心程序集的单个源项目？

在对象上实现更改跟踪的最佳方法是什么

模拟时钟 – 在标签上绘制时钟臂

正在运行的框架版本和运行时运行的*版本之间有什么区别？

授权标头在重定向时丢失

在C＃中找不到System.Windows.Vector

使用Linq在IEnumerable 中查找序列

Google使用服务帐户协调OAuth2

使用SQLite的System.AccessViolationException

XML反序列化到XSD生成的类中的麻烦

如何从XElement创建的节点中删除空xmlns属性

使用相同的C＃源构建一个完整的.net，PCL和dotnet核心程序集的单个源项目？

在对象上实现更改跟踪的最佳方法是什么

模拟时钟 – 在标签上绘制时钟臂

*正在运行的框架版本*和运行时运行的*版本之间有什么区别？

授权标头在重定向时丢失

在C＃中找不到System.Windows.Vector

正在运行的框架版本和运行时运行的*版本之间有什么区别？