通过增加索引总和来生成排序的有效方法

对于启发式算法,我需要一个接一个地评估某个集合的组合,直到达到停止标准。

由于它们很多,目前我使用以下内存高效迭代器块生成它们(受python的itertools.combinations启发):

 public static IEnumerable GetCombinations(this IList pool, int r) { int n = pool.Count; if (r > n) throw new ArgumentException("r cannot be greater than pool size"); int[] indices = Enumerable.Range(0, r).ToArray(); yield return indices.Select(idx => pool[idx]).ToArray(); while (true) { int i; for (i = r - 1; i >= 0; i--) if (indices[i] != i + n - r) break; if (i < 0) break; indices[i] += 1; for (int j = i + 1; j  pool[idx]).ToArray(); } } 

问题是,为了大大提高启发式的效率,我需要生成这些组合,这些组合按索引的总和排序(换句话说,我需要首先生成,包含集合的第一个元素的组合)。

例如
考虑集合S = {0,1,2,3,4,5}
(为简单起见,我选择此集合,因为元素及其索引重合)。
从给定算法生成的r=4个数的所有可能组合是:

 (0, 1, 2, 3) SUM: 6 (0, 1, 2, 4) SUM: 7 (0, 1, 2, 5) SUM: 8 (0, 1, 3, 4) SUM: 8 (0, 1, 3, 5) SUM: 9 (0, 1, 4, 5) SUM: 10 (0, 2, 3, 4) SUM: 9 (0, 2, 3, 5) SUM: 10 (0, 2, 4, 5) SUM: 11 (0, 3, 4, 5) SUM: 12 (1, 2, 3, 4) SUM: 10 (1, 2, 3, 5) SUM: 11 (1, 2, 4, 5) SUM: 12 (1, 3, 4, 5) SUM: 13 (2, 3, 4, 5) SUM: 14 

正如您所看到的,组合不是严格按升序排序。

期望的结果是以下:
(具有相同总和的组合的顺序并不重要)

 (0, 1, 2, 3) SUM: 6 (0, 1, 2, 4) SUM: 7 (0, 1, 2, 5) SUM: 8 (0, 1, 3, 4) SUM: 8 (0, 1, 3, 5) SUM: 9 (0, 2, 3, 4) SUM: 9 (0, 1, 4, 5) SUM: 10 (0, 2, 3, 5) SUM: 10 (1, 2, 3, 4) SUM: 10 (0, 2, 4, 5) SUM: 11 (1, 2, 3, 5) SUM: 11 (0, 3, 4, 5) SUM: 12 (1, 2, 4, 5) SUM: 12 (1, 3, 4, 5) SUM: 13 (2, 3, 4, 5) SUM: 14 

一个简单的解决方案是生成所有组合,然后根据它们的总和对它们进行排序; 但这并不是真正有效/可行,因为随着n增长,组合的数量变得很大。

我也快速了解组合格雷码,但我找不到任何适合这个问题的人。

你对如何实现这样的事情有所了解吗?

编辑:

这个问题有另一种(不幸的是不容易)的表述。
给定集合S和数字r ,所有可能的总和都是微不足道的,因为它们只是从S的前r元素之和到S的最后r元素之和的所有数字。

话虽这么说,如果,对于每个和T我们可以有效地找到所有具有和T的组合,我们解决了原始问题,因为我们只是按升序生成它们。

¹有效意味着我不想生成所有组合并丢弃具有不同总和的组合。

编辑2:

在@EricLippert建议之后,我创建了以下代码:

 public static IEnumerable GetCombinationsSortedByIndexSum(this IList pool, int r) { int n = pool.Count; if (r > n) throw new ArgumentException("r cannot be greater than pool size"); int minSum = ((r - 1) * r) / 2; int maxSum = (n * (n + 1)) / 2 - ((n - r - 1) * (n - r)) / 2; for (int sum = minSum; sum  pool[x]).ToArray(); } } static IEnumerable<IEnumerable> AllMonotIncrSubseqOfLenMWhichSumToN(int seqFirstElement, int seqLastElement, int m, int n) { for (int i = seqFirstElement; i <= seqLastElement - m + 1; i++) { if (m == 1) { if (i == n) yield return new int[] { i }; } else { foreach (var el in AllMonotIncrSubseqOfLenMWhichSumToN(i + 1, seqLastElement, m - 1, n - i)) yield return new int[] { i }.Concat(el); } } } 

这很好用(希望是Eric的意思:P)但是我仍然担心递归方法的复杂性。 实际上,我们似乎正在为每个总和重新生成所有组合,而不是总结到所需值的那些组合。

为了降低内部函数的复杂性,我找到了一种通过使用有效的上限和下限来限制迭代的方法(现在很难说这是什么复杂性)。

检查我的答案 ,看看最终的代码。

我想到的解决方案是:

 using System; using System.Collections.Generic; using System.Linq; class Program { // Preconditions: // * items is a sequence of non-negative monotone increasing integers // * n is the number of items to be in the subsequence // * sum is the desired sum of that subsequence. // Result: // A sequence of subsequences of the original sequence where each // subsequence has n items and the given sum. static IEnumerable> M(IEnumerable items, int sum, int n) { // Let's start by taking some easy outs. If the sum is negative // then there is no solution. If the number of items in the // subsequence is negative then there is no solution. if (sum < 0 || n < 0) yield break; // If the number of items in the subsequence is zero then // the only possible solution is if the sum is zero. if (n == 0) { if (sum == 0) yield return Enumerable.Empty(); yield break; } // If the number of items is less than the required number of // items, there is no solution. if (items.Count() < n) yield break; // We have at least n items in the sequence, and // and n is greater than zero, so First() is valid: int first = items.First(); // We need n items from a monotone increasing subsequence // that have a particular sum. We might already be too // large to meet that requirement: if (n * first > sum) yield break; // There might be some solutions that involve the first element. // Find them all. foreach(var subsequence in M(items.Skip(1), sum - first, n - 1)) yield return new[]{first}.Concat(subsequence); // And there might be some solutions that do not involve the first element. // Find them all. foreach(var subsequence in M(items.Skip(1), sum, n)) yield return subsequence; } static void Main() { int[] x = {0, 1, 2, 3, 4, 5}; for (int i = 0; i <= 15; ++i) foreach(var seq in M(x, i, 4)) Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i); } } 

输出是您想要的输出。

我没有尝试优化这个。 分析它并查看大部分时间花在哪里会很有趣。

更新:为了好玩,我写了一个使用不可变堆栈而不是任意可枚举的版本。 请享用!

 using System; using System.Collections.Generic; using System.Linq; abstract class ImmutableList : IEnumerable { public static readonly ImmutableList Empty = new EmptyList(); private ImmutableList() {} public abstract bool IsEmpty { get; } public abstract T Head { get; } public abstract ImmutableList Tail { get; } public ImmutableList Push(T newHead) { return new List(newHead, this); } private sealed class EmptyList : ImmutableList { public override bool IsEmpty { get { return true; } } public override T Head { get { throw new InvalidOperationException(); } } public override ImmutableList Tail { get { throw new InvalidOperationException(); } } } private sealed class List : ImmutableList { private readonly T head; private readonly ImmutableList tail; public override bool IsEmpty { get { return false; } } public override T Head { get { return head; } } public override ImmutableList Tail { get { return tail; } } public List(T head, ImmutableList tail) { this.head = head; this.tail = tail; } } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return this.GetEnumerator(); } public IEnumerator GetEnumerator() { for (ImmutableList current = this; !current.IsEmpty; current = current.Tail) yield return current.Head; } } class Program { // Preconditions: // * items is a sequence of non-negative monotone increasing integers // * n is the number of items to be in the subsequence // * sum is the desired sum of that subsequence. // Result: // A sequence of subsequences of the original sequence where each // subsequence has n items and the given sum. static IEnumerable> M(ImmutableList items, int sum, int n) { // Let's start by taking some easy outs. If the sum is negative // then there is no solution. If the number of items in the // subsequence is negative then there is no solution. if (sum < 0 || n < 0) yield break; // If the number of items in the subsequence is zero then // the only possible solution is if the sum is zero. if (n == 0) { if (sum == 0) yield return ImmutableList.Empty; yield break; } // If the number of items is less than the required number of // items, there is no solution. if (items.Count() < n) yield break; // We have at least n items in the sequence, and // and n is greater than zero. int first = items.Head; // We need n items from a monotone increasing subsequence // that have a particular sum. We might already be too // large to meet that requirement: if (n * first > sum) yield break; // There might be some solutions that involve the first element. // Find them all. foreach(var subsequence in M(items.Tail, sum - first, n - 1)) yield return subsequence.Push(first); // And there might be some solutions that do not involve the first element. // Find them all. foreach(var subsequence in M(items.Tail, sum, n)) yield return subsequence; } static void Main() { ImmutableList x = ImmutableList.Empty.Push(5). Push(4).Push(3).Push(2).Push(1).Push(0); for (int i = 0; i <= 15; ++i) foreach(var seq in M(x, i, 4)) Console.WriteLine("({0}) SUM {1}", string.Join(",", seq), i); } } 

如果你看的最糟糕的情况是35选择10,根据这个二项式系数计算器,这将产生183,579,396个独特的组合,这是迄今为止我在网上找到的最好的免费组合。 大多数现代CPU应该能够在最多一秒钟或2秒内完成此任务 – 取决于语言而不计算排序时间。 使用C ++,它可能会在一秒钟内完成。 如果进入C ++路由,那么您可能希望将其设为dll并通过平台调用(P / I)调用它。 还有一些类型具有优越性能的列表,这些列表大多是排序的,这看起来像这里的情况。

如果在一秒钟内仍然太慢,你可以考虑预先计算你需要的所有N个选择K个案例并将它们写出一个文件(在根据k索引的总和应用排序后)然后阅读程序启动时的文件。 根据应用程序及其托管位置,如果它适用于内存有限的Windows CE平台,则可能不太实用。 但是,对于具有大量硬盘空间的PC或其他系统,它应该不是问题。

通过“指定集合中的索引”来回答您关于我的意思的问题:

我编写了一个C#类,它可以将索引放入已排序的二项系数表中,并返回该索引的相应k索引,而不必遍历它之前的所有组合。 还有另一种方法可以执行相反的操作并返回给定k索引的相应索引(或排名)。 等级从零开始,在上面的示例中,将指定0,1,2,3的k索引。等级1将用于k索引0,1,2,4等。 所以,例如在35选择10的情况下,如果你知道你需要超过150,000,000的所有k索引,那么你不需要迭代前150M以获得之后的值。 您可以调用类方法并传递150000000作为索引,它将返回该索引的k索引。 这些方法是高度优化的,并且基于Pascal三角形中可以看到的数学关系。

该类是用.NET C#编写的,它提供了一种通过使用通用列表来管理与问题相关的对象(如果有)的方法。 此类的构造函数采用名为InitTable的bool值,当为true时,将创建一个通用列表来保存要管理的对象。 如果此值为false,则不会创建表。 无需创建表格即可使用上面列出的翻译方法。 提供访问器方法来访问该表。

有一个关联的测试类,它显示了如何使用该类及其方法。 它已经过至少2个案例的广泛测试,并且没有已知的错误。

要阅读有关此类并下载代码的信息,请参阅Tablizing The Binomial Coeffieicent 。

以下测试代码将遍历每个唯一组合:

 public void Test10Choose5() { String S; int Loop; int N = 10; // Total number of elements in the set. int K = 5; // Total number of elements in each group. // Create the bin coeff object required to get all // the combos for this N choose K combination. BinCoeff BC = new BinCoeff(N, K, false); int NumCombos = BinCoeff.GetBinCoeff(N, K); // The Kindexes array specifies the indexes for a lexigraphic element. int[] KIndexes = new int[K]; StringBuilder SB = new StringBuilder(); // Loop thru all the combinations for this N choose K case. for (int Combo = 0; Combo < NumCombos; Combo++) { // Get the k-indexes for this combination. BC.GetKIndexes(Combo, KIndexes); // Verify that the Kindexes returned can be used to retrive the // rank or lexigraphic order of the KIndexes in the table. int Val = BC.GetIndex(true, KIndexes); if (Val != Combo) { S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString(); Console.WriteLine(S); } SB.Remove(0, SB.Length); for (Loop = 0; Loop < K; Loop++) { SB.Append(KIndexes[Loop].ToString()); if (Loop < K - 1) SB.Append(" "); } S = "KIndexes = " + SB.ToString(); Console.WriteLine(S); } } 

确保使用类的版本的GetBinCoeff,它实现了计算组合数的Mark Dominus版本。 它使用长值,代码不太可能溢出。

为了完整和清晰起见,我将发布我的最终代码:

 // Given a pool of elements returns all the // combinations of the groups of lenght r in pool, // such that the combinations are ordered (ascending) by the sum of // the indexes of the elements. // eg pool = {A,B,C,D,E} r = 3 // returns // (A, B, C) indexes: (0, 1, 2) sum: 3 // (A, B, D) indexes: (0, 1, 3) sum: 4 // (A, B, E) indexes: (0, 1, 4) sum: 5 // (A, C, D) indexes: (0, 2, 3) sum: 5 // (A, C, E) indexes: (0, 2, 4) sum: 6 // (B, C, D) indexes: (1, 2, 3) sum: 6 // (A, D, E) indexes: (0, 3, 4) sum: 7 // (B, C, E) indexes: (1, 2, 4) sum: 7 // (B, D, E) indexes: (1, 3, 4) sum: 8 // (C, D, E) indexes: (2, 3, 4) sum: 9 public static IEnumerable GetCombinationsSortedByIndexSum(this IList pool, int r) { int n = pool.Count; if (r > n) throw new ArgumentException("r cannot be greater than pool size"); int minSum = F(r - 1); int maxSum = F(n) - F(n - r - 1); for (int sum = minSum; sum <= maxSum; sum++) { foreach (var indexes in AllSubSequencesWithGivenSum(0, n - 1, r, sum)) yield return indexes.Select(x => pool[x]).ToArray(); } } // Given a start element and a last element of a sequence of consecutive integers // returns all the monotonically increasing subsequences of length "m" having sum "sum" // eg seqFirstElement = 1, seqLastElement = 5, m = 3, sum = 8 // returns {1,2,5} and {1,3,4} static IEnumerable> AllSubSequencesWithGivenSum(int seqFirstElement, int seqLastElement, int m, int sum) { int lb = sum - F(seqLastElement) + F(seqLastElement - m + 1); int ub = sum - F(seqFirstElement + m - 1) + F(seqFirstElement); lb = Math.Max(seqFirstElement, lb); ub = Math.Min(seqLastElement - m + 1, ub); for (int i = lb; i <= ub; i++) { if (m == 1) { if (i == sum) // this check shouldn't be necessary anymore since LB/UB should automatically exclude wrong solutions yield return new int[] { i }; } else { foreach (var el in AllSubSequencesWithGivenSum(i + 1, seqLastElement, m - 1, sum - i)) yield return new int[] { i }.Concat(el); } } } // Formula to compute the sum of the numbers from 0 to n // eg F(4) = 0 + 1 + 2 + 3 + 4 = 10 static int F(int n) { return (n * (n + 1)) / 2; }