比较两个不同长度和显示差异的arrays

问题:
我有两个可能长度不同的数组。 我需要遍历两个数组并找到相似之处,添加和删除。

在C#中实现这一目标的最快,最有效的方法是什么?

编辑:数组已预先排序,它们可以包含50到100个项目之间的任何位置。 此外,对速度和/或内存使用没有任何限制(但是,没有人喜欢内存耗费;)


例如:

String[] Foo_Old = {"test1", "test2", "test3"}; String[] Foo_New = {"test1", "test2", "test4", "test5"}; 

 String[] Bar_Old = {"test1", "test2", "test4"}; String[] Bar_New = {"test1", "test3"}; 

区别:

(关于Foo_New数组)

 [相同]“test1”
 [相同]“test2”
 [已删除]“test3”
 [已添加]“test4”
 [已添加]“test5”

(关于Bar_New数组)

 [相同]“test1”
 [已删除]“test2”
 [已删除]“test4”
 [已添加]“test3”

你可以使用Except和Intersect ……

 var Foo_Old = new[] { "test1", "test2", "test3" }; var Foo_New = new[] { "test1", "test2", "test4", "test5" }; var diff = Foo_New.Except( Foo_Old ); var inter = Foo_New.Intersect( Foo_Old ); var rem = Foo_Old.Except(Foo_New); foreach (var s in diff) { Console.WriteLine("Added " + s); } foreach (var s in inter) { Console.WriteLine("Same " + s); } foreach (var s in rem) { Console.WriteLine("Removed " + s); } 

我继续手动编码并在接受的答案中使用示例,手动编码的表现稍好一些。 我处理我的字符串的方式略有不同。 要考虑的其他因素包括:除了制作数组的排序副本(因为它不能假定它已经排序),或者它是否进行某种散列或线性搜索(它实际上仅限于IEnumerable) – 对于已经排序的非常大的数组,这可能是一个问题)。 你可以改变我的比较IEnumerable(这是更一般的)而不是IComparable []。

 static void ArrayCompare(IComparable[] Old, IComparable[] New) { int lpOld = 0; int lpNew = 0; int OldLength = Old.Length; int NewLength = New.Length; while (lpOld < OldLength || lpNew < NewLength) { int compare; if (lpOld >= OldLength) compare = 1; else if (lpNew >= NewLength) compare = -1; else compare = Old[lpOld].CompareTo(New[lpNew]); if (compare < 0) { Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString())); lpOld++; } else if (compare > 0) { Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString())); lpNew++; } else { Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString())); lpOld++; lpNew++; } } } static void ArrayCompare2(IComparable[] Old, IComparable[] New) { var diff = New.Except( Old ); var inter = New.Intersect( Old ); var rem = Old.Except(New); foreach (var s in diff) { Debug.WriteLine("Added " + s); } foreach (var s in inter) { Debug.WriteLine("Same " + s); } foreach (var s in rem) { Debug.WriteLine("Removed " + s); } } static void Main(string[] args) { String[] Foo_Old = {"test1", "test2", "test3"}; String[] Foo_New = {"test1", "test2", "test4", "test5"}; String[] Bar_Old = {"test1", "test2", "test4"}; String[] Bar_New = {"test1", "test3"}; Stopwatch w1 = new Stopwatch(); w1.Start(); for (int lp = 0; lp < 10000; lp++) { ArrayCompare(Foo_Old, Foo_New); ArrayCompare(Bar_Old, Bar_New); } w1.Stop(); Stopwatch w2 = new Stopwatch(); w2.Start(); for (int lp = 0; lp < 10000; lp++) { ArrayCompare2(Foo_Old, Foo_New); ArrayCompare2(Bar_Old, Bar_New); } w2.Stop(); Debug.WriteLine(w1.Elapsed.ToString()); Debug.WriteLine(w2.Elapsed.ToString()); } 

我写了一会儿:

用法:

 foreach (var diff in Foo_Old.Diff(Foo_New)){ Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value); } 

执行:

 using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace LinqExtensions { enum DiffAction { Added, Removed, Same } class DiffPair { public T Value { get; set; } public DiffAction DiffAction { get; set; } } static class DiffExtension { public static IEnumerable> Diff ( this IEnumerable original, IEnumerable target ) { Dictionary results = new Dictionary(); foreach (var item in original) { results[item] = DiffAction.Removed; } foreach (var item in target) { if (results.ContainsKey(item)) { results[item] = DiffAction.Same; } else { results[item] = DiffAction.Added; } } return results.Select( pair => new DiffPair { Value=pair.Key, DiffAction = pair.Value }); } } } 

由于您的数组已排序,您应该能够同时遍历数组,并在一次传递中确定每个元素是否在另一个数组中。 (与合并排序中的合并步骤类似。)您可以在下面看到以下示例:

 string[] oldVersion = { "test1", "test2", "test3" }; string[] newVersion = { "test1", "test2", "test4", "test5" }; int oldIndex = 0, newIndex = 0; while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) { int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]); if (comparison < 0) Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]); else if (comparison > 0) Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]); else { Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]); newIndex++; } } while (oldIndex < oldVersion.Length) Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]); while (newIndex < newVersion.Length) Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]); 

或者,您需要遍历一个数组,并且对于此数组中的每个元素,执行另一个数组的单个传递以查找匹配项。

编辑:JP有一个很好的建议如何使用框架这样做。 虽然,假设数组已排序,我的方法的好处是你只需要进行一次传递来查找所有结果。 你不需要做三次传球。