C#Filter List删除任何双重对象

已经搜索过ant在这个论坛中测试了很多例子但是无法获得完全可行的方法。

我正在使用linq批量插入实体类列表(RemoteReadings)。

由于独特的限制,我需要过滤掉已插入的任何项目。

Uniqiuness由RemoteReadings表中的2列meterid和datetime组成。

// approx 5000 records (I need to do this in batches of 2000 due to a // constraint in L2S,but can do this after get this working) List lst = createListFromCSV(); // Option 1: // This does not work as am comparing memory list to db list. I need to use contains() method. // Actually am trying to accomplish this infollowing examples. List myLst = (from ri in db.RemoteReadings from l in lst where l.meterid = ri.meterid && l.date = r.date select ri).ToList(); //// // Option2: // Get the list from DB that are in memory lst List myLst = (from ri in db.RemoteReadings where // where in this list by comparing meaterid and datemeaured (from l in lst select /// help here ! /// select ri).ToList(); // Option3: // Get the list from lst that are not in database // I am bit confused here ! // Tried also to remove from list any duplicates: List result = List)myLst.Except(lst).ToList(); // Ultimately db.RemoteReading.InsertAllOnSubmit(result); db.submitChanges(); 

有什么帮助吗?

由于EF的限制,我们无法将内存列表中的数据库查询加入。 此外, Contains只能与原始列表一起使用。 所以我们需要努力在两列上找到重复项。

 var newItems = createListFromCSV(); var meterIds = newItems.Select(n=> n.meterid).Distinct().ToList(); var dates = newItems.Select(n=> n.date).Distinct().ToList(); var probableMatches = (from ri in db.RemoteReadings where (meterIds.Contains(ri.meterids) || dates.Contains(ri.date) select new {ri.merterid, ri.date}).ToList(); var duplicates = (from existingRi in probaleMatches join newRi in newItems on new {existingRi.meterid, existingRi.date} equals {newRi.meterid, newRi.date} select newRi).ToList(); var insertList = newItems.Except(duplicates).ToList(); db.RemoteReadings.Insert(insertList); // or whatever 

在aSharma和其他一些调整的帮助下,我终于得到了一个有效且经过测试的方法。 由于我的列表包含超过5000个项目,我必须批量执行以覆盖2112 SQL RPC调用限制。 添加了一些评论和信用:)

 /// List contains a list of database Entity Classes RemoteReadings public List removeDublicatesFirst(List lst) { try { DataClasses1DataContext db = new DataClasses1DataContext(); var meterIds = lst.Select(n => n.meterId).Distinct().ToList(); var dates = lst.Select(n => n.mydate).Distinct().ToList(); var myfLst = new List(); // To avoid the following SqlException, Linq query should be exceuted in batches as follows. //{System.Data.SqlClient.SqlException // The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. // Too many parameters were provided in this RPC request. The maximum is 2100. foreach (var batch in dates.Batch(2000)) { // Gets a list of possible matches from DB. var probableMatches = (from ri in db.RemoteReadingss where (meterIds.Contains(ri.meterId) && batch.Contains(ri.mydate)) select new { ri.meterId, ri.mydate }).ToList(); // Join the probableMatches with the lst in memory on unique // constraints meterid.date to find any duplicates var duplicates = (from existingRi in probableMatches join newRi in lst on new { existingRi.meterId, existingRi.mydate } equals new { newRi.meterId, newRi.mydate } select newRi).ToList(); //Add duplicates in a new List due to batch executions. foreach (var s in duplicates) { myfLst.Add(s); } } // Remove the duplicates from lst found in myfLst; var insertList = lst.Except(myfLst).ToList(); return insertList; } catch (Exception ex) { return null; } } // Found this extension Class to divide IEnumerable in batches. // http://stackoverflow.com/a/13731854/288865 public static class MyExtensions { public static IEnumerable> Batch(this IEnumerable items, int maxItems) { return items.Select((item, inx) => new { item, inx }) .GroupBy(x => x.inx / maxItems) .Select(g => g.Select(x => x.item)); } }