.net集合用于快速插入/删除

我需要维护一个非常短缺且频繁上下的连接客户名单。 由于潜在的客户端数量,我需要一个支持快速插入/删除的集合。 建议?

C5通用馆藏图书馆

我在C#和C ++中找到的最佳实现是 – 对于C#/ CLI:

它经过深入研究,具有可扩展的unit testing,自2月起,它们还在.Net中实现了通用接口,这使得使用集合更容易。 它们在Channel9上有特色,他们对这些系列进行了广泛的性能测试。

如果你正在使用数据结构,那么这些研究人员在他们的库中有一个红黑树实现,类似于你发现Lützreflection器并查看System.Data的内部结构时所发现的:p。 插入复杂度:O(log(n))。

无锁C ++集合

然后,如果你可以允许一些C ++互操作并且你绝对需要速度并且希望尽可能少的开销,那么来自Dmitriy V’jukov的这些无锁ADT可能是你在这个世界上可以获得的最好的,超过英特尔的并发库ADT。

我已经阅读了一些代码,而且这些代码非常精通这些代码。 VC ++可以在没有恼人边界的情况下进行本机C ++互操作。 http://www.swig.org/可以帮助你包装C ++接口以便在.Net中使用,或者你可以通过P / Invoke自己完成。

微软的Take

他们编写了教程, 这篇教程在C#中实现了一个相当粗糙的跳转列表 ,并讨论了其他类型的数据结构。 ( 在CodeProject上有一个更好的SkipList ,它非常完善并以良好的方式实现接口。)它们还有一些与.Net捆绑的数据结构,即HashTable / Dictionary <,>和HashSet 。 当然还有“ResizeArray”/ List类型以及堆栈和队列,但它们在搜索时都是“线性的”。

谷歌的工具

如果您希望加快内存分配所需的时间,可以使用google的perf工具。 它们在谷歌代码中可用,它们包含一个非常有趣的multithreadingmalloc实现(TCMalloc) ,它显示比正常malloc更一致的时序。 您可以将它与上面的无锁结构一起使用,以实现对性能的疯狂。

通过memoization改善响应时间

您还可以在函数上使用memoization来通过缓存来提高性能,如果您使用的是F# ,则会感兴趣。 F#还允许C ++互操作,所以你就可以了。

好)

使用在bloom-filters上进行的研究也可以自己做一些事情,它允许O(k)查找复杂度,其中k是一个常数,取决于你实现的哈希函数的数量。 这就是谷歌的BigTable实施的方式。 这些filter会为你提供元素,如果它在集合中,或者可能具有非常低的可能性,而不是你正在寻找的元素(参见维基百科的图表 – 它接近P(wrong_key) – > 0.01作为大小大概是10000个元素,但你可以通过实现更多的哈希函数/减少集合来解决这个问题。

我没有搜索过它的.Net实现,但由于散列计算是独立的,你可以使用MS的性能团队的Tasks实现来加快速度。

“我的”采取 – 随机化达到平均O(log n)

碰巧我刚做了一个涉及数据结构的课程。 在这种情况下,我们使用C ++,但很容易转换为C#。 我们构建了三种不同的数据结构; bloom-filter,skip-list和随机二叉搜索树 。

请参阅最后一段后面的代码和分析。

基于硬件的“集合”

最后,为了使我的答案“完整”,如果你真的需要速度,你可以使用像路由表或内容可寻址内存 。 这使您可以非常快速地在原则上对O(1)进行数据的“哈希”值查找。

随机二进制搜索树/布隆filterC ++代码

如果您在代码中发现错误,或者只是指示我如何更好地(或更好地使用模板),我会非常感谢您的反馈。 请注意,布隆filter不像现实生活中那样; 通常你不必从它中删除然后它比我允许删除测试的黑客空间效率更高。

DataStructure.h

#ifndef DATASTRUCTURE_H_ #define DATASTRUCTURE_H_ class DataStructure { public: DataStructure() {countAdd=0; countDelete=0;countFind=0;} virtual ~DataStructure() {} void resetCountAdd() {countAdd=0;} void resetCountFind() {countFind=0;} void resetCountDelete() {countDelete=0;} unsigned int getCountAdd(){return countAdd;} unsigned int getCountDelete(){return countDelete;} unsigned int getCountFind(){return countFind;} protected: unsigned int countAdd; unsigned int countDelete; unsigned int countFind; }; #endif /*DATASTRUCTURE_H_*/ 

Key.h

 #ifndef KEY_H_ #define KEY_H_ #include  using namespace std; const int keyLength = 128; class Key : public string { public: Key():string(keyLength, ' ') {} Key(const char in[]): string(in){} Key(const string& in): string(in){} bool operator<(const string& other); bool operator>(const string& other); bool operator==(const string& other); virtual ~Key() {} }; #endif /*KEY_H_*/ 

Key.cpp

 #include "Key.h" bool Key::operator<(const string& other) { return compare(other) < 0; }; bool Key::operator>(const string& other) { return compare(other) > 0; }; bool Key::operator==(const string& other) { return compare(other) == 0; } 

BloomFilter.h

 #ifndef BLOOMFILTER_H_ #define BLOOMFILTER_H_ #include  #include  #include  #include  #include "Key.h" #include "DataStructure.h" #define LONG_BIT 32 #define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1)) // TODO: Implement RW-locking on the reads/writes to the bitmap. class BloomFilter : public DataStructure { public: BloomFilter(){} BloomFilter(unsigned long length){init(length);} virtual ~BloomFilter(){} void init(unsigned long length); void dump(); void add(const Key& key); void del(const Key& key); /** * Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't. */ bool testExist(const Key& key, bool v = false); private: unsigned long hash1(const Key& key); unsigned long hash2(const Key& key); bool exist(const Key& key); void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key); void getCountIndicies(const int i1, const unsigned long h1, const int i2, const unsigned long h2, int& i1_c, int& i2_c); vector m_tickBook; vector m_useCounts; unsigned long m_length; // number of bits in the bloom filter unsigned long m_pockets; //the number of pockets static const unsigned long m_pocketSize; //bits in each pocket }; #endif /*BLOOMFILTER_H_*/ 

BloomFilter.cpp

 #include "BloomFilter.h" const unsigned long BloomFilter::m_pocketSize = LONG_BIT; void BloomFilter::init(unsigned long length) { //m_length = length; m_length = (unsigned long)((2.0*length)/log(2))+1; m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize)); m_tickBook.resize(m_pockets); // my own (allocate nr bits possible to store in the other vector) m_useCounts.resize(m_pockets * m_pocketSize); unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0; for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own } unsigned long BloomFilter::hash1(const Key& key) { unsigned long hash = 5381; unsigned int i=0; for (i=0; i< key.length(); i++){ hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */ } double d_hash = (double) hash; d_hash *= (0.5*(sqrt(5)-1)); d_hash -= floor(d_hash); d_hash *= (double)m_length; return (unsigned long)floor(d_hash); } unsigned long BloomFilter::hash2(const Key& key) { unsigned long hash = 0; unsigned int i=0; for (i=0; i< key.length(); i++){ hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash; } double d_hash = (double) hash; d_hash *= (0.5*(sqrt(5)-1)); d_hash -= floor(d_hash); d_hash *= (double)m_length; return (unsigned long)floor(d_hash); } bool BloomFilter::testExist(const Key& key, bool v){ if(exist(key)) { if(v) cout<<"Key "<< key<<" is in the set"< 0) && ((m_tickBook[i2] & bitmask(h2)) > 0); } /* * Gets the values of the indicies for two hashes and places them in * the passed parameters. The index is into m_tickBook. */ void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key) { h1 = hash1(key); h2 = hash2(key); i1 = (int) h1/m_pocketSize; i2 = (int) h2/m_pocketSize; } /* * Gets the values of the indicies into the count vector, which keeps * track of how many times a specific bit-position has been used. */ void BloomFilter::getCountIndicies(const int i1, const unsigned long h1, const int i2, const unsigned long h2, int& i1_c, int& i2_c) { i1_c = i1*m_pocketSize + h1%m_pocketSize; i2_c = i2*m_pocketSize + h2%m_pocketSize; } 

** RBST.h **

 #ifndef RBST_H_ #define RBST_H_ #include  #include  #include  #include  #include "Key.h" #include "DataStructure.h" #define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str); using namespace std; class RBSTNode; class RBSTNode: public Key { public: RBSTNode(const Key& key):Key(key) { m_left =NULL; m_right = NULL; m_size = 1U; // the size of one node is 1. } virtual ~RBSTNode(){} string setKey(const Key& key){return Key(key);} RBSTNode* left(){return m_left; } RBSTNode* right(){return m_right;} RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; } RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; } #ifdef DEBUG ostream& print(ostream& out) { out << "Key(" << *this << ", m_size: " << m_size << ")"; return out; } #endif unsigned int size() { return m_size; } void setSize(unsigned int val) { #ifdef DEBUG this->print(cout); cout << "::setSize(" << val << ") called." << endl; #endif if (val == 0) throw "Cannot set the size below 1, then just delete this node."; m_size = val; } void incSize() { #ifdef DEBUG this->print(cout); cout << "::incSize() called" << endl; #endif m_size++; } void decrSize() { #ifdef DEBUG this->print(cout); cout << "::decrSize() called" << endl; #endif if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node."; m_size--; } #ifdef DEBUG unsigned int size(RBSTNode* x); #endif private: RBSTNode(){} RBSTNode* m_left; RBSTNode* m_right; unsigned int m_size; }; class RBST : public DataStructure { public: RBST() { m_size = 0; m_head = NULL; srand(time(0)); }; virtual ~RBST() {}; /** * Tries to add key into the tree and will return * true for a new item added * false if the key already is in the tree. * * Will also have the side-effect of printing to the console if v=true. */ bool add(const Key& key, bool v=false); /** * Same semantics as other add function, but takes a string, * but diff name, because that'll cause an ambiguity because of inheritance. */ bool addString(const string& key); /** * Deletes a key from the tree if that key is in the tree. * Will return * true for success and * false for failure. * * Will also have the side-effect of printing to the console if v=true. */ bool del(const Key& key, bool v=false); /** * Tries to find the key in the tree and will return * true if the key is in the tree and * false if the key is not. * * Will also have the side-effect of printing to the console if v=true. */ bool find(const Key& key, bool v = false); unsigned int count() { return m_size; } #ifdef DEBUG int dump(char sep = ' '); int dump(RBSTNode* target, char sep); unsigned int size(RBSTNode* x); #endif private: RBSTNode* randomAdd(RBSTNode* target, const Key& key); RBSTNode* addRoot(RBSTNode* target, const Key& key); RBSTNode* rightRotate(RBSTNode* target); RBSTNode* leftRotate(RBSTNode* target); RBSTNode* del(RBSTNode* target, const Key& key); RBSTNode* join(RBSTNode* left, RBSTNode* right); RBSTNode* find(RBSTNode* target, const Key& key); RBSTNode* m_head; unsigned int m_size; }; #endif /*RBST_H_*/ 

** RBST.cpp **

 #include "RBST.h" bool RBST::add(const Key& key, bool v){ unsigned int oldSize = m_size; m_head = randomAdd(m_head, key); if (m_size > oldSize){ if(v) cout<<"Node "<left(), sep); cout<< *target<right(), sep); return ret; }; #endif /** * Rotates the tree around target, so that target's left * is the new root of the tree/subtree and updates the subtree sizes. * *(target) b (l) a * / \ right / \ * a ? ----> ? b * / \ / \ * ? xx ? * */ RBSTNode* RBST::rightRotate(RBSTNode* target) // private { if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested. if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!"; #ifdef DEBUG cout <<"Right-rotating b-node "; target->print(cout); cout << " for a-node "; target->left()->print(cout); cout << "." << endl; #endif RBSTNode* l = target->left(); int as0 = l->size(); // re-order the sizes l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null. target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null. // swap b's left (for a) target->setLeft(l->right()); // and a's right (for b's left) l->setRight(target); #ifdef DEBUG cout << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl; #endif // return the new root, a. return l; }; /** * Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*) */ RBSTNode* RBST::leftRotate(RBSTNode* target) { if (target == NULL) throw "Invariant failure, target is null"; if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!"; #ifdef DEBUG cout <<"Left-rotating a-node "; target->print(cout); cout << " for b-node "; target->right()->print(cout); cout << "." << endl; #endif RBSTNode* r = target->right(); int bs0 = r->size(); // re-roder the sizes r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1); target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size())); // swap a's right (for b's left) target->setRight(r->left()); // swap b's left (for a) r->setLeft(target); #ifdef DEBUG cout << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl; #endif return r; }; // /** * Adds a key to the tree and returns the new root of the tree. * If the key already exists doesn't add anything. * Increments m_size if the key didn't already exist and hence was added. * * This function is not called from public methods, it's a helper function. */ RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key) { countAdd++; if (target == NULL) return new RBSTNode(key); #ifdef DEBUG cout << "addRoot("; cout.flush(); target->print(cout) << "," << key << ") called." << endl; #endif if (*target < key) { target->setRight( addRoot(target->right(), key) ); target->incSize(); // Should I? RBSTNode* res = leftRotate(target); #ifdef DEBUG if (target->size() != size(target)) BUG("in addRoot 1"); #endif return res; } target->setLeft( addRoot(target->left(), key) ); target->incSize(); // Should I? RBSTNode* res = rightRotate(target); #ifdef DEBUG if (target->size() != size(target)) BUG("in addRoot 2"); #endif return res; }; /** * This function is called from the public add(key) function, * and returns the new root node. */ RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key) { countAdd++; if (target == NULL) { m_size++; return new RBSTNode(key); } #ifdef DEBUG cout << "randomAdd("; target->print(cout) << ", \"" << key << "\") called." << endl; #endif int r = (rand() % target->size()) + 1; // here is where we add the target as root! if (r == 1) { m_size++; // TODO: Need to lock. return addRoot(target, key); } #ifdef DEBUG printf("randomAdd recursion part, "); #endif // otherwise, continue recursing! if (*target <= key) { #ifdef DEBUG printf("target <= key\n"); #endif target->setRight( randomAdd(target->right(), key) ); target->incSize(); // TODO: Need to lock. #ifdef DEBUG if (target->right()->size() != size(target->right())) BUG("in randomAdd 1"); #endif } else { #ifdef DEBUG printf("target > key\n"); #endif target->setLeft( randomAdd(target->left(), key) ); target->incSize(); // TODO: Need to lock. #ifdef DEBUG if (target->left()->size() != size(target->left())) BUG("in randomAdd 2"); #endif } #ifdef DEBUG printf("randomAdd return part\n"); #endif m_size++; // TODO: Need to lock. return target; }; ///////////////////////////////////////////////////////////// ///////////////////// DEL FUNCTIONS //////////////////////// ///////////////////////////////////////////////////////////// /** * Deletes a node with the passed key. * Returns the root node. * Decrements m_size if something was deleted. */ RBSTNode* RBST::del(RBSTNode* target, const Key& key) { countDelete++; if (target == NULL) return NULL; #ifdef DEBUG cout << "del("; target->print(cout) << ", \"" << key << "\") called." << endl; #endif RBSTNode* ret = NULL; // found the node to delete if (*target == key) { ret = join(target->left(), target->right()); m_size--; delete target; return ret; // return the newly built joined subtree! } // store a temporary size before recursive deletion. unsigned int size = m_size; if (*target < key) target->setRight( del(target->right(), key) ); else target->setLeft( del(target->left(), key) ); // if the previous recursion changed the size, we need to decrement the size of this target too. if (m_size < size) target->decrSize(); #ifdef DEBUG if (RBST::size(target) != target->size()) BUG("in del"); #endif return target; }; /** * Joins the two subtrees represented by left and right * by randomly choosing which to make the root, weighted on the * size of the sub-tree. */ RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right) { if (left == NULL) return right; if (right == NULL) return left; #ifdef DEBUG cout << "join("; left->print(cout); cout << ","; right->print(cout) << ") called." << endl; #endif // Find the chance that we use the left tree, based on its size over the total tree size. // 3 sd randomness :-p eg 60.3% chance. bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0)); RBSTNode* subtree = NULL; if (useLeft) { subtree = join(left->right(), right); left->setRight(subtree) ->setSize((left->left() == NULL ? 0 : left->left()->size()) + subtree->size() + 1 ); #ifdef DEBUG if (size(left) != left->size()) BUG("in join 1"); #endif return left; } subtree = join(right->left(), left); right->setLeft(subtree) ->setSize((right->right() == NULL ? 0 : right->right()->size()) + subtree->size() + 1); #ifdef DEBUG if (size(right) != right->size()) BUG("in join 2"); #endif return right; }; ///////////////////////////////////////////////////////////// ///////////////////// FIND FUNCTIONS /////////////////////// ///////////////////////////////////////////////////////////// /** * Tries to find the key in the tree starting * search from target. * * Returns NULL if it was not found. */ RBSTNode* RBST::find(RBSTNode* target, const Key& key) { countFind++; // Could use private method only counting the first call. if (target == NULL) return NULL; // not found. if (*target == key) return target; // found (does string override ==?) if (*target < key) return find(target->right(), key); // search for gt to the right. return find(target->left(), key); // search for lt to the left. }; #ifdef DEBUG unsigned int RBST::size(RBSTNode* x) { if (x == NULL) return 0; return 1 + size(x->left()) + size(x->right()); } #endif 

我将另外保存SkipList,因为已经可以从链接中找到SkipList的良好实现,而且我的版本没有太大的不同。

从测试文件生成的图形如下:

图表显示为BloomFilter,RBST和SkipList添加新项目所花费的时间。 图http://sofzh.miximages.com/c%23/addtimer.png

图表显示为BloomFilter,RBST和SkipList查找项目所花费的时间 http://sofzh.miximages.com/c%23/findtimer.png

图表显示删除BloomFilter,RBST和SkipList 图表 项目所花费的时间 http://sofzh.miximages.com/c%23/deltimer.png

正如您所看到的,随机二叉搜索树比SkipList要好得多。 布隆filter达到其O(k)。

考虑基于哈希的集合,例如HashSetDictionaryHashTable ,它们为添加和删除元素提供恒定的时间性能。

.NET Framework开发人员指南中提供的更多信息:

  • Hashtable和字典集合类型
  • HashSet集合类型

那么,您需要查询多少钱? 链接列表具有快速插入/删除(在任何位置),但不像(例如)字典/排序列表那样快速搜索。 或者,在每个中具有位/值对的直列表 – 即“仍具有值”。 只需在追加之前重复使用逻辑上空的单元格。 删除只清除单元格。

对于引用类型,“null”将在此处执行。 对于值类型, Nullable

您可以使用Hashtable或强类型的Dictionary 。 客户端类可能会覆盖GetHashCode以提供更快的哈希代码生成,或者如果使用Hashtable,您可以选择使用IHashCodeProvider。

你如何找到客户? 是必要的元组/字典吗? 你很可能在Jeffrey Richter的Power Collections库中找到解决问题的东西,它有列表,树,你能想到的大多数数据结构。

Channel9采访Peter Sestoft给我留下了非常深刻的印象:

channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/

他是哥本哈根IT大学的教授,他帮助创建了The C5 Generic Collection Library:

http://www.itu.dk/research/c5/

它可能是矫枉过正,或者它可能只是你正在寻找的快速collections……

心连心,

-麦克风