C#:阿特金筛选的实施

我想知道是否有人在这里有一个很好的实施他们想要分享的阿特金筛选。

我正在尝试实现它,但不能完全包围它。 这是我到目前为止所拥有的。

public class Atkin : IEnumerable { private readonly List primes; private readonly ulong limit; public Atkin(ulong limit) { this.limit = limit; primes = new List(); } private void FindPrimes() { var isPrime = new bool[limit + 1]; var sqrt = Math.Sqrt(limit); for (ulong x = 1; x <= sqrt; x++) for (ulong y = 1; y <= sqrt; y++) { var n = 4*x*x + y*y; if (n <= limit && (n % 12 == 1 || n % 12 == 5)) isPrime[n] ^= true; n = 3*x*x + y*y; if (n  y && n <= limit && n % 12 == 11) isPrime[n] ^= true; } for (ulong n = 5; n <= sqrt; n++) if (isPrime[n]) for (ulong k = n*n; k <= limit; k *= k) isPrime[k] = false; primes.Add(2); primes.Add(3); for (ulong n = 5; n <= limit; n++) if (isPrime[n]) primes.Add(n); } public IEnumerator GetEnumerator() { if (!primes.Any()) FindPrimes(); foreach (var p in primes) yield return p; } IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } } 

我几乎只是试图“翻译” 维基百科上列出的伪代码,但它无法正常工作。 所以要么我误解了某些东西,要么只是做错了什么。 或者最有可能两者……

有一个我用作测试的前500个素数的列表,我的实现在40号(或41?)处失败。

指数[40]的值不同
预计:179
但是:175

您是否能够找到我的错误,您是否有可以分享的实施,或者两者兼而有之?


我使用的确切测试看起来像这样:

 public abstract class AtkinTests { [Test] public void GetEnumerator_FirstFiveHundredNumbers_AreCorrect() { var sequence = new Atkin(2000000); var actual = sequence.Take(500).ToArray(); var expected = First500; CollectionAssert.AreEqual(expected, actual); } private static readonly ulong[] First500 = new ulong[] { 2, 3, 5, 7, 11, 13, 17, ... }; } 

这段代码:

 for (ulong k = n*n; k <= limit; k *= k) isPrime[k] = false; 

似乎不是这个伪代码的忠实翻译:

 is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit} 

你的代码看起来会运行n * n,n ^ 4,n ^ 8等,即每次平方而不是每次都加上n平方。 试试这个:

 ulong nSquared = n * n; for (ulong k = nSquared; k <= limit; k += nSquared) isPrime[k] = false; 

Aaron Mugatroyd的最后一个答案来自于Atkin的Sieve(SoA)翻译的Python源代码并不是太糟糕,但它可以在几个方面得到改进,因为它错过了一些重要的优化,如下所示:

  1. 他的回答并没有使用完整的modulo 60原版Atkin和Bernstein版本的Sieve,而是稍微改进了维基百科文章中伪代码的变化,因此使用大约0.36的数值筛网范围组合切换/剔除操作; 我的代码使用了相当有效的非页面段伪代码,根据我在Atkin的筛选中的评论中的评论使用了数值范围的约0.26倍的因子来将完成的工作量减少到大约一倍。七分之二。

  2. 他的代码通过仅具有奇数表示来减少缓冲区大小,而我的代码进一步用bit pack来消除可被3和5整除的数字的任何表示以及可被“仅有几率”暗示的可被2整除的数字; 这将内存需求减少了近一半(至8/15)并有助于更好地利用CPU缓存,以便由于减少平均内存访问时间而进一步提高速度。

  3. 我的代码使用快速查找表(LUT)弹出计数技术计算素数的数量,与使用他使用的逐位技术的大约一秒相比,几乎没有时间计数; 但是,在此示例代码中,即使从代码的定时部分中取出很小的时间。

  4. 最后,我的代码优化了位操作操作,每个内部循环的代码最少。 例如,它不使用连续的右移一个来产生奇数表示索引,实际上通过将所有内部循环写为常数模(等位位置)操作来实现小位移。 同样,Aaron的翻译代码在操作中是非常低效的,例如在素数正方形自由剔除中,它将素数的平方添加到索引然后检查奇数结果而不是仅仅添加方形的两倍以便不需要检查; 然后,在内循环中执行剔除操作之前,通过将数字右移1(除以2)使得检查更加冗余,就像对所有循环一样。 这种效率低下的代码使用这种“大筛网缓冲arrays”技术对大范围的执行时间没有多大影响,因为每次操作的大部分时间都用于RAM存储器访问(大约37个CPU时钟周期或更长时间用于范围为10亿),但会使执行时间比适合CPU缓存的较小范围所需的时间慢得多; 换句话说,它为每个操作的执行速度设置了太高的最低限制。

代码如下:

 //Sieve of Atkin based on full non page segmented modulo 60 implementation... using System; using System.Collections; using System.Collections.Generic; using System.Linq; namespace NonPagedSoA { //implements the non-paged Sieve of Atkin (full modulo 60 version)... class SoA : IEnumerable { private ushort[] buf = null; private long cnt = 0; private long opcnt = 0; private static byte[] modPRMS = { 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59, 61 }; private static ushort[] modLUT; private static byte[] cntLUT; //initialize the private LUT's... static SoA() { modLUT = new ushort[60]; for (int i = 0, m = 0; i < modLUT.Length; ++i) { if ((i & 1) != 0 || (i + 7) % 3 == 0 || (i + 7) % 5 == 0) modLUT[i] = 0; else modLUT[i] = (ushort)(1 << (m++)); } cntLUT = new byte[65536]; for (int i = 0; i < cntLUT.Length; ++i) { var c = 0; for (int j = i; j > 0; j >>= 1) c += j & 1; cntLUT[i] = (byte)c; } } //initialization and all the work producing the prime bit array done in the constructor... public SoA(ulong range) { this.opcnt = 0; if (range < 7) { if (range > 1) { cnt = 1; if (range > 2) this.cnt += (long)(range - 1) / 2; } this.buf = new ushort[0]; } else { this.cnt = 3; var nrng = range - 7; var lmtw = nrng / 60; //initialize sufficient wheels to non-prime this.buf = new ushort[lmtw + 1]; //Put in candidate primes: //for the 4 * x ^ 2 + y ^ 2 quadratic solution toggles - all x odd y... ulong n = 6; // equivalent to 13 - 7 = 6... for (uint x = 1, y = 3; n <= nrng; n += (x << 3) + 4, ++x, y = 1) { var cb = n; if (x <= 1) n -= 8; //cancel the effect of skipping the first one... for (uint i = 0; i < 15 && cb <= range; cb += (y << 2) + 4, y += 2, ++i) { var cbd = cb / 60; var cm = modLUT[cb % 60]; if (cm != 0) for (uint c = (uint)cbd, my = y + 15; c < buf.Length; c += my, my += 30) { buf[c] ^= cm; // ++this.opcnt; } } } //for the 3 * x ^ 2 + y ^ 2 quadratic solution toggles - x odd y even... n = 0; // equivalent to 7 - 7 = 0... for (uint x = 1, y = 2; n <= nrng; n += ((x + x + x) << 2) + 12, x += 2, y = 2) { var cb = n; for (var i = 0; i < 15 && cb <= range; cb += (y << 2) + 4, y += 2, ++i) { var cbd = cb / 60; var cm = modLUT[cb % 60]; if (cm != 0) for (uint c = (uint)cbd, my = y + 15; c < buf.Length; c += my, my += 30) { buf[c] ^= cm; // ++this.opcnt; } } } //for the 3 * x ^ 2 - y ^ 2 quadratic solution toggles all x and opposite y = x - 1... n = 4; // equivalent to 11 - 7 = 4... for (uint x = 2, y = x - 1; n <= nrng; n += (ulong)(x << 2) + 4, y = x, ++x) { var cb = n; int i = 0; for ( ; y > 1 && i < 15 && cb <= nrng; cb += (ulong)(y << 2) - 4, y -= 2, ++i) { var cbd = cb / 60; var cm = modLUT[cb % 60]; if (cm != 0) { uint c = (uint)cbd, my = y; for ( ; my >= 30 && c < buf.Length; c += my - 15, my -= 30) { buf[c] ^= cm; // ++this.opcnt; } if (my > 0 && c < buf.Length) { buf[c] ^= cm; /* ++this.opcnt; */ } } } if (y == 1 && i < 15) { var cbd = cb / 60; var cm = modLUT[cb % 60]; if ((cm & 0x4822) != 0 && cbd < (ulong)buf.Length) { buf[cbd] ^= cm; /* ++this.opcnt; */ } } } //Eliminate squares of base primes, only for those on the wheel: for (uint i = 0, w = 0, pd = 0, pn = 0, msk = 1; w < this.buf.Length ; ++i) { uint p = pd + modPRMS[pn]; ulong sqr = (ulong)p * (ulong)p; //to handle ranges above UInt32.MaxValue if (sqr > range) break; if ((this.buf[w] & msk) != 0) { //found base prime, square free it... ulong s = sqr - 7; for (int j = 0; s <= nrng && j < modPRMS.Length; s = sqr * modPRMS[j] - 7, ++j) { var cd = s / 60; var cm = (ushort)(modLUT[s % 60] ^ 0xFFFF); //may need ulong loop index for ranges larger than two billion //but buf length only good to about 2^31 * 60 = 120 million anyway, //even with large array setting and half that with 32-bit... for (ulong c = cd; c < (ulong)this.buf.Length; c += sqr) { this.buf[c] &= cm; // ++this.opcnt; } } } if (msk >= 0x8000) { msk = 1; pn = 0; ++w; pd += 60; } else { msk <<= 1; ++pn; } } //clear any overflow primes in the excess space in the last wheel/word: var ndx = nrng % 60; //clear any primes beyond the range for (; modLUT[ndx] == 0; --ndx) ; this.buf[lmtw] &= (ushort)((modLUT[ndx] << 1) - 1); } } //uses a fast pop count Look Up Table to return the total number of primes... public long Count { get { long cnt = this.cnt; for (int i = 0; i < this.buf.Length; ++i) cnt += cntLUT[this.buf[i]]; return cnt; } } //returns the number of toggle/cull operations used to sieve the prime bit array... public long Ops { get { return this.opcnt; } } //generate the enumeration of primes... public IEnumerator GetEnumerator() { yield return 2; yield return 3; yield return 5; ulong pd = 0; for (uint i = 0, w = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) { if ((this.buf[w] & msk) != 0) //found a prime bit... yield return pd + modPRMS[pn]; //add it to the list if (msk >= 0x8000) { msk = 1; pn = 0; ++w; pd += 60; } else { msk <<= 1; ++pn; } } } //required for the above enumeration... IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); } } class Program { static void Main(string[] args) { Console.WriteLine("This program calculates primes by a simple full version of the Sieve of Atkin.\r\n"); const ulong n = 1000000000; var elpsd = -DateTime.Now.Ticks; var gen = new SoA(n); elpsd += DateTime.Now.Ticks; Console.WriteLine("{0} primes found to {1} using {2} operations in {3} milliseconds.", gen.Count, n, gen.Ops, elpsd / 10000); //Output prime list for testing... //Console.WriteLine(); //foreach (var p in gen) { // Console.Write(p + " "); //} //Console.WriteLine(); //Test options showing what one can do with the enumeration, although more slowly... // Console.WriteLine("\r\nThere are {0} primes with the last one {1} and the sum {2}.",gen.Count(),gen.Last(),gen.Sum(x => (long)x)); Console.Write("\r\nPress any key to exit:"); Console.ReadKey(true); Console.WriteLine(); } } } 

此代码的运行速度大约是Aaron代码的两倍(在i7-2700K(3.5 GHz)上使用64位或32位模式约为2.7秒,缓冲区大约为16.5兆字节,大约为25.58亿次组合切换/素数无方块剔除操作(可以通过取消注释“++ this.opcnt”语句显示)筛选范围为10亿,相比之下5.4 / 6.2秒(32位/ 64位)的代码没有计数时间和几乎两次内存使用,使用大约0.359亿次联合切换/剔除操作,筛选多达10亿。

虽然它比他最优化的天然几率实施的非分页的Eratosthenes筛选(SoE) 更快但这并不能使Atkin的筛子比Eratosthenes的Sieve更快 ,就像使用类似的技术一样。以上对SoE plus的SoA实现使用最大轮分解,SoE将与此速度大致相同。

分析:虽然完全优化的SoE的操作数量与筛选范围为10亿的SoA的操作数量大致相同,但是一旦筛选缓冲区大小超过,这些非分页实现的主要瓶颈就是内存访问CPU高速缓存大小(在一个时钟周期访问时为32千字节L1高速缓存,在大约4个时钟周期访问时间为256千字节L2高速缓存,在我的i7大约20个时钟周期访问时间时为8兆字节L3高速缓存),之后内存访问可能超过百个时钟周期。

现在,当人们将算法调整到页面分割时,两者都有大约8倍的内存访问速度提升因子,因此可以筛选出无法满足可用内存的范围。 然而,由于剔除扫描的巨大进步迅速增长到数百倍,由于难以实现算法的“素数无方块”部分,因此筛选范围开始变得非常大,因此SoE继续超过SoA。页面缓冲区的大小。 同样,也许更严重的是,为每个页面缓冲区的最低表示forms计算’x’的每个值的新起点以及另一个相当的’y’的值,它将获得非常大的内存和/或计算密集度。随着范围的增长,分页SoA的效率大幅下降与SoE相比。

EDIT_ADD: Aaron Murgatroyd使用的仅有赔率的SoE使用大约10.26亿次剔除操作,筛选范围为10亿,所以操作次数是SoA的四倍,因此运行速度应该慢四倍,但SoA即使实施也是如此这里有一个更复杂的内环,特别是由于更高比例的几率,只有SoE剔除在剔除扫描中的步幅比SoA的步幅短得多,天真的几率SoE具有更好的平均内存访问时间尽管筛缓冲区大大超过了CPU缓存大小(更好地利用了缓存关联性)。 这就解释了为什么上述SoA的速度仅为仅有几率的SoE的两倍,尽管理论上它似乎只能完成四分之一的工作。

如果使用与上述SoA相同的恒模数内循环的类似算法并实施相同的2/3/5车轮分解,则SoE会将剔除操作的数量减少到约4,050亿次操作,因此仅增加约50%操作比SoA,理论上运行速度稍微慢于SoA,但可能会以大约相同的速度运行,因为对于这种“天真”的大内存缓冲区使用,平均步幅仍然比SoA平均小一些。 将车轮分解增加到2/3/5/7车轮意味着对于10亿的剔除范围,SoE剔除操作减少到大约0.314,并且可能使该版本的SoE以相同的速度运行。

进一步使用车轮分解可以通过预先剔除2/3/5/7/11/13/17/19素数因子的筛arrays(模式复制),几乎不花费执行时间来减少筛选范围为10亿的剔除操作总数约为2.51亿,而且即使对于这些大型内存缓冲版本,SoE也会比SoA运行速度更快或速度更快,SoE仍然有很多代码复杂度低于上述代码。

因此,可以看出,SoE的操作次数可以从幼稚或甚至仅赔率或2/3/5轮分解版本大大减少,使得操作的数量与SoA的操作数量大致相同。同时,由于不太复杂的内部循环和更有效的内存访问,每个操作的时间实际上可能更少。 END_EDIT_ADD

EDIT_ADD2:我在这里使用类似的常量模/位位置技术为最内层循环添加代码,如上面的SoA,根据上面链接的答案下面的伪代码。 尽管采用了高轮分解和预剔除,但是剔除操作的总数实际上小于SoA的组合切换/剔除操作直到筛分范围,因此代码比上述SoA复杂得多。大约20亿。 代码如下:

EDIT_FINAL更正了以下代码以及与之相关的评论END_EDIT_FINAL

 //Sieve of Eratosthenes based on maximum wheel factorization and pre-culling implementation... using System; using System.Collections; using System.Collections.Generic; namespace NonPagedSoE { //implements the non-paged Sieve of Eratosthenes (full modulo 210 version with preculling)... class SoE : IEnumerable { private ushort[] buf = null; private long cnt = 0; private long opcnt = 0; private static byte[] basePRMS = { 2, 3, 5, 7, 11, 13, 17, 19 }; private static byte[] modPRMS = { 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, //positions + 23 97, 101, 103, 107, 109, 113, 121, 127, 131, 137, 139, 143, 149, 151, 157, 163, 167, 169, 173, 179, 181 ,187 ,191 ,193, 197, 199, 209, 211, 221, 223, 227, 229 }; private static byte[] gapsPRMS = { 6, 2, 6, 4, 2, 4, 6, 6, 2, 6, 4, 2, 6, 4, 6, 8, 4, 2, 4, 2, 4, 8, 6, 4, 6, 2, 4, 6, 2, 6, 6, 4, 2, 4, 6, 2, 6, 4, 2, 4, 2, 10, 2, 10, 2, 4, 2, 4 }; private static ulong[] modLUT; private static byte[] cntLUT; //initialize the private LUT's... static SoE() { modLUT = new ulong[210]; for (int i = 0, m = 0; i < modLUT.Length; ++i) { if ((i & 1) != 0 || (i + 23) % 3 == 0 || (i + 23) % 5 == 0 || (i + 23) % 7 == 0) modLUT[i] = 0; else modLUT[i] = 1UL << (m++); } cntLUT = new byte[65536]; for (int i = 0; i < cntLUT.Length; ++i) { var c = 0; for (int j = i ^ 0xFFFF; j > 0; j >>= 1) c += j & 1; //reverse logic; 0 is prime; 1 is composite cntLUT[i] = (byte)c; } } //initialization and all the work producing the prime bit array done in the constructor... public SoE(ulong range) { this.opcnt = 0; if (range < 23) { if (range > 1) { for (int i = 0; i < modPRMS.Length; ++i) if (modPRMS[i] <= range) this.cnt++; else break; } this.buf = new ushort[0]; } else { this.cnt = 8; var nrng = range - 23; var lmtw = nrng / 210; var lmtwt3 = lmtw * 3; //initialize sufficient wheels to prime this.buf = new ushort[lmtwt3 + 3]; //initial state of all zero's is all potential prime. //initialize array to account for preculling the primes of 11, 13, 17, and 19; //(2, 3, 5, and 7 already eliminated by the bit packing to residues). for (int pn = modPRMS.Length - 4; pn < modPRMS.Length; ++pn) { uint p = modPRMS[pn] - 210u; ulong pt3 = p * 3; ulong s = p * p - 23; ulong xrng = Math.Min(9699709, nrng); // only do for the repeating master pattern size ulong nwrds = (ulong)Math.Min(138567, this.buf.Length); for (int j = 0; s <= xrng && j < modPRMS.Length; s += p * gapsPRMS[(pn + j++) % 48]) { var sm = modLUT[s % 210]; var si = (sm < (1UL << 16)) ? 0UL : ((sm < (1UL << 32)) ? 1UL : 2UL); var cd = s / 210 * 3 + si; var cm = (ushort)(sm >> (int)(si << 4)); for (ulong c = cd; c < nwrds; c += pt3) { //tight culling loop for size of master pattern this.buf[c] |= cm; // ++this.opcnt; //reverse logic; mark composites with ones. } } } //Now copy the master pattern so it repeats across the main buffer, allow for overflow... for (long i = 138567; i < this.buf.Length; i += 138567) if (i + 138567 <= this.buf.Length) Array.Copy(this.buf, 0, this.buf, i, 138567); else Array.Copy(this.buf, 0, this.buf, i, this.buf.Length - i); //Eliminate all composites which are factors of base primes, only for those on the wheel: for (uint i = 0, w = 0, wi = 0, pd = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) { uint p = pd + modPRMS[pn]; ulong sqr = (ulong)p * (ulong)p; if (sqr > range) break; if ((this.buf[w] & msk) == 0) { //found base prime, mark its composites... ulong s = sqr - 23; ulong pt3 = p * 3; for (int j = 0; s <= nrng && j < modPRMS.Length; s += p * gapsPRMS[(pn + j++) % 48]) { var sm = modLUT[s % 210]; var si = (sm < (1UL << 16)) ? 0UL : ((sm < (1UL << 32)) ? 1UL : 2UL); var cd = s / 210 * 3 + si; var cm = (ushort)(sm >> (int)(si << 4)); for (ulong c = cd; c < (ulong)this.buf.Length; c += pt3) { //tight culling loop this.buf[c] |= cm; // ++this.opcnt; //reverse logic; mark composites with ones. } } } ++pn; if (msk >= 0x8000) { msk = 1; ++w; ++wi; if (wi == 3) { wi = 0; pn = 0; pd += 210; } } else msk <<= 1; } //clear any overflow primes in the excess space in the last wheel/word: var ndx = nrng % 210; //clear any primes beyond the range for (; modLUT[ndx] == 0; --ndx) ; var cmsk = (~(modLUT[ndx] - 1)) << 1; //force all bits above to be composite ones. this.buf[lmtwt3++] |= (ushort)cmsk; this.buf[lmtwt3++] |= (ushort)(cmsk >> 16); this.buf[lmtwt3] |= (ushort)(cmsk >> 32); } } //uses a fast pop count Look Up Table to return the total number of primes... public long Count { get { long cnt = this.cnt; for (int i = 0; i < this.buf.Length; ++i) cnt += cntLUT[this.buf[i]]; return cnt; } } //returns the number of cull operations used to sieve the prime bit array... public long Ops { get { return this.opcnt; } } //generate the enumeration of primes... public IEnumerator GetEnumerator() { yield return 2; yield return 3; yield return 5; yield return 7; yield return 11; yield return 13; yield return 17; yield return 19; ulong pd = 0; for (uint i = 0, w = 0, wi = 0, pn = 0, msk = 1; w < this.buf.Length; ++i) { if ((this.buf[w] & msk) == 0) //found a prime bit... yield return pd + modPRMS[pn]; ++pn; if (msk >= 0x8000) { msk = 1; ++w; ++wi; if (wi == 3) { wi = 0; pn = 0; pd += 210; } } else msk <<= 1; } } //required for the above enumeration... IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); } } class Program { static void Main(string[] args) { Console.WriteLine("This program calculates primes by a simple maximually wheel factorized version of the Sieve of Eratosthenes.\r\n"); const ulong n = 1000000000; var elpsd = -DateTime.Now.Ticks; var gen = new SoE(n); elpsd += DateTime.Now.Ticks; Console.WriteLine("{0} primes found to {1} using {2} operations in {3} milliseconds.", gen.Count, n, gen.Ops, elpsd / 10000); // Console.WriteLine(); // foreach (var p in gen) { // Console.Write(p + " "); // } // Console.WriteLine(); // Console.WriteLine("\r\nThere are {0} primes with the last one {1} and the sum {2}.",gen.Count(),gen.Last(),gen.Sum(x => (long)x)); Console.Write("\r\nPress any key to exit:"); Console.ReadKey(true); Console.WriteLine(); } } } 

这个代码实际运行速度比上面的SoA快了几个百分点,因为操作稍微少一点,这个大数组大小的主要瓶颈是十亿分之一的存储器访问时间,大概是40到100个CPU时钟周期取决于CPU和内存规格; 这意味着代码优化(除了减少操作总数)是无效的,因为大部分时间都花在等待内存访问上。 无论如何,使用巨大的内存缓冲区并不是筛选大范围的最有效方法,使用具有相同最大轮分解的页面分割的SoE最多可提高8倍(这也为此铺平了道路多处理)。

正是在实现页面分割和多处理时,与SoE相比,SoA实际上缺少超过40亿的范围,因为由于SoA的渐近复杂度降低而导致的任何增益都会被与页面处理相关的页面处理开销因素迅速消耗掉。素数广场免费处理和计算更大数量的页面起始地址; 或者,人们通过将标记存储在RAM存储器中以存储器消耗的巨大成本以及访问这些标记存储结构的进一步低效来克服这一点。 END_EDIT_ADD2

简而言之,与全轮分解SoE相比,SoA实际上并不是一个实用的筛子,因为正如渐近复杂度的增益开始使性能接近完全优化的SoE,它开始失去效率,因为关于相对存储器访问时间和页面分割复杂性以及通常更复杂和难以编写的实际实现的细节。 在我看来,与SoE相比,它更像是一个有趣的智力概念和心理练习,而不是一个实用的筛子。

有一天,我会将这些技术应用到一个multithreading页面分段的Eratosthenes Sieve,在C#中的速度与Atkin和Bernstein在’C’中实现SoA的“primegen”速度一样快,并且会在大范围内将其吹出水面在我的i7(包括超线程在内的八个内核)上进行multithreading处理时,大约有四十亿甚至单线程,速度最高可提升四倍。

这是另一个实现。 它使用BitArray来节省内存。 Parallel.For需要.NET Framework 4。

 static List FindPrimesBySieveOfAtkins(int max) { // var isPrime = new BitArray((int)max+1, false); // Can't use BitArray because of threading issues. var isPrime = new bool[max + 1]; var sqrt = (int)Math.Sqrt(max); Parallel.For(1, sqrt, x => { var xx = x * x; for (int y = 1; y <= sqrt; y++) { var yy = y * y; var n = 4 * xx + yy; if (n <= max && (n % 12 == 1 || n % 12 == 5)) isPrime[n] ^= true; n = 3 * xx + yy; if (n <= max && n % 12 == 7) isPrime[n] ^= true; n = 3 * xx - yy; if (x > y && n <= max && n % 12 == 11) isPrime[n] ^= true; } }); var primes = new List() { 2, 3 }; for (int n = 5; n <= sqrt; n++) { if (isPrime[n]) { primes.Add(n); int nn = n * n; for (int k = nn; k <= max; k += nn) isPrime[k] = false; } } for (int n = sqrt + 1; n <= max; n++) if (isPrime[n]) primes.Add(n); return primes; } 

这是Atkin的Sieve的更快实现,我在这里从这个Python脚本中窃取了算法(我不相信算法):

http://programmingpraxis.com/2010/02/19/sieve-of-atkin-improved/

 using System; using System.Collections; using System.Collections.Generic; namespace PrimeGenerator { // The block element type for the bit array, // use any unsigned value. WARNING: UInt64 is // slower even on x64 architectures. using BitArrayType = System.UInt32; // This should never be any bigger than 256 bits - leave as is. using BitsPerBlockType = System.Byte; // The prime data type, this can be any unsigned value, the limit // of this type determines the limit of Prime value that can be // found. WARNING: UInt64 is slower even on x64 architectures. using PrimeType = System.Int32; ///  /// Calculates prime number using the Sieve of Eratosthenes method. ///  ///  ///  /// var lpPrimes = new Eratosthenes(1e7); /// foreach (UInt32 luiPrime in lpPrimes) /// Console.WriteLine(luiPrime); ///  public class Atkin : IEnumerable { #region Constants ///  /// Constant for number of bits per block, calculated based on size of BitArrayType. ///  const BitsPerBlockType cbBitsPerBlock = sizeof(BitArrayType) * 8; #endregion #region Protected Locals ///  /// The limit for the maximum prime value to find. ///  protected readonly PrimeType mpLimit; ///  /// The number of primes calculated or null if not calculated yet. ///  protected PrimeType? mpCount = null; ///  /// The current bit array where a set bit means /// the odd value at that location has been determined /// to not be prime. ///  protected BitArrayType[] mbaOddPrime; #endregion #region Initialisation ///  /// Create Sieve of Atkin generator. ///  /// The limit for the maximum prime value to find. public Atkin(PrimeType limit) { // Check limit range if (limit > PrimeType.MaxValue - (PrimeType)Math.Sqrt(PrimeType.MaxValue)) throw new ArgumentOutOfRangeException(); mpLimit = limit; FindPrimes(); } #endregion #region Private Methods ///  /// Finds the prime number within range. ///  private unsafe void FindPrimes() { // Allocate bit array. mbaOddPrime = new BitArrayType[(((mpLimit >> 1) + 1) / cbBitsPerBlock) + 1]; PrimeType lpYLimit, lpN, lpXX3, lpXX4, lpDXX, lpDN, lpDXX4, lpXX, lpX, lpYY, lpMinY, lpS, lpK; fixed (BitArrayType* lpbOddPrime = &mbaOddPrime[0]) { // n = 3x^2 + y^2 section lpXX3 = 3; for (lpDXX = 0; lpDXX < 12 * SQRT((mpLimit - 1) / 3); lpDXX += 24) { lpXX3 += lpDXX; lpYLimit = (12 * SQRT(mpLimit - lpXX3)) - 36; lpN = lpXX3 + 16; for (lpDN = -12; lpDN < lpYLimit + 1; lpDN += 72) { lpN += lpDN; lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } lpN = lpXX3 + 4; for (lpDN = 12; lpDN < lpYLimit + 1; lpDN += 72) { lpN += lpDN; lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } } // # n = 4x^2 + y^2 section lpXX4 = 0; for (lpDXX4 = 4; lpDXX4 < 8 * SQRT((mpLimit - 1) / 4) + 4; lpDXX4 += 8) { lpXX4 += lpDXX4; lpN = lpXX4 + 1; if ((lpXX4 % 3) != 0) { for (lpDN = 0; lpDN < (4 * SQRT(mpLimit - lpXX4)) - 3; lpDN += 8) { lpN += lpDN; lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } } else { lpYLimit = (12 * SQRT(mpLimit - lpXX4)) - 36; lpN = lpXX4 + 25; for (lpDN = -24; lpDN < lpYLimit + 1; lpDN += 72) { lpN += lpDN; lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } lpN = lpXX4 + 1; for (lpDN = 24; lpDN < lpYLimit + 1; lpDN += 72) { lpN += lpDN; lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } } } // # n = 3x^2 - y^2 section lpXX = 1; for (lpX = 3; lpX < SQRT(mpLimit / 2) + 1; lpX += 2) { lpXX += 4 * lpX - 4; lpN = 3 * lpXX; if (lpN > mpLimit) { lpMinY = ((SQRT(lpN - mpLimit) >> 2) << 2); lpYY = lpMinY * lpMinY; lpN -= lpYY; lpS = 4 * lpMinY + 4; } else lpS = 4; for (lpDN = lpS; lpDN < 4 * lpX; lpDN += 8) { lpN -= lpDN; if (lpN <= mpLimit && lpN % 12 == 11) lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock)); } } // xx = 0 lpXX = 0; for (lpX = 2; lpX < SQRT(mpLimit / 2) + 1; lpX += 2) { lpXX += 4*lpX - 4; lpN = 3*lpXX; if (lpN > mpLimit) { lpMinY = ((SQRT(lpN - mpLimit) >> 2) << 2) - 1; lpYY = lpMinY * lpMinY; lpN -= lpYY; lpS = 4*lpMinY + 4; } else { lpN -= 1; lpS = 0; } for (lpDN = lpS; lpDN < 4 * lpX; lpDN += 8) { lpN -= lpDN; if (lpN <= mpLimit && lpN % 12 == 11) lpbOddPrime[(lpN>>1) / cbBitsPerBlock] ^= (BitArrayType)((BitArrayType)1 << (int)((lpN>>1) % cbBitsPerBlock)); } } // # eliminate squares for (lpN = 5; lpN < SQRT(mpLimit) + 1; lpN += 2) if ((lpbOddPrime[(lpN >> 1) / cbBitsPerBlock] & ((BitArrayType)1 << (int)((lpN >> 1) % cbBitsPerBlock))) != 0) for (lpK = lpN * lpN; lpK < mpLimit; lpK += lpN * lpN) if ((lpK & 1) == 1) lpbOddPrime[(lpK >> 1) / cbBitsPerBlock] &= (BitArrayType)~((BitArrayType)1 << (int)((lpK >> 1) % cbBitsPerBlock)); } } ///  /// Calculates the truncated square root for a number. ///  /// The value to get the square root for. /// The truncated sqrt of the value. private unsafe PrimeType SQRT(PrimeType value) { return (PrimeType)Math.Sqrt(value); } ///  /// Gets a bit value by index. ///  /// The blocks containing the bits. /// The index of the bit. /// True if bit is set, false if cleared. private bool GetBitSafe(BitArrayType[] bits, PrimeType index) { if ((index & 1) == 1) return (bits[(index >> 1) / cbBitsPerBlock] & ((BitArrayType)1 << (int)((index >> 1) % cbBitsPerBlock))) != 0; else return false; } #endregion #region Public Properties ///  /// Get the limit for the maximum prime value to find. ///  public PrimeType Limit { get { return mpLimit; } } ///  /// Returns the number of primes found in the range. ///  public PrimeType Count { get { if (!mpCount.HasValue) { PrimeType lpCount = 0; foreach (PrimeType liPrime in this) lpCount++; mpCount = lpCount; } return mpCount.Value; } } ///  /// Determines if a value in range is prime or not. ///  /// The value to test for primality. /// True if the value is prime, false otherwise. public bool this[PrimeType test] { get { if (test > mpLimit) throw new ArgumentOutOfRangeException(); if (test <= 1) return false; if (test == 2) return true; if ((test & 1) == 0) return false; return !GetBitSafe(mbaOddPrime, test >> 1); } } #endregion #region Public Methods ///  /// Gets the enumerator for the primes. ///  /// The enumerator of the primes. public IEnumerator GetEnumerator() { // return [2,3] + filter(primes.__getitem__, xrange(5,limit,2)) // Two & Three always prime. yield return 2; yield return 3; // Start at first block, third MSB (5). int liBlock = 0; byte lbBit = 2; BitArrayType lbaCurrent = mbaOddPrime[0] >> lbBit; // For each value in range stepping in incrments of two for odd values. for (PrimeType lpN = 5; lpN <= mpLimit; lpN += 2) { // If current bit not set then value is prime. if ((lbaCurrent & 1) == 1) yield return lpN; // Move to NSB. lbaCurrent >>= 1; // Increment bit value. lbBit++; // If block is finished. if (lbBit == cbBitsPerBlock) { lbBit = 0; lbaCurrent = mbaOddPrime[++liBlock]; //// Move to first bit of next block skipping full blocks. while (lbaCurrent == 0) { lpN += ((PrimeType)cbBitsPerBlock) << 1; if (lpN <= mpLimit) lbaCurrent = mbaOddPrime[++liBlock]; else break; } } } } #endregion #region IEnumerable Implementation ///  /// Gets the enumerator for the primes. ///  ///  IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } #endregion } } 

Its close in speed to my most optimised version of the Sieve of Eratosthenes, but its still slower by about 20%, it can be found here:

https://stackoverflow.com/a/9700790/738380

Heres mine, it uses a class called CompartmentalisedParallel which allows you to perform parallel for loops but control the number of threads so that the indexes are grouped up. However, due to the threading issues you need to either lock the BitArray each time it is altered or create a separate BitArray for each thread and then XOR them together at the end, the first option was pretty slow because the amount of locks, the second option seemed faster for me!

 using System; using System.Collections; using System.Collections.Generic; using System.Threading.Tasks; namespace PrimeGenerator { public class Atkin : Primes { protected BitArray mbaPrimes; protected bool mbThreaded = true; public Atkin(int limit) : this(limit, true) { } public Atkin(int limit, bool threaded) : base(limit) { mbThreaded = threaded; if (mbaPrimes == null) FindPrimes(); } public bool Threaded { get { return mbThreaded; } } public override IEnumerator GetEnumerator() { yield return 2; yield return 3; for (int lsN = 5; lsN <= msLimit; lsN += 2) if (mbaPrimes[lsN]) yield return lsN; } private void FindPrimes() { mbaPrimes = new BitArray(msLimit + 1, false); int lsSQRT = (int)Math.Sqrt(msLimit); int[] lsSquares = new int[lsSQRT + 1]; for (int lsN = 0; lsN <= lsSQRT; lsN++) lsSquares[lsN] = lsN * lsN; if (Threaded) { CompartmentalisedParallel.For( 1, lsSQRT + 1, new ParallelOptions(), (start, finish) => { return new BitArray(msLimit + 1, false); }, (lsX, lsState, lbaLocal) => { int lsX2 = lsSquares[lsX]; for (int lsY = 1; lsY <= lsSQRT; lsY++) { int lsY2 = lsSquares[lsY]; int lsN = 4 * lsX2 + lsY2; if (lsN <= msLimit && (lsN % 12 == 1 || lsN % 12 == 5)) lbaLocal[lsN] ^= true; lsN -= lsX2; if (lsN <= msLimit && lsN % 12 == 7) lbaLocal[lsN] ^= true; if (lsX > lsY) { lsN -= lsY2 * 2; if (lsN <= msLimit && lsN % 12 == 11) lbaLocal[lsN] ^= true; } } return lbaLocal; }, (lbaResult, start, finish) => { lock (mbaPrimes) mbaPrimes.Xor(lbaResult); }, -1 ); } else { for (int lsX = 1; lsX <= lsSQRT; lsX++) { int lsX2 = lsSquares[lsX]; for (int lsY = 1; lsY <= lsSQRT; lsY++) { int lsY2 = lsSquares[lsY]; int lsN = 4 * lsX2 + lsY2; if (lsN <= msLimit && (lsN % 12 == 1 || lsN % 12 == 5)) mbaPrimes[lsN] ^= true; lsN -= lsX2; if (lsN <= msLimit && lsN % 12 == 7) mbaPrimes[lsN] ^= true; if (lsX > lsY) { lsN -= lsY2 * 2; if (lsN <= msLimit && lsN % 12 == 11) mbaPrimes[lsN] ^= true; } } } } for (int lsN = 5; lsN < lsSQRT; lsN += 2) if (mbaPrimes[lsN]) { var lsS = lsSquares[lsN]; for (int lsK = lsS; lsK <= msLimit; lsK += lsS) mbaPrimes[lsK] = false; } } } } 

And the CompartmentalisedParallel class:

 using System; using System.Threading.Tasks; namespace PrimeGenerator { public static class CompartmentalisedParallel { #region Int private static int[] CalculateCompartments(int startInclusive, int endExclusive, ref int threads) { if (threads == 0) threads = 1; if (threads == -1) threads = Environment.ProcessorCount; if (threads > endExclusive - startInclusive) threads = endExclusive - startInclusive; int[] liThreadIndexes = new int[threads + 1]; liThreadIndexes[threads] = endExclusive - 1; int liIndexesPerThread = (endExclusive - startInclusive) / threads; for (int liCount = 0; liCount < threads; liCount++) liThreadIndexes[liCount] = liCount * liIndexesPerThread; return liThreadIndexes; } public static void For( int startInclusive, int endExclusive, ParallelOptions parallelOptions, Func localInit, Func body, Action localFinally, int threads) { int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread, lsState) => { TLocal llLocal = localInit(liThreadIndexes[liThread], liThreadIndexes[liThread + 1]); for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter, lsState, llLocal); localFinally(llLocal, liThreadIndexes[liThread], liThreadIndexes[liThread + 1]); } ); else { TLocal llLocal = localInit(startInclusive, endExclusive); for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter, null, llLocal); localFinally(llLocal, startInclusive, endExclusive); } } public static void For( int startInclusive, int endExclusive, ParallelOptions parallelOptions, Action body, int threads) { int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread, lsState) => { for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter, lsState); } ); else for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter, null); } public static void For( int startInclusive, int endExclusive, ParallelOptions parallelOptions, Action body, int threads) { int[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread) => { for (int liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter); } ); else for (int liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter); } public static void For( int startInclusive, int endExclusive, Action body, int threads) { For(startInclusive, endExclusive, new ParallelOptions(), body, threads); } public static void For( int startInclusive, int endExclusive, Action body, int threads) { For(startInclusive, endExclusive, new ParallelOptions(), body, threads); } public static void For( int startInclusive, int endExclusive, Func localInit, Func body, Action localFinally, int threads) { For(startInclusive, endExclusive, new ParallelOptions(), localInit, body, localFinally, threads); } #endregion #region Long private static long[] CalculateCompartments(long startInclusive, long endExclusive, ref long threads) { if (threads == 0) threads = 1; if (threads == -1) threads = Environment.ProcessorCount; if (threads > endExclusive - startInclusive) threads = endExclusive - startInclusive; long[] liThreadIndexes = new long[threads + 1]; liThreadIndexes[threads] = endExclusive - 1; long liIndexesPerThread = (endExclusive - startInclusive) / threads; for (long liCount = 0; liCount < threads; liCount++) liThreadIndexes[liCount] = liCount * liIndexesPerThread; return liThreadIndexes; } public static void For( long startInclusive, long endExclusive, ParallelOptions parallelOptions, Func localInit, Func body, Action localFinally, long threads) { long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread, lsState) => { TLocal llLocal = localInit(liThreadIndexes[liThread], liThreadIndexes[liThread + 1]); for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter, lsState, llLocal); localFinally(llLocal, liThreadIndexes[liThread], liThreadIndexes[liThread + 1]); } ); else { TLocal llLocal = localInit(startInclusive, endExclusive); for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter, null, llLocal); localFinally(llLocal, startInclusive, endExclusive); } } public static void For( long startInclusive, long endExclusive, ParallelOptions parallelOptions, Action body, long threads) { long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread, lsState) => { for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter, lsState); } ); else for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter, null); } public static void For( long startInclusive, long endExclusive, ParallelOptions parallelOptions, Action body, long threads) { long[] liThreadIndexes = CalculateCompartments(startInclusive, endExclusive, ref threads); if (threads > 1) Parallel.For( 0, threads, parallelOptions, (liThread) => { for (long liCounter = liThreadIndexes[liThread]; liCounter < liThreadIndexes[liThread + 1]; liCounter++) body(liCounter); } ); else for (long liCounter = startInclusive; liCounter < endExclusive; liCounter++) body(liCounter); } public static void For( long startInclusive, long endExclusive, Action body, long threads) { For(startInclusive, endExclusive, new ParallelOptions(), body, threads); } public static void For( long startInclusive, long endExclusive, Action body, long threads) { For(startInclusive, endExclusive, new ParallelOptions(), body, threads); } public static void For( long startInclusive, long endExclusive, Func localInit, Func body, Action localFinally, long threads) { For(startInclusive, endExclusive, new ParallelOptions(), localInit, body, localFinally, threads); } #endregion } } 

Primes base class:

 using System.Collections; using System.Collections.Generic; namespace PrimeGenerator { public abstract class Primes : IEnumerable { protected readonly int msLimit; public Primes(int limit) { msLimit = limit; } public int Limit { get { return msLimit; } } public int Count { get { int liCount = 0; foreach (int liPrime in this) liCount++; return liCount; } } public abstract IEnumerator GetEnumerator(); IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } } } 

Use it by doing the following:

  var lpPrimes = new Atkin(count, true); Console.WriteLine(lpPrimes.Count); Console.WriteLine(s.ElapsedMilliseconds); 

However, i found the Eratosthenes to be quicker in all cases, even with a four core CPU running in multithreaded mode for the Atkin:

 using System; using System.Collections; using System.Collections.Generic; namespace PrimeGenerator { public class Eratosthenes : Primes { protected BitArray mbaOddEliminated; public Eratosthenes(int limit) : base(limit) { if (mbaOddEliminated == null) FindPrimes(); } public override IEnumerator GetEnumerator() { yield return 2; for (int lsN = 3; lsN <= msLimit; lsN+=2) if (!mbaOddEliminated[lsN>>1]) yield return lsN; } private void FindPrimes() { mbaOddEliminated = new BitArray((msLimit>>1) + 1); int lsSQRT = (int)Math.Sqrt(msLimit); for (int lsN = 3; lsN < lsSQRT + 1; lsN += 2) if (!mbaOddEliminated[lsN>>1]) for (int lsM = lsN*lsN; lsM <= msLimit; lsM += lsN<<1) mbaOddEliminated[lsM>>1] = true; } } } 

If you get the Atkin to run faster, please let me know!

Heres an improvement of the Sieve of Eratosthenes using custom FixBitArrays and unsafe code for speed results, this is about 225% faster than my previous Eratosthenes algorithm, and the class is standalone (this is not multithreaded – Eratosthenes is not compatible with multi threading), On an AMD Phenom II X4 965 Processor I can calculate Primes to 1,000,000,000 limit in 9,261 ms:

 using System; using System.Collections; using System.Collections.Generic; namespace PrimeGenerator { // The block element type for the bit array, // use any unsigned value. WARNING: UInt64 is // slower even on x64 architectures. using BitArrayType = System.UInt32; // This should never be any bigger than 256 bits - leave as is. using BitsPerBlockType = System.Byte; // The prime data type, this can be any unsigned value, the limit // of this type determines the limit of Prime value that can be // found. WARNING: UInt64 is slower even on x64 architectures. using PrimeType = System.UInt32; ///  /// Calculates prime number using the Sieve of Eratosthenes method. ///  ///  ///  /// var lpPrimes = new Eratosthenes(1e7); /// foreach (UInt32 luiPrime in lpPrimes) /// Console.WriteLine(luiPrime); ///  public class Eratosthenes : IEnumerable { #region Constants ///  /// Constant for number of bits per block, calculated based on size of BitArrayType. ///  const BitsPerBlockType cbBitsPerBlock = sizeof(BitArrayType) * 8; #endregion #region Protected Locals ///  /// The limit for the maximum prime value to find. ///  protected readonly PrimeType mpLimit; ///  /// The current bit array where a set bit means /// the odd value at that location has been determined /// to not be prime. ///  protected BitArrayType[] mbaOddNotPrime; #endregion #region Initialisation ///  /// Create Sieve of Eratosthenes generator. ///  /// The limit for the maximum prime value to find. public Eratosthenes(PrimeType limit) { // Check limit range if (limit > PrimeType.MaxValue - (PrimeType)Math.Sqrt(PrimeType.MaxValue)) throw new ArgumentOutOfRangeException(); mpLimit = limit; FindPrimes(); } #endregion #region Private Methods ///  /// Finds the prime number within range. ///  private unsafe void FindPrimes() { // Allocate bit array. mbaOddNotPrime = new BitArrayType[(((mpLimit >> 1) + 1) / cbBitsPerBlock) + 1]; // Cache Sqrt of limit. PrimeType lpSQRT = (PrimeType)Math.Sqrt(mpLimit); // Fix the bit array for pointer access fixed (BitArrayType* lpbOddNotPrime = &mbaOddNotPrime[0]) // Scan primes up to lpSQRT for (PrimeType lpN = 3; lpN <= lpSQRT; lpN += 2) // If the current bit value for index lpN is cleared (prime) if ( ( lpbOddNotPrime[(lpN >> 1) / cbBitsPerBlock] & ((BitArrayType)1 << (BitsPerBlockType)((lpN >> 1) % cbBitsPerBlock)) ) == 0 ) // Leave it cleared (prime) and mark all multiples of lpN*2 from lpN*lpN as not prime for (PrimeType lpM = lpN * lpN; lpM <= mpLimit; lpM += lpN << 1) // Set as not prime lpbOddNotPrime[(lpM >> 1) / cbBitsPerBlock] |= (BitArrayType)((BitArrayType)1 << (BitsPerBlockType)((lpM >> 1) % cbBitsPerBlock)); } ///  /// Gets a bit value by index. ///  /// The blocks containing the bits. /// The index of the bit. /// True if bit is set, false if cleared. private bool GetBitSafe(BitArrayType[] bits, PrimeType index) { return (bits[index / cbBitsPerBlock] & ((BitArrayType)1 << (BitsPerBlockType)(index % cbBitsPerBlock))) != 0; } #endregion #region Public Properties ///  /// Get the limit for the maximum prime value to find. ///  public PrimeType Limit { get { return mpLimit; } } ///  /// Returns the number of primes found in the range. ///  public PrimeType Count { get { PrimeType lptCount = 0; foreach (PrimeType liPrime in this) lptCount++; return lptCount; } } ///  /// Determines if a value in range is prime or not. ///  /// The value to test for primality. /// True if the value is prime, false otherwise. public bool this[PrimeType test] { get { if (test > mpLimit) throw new ArgumentOutOfRangeException(); if (test <= 1) return false; if (test == 2) return true; if ((test & 1) == 0) return false; return !GetBitSafe(mbaOddNotPrime, test >> 1); } } #endregion #region Public Methods ///  /// Gets the enumerator for the primes. ///  /// The enumerator of the primes. public IEnumerator GetEnumerator() { // Two always prime. yield return 2; // Start at first block, second MSB. int liBlock = 0; byte lbBit = 1; BitArrayType lbaCurrent = mbaOddNotPrime[0] >> 1; // For each value in range stepping in incrments of two for odd values. for (PrimeType lpN = 3; lpN <= mpLimit; lpN += 2) { // If current bit not set then value is prime. if ((lbaCurrent & 1) == 0) yield return lpN; // Move to NSB. lbaCurrent >>= 1; // Increment bit value. lbBit++; // If block is finished. if (lbBit == cbBitsPerBlock) { // Move to first bit of next block. lbBit = 0; liBlock++; lbaCurrent = mbaOddNotPrime[liBlock]; } } } #endregion #region IEnumerable Implementation ///  /// Gets the enumerator for the primes. ///  /// The enumerator for the prime numbers. IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } #endregion } } 

Primes found in 1,000,000,000: 50,847,534 in 9,261 ms