Sieve of Eratosthenes

@hatecopy - The Other Art Fair

I recently came across a coding problem with a red herring that misdirected me into generating prime numbers. I do not recall ever needing to generate primes and therefore spent some time researching the various algorithms. 

A sieve by definition is used to separate one type of particle from another. Sieve algorithms are used to eliminate numbers that are not prime and usually require a fixed upper limit. 

The oldest and simplest algorithm is called the Sieve of Eratosthenes

A prime number by definition is only divisible by itself and 1. Therefore conversely, multiples of a number cannot be prime. The Sieve of Eratosthenes uses this information to eliminate multiples. At the end of the run we are left with prime numbers.

Example

In the following example, we will generate primes up to the number 14.

We start at the number 2, and eliminate all its multiples. There are two ways to do this

  • use a multiplication table
  • simply add 2 to get the next number to eliminate


We then pick the next number that hasn't been eliminated - 3, and repeat the process by eliminating its multiples. At this step it may be obvious that we are eliminating numbers that have already been eliminated in the prior step. This is a necessary inefficiency of the algorithm.

We could continue repeating the steps (as shown below with 5 and 11) until we run out of numbers to eliminate.  


Optimization - Rather that continue with processing 5, 11 and 13, we can terminate the algorithm once we pass the square root of 14. Any number past the square root would have been a multiple of the lower numbers and would have been eliminated in the prior steps.
At the end of the run, the non eliminated numbers is our list of primes from 1 to 14


Code

In the following code I used a Set to keep track of all numbers that have been eliminated, and use that to create a list of primes. 

def findPrimes(n):

    # Eratosthenes
    notPrimes = { 0 }

    p = 2 # start at 2 which we know to be prime

    # we quit when we reach an index that is the square root of the limit
    # because we would have already processed anything past this
    while (p*p <= n): 

        if p not in notPrimes: # true on first run
            start = p ** 2 # start at the first multiple (of 2)
            while start < n + 1:
                notPrimes.add(start)
                start += p # multiples are really addition of the number
        
        p += 1

    return [ i for i in range(n+1) if i not in notPrimes ]

I have also seen code that uses an N sized array initialized with True to keep track of the numbers that have been eliminated much like the figures in the example above. The indexes that are True at the end of the run are the prime numbers