The Project Euler Experience: 3 - Prime Factors and Foundations

The Problem

The prime factors for 13,195 are 5, 7, 13, and 29.

What is the largest prime factor of the number 600,851,475,143?

Background

Prime numbers hold a special place in mathematics. There are super computers dedicated to calculating higher and higher primes. And the distinction between prime and composite numbers was discovered early on. There are indications that the ancient Egyptians knew about prime numbers, based on translations of the Rhind papyrus. And by the ancient Greeks, around the time of Alexander the Great, people definitely knew about primes and were even crafting theorems about them.

Euclid, known as the father of Geometry, and famous for compiling almost all mathematical knowledge at the time into his great works Elements, has an interesting proof regarding the infinite nature of primes.

Lets assume that there aren't an infinite number of primes. And somebody was able to give you list of prime numbers $p_1, p_2, p_3, \ldots , p_n$. Let's suppose you then multiply all the numbers on that list to get the product $P = p_1\times p_2\times p_3 \ldots \times p_n$, and you add one to get $Q = P + 1$.

$Q$ can then either be prime, in which case there is at least one more prime,

or $Q$ can be composite. But none of numbers in the list of primes can divide $Q$ evenly. They are all factors of $P$, but if you divide $Q$ by one of the primes on the list you would get a remainder of 1. But if $Q$ is composite then there must be a prime factor that divides $Q$. That being the case there must be at least one prime number, not in the list that divides $Q$.

This creates a contradiction, any list of primes of finite primes will reveal that there is at least one more prime than those listed, therefore, there must be an infinite number of primes.

This is all leading up to the Fundamental Theorem of Arithmetic. Every branch of mathematics has its own fundamental theorem, and they can get pretty complex, but Arithmetic's is pretty simple.

Pretty much, any integer greater than 1 can be uniquely expressed as the product of one or more primes. A product of one number is just itself. So all primes have the trivial product of themselves. And composite numbers can be expressed as primes.

For example:
$$ 10 = 2 \times 5 \\
12 = 2^2 \times 3 \\
14 = 2 \times 7$$

But don't take my word for it, here is the proof.

Proof by least counterexample:

Let's assume $a$ is the lowest composite number which cannot be written as the product of primes. $a$ cannot be a prime number, because all primes numbers have the trivial product of themselves. So there must exist at least two numbers $b$ and $c$ which are greater than $1$ but smaller than $a$, where $a = bc$.

$b$ and $c$ must then be able to be written as the product of primes, since $a$ is the smallest number which cannot be written as primes. But, since $a$ is the product of $b$ and $c$, which are themselves products of primes, $a$ must therefore able to be written as the product of primes. Since there can be no least composite number which cannot be written as the product of primes, all integers greater than 1 must be able to be written as a product of primes.

This proof isn't ground breaking, it doesn't reveal anything that isn't already intuitive. Most kids learn prime factorization in school and quickly realize this without needing the proof. And indeed, it may seem silly that such a proof is even needed. But mathematics is a science that strives to make as few assumptions as possible. It is founded on a few axioms, which cannot be proven to be true, but by reason they can be assumed. But once a proof is made based on that axiom, other proofs can be created off of previous proofs.

There is a game I like to play called little alchemy that demonstrates this principle. The game starts you off with four elements; earth, air, wind, and fire. You can combine elements two at a time to form new things. Like dust, lava, and wind all the way down to zombies, cities, and sunglasses. There are only sixteen ways to combine the beginning four elements. But you can use what you made in the combination of these elements to combine with other objects. This is similar to how simple axioms can be used to craft insane proofs, like there are infinite primes, or that there are no positive integers $a, b, c$ such that $a^4 + b^4 = c^4$.

Mathematics today uses what is called the Peano axioms for defining mathematics concerning the natural numbers. These are complex, but they can be generalized like this.

0 is a natural number
$x = x$: A number is equal to itself
if $x = y$ then $y = x$
if $x = y$ and $y = z$ then $x = z$: This is called the transitive property
if $x$ is a natural number, and $x = y$, then $y$ is also a natural number.
For every natural number $n$, the next number $S(n)$ is also a natural number. $S(n)$ is defined to be the successor function, and is equivalent to n + 1.
$S(n) = 0$ is false. There are no negative natural numbers.
If $S(n) = S(m)$ then $n = m$
Induction holds, and the set of natural numbers contains $0$, and $S(n)$

With these axioms we can prove that 1 is a natural number, the natural numbers are closed under addition, and all of number theory. It's good to know that mathematics is founded on such simple concepts.

The Math

Finding primes is hard. There is no prime number function where we can just plug in numbers and get the nth prime out of it. Usually it is done by an algorithm which takes more and more time the larger the number is. But not all algorithms are equal. In programming we need to consider the order of the algorithm, which is basically the growth in the amount of time it takes to work on a larger input. The way to describe the order is often called big O notation. If we checked all the numbers below some given value for primes to determine if they divided our number it would be $O(n)$, because we would a new value for each larger n, which isn't the best. Even if we checked to half of 600 billion, and only checked the even numbers, it would still be $O(n)$, because there is a linear growth rate, coefficients don't matter, even if that algorithm is 4 times faster.

We have another option. We only really need to check up to the square root of a number to get prime factors. If none of the numbers below, or including, the square root are factors, then that automatically makes the number prime. If a number larger than the square root was a factor, it would have to be multiplied by a number smaller than the square root.

The main technique will be the same for both programs, they will both take the working number, divide out the lowest number which divides it, and make the dividend the new working number. The difference is that one program will be O(n) and the other will be O(sqrt(n)).

The Code

Python:

The way we check in programming if a number is prime is to see if every number below the square root of that number doesn't divide into it. Basically,

def isPrime(n):
    p = 2
    while n%p != 0 and p <= math.sqrt(n):
        p += 1

    if p > math.sqrt(n):
        return True

    else:
        return False

But there are improvements that could be made on it. For one we are incrementing by one, which is slow and there is no purpose for. We could save time instead by incrementing by 2, and only checking odd numbers, because no even number past two is prime. But even if we do this we are checking numbers that we don't need to. All the multiples of 3 and 5 and 7 don't need to even be there. It would be so much more optimal if we could just check with a list of prime numbers that we have already generated, and passed along to the function. An easy way to generate this list is using a technique called the seive of Erathmus. Checking to the square root turns out to be fast enough for our purposes.

Code 1:

originalNum = 600851475143
maxPrime = math.ceil(originalNum/2)

#Check if number is divisible by two
n = 2
workingNum = originalNum
# a while loop is used in case the factor repeats
while workingNum % n == 0:
    workingNum /= n

#Check odd numbers
n = 3
while n < math.sqrt(workingNum):
    while workingNum%n == 0:
        workingNum /= n

    n += 2

print(workingNum)

The simplicity of this code is that we don't need to check if the factor is prime, only that it is a factor. And in this case we were lucky because the largest prime is reletivily small. Only a few thousand, even though it could have been a few hundred billion. We got lucky, but it could have been the case that highest prime was much higher. If that were the case we might want to skip more than the even numbers and we would do something like this.

Code 2:

import math
originalNum = 600851475143
maxPrime = math.ceil(originalNum/2)

#Check if number is divisible by two
n = 2
workingNum = originalNum

largestPrime = 0
# a while loop is used in case the factor repeats
while workingNum % n == 0:
    workingNum /= n

    largestPrime = 2

#Check odd numbers
n = 3
while n <= maxPrime:
    while workingNum%n == 0:
        workingNum /= n

        largestPrime = n

    n += 2

print(largestPrime)

You are probably interested in how these programs performed. Both of them did pretty well. Code 1 finished faster, but not by much. I clocked it at about .0009999 seconds, while code 2 got about .002. Twice as slow doesn't seem so bad, but that is because the answer to this problem is pretty low. If we changed the original number by adding 1 the differences become more pronounced. Code 1 finished in .001999 seconds, while Code 2 was much sloweer at 1.0999 seconds. And if we chanced the input by adding 2, Code 1 finished in .11800 seconds, whereas Code 2 didn't finish, I left it running for over 2 minutes.

This shows a principle of flexibility. Just because your code works for some values does not mean that it is good, or even works, for others.

C:

This is pretty much a clone of the python code, not much to add. Except I will add the data type. In C an int data type is usually between -2 billion and 2 billion. The number we are working with is larger than this, so to prevent any overflow errors we are using a larger data type called a long long. Which uses twice the space and can store numbers between -9 quintillion and 9 quintillion. or $-2^{63}, 2^{63}$. We probably won't be needing a datatype larger than this.

#include <stdio.h>
#include <math.h>

#define ORIGINAL 600851475143

main()
{
    long long working = ORIGINAL; /*Cast as a double because too large for int*/
    int p = 2; /*The first prime, and the variable we will increment*/
    int max = ceil(sqrt(ORIGINAL));
    
    /*Check for case 2*/
    while (working%p == 0){
        working /= p;
    }
    p++;
    
    /*Check over odd numbers*/
    while (p < sqrt(working)){
        while (working%p == 0){
            working /= p;
        }
        
        p += 2; 
    }
    
    printf("%d\n", working);
    
}

The Project Euler Experience

Friday, August 22, 2014

3 - Prime Factors and Foundations