---
title: "Computational Mathematics Overview"
author: "Joseph Wood"
date: "11/30/2023"
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: false
vignette: >
  %\VignetteIndexEntry{Computational Mathematics Overview}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

This document serves as an overview for solving problems common in [Computational Mathematics](<https://en.wikipedia.org/wiki/Computational_mathematics>). Of note, `primeSieve` and `primeCount` are based on the excellent work by [Kim Walisch](<https://github.com/kimwalisch>).

------------------------------------------------------------------------

## `primeSieve`

The primeSieve function is based on the [Segmented Sieve of Eratosthenes](<https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Segmented_sieve>). As stated in the linked article, the sieve itself is already very efficient. The problem from an efficiency standpoint, is due to the memory requirements. The segmented version overcomes this by only sieving small sections at a time, which greatly facilitates use of the cache.

``` r
library(RcppAlgos)
library(microbenchmark)
options(width = 90)

microbenchmark(primeSieve(1e6))
#> Warning in microbenchmark(primeSieve(1000000)): less accurate nanosecond times to avoid
#> potential integer overflows
#> Unit: microseconds
#>               expr     min       lq     mean   median       uq      max neval
#>  primeSieve(1e+06) 831.562 837.5275 853.7422 843.0215 853.5175 1195.191   100

## Single threaded primes under a billion!!!
system.time(primeSieve(10^9))
#>    user  system elapsed 
#>   0.961   0.042   1.003

## Using 8 threads we can get under 0.5 seconds!!!
system.time(primeSieve(10^9, nThreads = 8))
#>    user  system elapsed 
#>   1.417   0.037   0.212

## Quickly generate large primes over small interval. N.B. The
## order for the bounds does not matter.
options(scipen = 50)
system.time(myPs <- primeSieve(10^13 + 10^3, 10^13))
#>    user  system elapsed 
#>   0.005   0.000   0.006

myPs
#>  [1] 10000000000037 10000000000051 10000000000099 10000000000129 10000000000183
#>  [6] 10000000000259 10000000000267 10000000000273 10000000000279 10000000000283
#> [11] 10000000000313 10000000000343 10000000000391 10000000000411 10000000000433
#> [16] 10000000000453 10000000000591 10000000000609 10000000000643 10000000000649
#> [21] 10000000000657 10000000000687 10000000000691 10000000000717 10000000000729
#> [26] 10000000000751 10000000000759 10000000000777 10000000000853 10000000000883
#> [31] 10000000000943 10000000000957 10000000000987 10000000000993

## Object created is small
object.size(myPs)
#> 320 bytes
```

### Larger primes

Since version `2.3.0`, we are implementing the cache-friendly improvements for larger primes originally developed by [Tomás Oliveira](<https://sweet.ua.pt/tos/software/prime_sieve.html>).

``` r
## Version <= 2.2.0.. i.e. older versions
system.time(old <- RcppAlgos220::primeSieve(1e15, 1e15 + 1e9))
#>    user  system elapsed 
#>   2.441   0.052   2.497

invisible(gc())
## v2.3.0+ is faster
system.time(a <- primeSieve(1e15, 1e15 + 1e9))
#>    user  system elapsed 
#>   1.217   0.038   1.255

invisible(gc())
## And using nThreads is much much faster
system.time(b <- primeSieve(1e15, 1e15 + 1e9, nThreads = 8))
#>    user  system elapsed 
#>   2.219   0.084   0.398

identical(old, a)
#> [1] TRUE

identical(a, b)
#> [1] TRUE
```

## `primeCount`

The library by Kim Walisch relies on [OpenMP](<https://en.wikipedia.org/wiki/OpenMP>) for parallel computation with [Legendre’s Formula](<https://mathworld.wolfram.com/LegendresFormula.html>). Currently, the default compiler on `macOS` is `clang`, which does not support `OpenMP`. James Balamuta (a.k.a. TheCoatlessProfessor… well at least [we think so](<https://thecoatlessprofessor.com/about/>)) has written a great article on this topic, which you can find here: <https://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/>. One of the goals of `RcppAlgos` is to be accessible by all users. With this in mind, we set out to count primes in parallel *without* `OpenMP`.

At first glance, this seems trivial as we have a function in `Primes.cpp` called `phiWorker` that counts the primes up to `x`. If you look in [phi.cpp](<https://github.com/kimwalisch/primecount/blob/master/src/phi.cpp>) in the `primecount` library by Kim Walisch, we see that `OpenMP` does its magic on a for loop that makes repeated calls to `phi` (which is what `phiWorker` is based on). All we need to do is break this loop into *n* intervals where *n* is the number of threads. Simple, right?

We can certainly do this, but what you will find is that *n - 1* threads will complete very quickly and the *n<sup>th</sup>* thread will be left with a heavy computation. In order to alleviate this unbalanced load, we divide the loop mentioned above into smaller intervals. The idea is to completely calculate `phi` up to a limit *m* using all *n* threads and then gradually increase *m*. The advantage here is that we are benefiting greatly from the caching done by the work of the previous *n* threads.

With this is mind, here are some results:

``` r
## Enumerate the number of primes below trillion
system.time(underOneTrillion <- primeCount(10^12))
#>    user  system elapsed 
#>   0.123   0.001   0.125

underOneTrillion
#> [1] 37607912018

## Enumerate the number of primes below ten billion in 2 milliseconds
microbenchmark(primeCount(10^10))
#> Unit: milliseconds
#>               expr      min       lq     mean   median       uq      max neval
#>  primeCount(10^10) 1.717367 1.721693 1.751844 1.724911 1.733521 2.628879   100

system.time(underOneHundredTrillion <- primeCount(1e14, nThreads = 8))
#>    user  system elapsed 
#>   6.844   0.487   1.229

underOneHundredTrillion
#> [1] 3204941750802

## Still not as fast as Kim Walisch's primecount library:
cat(paste(
    system("primecount 1e14 --legendre --time", intern = TRUE),
    collapse = "\n"
))
#> 3204941750802
#> Seconds: 0.397
```

## Other Sieving Functions

`RcppAlgos` comes equipped with several functions for quickly generating essential components for problems common in computational mathematics. All functions below can be executed in parallel by using the argument `nThreads`.

The following sieving functions (`primeFactorizeSieve`, `divisorsSieve`, `numDivisorSieve`, & `eulerPhiSieve`) are very useful and flexible. Generate components up to a number or between two bounds.

``` r
## get the number of divisors for every number from 1 to n
numDivisorSieve(20)
#>  [1] 1 2 2 3 2 4 2 4 3 4 2 6 2 4 4 5 2 6 2 6

## If you want the complete factorization from 1 to n, use divisorsList
system.time(allFacs <- divisorsSieve(10^5, namedList = TRUE))
#>    user  system elapsed 
#>   0.023   0.006   0.030

allFacs[c(4339, 15613, 22080)]
#> $`4339`
#> [1]    1 4339
#> 
#> $`15613`
#> [1]     1    13  1201 15613
#> 
#> $`22080`
#>  [1]     1     2     3     4     5     6     8    10    12    15    16    20    23    24
#> [15]    30    32    40    46    48    60    64    69    80    92    96   115   120   138
#> [29]   160   184   192   230   240   276   320   345   368   460   480   552   690   736
#> [43]   920   960  1104  1380  1472  1840  2208  2760  3680  4416  5520  7360 11040 22080


## Between two bounds
primeFactorizeSieve(10^12, 10^12 + 5)
#> [[1]]
#>  [1] 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5
#> 
#> [[2]]
#> [1]       73      137 99990001
#> 
#> [[3]]
#> [1]            2            3 166666666667
#> 
#> [[4]]
#> [1]      61   14221 1152763
#> 
#> [[5]]
#> [1]      2      2     17    149    197 501001
#> 
#> [[6]]
#> [1]           3           5 66666666667


## Creating a named object
eulerPhiSieve(20, namedVector = TRUE)
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
#>  1  1  2  2  4  2  6  4  6  4 10  4 12  6  8  8 16  6 18  8


system.time(a <- eulerPhiSieve(1e12, 1e12 + 1e7))
#>    user  system elapsed 
#>   0.316   0.012   0.327

## Using nThreads for greater efficiency
system.time(b <- eulerPhiSieve(1e12, 1e12 + 1e7, nThreads = 8))
#>    user  system elapsed 
#>   0.993   0.032   0.153

identical(a, b)
#> [1] TRUE
```

## Vectorized Functions

There are three very fast vectorized functions for general factoring (e.g. all divisors of number), primality testing, as well as prime factoring (`divisorsRcpp`, `isPrimeRcpp`, `primeFactorize`).

``` r
## get result for individual numbers
primeFactorize(123456789)
#> [1]    3    3 3607 3803


## or for an entire vector
## N.B. The R Version you are using... random sampling
## has changed throughout the years
R.version[["version.string"]]
#> [1] "R version 4.3.1 (2023-06-16)"
set.seed(100)

myVec <- sample(-100000000:100000000, 5)
divisorsRcpp(myVec, namedList = TRUE)
#> $`33331928`
#>  [1]        1        2        4        7        8       14       19       28       38
#> [10]       56       76      133      152      266      532     1064    31327    62654
#> [19]   125308   219289   250616   438578   595213   877156  1190426  1754312  2380852
#> [28]  4166491  4761704  8332982 16665964 33331928
#> 
#> $`99961494`
#>  [1]        1        2        3        6       37       74      111      222   450277
#> [10]   900554  1350831  2701662 16660249 33320498 49980747 99961494
#> 
#> $`30377219`
#> [1]        1       19  1598801 30377219
#> 
#> $`-46085563`
#> [1] -46085563   -201247      -229        -1         1       229    201247  46085563
#> 
#> $`-26510714`
#>  [1] -26510714 -13255357    -73846    -36923      -718      -359        -2        -1
#>  [9]         1         2       359       718     36923     73846  13255357  26510714


## Creating a named object
isPrimeRcpp(995:1000, namedVector = TRUE)
#>   995   996   997   998   999  1000 
#> FALSE FALSE  TRUE FALSE FALSE FALSE

system.time(a <- primeFactorize(1e12:(1e12 + 1e5)))
#>    user  system elapsed 
#>   0.656   0.009   0.665

## Using nThreads for greater efficiency
system.time(b <- primeFactorize(1e12:(1e12 + 1e5), nThreads = 8))
#>    user  system elapsed 
#>   0.937   0.007   0.146

identical(a, b)
#> [1] TRUE
```