Advanced Factorization Techniques: The Number Field Sieve

Magma provides an experimental implementation of the fastest general purpose factoring algorithm known: the Number Field Sieve (NFS). The implementation may be used both as a General Number Field Sieve and a Special Number Field Sieve -- the only difference is in the selection of a suitable polynomial.

Contents

The Magma Number Field Sieve Implementation

In order to make use of the Magma NFS, the user should have some knowledge of the algorithm. The Magma NFS implementation also requires a significant amount of memory and disk space to be available for the duration of the factorization. For example, factorization of an 80-digit number may require at least 64 megabytes of RAM and half a gigabyte of disk space.

Magma's NFS implementation uses one linear polynomial (the "rational side") and one polynomial of higher degree (the "algebraic side"). At the time of writing this is not a major restriction, since the best methods for selecting polynomials for factorization of numbers of more than 100 digits involve one linear and one non-linear polynomial. Magma provides a number of functions to assist in choosing a good algebraic-side polynomial for the factorization of a particular number, following the ideas of Montgomery and Murphy in [Mur99].

Magma provides two methods for using the NFS implementation. The first is the one-step function NFS, which provides a naive NFS factorization attempt using default algorithm parameters.

The second, more powerful method is to work with an NFS process object, splitting the algorithm into four stages: Sieving, Auxiliary data, Linear algebra and Final factorization. This approach allows greater control over the algorithm, as the user may supply their own algorithm parameter values. It also allows the user to distribute the computationally intensive sieving and final factorization stages over several machines or processors.

Some functions are included to allow Magma users to co-operate in factorization attempts using CWI tools.

A verbose flag may be set to obtain informative printing on progress in the various stages of the NFS algorithm.

SetVerbose("NFS", v) : MonStgElt, RngIntElt ->
Set the verbose printing level for the NFS algorithms to the integer v. Currently the legal values for v are 0, 1, 2 and 3.

If the level is 0, no verbose output is produced.

If the level is 1, NFS will produce basic information about its progress, and will also print information on NFS algorithm parameters.

If the level is 2, NFS will provide more detailed information about progress and parameters.

If the level is 3, NFS will print out extremely detailed information about progress and data. This level will only be useful for experts and developers.

Naive NFS

Magma's Number Field Sieve implementation provides a one-step black-box function NFS. Here, the user provides the integer n to be factored, a homogeneous bivariate integer polynomial F and integers m1 and m2 such that F(m1, m2) ≡ 0 mod n. Magma will attempt to factor n using F, m1 and m2, automatically selecting the other parameters (see below) for the algorithm.

The automatically chosen parameters are NOT optimal in general, and therefore no conclusions should be drawn about the speed of the implementation or the algorithm itself based on the use of this function.

For example, note that the default algebraic factor base size of NFS is chosen to be rather large to decrease the likelihood of running out of useful relations. This slows the algorithm considerably, since it increases the size of the matrix to be reduced -- but it also means that the algorithm should succeed in finding a factor unless one chooses a really bad polynomial.

NumberFieldSieve(n, F, m1, m2) : RngIntElt, RngMPolElt, RngIntElt, RngIntElt -> RngIntElt
NFS(n, F, m1, m2) : RngIntElt, RngMPolElt, RngIntElt, RngIntElt -> RngIntElt
Performs the factorization of an integer n using the Number Field Sieve with algebraic polynomial F, where the integers m1 and m2 satisfy F(m1, m2) ≡ 0 mod n. Returns a nontrivial factor of n if one is found, or 0 otherwise.

Factoring with NFS Processes

An NFS Process (an object of category NFSProc) encapsulates the data of a Magma NFS factorization. It contains the number n to be factored, the algebraic polynomial F and the integers m1 and m2. It also provides access to a number of NFS algorithm parameters (such as approximate factor base sizes). These parameters are attributes of the NFS process. If any of the parameters are not set, sensible (but not necessarily optimal) defaults will be provided by Magma.

The NFS algorithm is divided into four stages:

1.
Sieving
2.
Auxiliary data gathering
3.
Linear algebra
4.
Factorization

The stages are described in detail below.

After creating an NFS process for the factorization attempt, the user should proceed through each of the four stages in the above order.

NFSProcess(n, F, m1, m2) : RngIntElt, RngMPolElt, RngIntElt, RngIntElt -> NFSProc
Given a (composite) integer n, a bivariate homogeneous integer polynomial F, and nonzero integers m1 and m2 such that F(m1, m2) ≡ 0 mod n, this function creates an NFS process object for an NFS factorization of n.

Example RngInt_nfsprocessparameters (H19E10)

The attributes associated with an NFS process are:
> ListAttributes(NFSProc);
AlgebraicError              OutputFilename
AlgebraicFBBound            RationalError
AlgebraicLargePrimeBound    RationalFBBound
CacheSize                   RationalLargePrimeBound
F                           m1
Firstb                      m2
Lastb                       n
Maximuma

OutputFilename is the base name for NFS-generated data files. These files (and their actual names) are discussed below.

AlgebraicFBBound is the upper bound for smooth primes in the algebraic-side factor base, and RationalFBBound, the upper bound for smooth primes in rational-side factor base.

Maximuma bounds the sieve interval for a: NFS will sieve for relations with |a| ≤ Maximuma.

Firstb is the first value of b to sieve on, and Lastb is the last.

AlgebraicLargePrimeBound gives the upper bound for "large" (non-smooth) primes in the algebraic-side factor base. Similarly, RationalLargePrimeBound is the upper bound for the rational side.

AlgebraicError defines an "error" tolerance for logarithm arithmetic on algebraic side. Similarly, RationalError defines an "error" tolerance for the rational side.

CacheSize is a flag reflecting the computer cache memory size, for optimisation.

Attribute Selection

As a guideline for the selection of attributes, we include here a few examples of attributes that we have determined to be good for the Magma NFS implementation.

The best choice for the factor base size depends on many variables, including the average log size and the Murphy α parameter (defined in [Mur99]) for the polynomial F. Our polynomials above are quite good: if the user does not know much about determining the quality of polynomials, then he or she should use much larger factor bases.

Example RngInt_70digitnfs (H19E11)

Sample attributes for a 70-digit number:
> n := 5235869680233366295366904510725458053043111241035678897933802235060927;
> R<X,Y> := PolynomialRing(Integers( ), 2);
> F := 2379600*X^4 - 12052850016*X^3*Y - 13804671642407*X^2*Y^2 +
>      11449640164912254*X*Y^3 + 7965530070546332840*Y^4 ;
> m1 := 6848906180202117;
> m2 := 1;
> P := NFSProcess(n,F,m1,m2);
> P`AlgebraicFBBound := 8*10^5;
> P`RationalFBBound := 6*10^5;
> P`OutputFilename := "/tmp/nfs_70_digit";
> P`Maximuma := 4194280;
> P`AlgebraicError := 16;
> P`RationalError := 14;

Example RngInt_80digitnfs (H19E12)

Sample attributes for an 80-digit number:
> n := 1871831866357686493451122722951040222063279350383738650253906933489072\
> 2483083589;
> P<X,Y> := PolynomialRing(Integers(),2);
> F := 15901200*X^4 + 9933631795*X^3*Y - 112425819157429*X^2*Y^2 -
>     231659214929438137*X*Y^3 - 73799500175565303965*Y^4;
> m1 := 1041619817688573426;
> m2 := 1;
> P := NFSProcess(n, F, m1, m2);
> P`AlgebraicFBBound := 8*10^5;
> P`RationalFBBound := 6*10^5;
> P`OutputFilename := "/tmp/nfs_80_dgit";
> P`Maximuma := 10485760;
> P`AlgebraicError := 16;
> P`RationalError := 14;

Example RngInt_87digitnfs (H19E13)

Sample attributes for an 87-digit number:
> n := 12118618732463427472219179104631767765107839384219612469780841876821498\
> 2402918637227743;
> P<X,Y> := PolynomialRing(Integers(),2);
> F := 190512000*X^4 - 450872401242*X^3*Y +
>     1869594915648551*X^2*Y^2 + 2568544235742498*X*Y^3 -
>     9322965583419801010104*Y^4;
> m1 := 28241170741195273211;
> m2 := 1;
> P := NFSProcess(n, F, m1, m2);
> P`AlgebraicFBBound := 16*10^5;
> P`RationalFBBound := 10^6;
> P`OutputFilename := "/tmp/nfs_87_digit";
> P`Maximuma := 2^24;
> P`AlgebraicError := 24;
> P`RationalError := 18;
The Sieving Stage

Magma's NFS uses a "line-by-line" (or "classical") sieving algorithm. Future versions may include lattice sieving.

The line-by-line siever sieves values of F(a, b) on the algebraic side and corresponding values a .m2 - b .m1 on the rational side. This is done by fixing a value of b (beginning with the parameter Firstb, if supplied), then sieving all values of a between -a0 and a0, where a0 is approximately equal to the parameter Maximuma (some rounding off is done to make sure that the sieve interval length is divisible by a high power of 2). When this is completed b is incremented, and the next value of b is processed.

The sieving continues until either the maximum value of b (specified by the parameter Lastb) has been reached, or until enough relations are obtained to complete the factorization. If Lastb is not defined, the sieve simply continues until enough relations are found. The number of relations required may be determined by the function NumberOfRelationsRequired.

"Cycles" among partial relations are counted after every 256 iterations.

The sieve implementation uses (rounded natural) logarithms of primes to mark the sieve interval. Moreover, the implementation does not sieve with prime powers. Therefore, we must allow for some error in scanning the sieve arrays for useful relations; the acceptable sieve threshold errors for each side are defined by the AlgebraicError and RationalError parameters. If, in addition, the user wants to take advantage of large prime relations (recommended), then larger error terms should be used. The implementation will keep relations having up to 2 large primes on each side, but will only find such relations if the user selects large enough sieve threshold error bounds. The user should be cautious when sieving for (and subsequently using) relations with large primes, as they greatly increase overall disk space requirements. Some experimentation may be required in order to determine the best error bounds for speed or disk space optimization purposes.

The CacheSize parameter may be used to take advantage of the cache memory size of the computer: a value of 1 indicates a small cache size, 2 a medium cache size, and 3 a for large cache size.

NumberOfRelationsRequired(P) : NFSProc -> RngIntElt
The minimum number of relations required for an NFS factor attempt with NFS process P.
FindRelations(P) : NFSProc -> RngIntElt
Given an NFS process P for factoring an integer n, generates relations to factor n with the Number Field Sieve algorithm. Returns the number of full relations plus the number of cycles found.
The Auxiliary Data Stage

In this stage of the algorithm, "cycles" [LD95] are detected in the partial relations from the sieving stage, and quadratic characters are calculated for the relations. This greatly improves the efficiency of the NFS.

In a typical factorization, the user should call the procedures CreateCycleFile and CreateCharacterFile in succession.

CreateCycleFile(P) : NFSProc -> .
Creates a file with all the cycle information that the NFS algorithm requires to complete the matrix reduction and final factorization stages for the NFS process P.
CycleCount(P) : NFSProc -> RngIntElt
Returns the number of cycles in the partial relations of the NFS process P. This function is mainly intended for factoring with multiple processors.
CycleCount(fn) : MonStgElt -> RngIntElt
Returns the number of cycles in the partial data file corresponding to the base file name fn. This function is mainly intended for factoring with multiple processors.
CreateCharacterFile(P) : NFSProc -> .
Creates a file with the quadratic character data for the full relations and cycles in the NFS process P.
CreateCharacterFile(P, cc) : NFSProc, RngIntElt -> .
Creates a file with the quadratic character data for the full relations and cycles in the NFS process P. There are cc sets of 32 quadratic character columns created.
Finding Dependencies: the Linear Algebra Stage

In this stage, the relations are collected together to form a matrix, and then block Lanczos reduction is applied to find linear dependencies among the relations. These dependencies become candidates for factorization.

FindDependencies(P) : NFSProc -> .
Finds dependencies between relations in the NFS process P.
The Factorization Stage

In this stage, number field square roots are extracted and we attempt to factor the dependencies found in the linear algebra stage.

Factor(P) : NFSProc -> RngIntElt
Try to factor with each dependency in the NFS process P until a proper factor is found. Returns the factor, or 0 if no factor is found.
Factor(P,k) : NFSProc, RngIntElt -> RngIntElt
Attempt to factor with the k-th dependency in the NFS process P. Returns a proper factor if found, 0 otherwise.

Data Files

Many data files are ued for an NFS factorization. The user can control the names and location of the files by specifying the OutputFilename parameter; then all output files will have names beginning with the OutputFilename string, with a range of suffixes depending on their purpose.

In general, all files are appended to rather than overwritten; so to avoid inconsistencies (and to save disk space) the user should call RemoveFiles after a successful factorization.

When distributing factorizations, or collecting results from sieving stages that have been broken up into several runs for some reason (for example, if a process has been interrupted), Magma provides the function MergeFiles. This takes a sequence of base filenames (which are treated as if they were the value for OutputFilename), and reads in the corresponding relation and partial relation files; it then combines the contents of these files, removing duplicates and corrupted lines of data, and places the results into new relation and partial relation files.

RemoveFiles(P) : NFSProc -> .
Deletes any data files created by the NFS process P.
MergeFiles(S, fn) : [MonStgElt], MonStgElt -> RngIntElt, RngIntElt
Merges the NFS relation files named in the sequence S (and their associated partial relation files) into a pair of new relation and partial relation files, while removing duplicate and corrupted lines of data; returns the number of relations and the number of partial relations in the new output files. The combined full relations are stored in a file named fn, and the partial relations in a file named fn_partials.
Magma Native NFS Data Files

Here we describe the files used in a typical Magma NFS factorization. These files all use formats peculiar to Magma's NFS.

The first kind of file created by NFS stores the relations generated in the sieving stage by the FindRelations procedure. The name of the file is precisely the OutputFilename string.

NFS also stores partial relations generated in the sieving stage; these are stored in a file named OutputFilename_partials.

Whenever cycles [LD95] are counted (for example, in CycleCount, a file named OutputFilename_cycles is created to store them in. Some other files are also created and then deleted during the cycle counting process.

The quadratic characters calculated in CreateCharacterFile are stored in a file named OutputFilename_cc.

The linear algebra stage creates a file named OutputFilename_null_space, which lists relations making up null space vectors for the NFS matrix.

Distributing NFS Factorizations

Magma provides a number of tools for distributing the sieving and final factoring stages over a number of computers.

To distribute the sieving stage, each processor should get a unique range of b-values to sieve and unique data file names. During the sieving, the user must manually check when the combined data has enough relations to factor the number. To do this, the data files must first be merged using MergeFiles, and then the cycles can be counted with CycleCount. If the combined number of full relations plus the number of cycles exceeds the size of both factor bases combined, then the user can proceed to the other stages of the factorization attempt using the merged data file name.

To distribute the factorization stage, the user may choose a dependency for each process to factor, then call Factor(P,k) where P is the NFS process and k the number specifying the dependency to factor, with a different value of k for each process.

Example RngInt_distributed (H19E14)

Here we demonstrate a distributed NFS factorization (of a very small n) over two processes, A and B -- which may be on different machines, or different magma processes on the same machine, or even in the same magma process.

We begin with process A:

> R<X,Y> := PolynomialRing(Integers( ),2);
> n := 70478782497479747987234958341;
> F := 814*X^4 + 3172*X^3*Y - 49218*X^2*Y^2 - 142775*X*Y^3
> - 65862*Y^4;
> m1 := 3050411;
> m2 := 1;
> A := NFSProcess(n,F,m1,m2);
> A`Firstb := 0;
> A`Lastb := 99;
> A`OutputFilename := "/tmp/nfs-distrib-A";
> FindRelations(A);
3852

Now, process B, with n, F, m1 and m2 as above:

> B := NFSProcess(n,F,m1,m2);
> B`Firstb := 99;
> B`Lastb := 199;
> B`OutputFilename := "/tmp/nfs-distrib-B";
> FindRelations(B);
2455

Then later, on a single machine,

> input_files := ["/tmp/nfs-distrib-A","/tmp/nfs-distrib-B"];
> P := NFSProcess(n,F,m1,m2);
> P`OutputFilename := "/tmp/nfs-distrib-all";
> MergeFiles(input_files, P`OutputFilename);
4162 25925
> CycleCount(P);
4368
> CreateCycleFile(P);
> CreateCharacterFile(P);
> FindDependencies(P);

Now, the final factorization stage may be distributed over more than one processor also. We attempt to factor a relation on A:

> A`OutputFilename := "/tmp/nfs-distrib-all";
> Factor(A,9);  // factor dependency 9
0

No factor was found on machine A, but meantime on B:

> B`OutputFilename := "/tmp/nfs-distrib-all";
> Factor(P,1);  // factor dependency 1
94899629
> n mod $1, n div $1;
0 742666575624650207929

We have a successful factorisation.

Magma and CWI NFS Interoperability

At the time of writing, the record NFS factorizations were lead by the CWI group and by people using CWI's or Arjen Lenstra's code. The CWI tools use a different data file format to Magma's native format, but Magma supplies some tools to allow users to assist in CWI factorization attempts.

The user may generate relations in CWI relation format, rather than Magma native format, by using FindRelationsInCWIFormat. The user should note that relations in CWI format cannot at present be used in the Auxiliary data, Linear algebra or Factorization stages of the Magma NFS.

Alternatively, assuming some Magma NFS relations have already been computed for a process, then the user may use the procedure ConvertToCWIFormat to convert the relation data files from Magma native format to CWI format. The resulting data file is named OutputFilename_CWI_format, and will contain both the full and partial relations of the process.

FindRelationsInCWIFormat(P) : NFSProc -> RngIntElt
Given an NFS process P for factoring an integer n, generates relations to factor n with the Number Field Sieve algorithm, in a file format suitable for use with CWI's NFS tools. Returns the number of relations found.
ConvertToCWIFormat(P, pb) : NFSProc, RngIntElt -> .;
Converts the relation files of the NFS process P to CWI format, storing primes only greater than or equal to the prime printing bound pb. The resulting data file name will be named P`OutputFilename_CWI_format.

Tools for Finding a Suitable Polynomial

Magma does not provide a function to select an optimal polynomial for the factorization of a given number. However, Magma does provide some functions that are useful for the implementation of the polynomial selection algorithms developed by Peter Montgomery and Brian Murphy in [Mur99].

The functions BaseMPolynomial, MurphyAlphaApproximation, OptimalSkewness, BestTranslation, PolynomialSieve, and DickmanRho, will be useful for those wanting to implement polynomial selection routines within the Magma interpreter language.

BaseMPolynomial(n, m, d) : RngIntElt, RngIntElt, RngIntElt -> RngMPolElt
Given integers n, m and d, returns a homogeneous bivariate polynomial F = ∑i=0d ciXiYd - i such that the coefficients ci give a base m representation of n: that is, ∑i=0d cimi = n. The coefficients also satisfy |ci| ≤m/2.

This polynomial F may be used to factorize n using the number field sieve (with m1 := m and m2 := 1).

This function requires that d ≥2 and n ≥md.

MurphyAlphaApproximation(F, b) : RngMPolElt, RngIntElt -> FldReElt
Given a univariate or homogeneous bivariate polynomial F, return an approximation of the α value of F, using primes less than the positive integer bound b.

The α value of a polynomial is defined in [Mur99].

Since random sampling is used for primes dividing the discriminant, successive calls to this function will give slightly different results.

OptimalSkewness(F) : RngMPolElt -> FldReElt, FldReElt
Given a univariate or homogeneous bivariate polynomial F, return its optimal skewness and corresponding average log size.

The optimal skewness and average log size values are defined in [Mur99].

Example RngInt_GetPoly (H19E15)

This example illustrates an effective (though not optimal) method for finding a "good" polynomial for use in NFS factorizations.

Here we search for a degree d=4 polynomial to use in factoring a 52-digit integer n.

We define the rating of a polynomial to be the sum of the α value and corresponding "average log size" (see [Mur99]).

We then proceed by iterating over base m polynomials with successive leading coefficients (with the values of m near ((m)1/d(m)^(1/(d + 1)))1/2, and chosen to minimize the second-to-leading coefficient), and choosing as a result the polynomial with the smallest rating.

> n := RandomPrime(90)*RandomPrime(90);
> n;
3596354707256253204076739374167770148715218949803889
> d := 4;
> approx_m := Iroot( Iroot( n, d+1 ) * Iroot( n, d ) , 2 );
> leading_coeff := n div approx_m^d;
> leading_coeff;
143082
> m := Iroot( n div leading_coeff, d );
> P<X,Y> := PolynomialRing( Integers(), 2 );
> F<X,Y> := BaseMPolynomial(n,m,d);
> F;
143082*X^4 + 463535*X^3*Y - 173869838910*X^2*Y^2 + 167201617413*X*Y^3 +
    159859288415*Y^4
> skew, als := OptimalSkewness( F );
> alpha := MurphyAlphaApproximation( F, 2000 );
> rating := als + alpha;
> rating;
23.143714548914575193314917
>
> best_rating := rating;
> best_m := m;
> for i in [1..100] do
>   leading_coeff := leading_coeff + 1;
>   m := Iroot( n div leading_coeff, d );
>   F<X,Y> := BaseMPolynomial(n,m,d);
>   skew, als := OptimalSkewness( F );
>   alpha := MurphyAlphaApproximation( F, 2000 );
>   rating := als + alpha;
>   if rating lt best_rating then
>     best_rating := rating;
>     best_m := m;
>   end if;
> end for;
> best_rating;
20.899568473033257031950385
> best_m;
398116527578
> F<X,Y> := BaseMPolynomial(n,best_m,d);
> F;
143160*X^4 + 199085*X^3*Y - 9094377652*X^2*Y^2 - 93898749030*X*Y^3 -
    169859083883*Y^4
> OptimalSkewness( F );
165.514255523681640625 20.969934467920612180646408
> MurphyAlphaApproximation( F, 2000 );
-0.0542716157630141449500150842
> time NFS( n, F, best_m, 1 );
...
BestTranslation(F, m, a) : RngMPolElt, RngIntElt, FldReElt, FldReElt -> RngMPolElt, RngIntElt, FldReElt, FldReElt
Given a univariate or homogeneous bivariate polynomial F, an integer m, and real value a (which should be the average log size of F for some optimal skewness), returns a polynomial G and an integer m' such that G(m') = F(m), together with the average log size and optimal skewness of G. The translation G is selected such that the average log size is a local minimum.
PolynomialSieve(F, m, J0, J1,MaxAlpha) : RngMPolElt, RngIntElt, RngIntElt, RngIntElt, FldReElt -> List
    PrimeBound: RngIntElt               Default: 1000
Given a homogeneous bivariate integer polynomial F of degree d, together with integers m, J0 and J1 and a real value MaxAlpha, returns a list of tuples, each of which contains a polynomial G = F + j1x2yd - 2 - (|j0| + j1m)xyd - 1 + (j0m)yd, where |j0| ≤J0 and |j1| ≤J1 such that the α value (see [Mur99]) of G is "better" (that is, lower) than MaxAlpha.

Each tuple contains the data <average log size + α, skewness, α, G, m, j0, j1 >.

If the optional parameter PrimeBound is set, it is used as an upper bound for primes used to calculate α.

V2.28, 13 July 2023