Gröbner Bases

The computation of Gröbner bases of multivariate polynomial ideals by the Faugere F4 algorithm can be parallelised when the base ring is a finite field. These are the versions of the algorithm for which parallelisation is currently supported:

(i)
Standard (non-dense) F4: For ideals defined over GF(p) with p prime and 2<p<223.5 (POSIX threads for the specialist mixed sparse/dense linear algebra).
(ii)
General Dense F4: For ideals with dense input polynomials defined over GF(pk) with p<223.5 and k≥1 (POSIX threads and GPUs, mostly mapping to the parallel linear algebra (above).
(iii)
Special HFE Dense F4: For ideals with dense input polynomials defined over GF(2), in the `HFE' mode (where the hash table is not used in the symbolic reduction); the whole reduction phase is completely parallelised with POSIX threads, which is faster than the distinct calls to matrix multiplication in the general dense algorithm and significant memory saving is also achieved (POSIX threads and GPUs) [new in V2.26]).

For each of the above cases, parallelisation with POSIX threads is simply selected by calling the following procedure (before calling the functions below):

    SetNthreads(NT);
where NT is the number of threads desired. Note that (as for linear algebra) the speedup will not necessarily be close to the number of threads, and hyperthreading may not be beneficial.

For the dense versions (second and third cases above), the GPU mode is automatically selected by default in the CUDA executable when running on a computer with CUDA support; this can be disabled by first calling SetGPU(false);. A hybrid mode (POSIX threads and GPU combined) can be selected by also setting the number of threads while running the CUDA executable; this is more likely to be beneficial in the Special HFE Dense version.

Once parallelisation is selected by one of the above methods, the parallel F4 algorithm can be called by the procedure Groebner or the function GroebnerBasis and the other functions which take a multivariate polynomial ideal defined over a finite field and implicitly create a Gröbner basis first (such as Dimension and Variety).

The General Dense algorithm can be selected by passing the parameter Dense to the Groebner or GroebnerBasis functions, but this will also be automatically selected if the input polynomials are considered dense, while passing Dense := false will force the Standard (non-dense) algorithm to be used.

The Special HFE dense algorithm is selected by passing the parameters Dense and HFE to the Groebner or GroebnerBasis functions. (As before, Dense may be omitted if the input is already dense.)

Example Par_minrank (H5E5)

In this example we solve (n, k, r)-MinRank problems by first using the General Dense F4 algorithm to compute a grevlex Gröbner basis of the relevant ideal (based on the minors of a symbolic matrix with linear forms) and then using a block Wiedemann algorithm to compute the variety of the ideal (see [Ste15] for more details). After choosing a parameter r, a hard instance can be found with by taking n=r + 3 and k=(n - r)2. The runs are on a server with dual Intel 3.2GHz Gold 6146 CPUs with 24 cores total, and an NVIDIA V100 GPU.

> K := GF(PreviousPrime(2^16));
> r := 6; n := r + 3; k := (n - r)^2;
> n, k, r;
9 9 6
> time B := MinRankSystem(K, n, k, r);
Time: 160.030
> I := Ideal(B);
> I: Minimal;
Ideal of Polynomial ring of rank 9 over GF(65521)
> // Single thread
> SetNthreads(1);
> // DynamicStrategy handles large reduction to zero better:
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 2511.040
> QuotientDimension(I);
41580
> time V := Variety(I: Al := "Wiedemann");
Time: 2449.620
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
> // Check solution is correct:
> time { Evaluate(f, V[1]): f in B };
{ 0 }
Time: 1.320
> // 24 threads
> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 548.030[r]
> time V := Variety(I: Al := "Wiedemann");
Time: 539.280[r]
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
> // GPU
> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 477.280
> time V := Variety(I: Al := "Wiedemann");
Time: 520.450
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
Taking r=7 gives a much bigger system where the parallel speedup is greater, especially for the Wiedemann stage which completely depends on several large matrix multiplications.
> K := GF(PreviousPrime(2^16));
> r := 7; n := r + 3; k := (n - r)^2;
> n, k, r;
10 9 7
> time B := MinRankSystem(K, n, k, r);
Time: 305.080
> I := Ideal(B);
> I: Minimal;
Ideal of Polynomial ring of rank 9 over GF(65521)
> #B;
2025
> {* Length(f): f in B *};
{* 22283^^5, 22284^^83, 22285^^490, 22286^^1447 *}

With one thread only:

> SetNthreads(1);
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 16108.420
> QuotientDimension(I);
108900
> time V := Variety(I: Al := "Wiedemann");
Time: 35348.420
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
> // Check solution is correct:
> time { Evaluate(f, V[1]): f in B };
{ 0 }
Time: 4.220

With 24 threads:

> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 2582.500[r]
> time V := Variety(I: Al := "Wiedemann");
Time: 5317.040[r]
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]

With one thread and a GPU:

> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 2127.830
> time V := Variety(I: Al := "Wiedemann");
Time: 4316.960
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]

Example Par_randsys (H5E6)

In this example we solve some over-determined n-variable polynomial systems generated by m random dense quadratic polynomials, where the constant terms are adjusted so there is at least one solution over the original field. The hardest system in Table 1 of [MS17] has n=27, m=2.n=54.
> function RandomDenseSystem(K, n, m, d)
>     P<[x]> := PolynomialRing(K, n, "grevlex");
>     B := [
>         &+[Random(K)*s: s in MonomialsOfDegree(P, k), k in [0 .. d]]:
>             i in [1 .. m]
>     ];
>     sol := [Random(K): j in [1 .. n]];
>     return [f - Evaluate(f, sol): f in B];
> end function;
> n := 27; B := RandomDenseSystem(GF(31), n, 2*n, 2);
> // Single thread only
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 2799.290
> G;
[
    x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ...  x[27] + 21
]
> Variety(Ideal(G));
[ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12,
13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ]
> // 24 threads
> SetNthreads(24);
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 815.190[r]
> // GPU
> SetNthreads(1); SetGPU(true);
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 519.140
Taking n=32 gives a much harder system where the parallel speedup is greater.
> n := 32; B := RandomDenseSystem(GF(31), n, 2*n, 2);
> // Single thread only
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 156326.220
> Basis(I);
[
    x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ...  x[27] + 21
]
> Variety(Ideal(G));
[ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12,
13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ]
> // 24 threads
> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 21921.790[r]
> // GPU
> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 13335.090

Example Par_hfe (H5E7)

In this example we compute the Gröbner basis of a 80-variable HFE multivariate polynomial system over GF(2) with secret degree 100 to solve the underlying cryptosystem. Since the secret degree is 100, it suffices to compute pairs up to total degree 4. On the 3.6GHz Intel Core i7-7700, using 4 threads gives a speedup of about 3.4 over using 1 thread.

> B := HFESystem(2, 80, 100);
> Universe(B);
Polynomial ring of rank 80 over GF(2)
Order: Graded Reverse Lexicographical
Variables: x[1], x[2], x[3], x[4], x[5], x[6], ...
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000);
Time: 77.010
> [Sprint(f): f in G];
[ x[1] + 1, x[2] + 1, x[3], x[4], x[5], x[6], x[7], x[8], ...
x[75] + 1, x[76] + 1, x[77] + 1, x[78], x[79] + 1, x[80] + 1 ]
> Variety(Ideal(G));
[ <1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1> ]
> SetNthreads(4);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000);
Time: 26.990[r]

A much larger example has 200 variables with secret degree 100 again. On the 3.6GHz Intel Core i7-7700, using 4 threads gives a speedup of about 3.4 over using 1 thread.

> B := HFESystem(2, 200, 100);
> Universe(B);
Polynomial ring of rank 200 over GF(2)
Order: Graded Reverse Lexicographical
Variables: x[1], x[2], x[3], x[4], x[5], x[6], ...
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 8997.120
> [Sprint(f): f in G];
[ x[1] + 1, x[2], x[3], x[4] + 1, x[5] + 1, x[6], x[7], x[8] + 1, ...
x[195], x[196] + 1, x[197] + 1, x[198] + 1, x[199] + 1, x[200] ]
> Variety(Ideal(G));
[ <1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, ...
0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0> ]
> SetNthreads(4);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 2618.300[r]

The same polynomial system is here solved on a server with dual Intel 3.2GHz Gold 6146 CPUs with 24 cores total, and a NVIDIA V100 GPU. The GPU helps the echelonisation phase at the end of each step, so gives a moderate improvement.

> B := HFESystem(2, 200, 100);
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 6227.490
> SetNthreads(24);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 786.670[r]
> SetNthreads(1); SetGPU(true);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 700.450[r]
V2.28, 13 July 2023