The computation of Gröbner bases of multivariate polynomial ideals by the Faugere F4 algorithm can be parallelised when the base ring is a finite field. These are the versions of the algorithm for which parallelisation is currently supported:
SetNthreads(NT);where NT is the number of threads desired. Note that (as for linear algebra) the speedup will not necessarily be close to the number of threads, and hyperthreading may not be beneficial.
For the dense versions (second and third cases above), the GPU mode is automatically selected by default in the CUDA executable when running on a computer with CUDA support; this can be disabled by first calling SetGPU(false);. A hybrid mode (POSIX threads and GPU combined) can be selected by also setting the number of threads while running the CUDA executable; this is more likely to be beneficial in the Special HFE Dense version.
Once parallelisation is selected by one of the above methods, the parallel F4 algorithm can be called by the procedure Groebner or the function GroebnerBasis and the other functions which take a multivariate polynomial ideal defined over a finite field and implicitly create a Gröbner basis first (such as Dimension and Variety).
The General Dense algorithm can be selected by passing the parameter Dense to the Groebner or GroebnerBasis functions, but this will also be automatically selected if the input polynomials are considered dense, while passing Dense := false will force the Standard (non-dense) algorithm to be used.
The Special HFE dense algorithm is selected by passing the parameters Dense and HFE to the Groebner or GroebnerBasis functions. (As before, Dense may be omitted if the input is already dense.)
In this example we solve (n, k, r)-MinRank problems by first using the General Dense F4 algorithm to compute a grevlex Gröbner basis of the relevant ideal (based on the minors of a symbolic matrix with linear forms) and then using a block Wiedemann algorithm to compute the variety of the ideal (see [Ste15] for more details). After choosing a parameter r, a hard instance can be found with by taking n=r + 3 and k=(n - r)2. The runs are on a server with dual Intel 3.2GHz Gold 6146 CPUs with 24 cores total, and an NVIDIA V100 GPU.
> K := GF(PreviousPrime(2^16)); > r := 6; n := r + 3; k := (n - r)^2; > n, k, r; 9 9 6 > time B := MinRankSystem(K, n, k, r); Time: 160.030 > I := Ideal(B); > I: Minimal; Ideal of Polynomial ring of rank 9 over GF(65521) > // Single thread > SetNthreads(1); > // DynamicStrategy handles large reduction to zero better: > time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy); Time: 2511.040 > QuotientDimension(I); 41580 > time V := Variety(I: Al := "Wiedemann"); Time: 2449.620 > V; [ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ] > // Check solution is correct: > time { Evaluate(f, V[1]): f in B }; { 0 } Time: 1.320 > // 24 threads > SetNthreads(24); > I := Ideal(B); // Recreate ideal > time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy); Time: 548.030[r] > time V := Variety(I: Al := "Wiedemann"); Time: 539.280[r] > V; [ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ] > // GPU > SetNthreads(1); SetGPU(true); > I := Ideal(B); // Recreate ideal > time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy); Time: 477.280 > time V := Variety(I: Al := "Wiedemann"); Time: 520.450 > V; [ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]Taking r=7 gives a much bigger system where the parallel speedup is greater, especially for the Wiedemann stage which completely depends on several large matrix multiplications.
> K := GF(PreviousPrime(2^16)); > r := 7; n := r + 3; k := (n - r)^2; > n, k, r; 10 9 7 > time B := MinRankSystem(K, n, k, r); Time: 305.080 > I := Ideal(B); > I: Minimal; Ideal of Polynomial ring of rank 9 over GF(65521) > #B; 2025 > {* Length(f): f in B *}; {* 22283^^5, 22284^^83, 22285^^490, 22286^^1447 *}
With one thread only:
> SetNthreads(1); > time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy); Time: 16108.420 > QuotientDimension(I); 108900 > time V := Variety(I: Al := "Wiedemann"); Time: 35348.420 > V; [ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ] > // Check solution is correct: > time { Evaluate(f, V[1]): f in B }; { 0 } Time: 4.220
With 24 threads:
> SetNthreads(24); > I := Ideal(B); // Recreate ideal > time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy); Time: 2582.500[r] > time V := Variety(I: Al := "Wiedemann"); Time: 5317.040[r] > V; [ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
With one thread and a GPU:
> SetNthreads(1); SetGPU(true); > I := Ideal(B); // Recreate ideal > time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy); Time: 2127.830 > time V := Variety(I: Al := "Wiedemann"); Time: 4316.960 > V; [ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
> function RandomDenseSystem(K, n, m, d) > P<[x]> := PolynomialRing(K, n, "grevlex"); > B := [ > &+[Random(K)*s: s in MonomialsOfDegree(P, k), k in [0 .. d]]: > i in [1 .. m] > ]; > sol := [Random(K): j in [1 .. n]]; > return [f - Evaluate(f, sol): f in B]; > end function; > n := 27; B := RandomDenseSystem(GF(31), n, 2*n, 2); > // Single thread only > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000); Time: 2799.290 > G; [ x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ... x[27] + 21 ] > Variety(Ideal(G)); [ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12, 13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ] > // 24 threads > SetNthreads(24); > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000); Time: 815.190[r] > // GPU > SetNthreads(1); SetGPU(true); > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000); Time: 519.140Taking n=32 gives a much harder system where the parallel speedup is greater.
> n := 32; B := RandomDenseSystem(GF(31), n, 2*n, 2); > // Single thread only > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000); Time: 156326.220 > Basis(I); [ x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ... x[27] + 21 ] > Variety(Ideal(G)); [ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12, 13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ] > // 24 threads > SetNthreads(24); > I := Ideal(B); // Recreate ideal > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000); Time: 21921.790[r] > // GPU > SetNthreads(1); SetGPU(true); > I := Ideal(B); // Recreate ideal > time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000); Time: 13335.090
In this example we compute the Gröbner basis of a 80-variable HFE multivariate polynomial system over GF(2) with secret degree 100 to solve the underlying cryptosystem. Since the secret degree is 100, it suffices to compute pairs up to total degree 4. On the 3.6GHz Intel Core i7-7700, using 4 threads gives a speedup of about 3.4 over using 1 thread.
> B := HFESystem(2, 80, 100); > Universe(B); Polynomial ring of rank 80 over GF(2) Order: Graded Reverse Lexicographical Variables: x[1], x[2], x[3], x[4], x[5], x[6], ... > SetNthreads(1); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000); Time: 77.010 > [Sprint(f): f in G]; [ x[1] + 1, x[2] + 1, x[3], x[4], x[5], x[6], x[7], x[8], ... x[75] + 1, x[76] + 1, x[77] + 1, x[78], x[79] + 1, x[80] + 1 ] > Variety(Ideal(G)); [ <1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1> ] > SetNthreads(4); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000); Time: 26.990[r]
A much larger example has 200 variables with secret degree 100 again. On the 3.6GHz Intel Core i7-7700, using 4 threads gives a speedup of about 3.4 over using 1 thread.
> B := HFESystem(2, 200, 100); > Universe(B); Polynomial ring of rank 200 over GF(2) Order: Graded Reverse Lexicographical Variables: x[1], x[2], x[3], x[4], x[5], x[6], ... > SetNthreads(1); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500); Time: 8997.120 > [Sprint(f): f in G]; [ x[1] + 1, x[2], x[3], x[4] + 1, x[5] + 1, x[6], x[7], x[8] + 1, ... x[195], x[196] + 1, x[197] + 1, x[198] + 1, x[199] + 1, x[200] ] > Variety(Ideal(G)); [ <1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, ... 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0> ] > SetNthreads(4); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500); Time: 2618.300[r]
The same polynomial system is here solved on a server with dual Intel 3.2GHz Gold 6146 CPUs with 24 cores total, and a NVIDIA V100 GPU. The GPU helps the echelonisation phase at the end of each step, so gives a moderate improvement.
> B := HFESystem(2, 200, 100); > SetNthreads(1); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500); Time: 6227.490 > SetNthreads(24); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500); Time: 786.670[r] > SetNthreads(1); SetGPU(true); > time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500); Time: 700.450[r]