Allan Steel (December 2013)
UPDATE: Minrank Challenge C Solved Directly (August 2014)
Magma V2.20 (released December 2013) contains a new dense variant of the Faugere F4 algorithm for computing Groebner bases, which I developed in 2013. This page briefly outlines some basic information and timing results for this algorithm.
The dense variant of the F4 algorithm is currently only practically applicable to input systems over a finite field where the input polynomials are considered "dense"; that is, if the input polynomials are written as a matrix with columns labelled by the monomials, then the input matrix should be dense. Equivalently: if the field size is q and the set of all monomials occurring in the input has size m, then the number of monomials in each input polynomial should be reasonably close to (1-1/q)*m.
Dense input systems which are applicable include various cryptographic inputs such as HFE, MQQ and Minrank systems (as seen below), but systems which do not involve the solving of simultaneous equations are also applicable. (The algorithm does work correctly on sparser input systems but is usually not as efficient as the standard F4 algorithm for such inputs.)
Some key features of the dense variant of the F4 algorithm are as follows:
System |
#vars |
D |
V2.19 |
V2.20 |
V2.20 GPU |
V2.20 RH |
V2.20 RH+GPU |
Speedup Ratio |
---|---|---|---|---|---|---|---|---|
MQQ160 | 160 | 3 | 123.6 [2.6] | 48.2 [1.5] | 36.9 [1.5] | 10.8 [1.5] | 10.0 [1.5] | 12.4 [1.7] |
HFE40_96 | 40 | 4 | 23.1 [0.25] | 9.7 [0.18] | 6.9 [0.18] | 6.3 [0.16] | 4.8 [0.16] | 4.8 [1.6] |
HFE80_100 | 80 | 4 | 3477.1 [13.7] | 447.4 [5.7] | 220.7 [5.7] | 180.4 [2.2] | 86.0 [2.2] | 40.4 [6.2] |
HFE140_100 | 140 | 4 | 217984.6 [384] | 17715.4 [142] | 5853.9 [142] | 4196.8 [26] | 1246.2 [26] | 174.9 [14.7] |
HFE200_100 | 200 | 4 | 40227.7 [124] | 10906.3 [124] | ||||
HFE250_100 | 250 | 4 | 152294.4 [369] | 37405.4 [363] | ||||
HFE50_288 | 50 | 5 | 25571.9 [27.0] | 9332.9 [22.9] | 2165.4 [20.7] | 2488.9 [15.0] | 830.4 [13.5] | 30.8 [2.0] |
HFE60_288 | 60 | 5 | 154219.2 [111.8] | 46371.0 [81.0] | 9737.0 [75.9] | 11570.3 [43.6] | 3218.6 [43.6] | 47.9 [2.6] |
HFE80_288 | 80 | 5 | 118119.6 [275] | 34498.3 [276] |
Some comments:
The heuristic attempts to reduce the size of the matrices involved in the linear algebra phase of each F4 step. When the heuristic is selected, the run may simply fail, but when it succeeds, it often saves significant time and memory usage. The kinds of systems for which the saving in time and memory usage tends to be greatest are those such that in the F4 steps of maximal degree D, the number of S-polynomials is relatively small compared to the total number of monomials of degree D. The systems in the above table are all of this kind, so that is why there is a great speedup for them by using the heuristic.
Currently the heuristic depends on a manual choice by the user of a numerical parameter M and this is specified by passing the parameter ReductionHeuristic := M to the function GroebnerBasis with a maximal degree bound D. Thus if B is the sequence of input polynomials, to select the Reduction Heuristic with parameter M, one should invoke the algorithm with something like:
GroebnerBasis(B, D: ReductionHeuristic := M);where D is the expected maximal degree reached in the computation.
If M is large enough, then the computation will finish correctly and hopefully in considerably less time than would be the case without the heuristic selected; there always exists some number such that if M is greater than this, success is guaranteed. But if M is too small, then the algorithm will fail: a runtime error is usually triggered at the point that the algorithm realizes that M is too small. Also, conceivably the algorithm could fail to discover that M is too small and so a wrong Groebner basis could be returned (this point thus makes the algorithm Monte-Carlo). However: (1) a value of M which is too small is usually detected by Magma (and a runtime error is triggered); (2) the algorithm can actually be considered Las Vegas when solving a typical system known to have at least one solution since potential solutions can be trivially checked at the end (so coupled with checking, the algorithm either finds correct solutions or simply fails).
At the moment, it is impossible to suggest what is a suitable choice for M for any given system: I do not yet know in general how to pick a value for M which is guaranteed to work or is reasonably close to optimal -- if I knew how to do this, then this would be automatically incorporated into the algorithm (the heuristic is certainly very experimental at this stage)! But it is currently recommended in practice that one start with M=500 or 1000, and if there is failure, one should increase M successively by 500 or 1000 and restart from scratch and repeat this until the algorithm succeeds. Note that making M much larger than necessary reduces the effectiveness of the heuristic, so one should not just set M to a very large value.
It may seem inefficient if one has to try several runs with potential failure, but this is often alleviated as follows:
The values for M which were used in the above timings table are:
In the near future, I hope to explain this heuristic in detail and generalize and automate it as much as possible...
The Courtois Minrank cryptographic challenge ("Problem C" [2001]) has (n,k,r)=(11,9,8). In their ISSAC 2010 paper (Sec 5.3), Faugere, Safey El Din and Spaenlehauer analyzed this challenge for the largest 16-bit prime field K=GF(65521) and suggested that the best attack would be to perform 65521 Groebner basis computations of the system with one variable evaluated at each value of K. Here is a comparison of their result with what can be done with Magma's new dense variant of F4:
Thus the new Magma V2.20 time seems to be about 50 times faster than that of Faugere et al and clearly the challenge problem can now be easily solved on a moderately-sized cluster.
Using this, on 23 August 2014 I managed to solve a random instance over GF(65521) of the Cortois Minrank Challenge C where (n,k,r)=(11,9,8), using the direct approach with the Wiedemann algorithm (that is, without using the above evaluation method).
The current times for this direct method are:
Update (July 2015): Details on the computation (which can now be done in 15.1 hours with just one core and a K40 Tesla GPU) can be found in my PASCO 2015 paper.