This chapter provides basic instructions for using the parallelism
facilities provided in the current release of Magma. This version of
Magma provides some general tools for implementing parallelism as
well as the explicit parallelisation of some key algorithms in the
areas of linear algebra, commutative algebra, integral lattices and
linear codes.
The fact that Moore's law no longer applies to the development of
ever faster computers has greatly increased interest in exploiting
multiple processors for computations that use large amounts
of CPU time. In 2019 a project was commenced to develop tools
for parallelising some classes of computations in Magma.
Two approaches are being taken:-
- (i)
- The parallelisation of key algorithms in the Magma
kernel. This utilises multiple cores on a single compute node
using POSIX threads, multiple machines by distributing tasks
across a nework, and GPUs.
- (ii)
- A framework for the manager/worker model of parallelism
has been constructed which allows users to parallelise their Magma
code when the underlying algorithm lends itself to being parallelised.
A key element of this project has been the development of tools that
support asynchronous communication between copies of Magma running
on different machines. As part of this work, the serialisation of
Magma objects is being undertaken. This machinery is described in
Chapter
INPUT AND OUTPUT on Input and Output.
In Magma, when parallelism is selected, the execution is
parallelised for the following intrinsics, or for intrinsics
which are called (directly or indirectly) in the listed algorithms:
- (1)
- Multiplication of matrices over a finite field GF(pk),
where p is a prime such that 2 ≤p ≤223.5.
As a consequence, other intrinsics such as
EchelonForm, inverse (A - 1), Rank,
Determinant, Nullspace, RowSpace,
Solution, IsConsistent, and MinimalPolynomial
(if the Keller-Gehrig algorithm is specified) are
also parallelised.
- (2)
- Construction of a Gröbner basis for a system of
polynomials defined over a finite field GF(pk),
where either k=1 and p is a prime such that
2 ≤p < 263, or 2< pk ≤220.
Includes optimised asynchronous version for the dense
variant of the Faugere F4 over GF(2).
- (3)
- Construction of the affine variety (by calling Variety(I))
of an ideal
of a multivariate polynomial ring defined over a small finite
field K of size q (by exhaustive evaluation of k
variables at a time at all q elements of K, thus yielding
many smaller systems to solve in parallel). Point counting
variant VarietySize(I) gives the size only without
constructing all the solutions.
- (4)
- Multiplication of high-degree univariate polynomials
defined over prime finite fields or integer residue rings with
large integer coefficients;
multiplication of high-degree univariate polynomials
defined over finite fields of characteristic 2
(Karatsuba and Cantor algorithms).
- (5)
- Bivariate resultant of two univariate polynomials (this also
speeds up factorization of polynomials in Z[x]).
- (6)
- Factorisation of large integers (via ECM or MPQS algorithms).
- (7)
- LLL reduction of an integral lattice or basis matrix.
- (8)
- Enumeration of vectors in an integral lattice having
small norm (directly in intrinsics ShortVectors and
ShortestVectors). As a consequence the lattice
intrinsics Minimum, KissingNumber,
PackingRadius, HermiteNumber, CentreDensity,
Density, and ThetaSeries are also effectively
parallelised amongst others.
- (9)
- Enumeration of vectors in an integral lattice that are
close to a given vector which need not be in the lattice
(CloseVectors), etc.).
- (10)
- Minimum weight of a linear error-correcting code defined
over a finite field. As well as MinimumWeight,
the related intrinsics MinimumDistance and
MinimumWord are also parallelised.
- (11)
- Weight distribution of a linear error-correcting code
defined over a finite field. As well as WeightDistribution
the related intrinsics WeightEnumerator and
DualWeightDistribution are also parallelised.
- (12)
- Class group of a number field or an order (intrinsic ClassGroup).
V2.28, 13 July 2023