Linear Codes over Finite Fields

Introduction

Let K be a finite field and let V be the vector space of n-tuples over K. The Hamming distance between elements x and y of V, denoted d(x,y), is defined by

d(x,y) = #{ 1 ≤ i ≤ n | x_i ≠ y_i }.

The minimum distance d for a subset C of V is then

d = min{ d(x,y) | x∈C,y∈C,x ≠ y }.

The subset C of V is called an (n,M,d) code if the minimum distance for the subset C is d and |C| = M. The code C is called a [n,k,d] linear code if C is a k-dimensional subspace of V.

In this section, the term “code” will refer to a linear code. Machinery is provided for studying linear codes over finite fields GF(q), over integer residue classes ℤ_m = ℤ/mℤ, and over galois rings GR(pⁿ,k). Magma supports not only linear codes, but also codes over finite fields which are only linear over some subfield. These are known as additive codes and are covered in a subsequent section. At present, Magma provides support for block codes but not convolutional codes.

Constructions for Codes

One of the strengths of the Magma coding theory package is the provision of a huge number of tools for constructing linear codes. Firstly, all of the standard ways of defining a linear code are present: generator matrix, parity check matrix, set of vectors belonging to V, generator polynomial, etc. Secondly, a very large number of constructions are provided which construct a new code from one or more existing codes. Examples are direct sum and product of codes, Plotkin sum, concatenated code, Construction X, and many others. The third group of construction functions implement families of interesting codes such as cyclic codes, BCH codes, Hamming codes, Reed–Muller codes, and Reed–Solomon codes as well as more exotic codes such as quasi-twisted cyclic codes, Chien–Choy codes, alternant codes, etc. Practically all of the constructions found in The Handbook of Coding Theory are available.

Functions are provided which give lower and upper bounds for the parameters n, k, and d associated with codes. There are also tables of the best known bounds for linear codes. Of particular interest are databases of the best known linear codes. An [n,k] linear code C is said to be a best known linear [n,k] code (BKLC) if C has the highest minimum weight among all known [n,k] linear codes. An [n,k] linear code C is said to be an optimal linear [n,k] code if the minimum weight of C achieves the theoretical upper bound on the minimum weight of [n,k] linear codes. Magma currently has databases for best known linear codes over GF(q) for q = 2,3,4,5,7,8,9. Given any two of the parameters length, dimension, and minimum weight, then Magma will return the code with the best possible value of the omitted parameter. Giving a specified length and minimum weight, for example, will result in a corresponding code of maximal possible dimension.

The database for codes over GF(2) contains constructions of best codes of every dimension in F₂-vector spaces of length up to 256. The codes of length up to 31 are optimal. The GF(2)-database is complete in the sense that it contains a construction for every set of parameters. Thus the user has access to the 33,152 best-known binary codes covering every dimension and block length up to 256. Partial lists of best known codes are available for the other GF(q) listed above. The databases are virtual in the sense that most codes are created when requested from a small number of “seed codes”. These databases were constructed over the period 2000–2008 by J. Cannon (Magma), M. Grassl (Karlsruhe), and G. White (Magma).

A small amount of machinery is provided for constructing Algebraic Geometric codes. The most important tools are functions implementing the Goppa method for constructing a code from an irreducible plane curve defined over GF(q). Other functions enable the user to construct a code based on toric geometry, specifically from the lattice points of a polygon.

Minimum Weight and Weight Distribution

Perhaps the key invariant of a linear code is its minimum weight as this is the same as its minimum distance which, in turn, determines the number of errors the code is guaranteed to be able to correct. Much effort has been invested into designing and implementing fast algorithms for minimum weight determination. For non-cyclic linear codes an algorithm of A.E. Brouwer and K.H. Zimmermann is used. If the code is cyclic a more efficient variant of the Brouwer–Zimmermann algorithm due to A.E. Brouwer is used. G. White generalised this improvement to quasicyclic codes and any codes whose known automorphisms have large cycles.

Both of the above minimum weight algorithms are very easy to parallelise in such a way that the speed-up factor approaches n (the number of processors) as n becomes large. As an example, the minimum weight for the hull of the binary [273, 101] code generated by the lines of the dual Dempwolff projective plane of order 16 was found to be 24 (a [273, 102] code). It is sometimes desirable to list all of the code words of minimum weight and for this a variation of the Brouwer–Zimmermann minimum weight algorithm is used.

Algorithms are also provided for determining the weight distribution of a linear code. A general algorithm for this problem involves enumerating the code words so it only useful for codes of small dimension. For example, the weight distribution of a [32210, 10] linear code over GF(11) was found in 63,950 seconds running on a single processor. Various functions are provided for computing the weight enumerator or complete weight enumerator.

Equivalence and Automorphisms

The equivalence of two linear codes may be determined for codes defined over finite fields GF(q), where q is a small prime or 4. The algorithm is due to J. Leon. An algorithm developed by W. Unger (Magma) is capable of handling much larger codes (especially codes having large block lengths). At present it only applies to binary codes but it will be extended to other finite fields in the near future.

The automorphism group of the linear code C over the field K is the group A of all monomial-action permutations which preserve C. Thus, both permutation of coordinates and multiplication of components by non-zero elements of K is allowed. Variations of both the Leon and Unger equivalence algorithms are provided for computing the automorphism group of a linear code. The restrictions on the finite field that apply for testing equivalence also apply here.

Decoding

Currently, two methods are provided: syndrome decoding and a Euclidean decoding algorithm. The syndrome algorithm applies to any code but requires the storage for a set of coset representatives for the code with respect to its parent vector space. Consequently, it is restricted to linear codes having small codimension. The Euclidean algorithm decoder applies to any alternant code which includes BCH, Goppa, and Reed–Solomon codes amongst others.

Low Density Parity Check Codes

As their name suggests, low density parity check codes (LDPC codes) are block codes with parity-check matrices H that contain only a very small number of non-zero entries. It is the sparseness of H which guarantees both a decoding complexity which increases only linearly with the code length and a minimum distance which also increases linearly with the code length. LDPC codes are designed by first constructing a sparse parity-check matrix and then determining a generator matrix for the code. The biggest difference between LDPC codes and classical block codes is the manner in which they are decoded. Classical block codes are generally decoded using maximum likelihood decoding (decoding a vector into a closest codeword) and so are usually short. LDPC codes, however, are decoded iteratively using a representation of their parity-check matrix as a graph.

Magma provides machinery for constructing a LDPC code C either by choosing a parity-check matrix from some random distribution or from a standard construction such as the group-based construction of Margulis. All of the standard operations for linear codes over GF(q) apply to LDPC codes. Special operations implemented for an LDPC code include the construction of its Tanner graph and the calculation of its theoretical rate from an ensemble used to construct C.

LDPC codes of practical interest are usually chosen to have very large block lengths so that it is not possible to determine the minimum weight and related properties. Instead, their usefulness is determined by simulation. Magma supports two different channel models: the binary symmetric channel (BSC) and the Gaussian channel. The standard iterative decoding algorithm for LDPC codes is provided. For a given LDPC code, a simulation function is provided which simulates N transmissions across the given channel and returns the accumulated bit-error rate and word-error rate. The asymptotic performance of ensembles of LDPC codes can be determined using density evolution. An ensemble of LDPC codes is defined by a pair of degree distributions, corresponding to the degrees at the variable and check nodes of the Tanner graph. Density evolution may be applied to an ensemble of codes with respect to either the BSC or the Gaussian channel models.

The McEliece Cryptosystem

The McEliece cryptosystem is a public-key cryptosystem whose security is based on the difficulty of decoding a general linear code. Interest in this cryptosystem has grown recently due to the belief that it is one of the few current PKCs that are immune from attack by quantum computers. An attack on the McEliece cryptosystem must determine the coset leader (of known weight) given an error coset. Magma contains implementations of a number of attacks on McEliece. These include the attacks of McEliece, Lee–Brickell, Leon, Stern and Canteaut–Chabaud. In addition, the user may choose any combination of the implemented methods, which include improvements on the standard attacks.

Use and Applications

The Magma facility for algebraic coding theory is superior to other packages in this field (noting that it is not intended to compete with packages designed for the engineering side of ECC). Most groups working in algebraic coding theory use Magma for a great diversity of applications. Typical uses involve the construction of special codes, the construction of combinatorial objects from codes, the search for better codes (here the minimum weight tools are fundamental), certain attacks on cryptosystems that reduce to finding low weight codewords, and the classification of linear codes of a certain type.

Approximately 400 published papers on coding theory that cite the use of Magma have been identified (the exact number is unknown). A major project that has been carried out by Japanese mathematicians over the past decade has been to classify all self-dual codes of various lengths. Here one of the roles played by Magma was to test codes for equivalence. Another project to which Magma has contributed over many years has been research in finite geometry. For example, it was used in 2008 to show that none of the non-desarguesian affine planes of order 16 are tame.