Distributed Computation for Internal Algorithms

The term distributed computation refers to a computation which is performed on a number of nodes connected by a network. Magma uses the manager/worker model for its distributed computations. In this model, one of the nodes performing such a computation, the manager, breaks the computation up into small chunks, each of which it sends to one of the other nodes for execution. The later nodes are referred to as workers.

Contents

Automatic Worker Startup

The following procedures (and the manual worker startup method in the next subsection) apply to the internal distributed parallel algorithms supported by Magma (currently lattice vector enumeration, code min weight/distribution, and integer factorisation).

Suppose that the manager is a machine called node1 and the worker nodes are node2, node3, etc. Since V2.27, one can start up worker nodes by simply calling the following procedure from within the master job, before starting up a computation which will use distributed computation.

StartWorkers(H, T) : MonStgElt, RngIntElt ->
    Magma: MonStgElt                    Default: 
Automatically start up a Magma worker on the remote host H (given by a string) with T threads to run on H. By default this procedure starts a non-login shell (via ssh) on the remote host H and assumes that the command to start Magma is called magma on H and is already in the path when that shell starts (determined by the PATH environment variable).

For the Z shell, for example, if the PATH variable is set and exported in the file .zshenv to include the directory where 'magma' is, then the proper magma command will be automatically found and used, even in a non-login shell. Please do something similar for other shells to ensure 'magma' is in the path with a non-login shell by default.

Alternatively, if magma is NOT in the path by default on H, then the Magma parameter must be set to the full path name of the Magma command to be used on H (see example below).

SetWorkerPort(P) : RngIntElt ->
Set the port to be used for communicating with worker jobs to be P. It is not necessary to choose a port when using StartWorkers (an unused port is automatically chosen by default). But one can fix a specific port P by calling this procedure before calling StartWorkers.

Whether the port is automatically chosen by Magma or specified by the user, the port is always automatically passed on to the remote workers when they are started up via StartWorkers, so they know which port to use to connect back to the job running on the master host.

Example Par_StartWorkers (H5E9)

Here are simple examples which use StartWorkers to be run on the manager node1, assuming that the remote machines are node2, node3, etc. (with 16 threads each).

When {magma} is in the shell path on these nodes by default:

> StartWorkers("node2", 16); // start worker with 16 threads on node2
> StartWorkers("node3", 16); // start worker with 16 threads on node3

When the full path of {magma} is /usr/local/bin/magma (and not in the path for a non-login shell):

> StartWorkers("node2", 16: Magma := "/usr/local/bin/magma");
> StartWorkers("node3", 16: Magma := "/usr/local/bin/magma");

Now one could do a lattice/code operation or integer factorisation and the remote worker machines will automatically run (with the given number of threads when needed).

Manual Worker Startup

Suppose again that the manager is a machine called node1 and the worker nodes are node2, node3, etc. One can manually start up worker jobs and communicate with them with a specific port. Since V2.27, it is preferable to use the automatic startup method explained above, but it may sometimes be useful to have more manual control.

The manual setting-up procedure involves first choosing a port number PORT which is not already used on the network (for example a number in the range 4000 to 6000 is typically free). The input to each worker consists of something similar to the following statements:

    NUM_THREADS := 32;       // number of threads to run on worker
    PORT := 4000;            // chosen port number
    SetNthreads(NUM_THREADS);
    Worker("node1", PORT);

Alternatively, for each worker, this information can be provided on the Magma command line (with no input file needed):

     magma -m node1:4000 -t 32
The worker jobs can be started up both before and after the manager is started. The manager can successively run multiple kinds of distributed jobs and each worker will contribute to each job and then wait for another job until the manager job finally exits. Also, for the currently supported code and lattice enumeration cases, killing and restarting worker nodes should be robust; that is, if a worker job is deliberately killed or accidentally exits then the manager will notice and the worker's task will be reassigned to another worker (and thus completed at some point). Further, a new worker job with the same port can be started to connect to the same manager job and undertake new work on the same job.

The input for the manager job only needs to set the port number when commencing work:

    SetWorkerPort(PORT);

where PORT is the same port number chosen for the worker jobs. After this, for the code and lattice enumeration algorithms, the input to the master job is simply what one would do for a normal enumeration job (see sections Integral Lattices and Linear Codes below for details).

Example Par_SetWorkerPort (H5E10)

For determining the minimum weight of a linear code C, a typical manager input might be:
    PORT := 4000;           // or other free port number (matching worker)
    SetWorkerPort(PORT);
    SetVerbose("Code", 1);  // for verbose information
    time MinimumWeight(C);

The distributed computation can be augmented with the use of threads in the manager process, simply by using SetNthreads prior to the start of the computation. For instance, for determining the theta series of a lattice L, a typical manager input might be:

    NUM_THREADS := 32;      // number of threads to run on manager
    PORT := 4000;           // or other free port number (matching worker)
    SetNthreads(NUM_THREADS);
    SetWorkerPort(PORT);
    SetVerbose("Enum", 1);   // for verbose information
    time ThetaSeries(L, bound);
V2.28, 13 July 2023