Prior to multicore CPUs, host files had a very simple format. You simply wrote a list of all the names of the hosts you wanted to use, one per line. So the host file:
turing augusta chomsky hoarewould specify that MPI should spawn processes on the hosts turing, augusta, chomsky, and hoare. Simple, eh?
When multicore CPUs came along, a mechanism was needed to indicate that more than one process can be run on a given host. The natural place to do this is in the hosts file. Unfortunately, there is not (yet) a standard for the format this specification should take.
hostname slots=numCoresso if turing, augusta, and chomsky have a single dual-core CPU, while hoare has two dual-core CPUs (or a single quad-core CPU), then we might write:
turing slots=2 augusta slots=2 chomsky slots=2 hoare slots=4If you then give mpirun the --byslot switch:
mpirun -machinefile hosts -np=10 --byslot myProgramOpenMPI will spawn the first two processes on turing, the next two on augusta, the next two on chomsky, and the last four on hoare. The --byslot is the default policy, so if you don't specify a switch, this is the behavior you will get.
The alternative is to use the --bynode switch:
mpirun -machinefile hosts -np=10 --bynode myProgramThis will cause OpenMPI to spawn the processes in "round robin" fashion: the first process on turing, the second process on augusta, the third process on chomksy, the fourth process on hoare, the fifth process on turing, the sixth process on augusta, the seventh process on chomsky, the eighth process on hoare, the ninth process on turing, and the tenth process on augusta.
This is probably not what we want, since turing and augusta are each oversubscribed (i.e., have more processes than they have cores), and hoare, with just 2 processes for its 4 cores, is undersubscribed. If you want to "round robin" your processes across the hosts this way, but avoid this oversubscription problem, you can do so by using the maxslots specifier in the host file:
turing slots=2 maxslots=2 augusta slots=2 maxslots=2 chomsky slots=2 maxslots=2 hoare slots=4 maxslots=4With this hosts file, OpenMPI will spawn the first process on turing, the second process on augusta, the third process on chomksy, the fourth process on hoare, the fifth process on turing, the sixth process on augusta, the seventh process on chomsky, the eighth process on hoare, the ninth process on hoare, and the tenth process on hoare.
The basic premise is that if the number of processes spawned on a host is equal to its maxslots value, then that host is subsequently skipped in the "round robin" scheduling. If the value given to mpirun's -np switch exceeds the sum of the maxslots values, then mpirun will abort without launching any processes.
MPICH 1.x uses a simpler syntax to denote multicore/multiprocessor CPUs:
hostname:numCoresso if we have the same situation as before, where turing, augusta, and chomsky have a single dual-core CPU, while hoare has two dual-core CPUs (or a single quad-core CPU), then we would write:
turing:2 augusta:2 chomsky:2 hoare:4If we then invoke mpirun
mpirun -machinefile hosts -np=10 myProgramMPICH will (by default) spawn the first MPI process on the machine on which mpirun was invoked (which may or may not be turing). It will then spawn the second process on turing, the third process on augusta, the fourth process on chomksy, the fifth process on hoare, and then continue this "round robin" among the hosts. MPICH will automatically skip any host that has already been allocated the number of processes specified in the hosts file. So if the mpirun command were executed on turing, then the sixth process is spawned on augusta, the seventh process is spawned on chomsky, and the eighth, ninth, and tenth processes on hoare.
To avoid having any processes spawn on the host where mpirun is invoked, you can give mpirun the -nolocal switch. This is mostly used in large clusters where one node is dedicated to job scheduling and execution.
The simplicity of the format in MPICH 1.x has a price: it provides less control over how processes are mapped onto cores than that of OpenMPI.
More control is available in MPICH 2.x, via the mpd mechanism and its --ncpus switch. See the MPICH 2 Installers Guide for more information.