Installing mpich on a Network of Workstations

by Joel Adams and Nan Schaller.

These instructions assume that you know the basics of UNIX system administration. If you do not, you will want to consult with your system administrator. Listed below are problems we encountered installing mpich in a lab of Sun workstations running Solaris 2.5, and how we were able to fix them.

  1. Make sure that your system is configured normally, with networking services like domain name service.
  2. Make sure that your system has the complete X11 distribution, the complete package of GNU utilities (gzip, gunzip, gmake, gcc, g++, gdb, ...), and ideally, tcl/tk and wish. If your system does not have them, get your sysadmin to install them before proceeding.
  3. On the web, go to the Argonne Labs Installation Guide to mpich, a Portable Implementation of MPI.
  4. Follow the directions you find there for getting and unzipping the mpich distribution from Argonne.
  5. Read over the web pages on configuring mpich. Look for any potential problems such as items mpich will be expecting that are not on your system (e.g., f77, mpe, X11) or items on your system that are not installed in their default directories.
  6. Follow the 'Quick Start' directions to configure MPI, running './configure' with no arguments in a scrollable window. If your UNIX system is set up properly, this should work seamlessly. (Ours did not, as explained below.)
    1. If './configure fails, analyze the logging messages it output, to make sure that it is finding and using gmake for make, gcc for cc, and so on. If it is not, then either your path is incorrectly set, or they are installed in a non-standard place (or both). You can fix the problem by adjusting your path, or by calling configure with the appropriate switches that are discussed on the Argonne web pages. (e.g., './configure -make=/usr/oddPlace/gmake -cc=gcc ...')
    2. (This from Victor Thompson at Macalester (vlthompson@macalester.edu): If `./configure' returns multiple errors from sed, try running `configure.2', located in the same directory as `configure'. This script splits the sed commands up so that there are less than 100 per call. This problem seems to appear when the GNU version of sed is used.
    3. If configure still fails, it could be for a variety of reasons. Try to identify the problem from the context immediately before the failure message.
      1. If configure is looking for something that is not on your system, rerun configure, adding the switch(es) to tell configure not to use that item
            './configure -make=/usr/oddPlace/gmake -cc=gcc -nof77'
            
      2. If configure is looking for something that IS on your system, and it is not finding it, then the problem is in your path. Correct it as you did above.
    4. To set up MPI for C++, use the -c++=gcc switch. You may want to also set some of the other c++-related switches (-c++flags, -c++linker, etc.), all of which are given at Argonne's page on Configuring mpich. We neglected to do this, and being too lazy to go through all of this again, we must specify a variety of switches when compiling and linking C++ programs that use mpich. (See our Makefile and other notes related to using C++ and MPI.)
    5. Even after doing all of this, we still got this linking error when running configure:
         ...
         checking for sys/systeminfo.h... yes
         checking for getdomainname... undefined symbol       first referenced      in file
         getdomainname           /usr/tmp/cca000lj1.o
         ld: fatal: symbol referencing errors. No output written to conftest
         no
         checking for catopen... yes
         checking for catclose... yet
         ...
         
      We're not sure how to resolve this, and since the system seems to work in spite of its presence, we decided to ignore it... What? Me Worry? 8^)
  7. Read over the Argonne web pages on 'Compiling mpich.' Then follow the 'Quick Start' instructions, using 'gmake' to coordinate the translation (use a full pathname to gmake, if necessary). Once our configure had been done correctly, this step succeeded. (If it does not, go back and recheck the output from configure.)
  8. If you're wanting to run mpich on a network of workstations:
    1. Check that you can do an rsh to each machine in your network. If you try and run an mpich program using hosts to which you cannot rsh, the program will die with a 'Permission denied' error.

      If this happens, and your sysadmin is willing, add the names of the machines to which you cannot rsh to the system file that names trusted remote hosts (e.g., /etc/hosts.equiv). Since /etc is local to each machine, you will have to do this on each machine you wish to serve as part of your multicomputer.

      If your sysadmin is unwilling to do this for security reasons then each user will have to have a .rhosts file in their home directory, listing each of the machines on which you want to run mpich. See the man pages for rsh and rlogin for more details.

    2. Once you can rsh to each machine, go to the directory mpich/util/machines/machines/X (where X is your operating system) and add the names of each of the hosts on your network. Be sure to use the same naming format as is used in your hosts.equiv (or .rhosts) file. It may or may not be necessary to use full domain names on your system, but you should be consistent.

      If you try and run an mpich program with a naming mismatch, the program will die with a 'gethostname failed' error message.

      If you try and run an mpich program with more processors than you have named in your machines.X file, the program will die with an 'index' or 'rindex' error message.

  9. Read over the Argonne web pages on 'Running an MPI Program.' Then follow the 'Quick Start' directions to build and run a simple test program. If it fails, check the diagnostic message against the discussion above for hints as to what might be wrong.
  10. Run the whole test suite as described in the 'Thorough Testing' web pages from Argonne. If any of these fail, check the diagnostics to determine what you are missing.
  11. If you want to install mpich in a public place:
    1. Follow the 'Quick Start' directions to 'Build the rest of the MPICH environment' (step # 10), and make serv_p4. You need to do this step before the public installation, or else the public installation will fail.

      If you have tk, build nupshot while you're at it.

    2. Read over the 'Installing mpich for Others to Use' web pages from Argonne. Then decide where you want to install mpich publicly. (For a workstation network, this needs to be somewhere that is visible to each workstation in the network.) You should probably do this in consultation with your sysadmin.

      Then follwing the 'Installing mpich for Others to use' directions, not the 'Quick Start' instructions in step # 9. The 'make install PREFIX=...' command creates a new directory structure, and copies files from your first installation into this new place.

    3. If you want everyone to be able to use mpich transparently, add the path of $PREFIX/bin to your system's default shell initialization script (e.g., /.etc/csh.csrhc).

      Otherwise, distribute this path to those who are to use mpich so that they can modify their own path environment variable.

  12. Read over the rest of Argonne's 'Installion Guild to mpich' web pages, to install documentation, examples, etc.
  13. Get a life! (Remember: life runs in parallel.) 8^)
Good luck! If you find ambiguities in these instructions, or can supply additional information from your own experience, e-mail them to me at the address on the link below, and I'll try to incorporate them.
Calvin > CS > 374 > MPI Resources > Installing MPICH on a NOW
This page maintained by Joel Adams