man hboot (Commandes) - Start LAM on the local node.
NAME
hboot - Start LAM on the local node.
SYNTAX
OPTIONS
- -d
- Turn on debugging. This implies -v.
- -h
- Print the command help menu.
- -s
- Close stdio of child processes.
- -t
- Terminate (tkill(1)) any previous LAM session before starting.
- -v
- Be verbose.
- -N
- Go through the motions but do not actually take any action.
- -V
- Format and print the process schema.
- -c <conf>
- Use <conf> as the process schema.
- -I <inet_topo>
- Set the $inet_topo variable in the process schema.
- -R <rtr_topo>
- Set the $rtr_topo variable in the process schema.
DESCRIPTION
Most MPI users will probably not need to use the hboot command; see lamboot(1).
The hboot tool can be understood as a generic utility that starts multiple processes on the local node, based on information in a process schema. It is not restricted to starting LAM. It is part of the startup sequence preformed by lamboot(1).
A process schema is a description of the processes which constitute the operating system on a given node. Naturally, the process schema used by hboot should be the one that describes LAM on a node. The grammar of the process schema is described in conf(5).
When starting LAM on a remote machine using rsh(1), the open file descriptors of the processes started by hboot must be closed in order for rsh(1) to exit. This is done by using the -s option. The -t option can be used to force a tkill(1) on the machine before attempting to start LAM. This feature is used by lamboot(1) to handle the case where a user might start a machine a second time without using lamwipe(1) to terminate the previous LAM session.
The -I and -R options set their respective variables to the given values. The $inet_topo variable is typically used by the LAM Internet datalinks that communicate with other nodes. The $rtr_topo variable is passed to the LAM router that handles network and topology information. The variables can also be set in the process schema file (see conf(5)) but their values are overridden by the command line options.
When LAM is started, the kernel records all processes that attach to it, including all the processes in the process schema. It is the job of tkill(1) to use this information to remove these processes from the node.
EXAMPLES
- hboot -v
- Start LAM on the local node with the default process schema. Report about every step as it is done.
- hboot -c myconfig
- Boot the local node with the custom process schema, myconfig.
FILES
- laminstalldir/etc/lam-conf.lamd
- default node process schema, where "laminstalldir" is the directory where LAM/MPI was installed
- laminstalldir/etc/lam7.1.1helpfile
- Default location for help file for diagnostic messages that hboot may generate.
- /tmp/lam-$USER@<hostname>
- kill file for the LAM session on machine <hostname>, where $USER is the userid.
DIAGNOSTICS
Using ps(1) after hboot will display, among others, the LAM processes that have been started. They may be killed one by one with kill(1), or all at once by killing the LAM kernel process with a HUP signal. The preferred method is to use the LAM tool tkill(1) which should kill them all at once, and also remove the kill file. New users should make liberal use of ps(1) to gain confidence that the system is working properly. In a disaster, ps(1) and kill(1) are your only hope of recovery.