« Back to blog

Building a Diskless Linux Cluster: Debian (Etch) + DRBL + GridEngine

It's quite common in University departments for roomfuls of very capable PCs to sit idle from 5pm until 8am everyday and all weekend. Thankfully, it's possible to turn such a wasted resource into a powerful cluster of Linux machines using freely available open source software. In fact, even a group of machines in an office, or at home, can be transformed into a similar state with the minimum of effort. By booting the PCs (or nodes) over the network, no changes are made to the current setup of the machines, so reverting back to their normal state is as simple as rebooting. This guide will bring you through the process of creating a High-Performance Computing (HPC) cluster using the latest version of the Debian GNU/Linux operating system, Diskless Remote Boot in Linux (DRBL), and Sun's Open Source N1 Grid Engine 6.

Requirements

Creating a Linux cluster doesn't require a huge amount of resources, either in hardware or in time, but there is a bare minimum of hardware that you need to get started:

  1. A moderately powerful (at least a Pentium 4 with 1Gb RAM) machine to act as the master node. This really needs to be a dedicated machine, as this will allow users to submit jobs whether the cluster is currently running or not.
  2. The master node obviously needs a network card, but if you want to give the master or the nodes internet access, or are planning to have more than about 40 nodes, you will need two or three network cards. Even high-end Gigabit ethernet cards are cheap these days, and the way your nodes will communicate with the master can make or break a successful cluster.
  3. Network infrastructure to connect all of the nodes together. A 100Mbit network is the absolute minimum, particularly if you will have more than 10 nodes. Gigabit ethernet is recommended.
  4. One or more node PCs which support PXE booting (network booting). If the bios doesn't have an option to boot from the network card, or you don't see any messages about netbooting when you start the PCs, then you need to install a card which supports PXE. These nodes don't need to have a hard drive, but if they do (and you have some spare space) then it can be used for a swap partition (virtual memory in Windows-speak). It helps if the nodes contain similar hardware (such as is found in most offices or University computer rooms), as it reduces the amount of configuration required (almost nobody likes to rebuild the Linux kernel 20 times).

Download the script files

You will need to download some scripts which help setup GridEngine with DRBL (Python is required to run them, but is installed by default in Debian)

Install Debian

For this project, I used the latest release of Debian, 'Etch'. You can download it here. I'm not going to go through the installation process, as you can find plenty of excellent documentation over at the Debian website. I suggest heading straight for the installation manual. Thanks to the excellent installation scripts for Diskless Remote Boot Linux (DRBL), you can choose whatever installation type you want when installing Debian. For example, if you want to use graphical configuration utilities to set up your system, the select a desktop install, which will include the X window system. If you only plan to access the master over SSH then choose a lightweight console-only install.

Install DRBL

Again, I won't duplicate the excellent and thorough installation guide available from the DRBL website over here. However, for the purposes of this guide it is important that you choose the Single System Image (SSI) mode when installing. This means that all the nodes download the same system image over the network which provides the minimum filesystem to boot Linux, and the rest of the system, including /home, /usr and /opt are NFS mounted (accessed over the network) on the master node. Therefore, any software and user files on the master node are available to all of the cluster nodes. To deploy new software across the cluster, the most dramatic step you need to take is to reboot the nodes, something that DRBL will even do for you. A final point here is to take extra care when configuring the network setup of your nodes. During installation, DRBL will assume that any network card configured with a private address (e.g. 192.168.*) will contain nodes. If you will have over 40 machines on the network, consider splitting them up into two groups, and install two network cards in the machine with different subnets, e.g. 192.168.1.1/255.255.255.0 and 192.168.2.1/255.255.255.0. Having a multi-homed master node will increase the amount of data that you can push out to nodes.

Install Grid Engine

Before you go any further, make sure tht DRBL is installed, and that you can boot your nodes over the network from your master node. All following steps assume a working DRBL Single System Image setup with one or more nodes. Now that you have a network of machines, the next step is to install software which will manage the distribution and scheduling of jobs on the cluster. There are several alternatives to choose from, including Torque, SLURM, and Condor.

Note: I strongly suggest that you do not install or use the Condor project on your system(s). Not only is the source code only available on request, and only to those that the maintainers feel are 'suitable', but by default the program will report information about your installation back to the Condor servers on a regular basis. There is no mention of this during the installation procedure. To add to this, the license requires that any publication which includes data that has been analysed or processed using Condor must include a reference to the Condor project. Given that there are plenty of open source alternatives available, I don't recommend using this.

UPDATE: As of 6.9.5, Condor is now released under the Apache Licence, v. 2.0. Therefore, some of the above concerns have been addressed. In particular, the source code is now available, but only when you agree to fill in personal information. However, I am still unsure whether usage information is reported to the maintainers by default. Caveat Emptor.

For this guide, we will be installing Sun's N1 GridEngine 6. GridEngine is simple to install on Linux, requires only a minimum of installed libraries, is relatively easy to integrate with an existing DRBL setup, and provides a wide range of powerful tools to manage and use your cluster. There is also extensive documentation available online. Installing GridEngine will involve three setups:

  1. Installing the qmaster on the master node
  2. Installing the execution host on the master node
  3. Configuring the nodes and GridEngine to work together

Installing qmaster

This step simply involves installing qmaster on the master node as per the online documentation. However, before starting the installation script, there are a few things you will need to do:

  • Make sure the hostname of the master (or any of the nodes) doesn't also point to localhost or localhost.localdomain. SGE objects very strongly if it does. So,if your /etc/hosts file looks like this: 127.0.0.1 localhost mymaster 192.168.0.1 mymaster.mydomain.com mymaster then you will want to change it to: 127.0.0.1 localhost 192.168.0.1 mymaster.mydomain.com mymaster
  • Create a file containing the hostnames of your DRBL nodes. This will be used during the GridEngine installation process, when you will need to supply a textfile which contains one hostname per line. This information is available from your DRBL installation at /etc/drbl/IP_HOSTS_TABLE. You can generate it using: create_drbl_hostlist.py /path/to/output/hostlist

During the qmaster installation process you can safely accept most of the defaults, but make sure you choose to:

  • Install as root user You don't have to do this, but it simplifies the process of getting SGE to run on the DRBL nodes. Please note that in recommending this, I am presuming that your cluster network is private, and the nodes won't be accessible by non-privileged users (i.e. not you).
  • Do opt to verify file permissions
  • Select to use BerkleyDB, but without a spool server
  • Use the ID range suggested in the manual: 20000-20100
  • Accept to install startup scripts
  • Accept to load a file which contains the hostnames of your nodes. Here you enter the full path to file you created before running the install script.
  • Use normal scheduling

Once it's installed, source the SGE settings file in your .bashrc with the command: source sge-root/cell/common/settings.sh where sge-root is the SGE installation directory.

Install the execution host on the master server

We don't actually want to make the master node an execution node (although you may want to), as it's job is to manage and schedule jobs on the other nodes. However, as the nodes rely on the master node's filesystem for their applications, we will set up the execution daemons temporarily so that we may run them remotely on the nodes. Again, installation is as per the documentation, but there are some important choices to make:

  • Specify a local spool dir: /var/tmp/spool
  • Accept to install startup scripts

After the install is finished:

  • Move /etc/init.d/sgeexecd to somewhere safe
  • Run update-rc.d -n sgeexecd remove (because we don't want to start at boot time on the master node)
  • Move sgeexecd script back into /etc/init.d (which will make it available to the nodes once we rerun the drblpush script)
  • Add sgeexecd to /opt/drbl/conf/client-extra-service where /opt/drbl is your DRBL installation directory. This will make sure that the execution host daemon is started automatically when the nodes boots so they will be available to run jobs.
  • On the master node, save the current execution host config as a template with: qconf -sconf hostname > node.conf.template where hostname is the hostname of the master. Open the resulting file with a text editor and if the first line is 'hostname:' then remove it (thanks Josh). The config includes the custom local spool dir /var/tmp/spool which we want to be set for each of our nodes, so that the spool is storted locally in ram, and not on the master. If all of the nodes saved their spool information to the master via NFS, it would place a greater strain on the master node and the network infrastructure. Storing the information locally means that network bandwidth is saved for booting nodes and transferring user data/scripts for execution on the cluster.
  • Finally, run: create_sge_nodeconfig.py /path/to/node.conf.template This script will create a new GridEngine configuration for each of the DRBL nodes (as defined in /etc/drbl/IP_HOSTS_TABLE) based on the template we saved. Note: it is not strictly necessary to have a seperate configuration for each of the nodes, as the main purpose is to set the spool location to /var/tmp/spool. This is also possible by modifying the global configuration parameters of the GridEngine. However, I personally prefer having the ability to change the configuration of individual nodes if necessary without having to create a new configuration for them each time. As we use scripts to create and update the configurations, it doesn't require any effort to adopt either setup.

Configure the nodes

At the moment, a job queue ('all.q') has been created on the master, because we ran the execution host install script. For this cluster, we can either add all our nodes to that queue or create a seperate queue for each node. For this setup we will have a seperate job queue on all execution hosts. We now want to remove the all.q queue from the master, remove it as an execute node, and add queues to all of our DRBL hosts:

  • Save the queue config for the master with: qconf -sq all.q > node.queue.template
  • We'll now modify the queue config template so that we can easily recreate a new config for each node with a script. Open the queue conf template in your favourite text editor and make the following changes: qname %HOST%.q hostlist %HOST% slots 1 Note: if you have more than 1 processor on all your nodes then you can change the number to reflect this shell /bin/bash Set this to whatever shell you want to use for jobs
  • Run create_sge_nodequeues.py /path/to/queue.template This will create a new queue on each node with one slot per processor, enabling jobs to be submitted and executed.
  • Delete the all.q queue on the master as we won't be using it: qconf -dq all.q
  • And now, remove the @allhosts group because we have made individual queues for each node, so we don't need it either: qconf -dhgrp @allhosts
  • And last we remove the master as an execution node. It was set as such by running the execution host installation script, but we can undo that with: qconf -de master

If you now run qhost -q | more you should see a list of all your DRBL nodes and below each should be a queue definition. Running qstat -f should now list the various queues that were created, and show how many slots each of them have and how many slots are filled at the moment. Now, rerun drblpush -i or drblpush -c /etc/drbl/drblpush and restart one of your nodes. ssh into the node when it boots and confirm that sgeexecd is running with: ps ax | grep sge. If it isn't, check for startup messages in /tmp/execd*. Back on the master, rerun qhost -q | more. The node that you have rebooted should be listed with extra information than the others, such as free memory, number of processors, and load. You can now boot the rest of your nodes to create a diskless cluster with GridEngine and DRBL!

Setting up MPI/MPICH support on GridEngine

If you want to configure MPI/MPICH support, there are two steps which need to be completed. First, you must add a new parallel environment to GridEngine which sets up the mpi/mpich startup and shutdown scripts that are provided with SGE for use with MPI/MPICH jobs. Second, you must add your new parallel environment to each of your already existing job queues. For this example we will be setting up mpich as our parallel environment. If you want non-tight MPI support, just replace mpich with mpi throughout:

  • Make a copy of the mpich.template in the mpi directory
  • Fill in the information that is missing (marked with <>) in the template, such as number of slots (just put 999) and the SGE root dir (for me it's /opt/n1ge6)
  • Now add the parallel environment with: qconf -Ap /path/to/modified/mpich.template You can check it exists with qconf -spl and qconf -sp mpich
  • Next, modify the node.queue.template that we created in a previous step, and change the 'pe_list' from 'make' to 'make mpich'. This will enable parallel jobs to be run on the queue.
  • Finally, update all our queues (one for each DRBL node) with: update_sge_nodequeues.py /path/to/modified/node.queue.template
  • You can then check the updated queue definitions with: qconf -sq <queuename> where queuename is one of the queues on a DRBL node.

Using QMon

QMon is the graphical interface to the Sun GridEngine system. To run it, you need some extra libraries installed. Unfortunately, they haven't (and probably won't) have a release candidate for Debian, so you need to get it manually from the Debian servers:

  • Download the package for your install here.
  • Install with dpkg -i package-name.deb
  • Type qmon to start it.
| Viewed
times | Favorited 0 times
Filed under:  

29 Comments

Aug 14, 2007
Steven Shiau said...
Happened to find this nice article, excellent!
Do you mind if we put a link in drbl.sf.net ?
Thanks in advance.
Aug 14, 2007
padraig said...
Sure! Thanks for the great software :-)
Aug 16, 2007
Steven Shiau said...
It's in http://drbl.sourceforge.net/related_article/ now.
Thanks!
Sep 05, 2007
padraig said...
Those scripts are provided by me, but the link to the scripts in the requirements section wasn't clear enough. I've added a new section 'Download the script files' which contains a link to the scripts.
Feb 19, 2008
Newbie said...
Hi

I'm trying to get this to work on Scientific Linux 5 (SL5), which is a Red Hat Enterprise Linux 5 variant.

I've followed all the instructions above up to the MPI point, apart from the command "update-rc.d -n sgeexecd remove", which does not work on SL5, but it looks like ignoring this doesn't present major problems.

Initially sgeexecd appeared to be working on the node as per the suggested test, but after rebooting and re testing "ps ax | grep sge" on the node, it is not running. Looking at /tmp/execd* on the node gives messages that the "local configuration is not defined" and "can't create directory" on the node. Any hints on how to resolve this?

Many thanks!

Apr 20, 2008
Santiago said...
Hello,

I have been trying to setup a cluster following the instructions on your blog. Everything seems to be ok, but I have a problem with the scripts that create a new GridEngine configuration for each of the DRBL nodes, and no configuration files are being created. Could you give some hints about creating the configuration files for each of the nodes? maybe a sample configuration file and where exactly should the file be located?

Thanks.

Apr 20, 2008
Marco Antonio Silveira de Souza said...
sorry, the link for Download the script files not work
how to get?

thanks!

Apr 20, 2008
Phil said...
Hello,
I'm the chap from Uni that came and bothered you;-), I got the scripts from here. Thanks for the help.
Apr 21, 2008
Pádraig said...
@Marco: I just checked the link and it seems to be working. The files are hosted by Google so they should be available...

@Phil: glad you got them. Let me know if you have any issues...

Apr 21, 2008
Pádraig said...
@Santiago: the scripts don't create configuration files, they alter the SGE configuration which I believe is stored in some form of database. You can check the configuration with the qconf command.
Apr 21, 2008
Pádraig said...
@Newbie: that error message occurs because there is no configuration for that machine on the server. You need to make sure that when you follow the steps in the article, you list the hostnames of all your machines accurately as SGE uses the hostname of a machine to figure out what it's configuration is. If the hostnames don't match it won't find the config and the execution daemon won't run. Good luck!
Apr 21, 2008
Santiago said...
Hi again,

I still don't get it, spare with me please, because I'm new on this. I still can't configure the nodes. In the master node everythins seems to be working ok, I get a bit lost on the "Install execution host in the master node" step of this document, everythins seems to go well untill I get to the step where you have to run the script:

create_sge_nodeconfig.py /path/to/node.conf.template

I run the script but I get the following output:
value == NULL for attribute "servidor.local:" in configuration list of "nodo150"
value == NULL for attribute "servidor.local:" in configuration list of "nodo151"
value == NULL for attribute "servidor.local:" in configuration list of "nodo152"
value == NULL for attribute "servidor.local:" in configuration list of "nodo153"
value == NULL for attribute "servidor.local:" in configuration list of "nodo250"
value == NULL for attribute "servidor.local:" in configuration list of "nodo251"
value == NULL for attribute "servidor.local:" in configuration list of "nodo252"
value == NULL for attribute "sevidor.local:" in configuration list of "nodo253"

Where servidor.local is the master node and nodoNNN are the different nodes of the cluster. I'm not sure if this is the output that is expected for this script.

Do I also have to run the execution host setup in all the nodes?

Thanks in advance for your help.

Jun 21, 2008
Pádraig said...
This is probably down to my dodgy scripting ;-) They were written pretty fast just to solve my particular setup and may contain some bugs.

Have you checked to see if the node configurations have been stored? You can do this using qconf.

You don't need to run anything on the nodes, as all the configuration for SGE is stored on the master.

Good luck (again)!

Jul 01, 2008
Alex said...
Hello,

A small note : condor changed their license to "Apache license version 2.0" for
the latest versions.

Jul 01, 2008
Pádraig said...
Thanks. Have updated the article to reflect that.
Aug 07, 2008
Josh said...
I ran into the same problem as Santiago when running the node config script. Turns out when you do qconf -sconf hostname it outputs something like this

hostname:
mailer /bin/mail
xterm /usr/bin/X11/xterm
qlogin_daemon /usr/sbin/in.telnetd
rlogin_daemon /usr/sbin/in.rlogind
execd_spool_dir /var/tmp/spool

The script doesn't like that first line, and doesn't end up making the configs for each node. Remove that first line, and it works fine. Thanks for the tips. I think this will come in real useful to me.

Sep 05, 2008
Mendes, Jorge said...
Greetings,
I'm having problems with the SGE instalation, when i try to install the qmaster following the SUN's docs for debian etch, i recive this error message:
util/arch: line 234: srtings: command not found
Architeture UNSUPPORTTED-lx24-GLIBC-x86 not supported by this procedure!
Sep 22, 2008
intoniatonA said...
favorited this one, guy
Oct 24, 2008
Felipe Munarin said...
Very, very good article. I think that this procedure can be used in several clusters around the world. However, I did not get the drbl_sge_scripts.zip. The link looks broken. Can you help me making able to get it or sending by email? I am trying to solve a big problem.

Again, congratulation for this article. It is the unique in all google data about drbl and sge together.
thanks

Oct 30, 2008
Steve said...
Hello!

Than you for the nice article! Unfortunately, the link for Download the script files seems to me not working. Is there any way to get the files?

Thanks!

Nov 05, 2008
Steve said...
Hi!

A few days ago I already mentioned hear, that the link above about your scripts is out of order. I would be pleased, if you would repair the link, or simply send me the scripts above to let me have a try with them.

Thank you very much, again!

Jan 11, 2009
billy said...
The link is not working, could you provide us with a direct link to it
Feb 04, 2009
Pádraig said...
Sorry for the broken link. I've put the scripts back online again and have cleaned them up in the process. I've also implemented the fix suggested by Josh.
Feb 07, 2009
pienteecest said...
Nice post thanks for the information, keep it.
Greetings!
Feb 17, 2010
ranner said...
First of all, I want to thank you for this nice post.

I've found a problem: the /etc/hosts file generated by drblpush (both for the master and the client nodes) shows a modified hostname for the master. drblpush seems to append an "-eth1" suffix to the hostname, making the master's original hostname unreachable for the clients so sgeexecd does not work.

Do you have any idea?

Thanks again.

May 16, 2010
GamerCasino said...
good review thank
Jun 18, 2010
gregor said...
Hi!
The link for scripts is not working. I would be pleased, if you would repair the link, or simply send me the scripts above to let me have a try with them. Otherwise it is very nice post.

Greetings!

Jan 09, 2011
puppian said...
http://murga-linux.com/puppy/viewtopic.php?t=63705

get custom puppy mpich cluster system, no need to install anything to try cluster.

Jan 25, 2011
John Clark said...
~:` I am really thankful to this topic because it really gives up to date information '*~

Leave a comment...