TWiki> Computing Web>LocalGridBBintegrate (revision 3)EditAttach

Birmingham GRID BlueBEAR System Changes

This documents changes to the BlueBEAR setup which were need to implement GRID which needed admin privileges to put into effect.

Changes required to allow job submission network calls from outside the cluster

NOTE: none of these nodes use the /bb or the /projects file systems

In order to support Grid job submission from the Computing Element (CE) in Physics, it is necessary to allow machines there to be able to talk to the qmaster. The qmaster, however, has no external IP address (in the 147.188 network). The export server, on the other hand, has an external IP address, and already has networking rules to allow worker clients to access the outside world, and so it is a small extension to allow that to act as the communication path for job-related network requests.

Preliminary tests using ShoreWall were not successful - Alex of ClusterVision was of the view that ShoreWall had bugs and could not successfully generate the right raw iptables rules. (This may change in future versions: if ShoreWall can in the future reproduce the same additional rules that we have implemented as an add-on, then it can take over that job). For the present, we have additional raw iptables rules, tuned to work with the Particle Physics subnets, but easily configurable as will be clear, which are implemented as follows:

On export server, /etc/rc.d/rc.local contains the extra lines:

 # Perform post-shorewall fixups for routing to qmaster, if required
if [ -x /root/nat-qmaster.sh ]; then /root/nat-qmaster.sh; fi

And on that server, /root/nat-qmaster.sh contains the lines:

# L.S.Lowe at bham.ac.uk
# Rules for forwarding torque/moab packets: these rules are on bbexport only, applied after shorewall etc.
# Torque/Moab packets from campus to bbexport are translated to have a destination address of qmaster.
# Implicitly, Torque/Moab packets from qmaster to campus are translated to have a source address of bbexport.
# We assume that the route on qmaster for packets to 147.188.0.0/16 is via gateway 10.141.245.101 (bbexport).
# Note: bbexport is (147.188.126.18 and 10.14x.245.101) and qmaster is (10.14x.255.250) where x is 1 and 3.
# These settings allow a communication path for any client within the PREROUTING source mask:
# the pbs_server itself is configured (via qmgr) to narrow-down which of those clients it will accept or reject.

src=147.188.46.0/23
dst=10.141.255.250
port1=15001
port2=42559
ext=eth2
set -u

if [ $# -eq 0 ] && iptables -t nat --list PREROUTING | grep -q 15001; then exit 0; fi # exit if done already

/sbin/iptables -t nat -I PREROUTING -p tcp -i $ext -s $src --dport $port1 -j DNAT --to-destination $dst
/sbin/iptables -t nat -I PREROUTING -p tcp -i $ext -s $src --dport $port2 -j DNAT --to-destination $dst

/sbin/iptables -t filter -I FORWARD -p tcp -i $ext -s $src -d $dst --dport $port1 -j ACCEPT
/sbin/iptables -t filter -I FORWARD -p tcp -i $ext -s $src -d $dst --dport $port2 -j ACCEPT
/sbin/iptables -t filter -I FORWARD -p tcp -o $ext -d $src -s $dst --sport $port1 -j ACCEPT
/sbin/iptables -t filter -I FORWARD -p tcp -o $ext -d $src -s $dst --sport $port2 -j ACCEPT

On the qmaster server, we need packets returning to campus machines, in particular the machines in Particle Physics, to go via that same export server, so we have the following additional lines in /etc/rc.d/rc.local:

# L.S.Lowe. For qsub/qstat/showq for grid submission from Physics,
# so packets NATted from export to qmaster return on that same path.
# This could be defined instead in route-eth0, but it's here for now:
route add -net 147.188.46.0 netmask 255.255.254.0 gw 10.141.245.101

Changes to the Torque setup

The above section deals with getting external packets in and out of the qmaster. This section deals with the additional requirements in the Torque configuration itself.

The grid nodes (at the time of writing) are u4n081-u4n128. These are allocated to queues using the acl_hosts technique in Torque's qmgr setup, just as is done for other existing queues. There are two grid queues: glong and gshort, with different wall and cput limits. The glong queue has access to fewer nodes than gshort (two fewer, at the time of writing), with the idea that gshort jobs will thereby get a better turnround. Which queue is selected is done by the external request broker: there is currently no need for a routing queue to feed these two queues. Both queues use acl_groups settings to limit who can submit to them; this is to prevent local users submitting to those queues.

# Create and define queue glong
create queue glong
set queue glong queue_type = Execution
set queue glong max_user_queuable = 5000
set queue glong resources_max.cput = 48:00:00
set queue glong resources_max.walltime = 72:00:00
set queue glong resources_default.cput = 48:00:00
set queue glong resources_default.walltime = 72:00:00
set queue glong resources_default.nodes = 1:ppn=1
set queue glong resources_default.pmem = 1996mb
set queue glong enabled = True
set queue glong started = True
set queue glong acl_group_enable = True
set queue glong acl_groups = g-atlas
set queue glong acl_groups += g-atlasp
set queue glong acl_groups += g-atlass
... etc ...
set queue glong acl_host_enable = False
set queue glong acl_hosts = u4n083.cvos.cluster
set queue glong acl_hosts += u4n084.cvos.cluster
set queue glong acl_hosts += u4n085.cvos.cluster
... etc ...

# Create and define queue gshort
create queue gshort
set queue gshort queue_type = Execution
set queue gshort max_user_queuable = 5000
set queue gshort resources_max.cput = 00:20:00
set queue gshort resources_max.walltime = 00:30:00
set queue gshort resources_default.cput = 00:20:00
set queue gshort resources_default.walltime = 00:30:00
set queue gshort resources_default.nodes = 1:ppn=1
set queue gshort resources_default.pmem = 1996mb
set queue gshort enabled = True
set queue gshort started = True
set queue gshort acl_group_enable = True
set queue gshort acl_groups = g-atlas
set queue gshort acl_groups += g-atlasp
set queue gshort acl_groups += g-atlass
... etc ...
set queue gshort acl_host_enable = False
set queue gshort acl_hosts = u4n081.cvos.cluster
set queue gshort acl_hosts += u4n082.cvos.cluster
set queue gshort acl_hosts += u4n083.cvos.cluster
... etc ... 

In order that particular hosts are accepted as valid submitters of jobs, it is necessary for them to be made known to Torque. There are two ways within Torque to allow hosts to be valid submitters: one is via qmgr:

set server submit_hosts += cename

The other method is to add a line to file /etc/hosts.equiv for the submit host on the server running pbs_server, namely qmaster. I did the latter, so our CE which handles BB jobs is added to that file.

In order for information subsystems on the CE to be able to issue certain privileged sorts of Torque commands, they need to be added as operators in the Torque sense, using qmgr:

set server operators += edginfo@cename 
set server operators += edguser@cename
set server operators += rgma@cename

Changes to the Torque prologue/epilogue

Some changes to the Torque prologue and epilogue provided by LSL were already in place in order to give non-grid users well-formatted additional information at start and end of job.

Further changes were added for Grid simply to ensure that the BB system was not accidentally or deliberately abused by local users or grid users. The prologue checks to ensure that a job submitted to a grid queue has a grid unix-group, and vice-versa. If a job is found in the wrong queue, the prologue cancels it without retry. In practice it is actually not possible (with our setup) for a grid job submitted via WMS to end up in a non-grid queue, or for a local BB user to submit to a grid-queue (because of use of acl_groups), so this is largely belt and braces. This code would also be superfluous if all BB queues (that is, including non-grid ones) used acl_groups to limit the userids/groups which could submit to them, but that would be an extra burden on the BB administrators to keep pace with all the different non-grid groups.

# Check queue and user/group and disallow invalid combinations. Also see qmgr acl_groups.
case "$6" in
g*) case "$3" in g-*) : ;; ?*) echo Invalid job queue for non-grid user; exit 1;; esac;;
?*) case "$3" in g-*) echo Invalid job queue for NGS/WLCG user; exit 1;; ?*) : ;; esac;;
esac

Changes to the Moab setup

Additional lines were requested to be added to the Moab scheduler setup in order to apply the appropriate fair-share and job-throttling policies for grid userids/groups. For example:

 GROUPCFG[g-atlasp]   FSTARGET=20 MAXPROC= _nn_

Changes to support grid userids and groups

A number of grid users and groups were added to the LDAP database to support GRID. All grid users and groups begin with "g-" to ensure that they can never clash with local userid naming rules. All grid uids and gids are above 100000 for similar reasons. The length of userids was kept within 8 characters, so that output of commands like qstat and showq continue to look nicely formatted.

The IDs are generated by my /home/lcgdata/make-users-groups/makeug* script. Running this script produces three files: /etc/passwd compatible entries, /etc/group compatible entries, and a script which uses the useradd/mod and groupadd/mod commands. These can be passed to the BB technical team in order for them to add them to the BB LDAP system. The resulting changes should be checked when they've been done to make sure there are no discrepancies between the request and the result.

Changes to help with maintenance of grid user accounts

So that our local grid administrators can maintain the contents of grid users' home areas, additions were made to the /etc/sudoers file on login nodes and worker nodes. The aliases and rule below allow those grid admins to run a bash shell in the accounts of grid users only. The command suitable for this purpose is sudo -H -s -u g-atl001, for example.

User_Alias  GRIDADMINS = list-of-pp-GRID-admins 
Runas_Alias GRIDUSERS = %g-atlas,%g-atlasp,%g-atlass,%g-cms,%g-cmsp,%g-cmss,%g-lhcb, %g-lhcbp,%g-lhcbs,%g-dteam,%g-dteamp,%g-dteams,%g-ops,%g-opsp, %g-opss,%g-hone,%g-honep,%g-hones,%g-na48,%g-na48p,%g-na48s,%g-ngs,%g-ngsp,%g-ngss   
GRIDADMINS ALL = (GRIDUSERS) /bin/bash 

-- LawrenceLowe - 22 Dec 2009

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 11 Jan 2010 - _47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61lawrence_32lowe?
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback