Birmingham CAGE Test Bench

A description of Cream ce /Argus/Glexec_wn/lcg-cE (CAGE) test installation at Birmingham.

Context

ALICE have requested Birmingham install a CreamCE, which should be able to submit jobs to all WNs supporting the ALICE VO. ATLAS does not support CreamCE submission yet, so the same WNs also need to accept jobs from a conventional lcg-CE.

ATLAS does support multi-user pilot jobs though, so the WNs must be able to execute glexec. Other GridPP? sites have tested this functionality in the context of an SCAS server. This test bench makes use of ARGUS to decide on authentication requests.

Installation

Installation of all test nodes is managed by the cfengine server on epgmo1. Below are instructions for completing the installation manually.

GLEXEC_wn

  • Setup the glite-ARGUS, glite-WN, glite-GLEXEC_wn, glite-TORQUE_client and lcg-CA yum repos.
  • Install the following:
     
          yum -y install lcg-CA 
          yum -y groupinstall glite-WN 
          yum -y glite-GLEXEC_wn 
          yum -y install glite-TORQUE_client 
          yum -y install glite-authz-pep-c 
          yum -y install glite-authz-pep-c-cli 
    
  • In order to get glexec working properly with lcmaps, you will also need to download and install glite-security-lcmaps-plugins-c-pep
  • If ATLAS jobs are to be supported, all the normal additional libraries should also be installed.
  • The node is configured with the yaim command /opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n WN -n GLEXEC_wn -n TORQUE_client. The relevant yaim variables can be found here.
  • The files /opt/glite/etc/glexec.conf, /opt/glite/etc/lcmaps/lcmaps-glexec.db and /opt/glite/etc/lcas/lcas-glexec.db are not properly set by yaim-core 4.0.11 and should be updated manually to reflect the settings detailed here.
  • In addition to all the normal glite-WN communication ports, the GLEXEC_wn also requires access (both INPUT and OUTPUT) on port 8154 to the ARGUS server.

ARGUS

  • Setup the glite-ARGUS and lcg-CA yum repos.
  • Install the following:
     
          yum -y install jdk-1.6.0_12-fcs.x86_64
          yum -y install lcg-CA 
          yum -y install jpackage-utils
          yum -y install glite-ARGUS
    
  • The node is configured with the yaim command /opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n ARGUS_server. The relevant yaim variables can be found here.
  • In order for dteam to use glexec, the appropriate policies have to be defined. Full details can be found here. A simple policy could be:
          resource "http://authz-interop.org/xacml/resource/resource-type/wn" {
            action "http://authz-interop.org/xacml/action/action-type/execute-now" {
              rule permit { vo = dteam }
            }
          }
    
    which, if stored in the file dteam_policy, can be loaded using the command pap-admin apf dteam_policy.
  • The ARGUS server should be able to communicate with itself (ie localhost) on ports 8150-8153 (INPUT only), and any node requesting authentication services on 8154 (INPUT only).

Cream CE

  • Setup the glite-CREAM, glite-TORQUE_server, glite-TORQUE_utils and lcg-CA yum repos.
  • Install the following:
     
          yum -y install lcg-CA 
          yum -y install xml-commons-apis
          yum -y --enablerepo=dag install glite-CREAM
          yum -y install glite-TORQUE_server
          yum -y install glite-TORQUE_utils
    
  • The node is configured with the yaim command /opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n creamCE -n glite-TORQUE_server -n glite-TORQUE_utils. The relevant yaim variables can be found here.
  • After configuring with yaim, the blparser must be configured with the command /opt/glite/yaim/bin/yaim -r -s /root/yaim-conf/site-info.def -n creamCE -f config_cream_blparser. The tomcat5 service should then be restarted.
  • The Cream CE hosts the torque server used by the normal lcg-CE to submit jobs to the WN. In order for this to work, the file /etc/hosts.equiv must contain the line epgr08.ph.bham.ac.uk +. In addition, the directory /var/spool/pbs/server_priv/accounting is NFS exported to the lcg-CE, for the purposes of APEL accounting. This requires that portmap and mountd services be added for the lcg-CE in /etc/hosts.allow.
  • A full list of ports used by the CreamCE can be found here. In addition, the lcg-CE must also be allowed to connect over the normal torque and NFS ports.

lcg-CE

  • Setup the lcg-CA, lcg-CE, jpackage and glite-TORQUE_utils yum repos.
  • Install the following:
     
          yum -y install lcg-CA 
          yum -y install lcg-CE
          yum -y install glite-TORQUE_utils
          yum -y install glite-info-provider-ldap
    
  • The node is configured with the yaim command /opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n lcg-CE -n glite-TORQUE_utils. The relevant yaim variables can be found here.
  • All of the normal lcg-CE ports should be open. In addition, the lcg-CE NFS mounts the directory cream_ce:/var/spool/pbs/server_priv/accounting, for the purposes of of APEL accounting.
  • The lcg-CE and Cream CE must publish the same software tags. For this reason, the lcg-CE NFS mounts the directory cream_ce:/opt/edg/var/info. It is believed that the tag hierarchy in lcg_ce:/opt/glite/var/info/ is not currently used, so this area is not (yet) shared.
  • The worker node must be able to scp files back to the CE without the need for a password, so the lcg-CE public key needs to be loaded (the lcg-CE should have the WN public keys after running yaim). This can be achieved by adding the hostname of the lcg-CE to the file /opt/edg/etc/edg-pbs-knownhosts.conf on the WN and then running the script /opt/edg/sbin/edg-pbs-knownhosts.

Testing

Dteam Job Submission

A dteam HelloWorld? job ran successfully on both the Cream and LCG CEs. OPS jobs also ran on both CEs automatically after enabling them in the GOCDB.

ATLAS Job Submission

A HelloWorld? job ran successfully on both the Cream and LCG CEs. An LCG Ganga job was submitted to the lcg-CE, which completed successfully. Panda functionality has not been tested. The CreamCE has not been tested.

GLExec Functionality

Renaming pool accounts

This is potentially difficult. It is assumed that pool accounts on resources (WNs, CE etc) must have the same format, UID and GID as the pool accounts on the ARGUS server in order for authentication to work.

Assuming no work is being done by a node (GridFTP? transfers, job submission/execution etc), it is relatively simple to delete existing pool accounts, clear the grid-map files and directories from /etc/grid-security and create new pool accounts with yaim. The one forseeable complication is that the software areas must be readable by a particular GID.

The pool accounts defined on BlueBEAR resources take a different format, and different IDs to those defined on local resources. As the BlueBEAR accounts are much more difficult to change, the solution would be to redefine all local pool accounts using the same format as those used on BlueBEAR. In addition, all files in the software areas would have to be assigned to the appropriate group. This is a potentially risky strategy, and it would require some thought before executing ( especially the day before collisions!). As it is not yet known if BlueBEAR will support glexec functionality, and as it is not yet known if a TAR_GLEXEC_wn release will be made available, the current implementation of the ARGUS server will contain pool accounts defined in the local cluster style.

-- ChristopherCurtis - 24 Mar 2010

Topic revision: r7 - 08 Apr 2010 - 11:14:56 - ChristopherCurtis
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback