Resource for running ALICE MC production and analysis on BlueBEAR
Getting Started
Register on BlueBEAR via the web interface page which can be reached
here.
You login using ssh with your usual University (ADF) username, eg abc123 for students or smitha for staff.
Login is only available from machine with bham.ac.uk addresses eg eprexa.
Use the password notified to you. It can be changed with the standard
passwd command.
Send a
Helpdesk request, type 'Request', Subject area 'BEAR', sub-area 'Other', asking to be added to the "p-alice" group. This will allow you access to the shared software (and eventually we hope storage) area.
When you first login you will have a home area
/bb/phys/abc123 and hopefully a directory in the not backed up space
/nbu/phys/abc123. If you do not have the latter then again send a Helpdesk request as above but with sub-area 'Storage' asking for your not backed up space to be created as you anticipate exceeding the usual backed-up quota (currently 50 GB). As the name implies the area is not backed-up so
never store code, macros, scripts etc. there, only data which you could re-create be re-runnning.
Finally change the permissions on your home directory so that it will be group readable, using the usual chmod command, eg '
chmod g+r ~ ' and '
chmod g+x ~ ' This will enable us to help each other effectively by eg copying examples so do it now and there will be no delays in the future.
Some General Hints
These are mainly extracted from the documentation.
Don't run jobs on the login node(s). They will be de-prioritised and eventually killed by the system. Use the login nodes only for light tasks such as submitting jobs, editing code etc. Other interactive tasks can be accomplished by submitting an interactive batch job with '
qsub -I ' (the argument is an upper case i and not a lower case 'ell'). This gives you a shell on a worker node.
You can check your disk usage versus the quota by using he
bbquota command.
[Please add your own hints …]
ALICE Software setup
The ALICE sofware area is /apps/hep/alice
The currently installed ALICE releases are listed below. If you add another release then please document it here.
ROOT version |
GEANT version |
AliRoot |
Date |
By |
root-v5-24-00 |
geant-v1-11 |
AliRoot-v4-17-Release |
2009-09-04 |
Plamen |
You can use a release by sourcing the script setting the environment. Eg
source /apps/hep/alice/SetEnv/SetEnvAliRoot-v4-17-Release
Currently we are compiling using SL4 and therefore the SetEnv script starts with #!/bin4/bash so that we run using SL4 too.
There is an issue which currently prevents compilation 'out of the box'. The line:
cout << "aliroot " << ALIROOT_SVN_REVISION << " " << ALIROOT_SVN_BRANCH << endl;
must be commented out of the file
$ALICE_ROOT/ALIROOT/aliroot.cxx
The explanation is that
ALIROOT_SVN_REVISION and
ALIROOT_SVN_BRANCH should be substituted into the code during the make process but they become null strings if svn is not available and the code cannot then compile (because it has << << in it). Solutions would be for the AliRoot makefile to be fixed and/or svn to be made available throughout BlueBear (and in SL4).
Running MC production
In initial testing very good throughput of events was achieved with 50,000 events produced (generated, simulated and reconstructed) in less than 16 hours. This is because a large number of nodes can be employed in parallel. Longer runs are more efficient as more nodes usually become available as a function of time.
To do this two things are needed. First a script which can be given to the qsub (batch submission command). This script needs to; source the AliRoot environment, copy the necessary .C and .sh files from
$ALICE_ROOT/test/fpprod to a working directory, run the .sh file and finally copy back the necessary .root and .log files to some data area. Secondly a small script to run a loop submitting N jobs, where N is determined by the number of events needed and the number of events per job is used. Examples of each are
/bb/phy/barnbyl/prodtest/runfptest.sh and
/bb/phy/barnbyl/prodtest/submit.pl
This scheme has a couple of advantages:
- Each node is only opening files on its local disk for the duration of the jobs (~90 minutes) and copying them to the filestore at the end (a few seconds).
- Any files not required such as *.Digits.root, *.Hits.root and raw.root can be deleted before the copy which saves a factor of 4 in disk space.
This is only a basic schema and one can add other possibilities; use modified macros copied from home region; use a specially configured AliRoot with eg PYTHIA 8; copy the data at the end of the job directly to a different machine, bypassing the BlueBear filestore.
--
LeeBarnby - 19 Oct 2009