PLEASE NOTE: This guide is very out of date. Please see https://twiki.cern.ch/twiki/bin/view/AtlasComputing/FullGangaAtlasTutorial instead

Introduction

This tutorial will go through the Atlas module of the Ganga job management tool. It has been written to follow on from the rolling CERN Distributed Analysis tutorials but will cover significantly more material than is covered there.

0 HOW TO PREVENT QUOTA ISSUES

If you are doing this tutorial as part of the ATLAS offline software tutorial, you will likely run into quota issues as you proceed through the following steps. To prevent this, please remove your athena test area before beginning, and proceed through section 1 below to start again without the large BPhys user area.

1 Grid Certificate Setup

1.1 Preparing your Grid Certificate

To run on the grid, you must have a valid grid certificate. Hopefully, you will all have this already! However, it may currently be still loaded in your browser and so you need to go through a few steps to export it to your userarea:

1) Set up the grid environment:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh

2) Create a .globus directory:

cd $HOME
mkdir ~/.globus
cd ~/.globus

3) First, export the grid certificate. This is browser dependent, but in Firefox do:

Edit -> Preferences -> Advanced -> View Certificates

Select you certificate and click on 'Backup'. You should then be requested to download your certificate. Download it to your .globus directory you created above as 'grid.p12'.

4) Now the certificate must be converted to the correct form by doing the following:

openssl pkcs12 -in grid.p12 -clcerts -nokeys -out usercert.pem
openssl pkcs12 -in grid.p12 -nocerts -out userkey.pem
chmod 400 userkey.pem
chmod 600 usercert.pem

5) Hopefully, everything should now be ready so to check it's all OK, do the following:

voms-proxy-init --voms atlas

which should give something like this:

Cannot find file or dir: /home/slater/.glite/vomses
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Birmingham/L=ParticlePhysics/CN=mark slater
Creating proxy ........................................................................................................ Done
Your proxy is valid until Sun Dec 14 03:34:50 2008

1.2 Some help on Useful Commands Outside Ganga

Though almost everything you would want to do is capable within Ganga, at times it can be useful to know some of the grid UI commands so you don't have to start up Ganga just to check them. Some of the more useful ones are:

voms-proxy-init --voms <VONAME>
Create a grid proxy for the given VO

voms-proxy-info --all
View all information about the current proxy

glite-wms-job-status https://...
Query the resource broker for information about the given job. The job id is given by the https tag that be found by doing:
j.backend.id
for a given grid job in Ganga

2 Setting up CMT and Athena

2.1 Hello World with Athena

If you already have athena setup, you can proceed to step 8 to setup the environment and check it works. However, do ensure you then have the UserAnalysis? package installed (2.2).

1. Create a cmthome directory in your home directory:

cd $HOME 
mkdir cmthome 
cd cmthome

2. Make your login requirements file - use your favourite editor to create a file called requirements and paste into it these lines:

set CMTSITE CERN 
set SITEROOT /afs/cern.ch 
macro ATLAS_DIST_AREA ${SITEROOT}/atlas/software/dist 
macro ATLAS_TEST_AREA ${HOME}/scratch0/ 
use AtlasLogin AtlasLogin-* $(ATLAS_DIST_AREA) 

3. Save and exit this file

4. Now set up CMT as follows:

source /afs/cern.ch/sw/contrib/CMT/v1r20p20080222/mgr/setup.sh 
cmt config 

5. Now set up the ATLAS software. For this part we'll use release 14.5.0. See the introductory talk or the WorkBook? for information on release numbers.

cd $HOME 
source cmthome/setup.sh -tag=14.5.0,setup,32 
(Note that you'll get a complaint about directories not existing - don't worry about this - it will only happen once)
echo $TestArea 
mkdir -p $TestArea 

6. Go into your $TestArea, and get ready to run the ATLAS software!

7. IMPORTANT: Make sure that there is not a directory called "run" in the $TestArea. This avoids a potential problem that you might have later in this tutorial. If 'run' is in your TestArea? , simply rename it for now:

cd $TestArea 
mv run run.bak

8. We will now setup the runtime environment - NOTE that this is all you would need to do from a clean shell!

cd $HOME 
source cmthome/setup.sh -tag=14.5.0,setup,32 
cd $TestArea 

9. The last thing to do is to test the setup. For this we will use a basic set of Hello World job options.

get_files -jo HelloWorldOptions.py
athena HelloWorldOptions.py

If all is well, you should see a lot of output showing that Athena ran successfully!

2.2 Setting up a Basic Analysis with Athena

Before completing the Athena setup, you need to check out and compile the UserAnalysis package:

cd $TestArea
cmt co -r UserAnalysis-00-13-09 PhysicsAnalysis/AnalysisCommon/UserAnalysis

Setup and compile the code in the directories of 14.5.0 with:

cd PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cmt make

After compilation:

cd ../run 
cp ../share/AnalysisSkeleton_topOptions.py .

We now need some data to run on. For this, we will use the dq2 tools to grab a single file from one of the mc08 datasets:

source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh 
dq2-get -f AOD.026663._00002.pool.root.1 mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545

Next, edit the 'AnalysisSkeleton_topOptions.py' file to point to this downloaded AOD. The appropriate change is:

ServiceMgr.EventSelector.InputCollections = [ "AOD.pool.root" ]

changed to:

ServiceMgr.EventSelector.InputCollections = [ "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545/AOD.026663._00002.pool.root.1" ]

Save the change and then attempt to run the UserAnalysis? using these basic options:

athena AnalysisSkeleton_topOptions.py

If all is well, the job should complete over this single file and output a root file.

3 First Steps with Ganga

NOTE This section is only designed to give a brief overview of basic Ganga functionality. For more information, see the Ganga website!

3.1 Starting Ganga

If you are working on lxplus at CERN, there is an automatic script that sets the correct environment and finds the latest stable release of Ganga. To setup Ganga on lxplus just type the following two commands from a clean shell:

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh  (or .csh)

You can also specify which version of Ganga to use by issuing this command instead:

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh 5.1.4

Before running Ganga properly, we will create a configuration script that you can use to alter the way Ganga behaves:

ganga -g

This creates a file '.gangarc'. The options given in here will override anything else (except the command line). You are now ready to start Ganga properly:

ganga

This should present you with:

*** Welcome to Ganga ***
Version: Ganga-5-1-4
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.


ATLAS Distributed Analysis Support is provided by the "Distributed Analysis Help" HyperNews forum. You can find the forum at
    https://hypernews.cern.ch/HyperNews/Atlas/get/distAnalysisHelp.html
or you can send an email to hn-atlas-dist-analysis-help@cern.ch

GangaAtlas                         : INFO     Starting for first launch - Creating new Task list.
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "jobs", completed in 0 seconds
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "templates", completed in 0 seconds


In [1]:

You can quit Ganga at any point using Ctrl-D.

3.2 Getting Help

Ganga is based completely on Python and so the usual Python commands can be entered at the IPython prompt. For the specific Ganga related parts, however, there is an online help system that can be accessed using:

In [1]: help() 
************************************

*** Welcome to Ganga ***
Version: Ganga-5-1-4
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.


This is an interactive help based on standard pydoc help.

Type 'index'  to see GPI help index.
Type 'python' to see standard python help screen.
Type 'interactive' to get online interactive help from an expert.
Type 'quit'   to return to Ganga.
************************************

help>

Type 'index' at the prompt to see the Class list available. Then type the name of the particular object you're interested in to see the associated help. You can use 'q' to quit the entry you're currently viewing (though there is currently a bug that displays help on a 'NoneType' object!). You can also do this directly from the IPython prompt using:

In [1]: help(Job)

You might find it useful at this point to have a look in the help system about the following classes that we will be using:

Job
Athena
AthenaMC
ATLASLocalDataset
ATLASOutputDataset
AthenaJobSplitter
DQ2Dataset
DQ2OutputDataset
DQ2JobSplitter
LCG
Panda
NG

3.3 Your First Job

We will start with a very basic Hello World job that will run on the machine you are currently logged in on. This will hopefully start getting you used to the way Ganga works. Create a basic job object with default options and view it:

In [1]: j = Job()
In [2]: j
 Out[6]: Job (
 status = 'new' ,
 name = '' ,
 inputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/input/' ,
 outputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/output/' ,
 outputsandbox = [] ,
 id = 0 ,
 info = JobInfo (
    submit_counter = 0
    ) ,
 inputdata = None ,
 merger = None ,
 inputsandbox = [] ,
 application = Executable (
    exe = 'echo' ,
    env = {} ,
    args = ['Hello World']
    ) ,
 outputdata = None ,
 splitter = None ,
 subjobs = 'Job slice:  jobs(0).subjobs (0 jobs)
' ,
 backend = Local (
    actualCE = '' ,
    workdir = '' ,
    nice = 0 ,
    id = -1 ,
    exitcode = None
    )
 )

Note that by just typing the job variable ('j'), IPython tries to print the information regarding it. For the job object, this is a summary of the object that Ganga uses to manage your job. These include the following parts:

  • application - The type of application to run
  • backend - Where to run
  • inputsandbox/outputsandbox - The files required for input and output that will be sent with the job
  • inputdata/outputdata - The required dataset files to be accessed by the job
  • splitter - How to split the job up into several subjobs
  • merger - How to merge the completed subjobs

For this job, we will be using a basic 'Executable' application ('echo') with the arguments 'Hello World'. There is no input or output data, so these are not set. We'll now submit the job:

In [3]: j.submit()

If all is well, the job will be submitted and you can then check it's progress using the following:

In [4]: jobs

This will show a summary of all the jobs currently running. You're basic Hello World job will go through the following stages: 'submitted', 'running', 'completing' and 'completed'. When your job has reached the completed state, the standard output and error output are transferred to the output directory of the job (as listed in the job object). There are several ways to check this output. First, we will use the 'ls' shell command to see if it's present:

In [5]: !ls $j.outputdir

where we have used the exclamation mark (!) to go into 'shell' mode and the dollar sign ($) to return to 'python' mode. Check the output is valid by using the one of the following:

In [6]: !emacs $j.outputdir/stdout
In [7]: j.peek("stdout", "emacs")

The peek command in the last line can be used to run a command (the second argument) on a file in the output directory (the first argument). These two lines are equivalent. With any luck, you will see Hello World in the stdout. Congratulations, you've run your first job!

3.4 Submitting to Different Backends

We'll now look at the different backends that can be used. A 'backend' for Ganga describes where you want your job to actually run. Up until this point, we have been using the 'Local' backend which refers to the current computer you're using. Consequently, your computer name can be seen in the 'ActualCE' field when running the 'jobs' command. We'll now try using the LSF (LxBatch? ) system to send jobs:

In [1]: j = Job()
In [2]: j.backend = LSF()
In [3]: j.submit()

Again, your job starts in the submitted state and then will go through running and completed states. That is all there is to running on the LSF batch system.

Running on the grid is just as simple with the only change being the backend - in this case, LCG. For the most basic job, do the following:

In [1]: j = Job()
In [2]: j.backend = LCG()
In [3]: j.submit()

After the job completes, there is a slight difference in accessing the standard output. To save on the transfer, the stdout and stderr files are zipped up. However, the 'peek' method takes this into account. Consequently, you can do something like:

In [7]: j.peek("stdout.gz", "emacs")

3.5 Creating Submission Scripts

Clearly, it would be very tedious if you had to keep typing out the same text to submit a job and so there is scripting available within Ganga. To test this, copy the code for the three Hello World jobs above, paste them one after another into a file called 'first_job.py'. Then, from within Ganga, you can use the 'execfile' command to execute the script:

In [1]: execfile('first_job.py')

You can also run Ganga in batch mode by doing the following:

ganga first_job.py

At this point, just to show the persistency of your jobs, quit and restart Ganga. Your jobs will be preserved just as you left them!

3.6 More Advanced Job Manipulation

To finish off, we will cover some useful features of managing jobs. This is a fairly brief overview and a more complete list can be found at:

http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/

Copying Jobs

You can copy a job regardless of it's status using the following:

j = Job()
j2 = j.copy()

The copied job is able to be submitted regardless of the original jobs status. Consequently, you can do the following:

j = jobs(2)
j.submit()

Job Status

Jobs can be killed and then resubmitted using the following:

j.kill()
j.resubmit()

The status of a job can be forced (e.g. if you think it has hung and you want to set it to failed) using the following:

j.force_status('failed')

Removing Jobs

To clean up your job repository, you can remove jobs using the 'remove' method:

j.remove()

Configuration Options

You can supply different configuration options for Ganga at startup through the .gangarc file. If you wish to change things on the fly however, you can use:

config[<section>][<parameter>] = value

To show what the current settings are, just use the straight config value as with jobs:

config[<section>]

4 Running Athena on the LSF Batch System

To run Athena jobs within Ganga, there are a of number objects to get used to. The main one is the Athena application class that tells Ganga such things as the version of Athena to use and your job options. It is also used to prepare your job to be sent to the batch system or grid. In this section we will concentrate on using Athena on the LSF batch system.

4.1 Hello World

We will start by running the basic Athena Hello World example from above, only this time, using Ganga. First, make sure you are have setup Athena and your UserAnalysis? package as before and then change to the run directory, just as if you were going to run it locally as above:

cd $HOME 
source cmthome/setup.sh -tag=14.5.0,setup,32 
cd $TestArea/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cd ../run 

The job is set up in a similar way as before, but the application should be changed:

j = Job()
j.application = Athena()
j.application.atlas_release = '14.5.0'
j.application.option_file = '$HOME/scratch0/AtlasOffline-14.5.0/HelloWorldOptions.py'
j.application.max_events = 10
j.backend = LSF()
j.submit()

After the job has run, you can now look at the stdout using the 'peek' method as before, you will see the Athena output that you had previously. Note that because we were using 'vanilla' Athena, we didn't use the 'prepare' method (see below) and needed to specify the release you wish to use.

4.3 Local Datasets

ATLAS uses the idea of datasets to encapsulate the data files. These have been carried over to Ganga. For local jobs (e.g. LSF), we will use the ATLASLocalDataset? and the ATLASOutputDataset? .

4.3.1 ATLASLocalDataset?

For the dataset input, we will use 'ATLASLocalDataset'. This should only be used for local jobs (on the batch system or LSF) and not for grid jobs as it is used to refer to dataset and files present on the local (or shared) file system. It gives several ways of specifying the dataset files to run over and be aware that they're cumulative:

* Directory + wildcard: Supply the directory containing the files and a regular expression for the files you want to include

d.get_dataset(<dir>, <reg_exp>)

* File list: Give a text file that simply lists the filenames to run over. This can also handle wildcards

d.get_dataset_from_list(<dir>, <reg_exp>)

* Using the 'names' list directly: There is an array within the object that can be added to directly

d.names.append('<filename>')

To show these in action, try the following:

d = ATLASLocalDataset()

# should give 3 files in the list
d.get_dataset('/afs/cern.ch/user/m/mslater/public/mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545', 'AOD.026663.*')
len(d.names)
full_print(d.names)

# a previously created filelist with 4 files from the ddiff sample - now 7 files in the dataset
d.get_dataset_from_list('/afs/cern.ch/user/m/mslater/public/ddiff_filelist.txt')
len(d.names)
full_print(d.names)

# now we'll just add 2 more from the minbias sample
d.names.append('/afs/cern.ch/user/m/mslater/public/mc08.105001.pythia_minbias.recon.AOD.e349_s462_r541/AOD.026320._00808.pool.root')
d.names.append('/afs/cern.ch/user/m/mslater/public/mc08.105001.pythia_minbias.recon.AOD.e349_s462_r541/AOD.026320._01037.pool.root')
len(d.names)
full_print(d.names)

When given to a job, these files will overwrite anything specified in the job options.

4.3.1 ATLASOutputDataset?

We will be using the ATLASOutputDataset? to store the output. The output dataset is a bit easier to deal as all we will need to do is specify what files are going to be added to the dataset (similar to the output sandbox) and where to place them in local storage. As an example:

j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.outputdata.location = '$HOME/athena/output'

4.4 Running a UserAnalysis? Job on LSF Using Ganga

Now that we know about local datasets, we're in a position to send an Athena UserAnalysis? job to LSF. This is similar to running the Athena Hello World example above but to ensure that our modified UserAnalysis? module is used, we must 'prepare' the package before submitting the job. This will tell Ganga to tar up the user area and send it with the job. Note that we have also used the 'exclude_from_user_area' option so that the root file you downloaded previously will not be included unnecesarily in the tar ball. In addition, we need to specify the dataset to run on and where to place the output. The following script creates a job that runs Athena on the some of the files in the MC dataset above:

j = Job()
j.application = Athena()
j.application.exclude_from_user_area=["*.o","*.root*","*.exe"]
j.application.prepare(athena_compile=False)
j.application.option_file='AnalysisSkeleton_topOptions.py'
j.inputdata=ATLASLocalDataset()
j.inputdata.get_dataset('/afs/cern.ch/user/m/mslater/public/mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545','AOD*.root.*')
j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.outputdata.location = '$HOME/athena/output'
j.backend = LSF()
j.submit()

Note that in the 'prepare' method, we have specified 'athena_compile=False'. This was done because we had already compiled it on the node we're working on and we only needed to send these compiled object files. It can be useful if dealing with heavily modified or many packages to compile them on the Worker Node and also to prevent incompatibilities. Submit a new job, but this time tell Ganga to compile the code on the worker node by replacing the prepare method above with:

j.application.prepare(athena_compile=True)

When complete, examining the stdout file should show you that your code was compiled.

4.5 Splitting and Merging Athena Jobs

The last thing to look at before we move onto running Athena on the Grid is combining what we have just done with splitting and merging. This is where the power of Ganga for running over large datasets becomes apparent. By adding a splitter module to the job definition the job will be split into a number of subjobs that will only run on a few of the specified files. After these jobs are complete, the results can be merged and so it is as if you ran over all the files in one job. The splitter and merger we will use for this local job are AthenaSplitterJob? and AthenaOutputMerger? .

The splitting of the job is fairly simple. The only thing you will need to specify is the number of subjobs you want per job (the other options don't work at the moment!). The merger is even easier as the only thing you would need to specify here is the output directory to store the combined files. As an example, create a submission script that loads in the selection of data we did above (containing sdiff, ddiff and minbias contributions) and then add the following to the above submission script in section 4.4 (remembering to change the inputdata to the dataset you've just created!):

j.splitter = AthenaSplitterJob()
j.splitter.numsubjobs = 5
j.merger = AthenaOutputMerger()

Each split job contains a variable 'subjobs' that behaves exactly like jobs, e.g. you view a summary of the subjobs using:

jobs(<id>).subjobs

or view an individual subjob using:

jobs(<id>).subjobs(<sid>)

Once complete, you should have the individual files for each job. Unfortunately, due to a conflict with the Athena environment you will probably get an error for the merger. However, you can still do the merging by running Ganga from a clean shell and doing:

jobs(<id>).merger.merge()

5 Running Athena on the EGEE Grid

5.1 The Atlas Computing Model

The Atlas Computing Model has been introduced in order to ensure the most efficient use of the resources on the Grid. The most important concept for this is that of jobs going to data and NOT data going to jobs. This means that, though it may seem the simplest thing to do, you should not just download datasets to local storage and run on them there. When scaled up with several thousand users and many different datasets, everything would grind to a halt, nevermind whether you would have enough storage space! The way to analyse data is to set up your job description (as we did in the previous section), send this job to the sites that have your data and then only transfer back the (much smaller) analysis data.

Having established this method of working, we will now look at how this has been implemented. The data in Atlas comes in various flavours going from raw data through ESD (event summary data) to AODs (Analysis Object Data). There is also the DPD level below this which is even more stripped down, but at present, most analysis is being done on AODs. This data is replicated at many sites across the world allowing many users to run jobs at many different sites. The system that organises all this is DQ2 for which there are useful commands that can be used to list and retrieve datasets. A DQ2 dataset consists of a unique name (e.g. mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545) that should describe the data to some degree. Much like a directory, this dataset will 'contain' the associated files for that dataset. The files themselves can be stored at different storage elements, but usually a dataset will be complete (all files in one location) in at least one place. Just to complicate matters a little more there are also datasets that contain other datasets. These are indicated by a trailing forward slash '/' at the end of the dataset name. There is currently a move to start using just these containers, but for the moment, we will use the simple datasets.

Finally, there are several systems (backends for Ganga) that are used to submit jobs to the grid. There is the LCG system that is based primarily in Europe, the Panda system that started in North America but is now starting to be implemented in Europe as well and finally the NorduGrid? system which handles the Grid in the Neatherlands.

5.2 Running using the LCG Backend

In order to run a basic Athena Hello World job on the grid, as before all you need to do is specify the LCG backend. However, it is only possible to do this when not requiring input data. The following example shows this in action:

j = Job()
j.application = Athena()
j.application.atlas_release = '14.5.0'
j.application.option_file = '$HOME/scratch0/AtlasOffline-14.5.0/HelloWorldOptions.py'
j.application.max_events = 10
j.backend = LCG()
j.submit()

As with the previous LCG jobs, your job will go to somewhere in the world, run and hopefully complete. After it is finished, there is a slight difference to accessing the output as the stdout and stderr files are gzipped to save transfer. However, you can still use very similar commands:

j.peek("stdout.gz", "emacs")

The last (but quite important!) thing that you need to know when running on the LCG backend is how to specify sites for your job to run. This is done in a different way to the previous example as there is a special list of site+space token names that Atlas uses that are then mapped to CE names. These are then further organised in 'clouds' (e.g. CERN, UK, IT) which you can also submit to. There is unfortunately no easy way at present of linking sites to clouds, but you can get a good idea using the following sites and Ganga commands:

TiersOfAtlas? :

http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py

Ganga Robot:

http://gangarobot.cern.ch/index_200812.html

Ganga:

r = AtlasLCGRequirements()
r.list_clouds()
r.list_sites()

We will go over finding the site your data is at below, but you can specify the site or cloud you want to run on and those you wish to exclude by using the AtlasLCGRequirements? object:

r = AtlasLCGRequirements()
r.sites = ['<sitename>']
r.excluded_sites = ['<sitename>']
r.cloud = '<cloudname>'

As an example, try submitting the test job above to the Italian cloud (IT) and the Oxford site 'UKI-SOUTHGRID-OX-HEP_DATADISK'.

5.3 Finding your Data

To do an analysis, you will need to find your data. There are several ways to do this, but I will cover two of the more common ways. First, there is the Atlas Metadata Interface (AMI). This can be accessed from the following:

https://atlastagcollector.in2p3.fr:8443/AMI/servlet/net.hep.atlas.Database.Bookkeeping.AMI.Servlet.Command?linkId=62

(or follow the links from http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/index.html)

From here, you can retrieve a large amount of information on datasets by entering wildcard searches. For example, to find the single diffraction example dataset we've been using, try the search term:

%sdiff%aod%

Try some other search terms with AMI to see what is available!

A similar thing can be accomplished (though not with the additional information) using the DQ2Clients? tool:

voms-proxy-init --voms atlas
source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh

This will give you access to several dq2 related commands. I'll go over the most useful here, but do have a look at the DQ2Clients? Twiki for more information:

https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Clients

dq2-ls

This is a basic dq2 version of the ls command. It also takes wildcards and so, to do the same as we did with AMI:

dq2-ls '*sdiff*aod*'

Again, have a go with other search terms.

dq2-list-files

This lists the files associated with the dataset:

dq2-list-files <dataset_name>

dq2-list-dataset-replicas

This will list the sites where this dataset is available. All the main production and fdr2 AODs should be available at many sites in most clouds. This information will be needed when specifying the jobs later on.

dq2-list-dataset-replicas <dataset_name>

dq2-get

USE WITH CAUTION!! As mentioned above, though dq2-get is very useful and you will need it, you are not supposed to download many GB of data per day!

dq2-get [-n <number of files>] <dataset_name>

5.4 Using DQ2Datasets? in Ganga

As with the local datasets used previously, there are both input and output datasets that use the DQ2 system.

5.4.1 DQ2Dataset?

For the input dataset, we will use the DQ2Dataset? class. This gives the interface to the DQ2 system in Ganga. The basic usage of the class is fairly simple:

d = DQ2Dataset()
d.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"

The following methods can then be used to find out information about the dataset:

d.list_locations()   # show the locations of the dataset
d.list_contents()   # show the files within the dataset

5.4.2 DQ2OutputDataset?

The output dataset for Grid use is the the DQ2OutputDataset? . This saves your output data as a DQ2 Dataset that you can retrieve afterwards and also provides several ways of controlling this output. Here is an example:

d = DQ2OutputDataset()
d.outputdata=['AnalysisSkeleton.aan.root']    # output files from job
d.datasetname='MarkSlater.BasicAnaTest'  # not necessary - Ganga will create it if you don't supply one
d.location = 'UKI-SOUTHGRID-OX-HEP_DATADISK'    # not necessary - the site name and space token to save your data. Defaults to the nearest SE

5.5 Splitting on the Grid

As with the Local and batch backends above, there is a specific class available to split a job over the number of files. When running on the grid, you should use the DQ2JobSplitter? class to achieve this. The options available for the DQ2JobSplitter? are:

numsubjobs - The number of subjobs per site (NOTE: not the maximum number of subjobs!)
numfiles - The number of files per subjob to process
use_blacklist - Use the automatic blacklist service (probably best to leave this as True!)

5.6 Running a Full Analysis on the Grid

We now have all the elements to run a complete analysis. We will use the UserAnalysis? package from before and the single diffraction MC sample. This will take you through the typical steps from beginning to end so you have the basic idea - obviously, these will depend greatly on the actual analysis you want to do, but the general workflow shouldn't change much!

1) First, we'll set up Athena as before from a clean shell

source cmthome/setup.sh -tag=14.5.0,32,setup
cd $TestArea/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cd ../run

2) Next, we will use Ganga to find out where our data is:

d = DQ2Dataset()
d.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"
d.list_locations()
d.list_contents()

3) From this list we see that the data is present at CERN-PROD_MCDISK which is in the 'CERN' cloud. The dataset also contains 1000 files so we will want to split over ~25 subjobs. This gives us the following job script that will tar up our UserAnalysis? module, run over this input data and then output to a DQ2Dataset? :

j = Job()
j.application = Athena()
j.application.exclude_from_user_area=["*.o","*.root*","*.exe"]
j.application.prepare(athena_compile=False)
j.application.option_file = 'AnalysisSkeleton_topOptions.py'
j.application.max_events=-1
j.inputdata = DQ2Dataset()
j.inputdata.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.backend=LCG()
j.backend.requirements=AtlasLCGRequirements()
j.backend.requirements.cloud = 'CERN'
j.splitter = DQ2JobSplitter()
j.splitter.numsubjobs = 25
j.submit()

4) When the job has completed, exit Ganga and do the following to retrieve you dataset (where the dataset name can be found by doing 'j.outputdata'):

source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
dq2-get <dataset_name>

6 UserAnalysis? on the other Grid Backends

Submission to both the Panda and NorduGrid? systems is very similar to the LCG backend. Here we will try both.

6.1 Submission to the Panda System

To submit to the Panda system is very similar to submitting to the LCG backend. The only difference is that the cloud or site must be specfied in the backend. A brief example is given here, but for more information please see:

* https://twiki.cern.ch/twiki/bin/view/Atlas/GangaPanda

j = Job()
j.application=Athena()
j.application.exclude_from_user_area=["*.o","*.root*","*.exe"]
j.application.prepare()
j.application.option_file=['AnalysisSkeleton_topOptions.py' ]
j.inputdata=DQ2Dataset()
j.inputdata.dataset="mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"
j.outputdata=DQ2OutputDataset()
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs = 25
j.backend=Panda()
j.backend.cloud = 'US'
j.submit() 

With your job submitted, you can keep track of it's status using Ganga, or the Panda Monitor:

* http://pandamon.usatlas.bnl.gov:25880/server/pandamon/query

Note that to use the Panda system, there must be an available queue at the site. The Panda Monitor page above lists all the active queues.

6.2 Submission to the NorduGrid? System

Examples and tutorials for submitting to NorduGrid? can be found at:

* https://twiki.cern.ch/twiki/bin/view/Atlas/GangaNGTutorial430

7 AthenaRootAccess? Using Ganga

We will now show you how to run AthenaRootAccess? jobs using Ganga. This involves running python or ROOT scripts within the Athena environment. Though it is easy to do, there are a few differences between the 3 backends. However, for all three we will be using the same example from the package PhysicsAnalysis/AthenaROOTAccessExamples though different scripts will be needed. They all take advantage of the same feature of Ganga that provides a list ('input.txt') of the input files on the worker node that can be loaded into ROOT or python.

7.1 ARA on the LCG Backend

On the LCG backend, there is a simple switch required in the application plugin. Before we get to the job description though, first copy the ARA script that we will run:

cd $TestArea
cp /afs/cern.ch/user/e/elmsheus/public/gangaARAExample_502.py .

Now we can submit this as a job options file using 'vanilla' Athena as we did for the Hello World exercise with the new switch:

j = Job()
j.application=Athena()
j.application.option_file=['gangaARAExample_502.py' ]
j.application.atlas_release='14.5.0'
j.application.atlas_exetype='PYARA'
j.inputdata=DQ2Dataset()
j.inputdata.dataset="fdr08_run2.0052283.physics_Muon.merge.AOD.o3_f8_m10"
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['histos.root' ]
j.splitter=DQ2JobSplitter()
j.splitter.numsubjobs=6
j.backend=LCG()
j.backend.requirements.cloud='DE'
j.submit() 

The 'atlas_exetype' options determines what will be run. The options are:

ATHENA - Run Athena (default)
PYARA - Run python
ROOT - Run ROOT

7.2 ARA on the Panda Backend

The Panda backend has differing requirements to run an ARA analysis. This time, before running Ganga, you must also setup Athena as you normally would. As an example, follow the instructions below:

cd $HOME 
source cmthome/setup.sh -tag=14.5.0,setup,32 
cd $TestArea
cp /afs/cern.ch/user/d/dvanders/public/gangaPandaARAExample_502.py .

After setting up Athena and copying the script, we can now run Ganga and submit the following job:

j = Job()
j.application=Athena()
j.application.option_file=['gangaPandaARAExample_502.py']
j.application.atlas_release='14.5.0'
j.inputdata=DQ2Dataset()
j.inputdata.dataset="fdr08_run2.0052283.physics_Muon.merge.AOD.o3_f8_m10"
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['out1.root']
j.splitter = DQ2JobSplitter() 
j.splitter.numsubjobs = 6
j.backend=Panda()
j.backend.ara=True   # This is what activates ARA
j.submit()

Note that the Panda backend will remotely invoke python in this ".py" example, but will invoke root if your analysis script ends in ".C"

8 Using TAG Files with Ganga

TAGs are intended to support efficient identification and selection of events of interest for a given analysis. Please get familiar with the TAG analysis as described at the Wiki page for TAG event selection.

8.1 TAG database and TNT

The job examples described in sections 3.4.2/3 are using the ROOT file based TAG analysis. A second way to do a TAG analysis is to use the TAG database. This can be done using the GangaTNT plugin. Examples are given at the following tutorial webpages:
  • TNT (TagNavigatorTool? ) Ganga Plugin: link
  • StreamsTestDistributedAnalysisExamples?

8.2 TAG ROOT files and AANT output

Performing a TAG analysis using ROOT files requires two datasets: the usual AOD dataset and the corresponding TAG dataset like e.g. trig1_misal1_mc12.005672.PythiaWZFusionChL1150lnjj.merge.AOD.v12000604_tid010411 and trig1_misal1_mc12.005672.PythiaWZFusionChL1150lnjj.merge.TAG.v12000604_tid010411. Both names have to be provided as job configuration parameters j.inputdata.dataset and j.inputdata.tagdataset, respectively (see below). If a fast selection based on TAGs with AANT output should be performed, please use the option: j.inputdata.type='TAG'.

Create a new jobOption file AnalysisSkeleton_topOptions_TAG.py based on the previous used AnalysisSkeleton_topOptions.py and add the three additional lines:

# TAG
EventSelector.Query="MissingET>20000&&NLooseElectron>=2"
EventSelector.CollectionType="ExplicitROOT"
Please put this file into $HOME/athena/testarea/12.0.6/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/ :

The following job performs a TAG analysis at FZK with AANT output:

j = Job()
j.name='athena_lcg_dq2_1206_tag1'
j.application=Athena()
j.application.prepare(athena_compile=False)
j.application.option_file='$HOME/athena/testarea/12.0.6/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/AnalysisSkeleton_topOptions_TAG.py'
j.inputdata=DQ2Dataset()
j.inputdata.dataset='trig1_misal1_mc12.005672.PythiaWZFusionChL1150lnjj.merge.AOD.v12000604_tid010411'
j.inputdata.tagdataset='trig1_misal1_mc12.005672.PythiaWZFusionChL1150lnjj.merge.TAG.v12000604_tid010411'
j.inputdata.type='TAG'
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.backend=LCG()
j.backend.CE='ce-fzk.gridka.de:2119/jobmanager-pbspro-atlasS'
j.submit()

help Link to GUI version.?

8.3 TAG ROOT files and AOD output

TAGs can also be used for AOD production. If no corresponding TAG dataset exists, TAG files can also be uploaded with the job inputsandbox as demonstrated below. The archive file containing the TAG files must be named tag.tar.gz.

The jobOptions file to run a preselection based on TAGs with AOD output should look like the following TagBasedEventSelection_topOptions.py. Please put this file into $HOME/athena/testarea/12.0.6/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/ :

####################################################
DetDescrVersion="ATLAS-CSC-01-01-00"
AllAlgs = False
doHist = False
doCBNT = False
doWriteTAG = False
doWriteESD = False
readAOD = True
doAOD = False
doWriteAOD = True
# read the TAG as input to the job
readColl = True

# Number of Events
EvtMax = 100

# define also the selection criteria
CollInputQuery="MissingET>20000 && NLooseMuon>1"

#Create a new RDO, ESD and AOD that contain only selected events
PoolAODOutput = "misalg_csc11.005300.PythiaH130zz4l.recon.AOD.TagSel.v12003104.root"

# main reconstruction job jobOptions
include ("RecExCommon/RecExCommon_topOptions.py")
#######################################################

As TAG input there are two choices: either use the centrally produced TAG dataset corresponding to the AOD dataset with j.inputdata.tagdataset='csc11.datasetname.TAG.xxxx' or provide the previously produced TAG dataset rootfiles in an archive file called tag.tar.gz that is uploaded with the grid job. You can use it directly from /afs/cern.ch/user/e/elmsheus/public/tag.tar.gz as is done in the exercise below. It contains TAG ROOT files for the first 5 files of the dataset misalg_csc11.005300.PythiaH130zz4l.recon.AOD.TagSel.v12003104.root

Use the the following ganga script to submit the job to LCG:

j = Job()
j.application=Athena()
j.application.prepare()
j.application.option_file='$HOME/athena/testarea/12.0.6/PhysicsAnalysis/AnalysisCommon/UserAnalysis/run/TagBasedEventSelection_topOptions.py'
j.inputdata=DQ2Dataset()
j.inputdata.dataset="misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104"
j.inputdata.names=['misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104_tid004174._00001.pool.root.7','misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104_tid004174._00002.pool.root.12']
j.inputdata.type='TAG_REC'
j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['misalg_csc11.005300.PythiaH130zz4l.recon.AOD.TagSel.v12003104.root']
j.backend=LCG()
j.inputsandbox=['/afs/cern.ch/user/e/elmsheus/public/tag.tar.gz']
j.backend.CE='ce-fzk.gridka.de:2119/jobmanager-pbspro-atlasS'
#j.backend.CE='ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas'
j.submit()

The job produces a AOD outputfile misalg_csc11.005300.PythiaH130zz4l.recon.AOD.TagSel.v12003104.root based on the selection CollInputQuery.

  • Optional: You can produce TAG root files as they are used in the previous example with the following example jobOptions file MakeTag_jobOptions.py. To ease the demonstration example TAG files have already produced and archived in /afs/cern.ch/user/e/elmsheus/public/tag.tar.gz
    ####################################################################
    AllAlgs = False
    EvtMax=-1
    readAOD = True
    doWriteTAG=True
    doWriteESD=False
    doAOD = False
    doWriteAOD=False
    doCBNT=False
    doHist=False
    
    # AOD input files
    PoolAODInput = [ "/pathto/misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104_tid004174._00007.pool.root.13"]
    
    # TAG output file name and location
    CollOutput = "misalg_csc11.005300.PythiaH130zz4l.recon.TAG.v12003104_tid004174._00007.pool"
    
    # Switch  OFF all detector descriptor - not needed in AOD -> TAG
    include ("RecExCommon/RecExCommon_flags.py")
    DetFlags.ID_setOff()
    DetFlags.Calo_setOff()
    DetFlags.Muon_setOff()
    
    # Reconstruction Top Options
    include ("RecExCommon/RecExCommon_topOptions.py")
    ####################################################################
    

9 Using production job transformations with the AthenaMC module

9.1 Introduction

MC production is run using a dedicated set of generic scripts called transformations. These transformations use a preformatted subset of input variables and files (event weight, jobOption fragments) which depend mostly on the chosen processing step and sometimes on the Monte Carlo generator used.
The transformations and input files needed by the production system are archived and released for every ATLAS release version.
These packages are named AtlasProduction for ATLAS release 12 and above.
More information can be found at this wiki page.
The existing AtlasProduction archives are located at this site.

The MC generation is split into three separate steps:

  • evgen: 4-Vector generation with Pythia
  • simul+digit: Geant4 simulation and Digitization, RDO output
  • recon: Reconstruction, AOD, ESD, NTUP output

In real life, each step needs to be started after all jobs of the previous step have finished successfully, as the output of these jobs will be used as input of the next step.
In this tutorial though, input data for simul+digit and recon has already being prepared, so you do not need to wait.

Transformations are also used for MC generation with special conditions (pileup, full chain in one go, full chain with only a subset of detectors, and so on).
To take into account these special needs, in ganga 5.0.0, we have introduced a new "production mode", called template. This mode is a "catch all" mode, trying to cover all transformations that are not covered by the standard chain.

9.2 Evgen Transformation

9.2.1 Job preparation

This exercise aims to generate one evgen file with ATLAS release 13.0.30 and the single particle event generator, reusing a jobOption file used by prodsys during the CSC production. We are going to generate 30 events with a single electron with a transverse energy of 40 GeV? .
Start ganga anywhere.
From the ganga prompt:
j=Job()
j.application=AthenaMC()
j.application.evgen_job_option='CSC.007004.singlepart_e_Et40.py'
j.application.production_name='tutorial'   
j.application.process_name='single_e_Et40'
j.application.run_number='000001'
j.application.firstevent=1
j.application.partition_number=1 # new to 5.1.3
j.application.number_events_job=30
j.application.atlas_release='14.5.0.1'
j.application.mode='evgen'
j.application.se_name='UKI-SCOTGRID-GLASGOW_USERDISK'
j.backend=LCG()
j.outputdata=AthenaMCOutputDatasets()
j.submit()
Notes:
  • The random number used as a seed for event generation is set to 1 by default and incremented by one unit for each subsequent subjob of the task (if you decide to generate more than one file). The default value for the job or the first subjob can be changed with:

j.application.random_seed='1102362401'

  • You might want to use these requirements to help your job going through faster:
    j.backend.requirements.other= ['other.GlueCEStateWaitingJobs==0', 'other.GlueCEStateStatus=="Production"']
    

  • atlas_release is now extended to 4 digits in order to let your jobs use pre-installed production cache. In Ganga 4, only the "job option part" of the production cache was shipped with the job.
    Using 4-digits in atlas_release will force your jobs to sites where the complete production cache (job options AND patched libraries) is deployed, granting access to the full patch.
    With 4 digits, you do not need to specify a production archive in j.application.transform_archive .
    The former production mode (3 digits and archive name in j.application.transform_archive) is still supported. See GangaTutorial44? about this.

  • j.application.se_name is used to force the output data to be written in a given site's storage.
    You can find valid site names in this file.
    Just use site names finishing by "USERDISK" as ganga does not provide any support for writing output to LOCALGROUPDISK space tokens at the moment. It is only needed at the evgen stage as there is no input data to tell ganga where to send the job.
    Jobs with input data will be automatically sent to sites in the region (cloud) where the input data is stored.

9.2.2 Exercise variations

  • GANGA takes care of output file and directory naming. By default, output data files are names as followed:
    {body}._{partition_number}.pool.root.{date}.{jobid}
    

where {body} is fully tunable and is by default built as {production_name}.{run_number}.{mode}.{process_name}.{version}.
with all subcomponents defined above as j.application members.
The suffix part ({partition_number}, {date} and {jobid}) is entirely controlled by GANGA's job splitter and cannot be altered by hand. {partition_number} is the output data file number and {jobid} is the ganga job id (to distinguish between resubmissions of the same job) . {date} has been added to protect against repository migration which would reset {jobid} and produce non-unique file names. It is a 6 digit date: DDMMYY.

In this particular case, the default name for the output of the evgen job described above would be:

tutorial.000001.evgen.single_e_Et40._00001.pool.root.DDMMYY.X 

where X is the jobid provided by ganga.

The entire {body} can be changed by giving an arbitrary value to j.outputdata.outrootfile:

j.outputdata.outrootfile='myfile'

will allow you to rename the output evgen file as: myfile._00001.pool.root.X

Similarly, default naming convention for the associated output directory is:

/user/{username}/ganga/datafiles/{body}

and matching DQ2 dataset is:

users.{username}.ganga.datafiles.{body}

where {username} is usually {firstname}{lastname}, taken from your grid proxy (and not your login name).

  • To change the output directory:
    j.outputdata.outdirectory='/my/directory/tree/'
    
    to obtain this output directory: /my/directory/tree/{outrootfile} For datasets not registered in DQ2, this directory is used internally as "dataset name"

  • To alter the name of the output dataset registered in DQ2:
    j.outputdata.output_dataset='mydataset'
    
    will result in this DQ2 dataset: users.{username}.mydataset N.B: DQ2 dataset naming conventions expect private users datasets to start by users.{username}.

  • To change the output partition number (if you want for instance extend an existing dataset):
    j.outputdata.output_firstfile=X
       
    will set the output partion number to _0000X. This offset applies to all subjobs if you are generating several output files. In this case, the first output partion number will be _0000X, the second one _0000(X+1) and so on...

  • You can now include directly your own modified files (job options, event weights for event generators, or even modified transform) in the input sandbox instead of including them in a production archive. Do not forget to declare them as transformation parameters using j.application.extraArgs: - Example 1: adding an event weight data file:
    j.inputsandbox=['mcatnlo31.005200.ttbar._00227.tar.gz']
    j.application.extraArgs='inputGeneratorFile="mcatnlo31.005200.ttbar._00227.tar.gz"'
    
    - Example 2: using your locally modified job option:
    j.application.evgen_job_option='CSC.007004.modified.py'
    j.inputsandbox=['CSC.007004.modified.py']
    
    - Example 3: using your locally modified transformation:
    j.application.transform_script='my_csc_evgen_trf.py'
    j.inputsandbox=['my_csc_evgen_trf.py']
    j.application.extraArgs='...' # insert here all extra arguments needed by your modified transformation
    
    If the extra arguments include an input or output file name, it is better to use the new "template" mode (see section 9.5).

9.2.3 Job status, output and debug.

  • You can follow the job's status evolution with j.status or by typing: jobs under the GANGA prompt.
  • All output is registered by default in DQ2. To access your output once the job is completed, you have to terminate your ganga session, then set up dq2 like this:
    source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
    voms-proxy-init -voms atlas                                                                               
    
    Then check your output with dq2-ls and collect it with dq2-get:
    > dq2-ls *fredericbrochu*evgen*v13003004
    users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.evgen.EVNT.v13003004
    users.fredericbrochu.ganga.logfiles.tutorial.000001.single_e_Et40.evgen.LOG.v13003004
    
    
    (replace fredericbrochu by your own name).

  • If any file listed above is missing, then there might have been something wrong in the job processing. You can have a look at the job output (once the job is completed) with j.peek().
    In [5]:j.peek()
    total 378K
    -rw-r--r--    1 fbrochu  zp            138 Jan 16 18:03 output_guids
    -rw-r--r--    1 fbrochu  zp            570 Jan 16 18:03 output_data
    -rw-r--r--    1 fbrochu  zp              4 Jan 16 18:03 output_location
    -rw-r--r--    1 fbrochu  zp            399 Jan 16 18:08 _output_sandbox.tgz
    -rw-r--r--    1 fbrochu  zp            304 Jan 16 18:08 __jobscript__.log
    -rw-r--r--    1 fbrochu  zp            984 Jan 16 18:08 stderr
    -rw-r--r--    1 fbrochu  zp           371K Jan 16 18:08 stdout
    In [6]:j.peek("stdout")
    
    The last command opens an editor to inspect the file stdout.

9.2.4 Going further: job templates.

If one wants to generate several physics samples whose parameters are almost similar, one does not need to retype the whole set of parameters mentionned in section 9.2.1. One can either clone a job (using j.copy()) and edit the parameters which need to be changed. Or you can save a job as template, and reuse the template in the same way as the copy. For instance, if we want to have a single electron sample with Et = 60 GeV? to complement the initial sample ( with Et = 40 GeV? ):

evgentp=JobTemplate(j,name='Evgen-13.0.30')
print templates
j2=Job(templates[1]) # or j2=Job(evgentp)
j2.application.evgen_job_option='DC3.007005.singlepart_e_Et60.py'
j2.application.process_name='single_e_Et60'
j2.application.run_number='000002'
j2.submit()
One can replace the three first lines with a simple j2=j.copy(), as long as j exists. The main difference with j.copy() is that the template is a different object. It does not take any space on the disk like a submitted job does (input and output sandbox files, archives, and so on).
If j is deleted by accident, you can't do j.copy() anymore. But at least the template will remain if j is deleted.

9.3 Simul+Digit Transformation

9.3.1 Job preparation

The simul+digit step will run several LCG jobs in parallel since the simulation step takes much longer. The setup uses the one input file from before (matching done via filenames) and send out 3 jobs with 1 event each.

j=Job()
j.application=AthenaMC()
j.application.production_name='tutorial'   
j.application.process_name='single_e_Et40'
j.application.run_number='000001'
j.application.number_events_job='1'
j.application.atlas_release='13.0.30.4'
j.application.version='v13003004'
j.application.mode='simul'
j.application.geometryTag = 'ATLAS-CSC-02-00-00'
# needed in release 13 and beyond.
j.application.extraIncArgs="digiSeedOffset1=1 digiSeedOffset2=1" 
j.backend=LCG()
j.backend.middleware='EDG'
j.inputdata=AthenaMCInputDatasets()
j.inputdata.DQ2dataset='users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.evgen.EVNT.v13003004'
j.inputdata.datasetType='DQ2'
j.outputdata=AthenaMCOutputDatasets()
j.splitter=AthenaMCSplitterJob()
j.splitter.nsubjobs_inputfile = 3
j.splitter.numsubjobs = 3
j.submit()

9.3.2 Exercise general note and job splitting

  • The input dataset used in this exercise is a predefined one. To use by default the dataset generated in the previous exercise, just set:
    j.inputdata=AthenaMCInputDatasets()
    
    and do not set j.inputdata.DQ2dataset nor j.inputdata.datasetType. * If you target a specific file or subset of files in a given DQ2 dataset (for instance the files trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402_tid004577._00002.pool.root.10 and trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402_tid004577._00003.pool.root.1 ) you can use either:
    j.inputdata.DQ2dataset='trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402'
    j.inputdata.datasetType='DQ2'
    j.inputdata.inputfiles=['trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402_tid004577._00002.pool.root.10',
    'trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402_tid004577._00003.pool.root.1']
    
    or if you are positive that these files are unique (no other numbered version like: trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402_tid004577._00002.pool.root.7), you can use lazy partition number search:
    j.inputdata.DQ2dataset='trig1_misal1_csc11.005200.T1_McAtNlo_Jimmy.recon.AOD.v12000402'
    j.inputdata.datasetType='DQ2'
    j.inputdata.inputpartitions='2-3'
    
  • If your dataset is not registered in DQ2 but available in LCG, you should use the following combination:
    j.inputdata.LFCpath='/grid/atlas/users/{username}/{pathtofile}'
    j.inputdata.datasetType='private'
    
    followed by the optionnal j.inputdata.inputfiles or j.inputdata.inputpartitions as mentionned above.
    LFCpath should include the name of the LFC server if different from CERN. Example:
    My input file is registered on the LFC lfc0448.gridpp.rl.ac.uk under the lfn: /grid/atlas/users/griduser/data/myfile.pool.root
    To use it:
    j.inputdata.LFCpath='lfc:lfc0448.gridpp.rl.ac.uk:/grid/atlas/users/griduser/data'
    and then feel free to set j.inputdata.inputfiles=['myfile.pool.root'] especially if there are several file registered under this logical directory.
    

  • Job splitting is performed by the module AthenaMCSplitterJob(). It is quite simple at the moment and allows only these two operations:
  • splitting 1 to N (one input file to feed N subjobs, 1 output file of each expected type per subjob)
  • splitting N to M (with M >= N) The parameters to control the behaviour of the job splitter are j.application.number_inputfiles and j.splitter.numsubjobs

  • Starting with Ganga 4.4, we enforce a "job-to-data" policy by default. It basically means that the whole processing chain takes place at the site where the evgen file, the first output file, has been registered.To override this, use both j.application.se_name and j.backend.CE

9.3.3 Job Output.

Similar to the previous evgen job, every simulation+digitization job should have an output rdo file registered in DQ2:
> dq2-ls *fredericbrochu*simul*v13003004
users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.simul.HITS.v13003004
users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.simul.RDO.v13003004
users.fredericbrochu.ganga.logfiles.tutorial.000001.single_e_Et40.simul.LOG.v13003004

....
Otherwise, the debugging is similar to 9.2.3, with a small twist as we are dealing with subjobs. Let's assume that subjob # 1 did not return a RDO file. To investigate the stdout file of this subjob:
sj=j.subjobs[1]
sj.peek("stdout")

9.4 Recon Transformation

This part is left as a free exercise, but is useful to learn how to perform a N to N splitting. Each recon subjob takes exactly one input rdo file and spits out one file of each of the following type: AOD,ESD (CBNT files are no longer produced in python transformations).

j=Job()
j.application=AthenaMC()
j.application.production_name='tutorial'   
j.application.process_name='single_e_Et40'
j.application.run_number='000001'
j.application.number_events_job='1'
j.application.atlas_release='13.0.30.4'
j.application.version='v13003004'
j.application.geometryTag = 'ATLAS-CSC-02-00-00'
j.application.mode='recon'
j.backend=LCG()
j.backend.middleware='EDG'
j.inputdata=AthenaMCInputDatasets()
j.inputdata.DQ2dataset = 'users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.simul.RDO.v13003004'
j.inputdata.datasetType='DQ2'
j.outputdata=AthenaMCOutputDatasets()
j.splitter=AthenaMCSplitterJob()
j.splitter.numsubjobs = 3
j.submit()

  • Merging jobs (jobs processing more than one input file) are supported. Please use:
    j.inputdata.n_infiles_job=X
    
    to set the number of input files per jobs to X. Default value is 1. Please check the contents of the input dataset and set accordingly j.splitter.numsubjobs

  • Reconstruction, 12.0.4 and beyond (not in release 13?): no AODs are created because of Unknown type errors while creating the ESDs This is a transformation feature: the job is killed if random errors are detected.
    To prevent this, open the archive AtlasProduction _12_0_4_1_noarch.tar.gz, then edit the file 12.0.4.1/InstallArea/share/atlas_error_ignore.db and activate the "emergency hack" by uncommenting the line (remove the 4 sharp signs):
    ####ALL,\S+,  ERROR .*
    
  • AOD and ESD files appear under their own dataset, so you can select which type to download:
    >  dq2-ls *fredericbrochu*recon*v13003004
    users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.recon.AOD.v13003004
    users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.recon.ESD.v13003004
    users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.recon.NTUP.v13003004
    users.fredericbrochu.ganga.logfiles.tutorial.000001.single_e_Et40.recon.LOG.v13003004
    

9.5 New "Template" Transformation

This new "catch all" mode has been written to allow the use of most transformations that do not fit in the standard chain. To do so, follow this recipe:
  • get the full list of arguments to fill by running the requested transform with the help flag, -h. E.g, for the pileup transform: csc_digi_trf.py -h
  • select j.application.mode='template'
  • fill up j.application.atlas_release and j.application.number_events_job as usual.
  • If you do not fill j.application.production_name, j.application.process_name and j.application.run_number, then you MUST set j.application.outputdata.logfile.
  • set j.application.transform_script to the name of the transform you want to use. If this transformation is only available locally, then add it to j.inputsandbox as well.
  • The parameters of the transformation must be declared exclusively in j.application.extraArgs and j.application.extraIncArgs (the later for arguments which must be changed between subjobs, like the random seed).
  • Input file and output files must be declared first as a j.inputdata/ j.outputdata member, then as a transformation parameter in j.application.extraArgs as follows.

- Example 1: case of input data file

j.inputdata=AthenaMCInputDatasets()
j.inputdata.DQ2dataset = 'users.fredericbrochu.ganga.datafiles.tutorial.000001.single_e_Et40.simul.RDO.v12000401'
j.inputdata.inputfiles = [ 'input_file1' ,'input_file2' ]
j.application.extraArgs='myInputFile=$inputfile'
$inputfile is a dedicated key word to use input files from the DQ2 dataset declared in j.inputdata. Job splitting elements will help ganga decide which subjob(s) will process input_file1 and which one(s) will process input_file2.
For pileup, you need up to two extra datasets: one for the minimum bias events and one for the cavern noise. These two datasets are covered by the new members of AthenaMCInputDatasets? (), minbias and cavern. You will also have to set up how many input files will be used per job or subjob, with the related members n_minbias_files_job and n_cavern_files_job.
Here is an example:
j.inputdata.minbias = 'misal1_csc11.005001.pythia_minbias.simul.HITS.v12003104'
j.inputdata.n_minbias_files_job = 4 
j.inputdata.cavern = 'misal1_csc11.007903.cavernbg_sf05.simul.HITS.v12003105'
j.inputdata.n_cavern_files_job = 4
The matching dedicated words for j.application.extraArgs are $cavern and $minbias.
These datasets can be used in other transform as extra input datasets.

Please note some important differences with the primary input dataset:

  • The secondary datasets MUST be DQ2 datasets
  • You cannot pick up a subset of files. Instead, the whole dataset list is used, then shuffled at random for each subjob and the first N files are picked up for a given subjob (where N is the related N_{dataset}_files_job set value).
  • You can always select a definite subset of files by replacing $cavern/$minbias by a comma-separated list of files, but the same list will be used by every subjob.

- Example 2: case of output files

j.outputdata=AthenaMCOutputDatasets()
j.outputdata.outrootfile='myoutroot_dataset'
j.outputdata.outaodfile='myaod_dataset'
j.application.extraArgs='OutputHitsFile=$outrootfile OutputAODFile=$outaodfile'
The following members of AthenaMCOutputDatasets? () can be used as a key word for extraArgs, leaving up to 6 possible output file types: outhistfile, outesdfile, outaodfile, outntuplefile, outrdofile, outrootfile.
Ganga will generate automatically the output file names, based on the dataset names provided in j.outputdata and the job splitting information.

  • If you have a transformation argument which sets the first event number, set it to $first in j.application.extraArgs. Same for skipEvents: set it to $skip in j.application.extraArgs. $number_events_job must be used to declare the number of events to be processed in a single job.

A complete example is given below:

j=Job()
j.application=AthenaMC()
j.application.mode='template'
j.application.atlas_release = '13.0.40.3'
j.application.transform_script = 'csc_evgen_trf.py'
j.application.number_events_job = 30
j.application.extraArgs='runNumber=000001 firstEvent=$first maxEvents=$number_events_job jobConfig=CSC.007004.singlepart_e_Et40.py outputEvgenFile=$outrootfile'
j.application.extraIncArgs='randomSeed=1'
j.outputdata=AthenaMCOutputDatasets()
j.outputdata.outrootfile='test.000001.singlepart_e_Et40.HITS'
j.outputdata.logfile='test.000001.singlepart_e_Et40.log'
j.application.se_name='FZKDISK'
j.backend=LCG()
j.backend.middleware='EDG'
j.submit()

9.5.1 Advanced template example: Running AtlfastII?

The following job is an example to run AtlfastII? starting from an SUSY GMSB6 evgen file in release 14. It uses the template mode of AthenaMC to run the csc_simul_reco_trf.py transform with all the AtlfastII? specific job options. This transform runs simulation, digitisation and reconstruction in one go.

j=Job()
#
j.name='atlfastII_GMSB6_14022201'
#
j.application=AthenaMC()
j.application.production_name='test_atlfastII_mc08'
j.application.process_name='GMSB6_jimmy_susy'
j.application.run_number='005415'
j.application.number_events_job='20'
j.application.atlas_release='14.2.22.1'
j.application.version='v14022201'
j.application.mode='template'
j.application.transform_script='csc_simul_reco_trf.py'
j.application.extraArgs = 'inputEvgenFile=$inputfile outputAODFile=$outaodfile maxEvents=$number_events_job skipEvents=$skip geometryVersion="ATLAS-GEO-02-01-00" physicsList="QGSP_BERT" simuJobConfig="jobConfig.FastIDKiller.py" recoJobConfig="FastSimulationJobTransforms/FastCaloSimAddCellsRecConfig.py" triggerConfig="NONE" outputHitsFile=$outrootfile outputRDOFile=$outrdofile outputESDFile=$outesdfile'
j.application.extraIncArgs='randomSeed=0 digiSeedOffset1=0 digiSeedOffset2=0'
j.application.se_name='DESY-HH_USERDISK'
#
j.inputdata=AthenaMCInputDatasets()
j.inputdata.DQ2dataset='mc08.105415.GMSB6_jimmy_susy.evgen.EVNT.e352'
#j.inputdata.inputfiles = ['mc13.005415.GMSB6_jimmy_susy.evgen.EVNT.v13004004._00001.pool.root.200408.17']
j.inputdata.datasetType='DQ2'
j.inputdata.inputpartitions='1'
#
j.outputdata=AthenaMCOutputDatasets()
j.outputdata.outrootfile = '%s.%s.%s.simul.HITS.%s' % (j.application.production_name, j.application.run_number, j.application.process_name, j.application.version)
j.outputdata.outrdofile  = '%s.%s.%s.simul.RDO.%s'  % (j.application.production_name, j.application.run_number, j.application.process_name, j.application.version)
j.outputdata.outesdfile  = '%s.%s.%s.recon.ESD.%s'  % (j.application.production_name, j.application.run_number, j.application.process_name, j.application.version)
j.outputdata.outaodfile  = '%s.%s.%s.recon.AOD.%s'  % (j.application.production_name, j.application.run_number, j.application.process_name, j.application.version)
#
j.backend=LCG()
j.backend.requirements=AtlasLCGRequirements()
j.backend.requirements.sites= ['DESY-HH_USERDISK','DESY-ZN_USERDISK']
#
j.splitter=AthenaMCSplitterJob()
j.splitter.nsubjobs_inputfile = 1
j.splitter.numsubjobs = 1

Here are some remarks:

  • for a description of AtlfastII? (general and job options), see the AtlfastII? TWiki
  • all possible output (Hits, RDO, ESD, AOD) is stored in DQ2 datasets. If you are not interested in all the formats, remove the corresponding lines from the AthenaMCOutputDatasets object and the references in the extraArgs argument of the AthenaMC object.
  • the job will run either at DESY-HH or DESY-ZN and all output will go to the ATLASUSERDISK at DESY-HH
  • for test purpose, only 20 events are processed in one subjob. Read the instructions about AthenaMCSplitterJob for nontrivial arguments

9.6 Important notes and useful, generic tricks to run a small scale production on the grid.

  • Simul+Digitisation, release 13 and beyond:
    The transformation takes two new mandatory input parameters, digiSeedOffset1 and digiSeedOffset2.
    To run the "simul" step with release 13, please use the extraArgs/ extraIncArgs members of the application:
          j.application.extraArgs='digiSeedOffset1=0 digiSeedOffset2=0'
          
    to have the same values applied to all subjobs of your job. To make these values change between subjobs:
          j.application.extraIncArgs='digiSeedOffset1={i} digiSeedOffset2={j}'
          
    where {i} and {j} are the values to apply to the first subjob. These values will be incremented by 1 for each subsequent subjob: i+1, j+1 for subjob2, i+2, j+2 for subjob3 and so on...
  • One can also use these extra requirements to ensure that your job will got to a place where it will not be killed by timeout or excessive memory consumption:
           j.backend.requirements.cputime={cputime} 
           #({cputime} in minutes, generally 600 would be a good value)
           j.backend.requirements.memory= {ram} 
           # ({ram} in MB: 512 for EVGEN, 800 for SIMUL and 1300 for RECON)
          
  • The use of small pilot jobs is good practice, especially before launching a mass submission of 20+ jobs. Define a pilot job in the same way you define a normal job, changing:
         j.application.production_name='pilot' 
         
    and reducing the number of subjobs (j.splitter.numsubjobs) to a maximum of 3-5 Set:
         j.backend.requirements.other= ['other.GlueCEStateWaitingJobs==0', 'other.GlueCEStateStatus=="Production"'] 
        
    to select an empty site and submit.
Topic revision: r8 - 02 Dec 2015 - 08:50:49 - MarkSlater
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback