Athena and Ganga

We will now use Ganga to create some Athena jobs on the local batch system.

Setting up Ganga for Athena

In order to run Athena jobs within Ganga, you must run a different script that sets up the GangaAtlas? environment. It is also worth regenerating your .gangarc file to add the Atlas related configuration options to it:

/home/slater/runGanga_Atlas -g

Your previous .gangarc will be stored so you can copy any settings from this to the new one. Now, starting Ganga using the runGanga script will give you access to the Atlas specific additions. To check all is well, do the following:

/home/slater/runGanga_Atlas

help(Athena)
help(ATLASLocalDataset)
help(ATLASOutputDataset)

Hello World Revisited

We will start by running the basic Athena Hello World example from the previous session only this time, using Ganga. The job is set up in a similar way as before, but the application should be changed:

j = Job()
j.application = Athena()
j.application.atlas_release = '14.2.10'
j.application.option_file = '$HOME/Athena/AtlasOffline-14.2.10/run/HelloWorldOptions.py'
j.application.max_events = 10
j.submit()

If you now look at the stdout using the 'peek' method as before, you will see the Athena output that you had previously.

Input and Output datasets

To bring the example more closer to reality, we will now indroduce local input and output data to the job. The datasets we will use are the ATLASLocalDataset? and the ATLASOutputDataset? . As before, have a quick look at the help to get an idea of these objects.

Input Dataset

For the dataset input, we will use 'ATLASLocalDataset'. This should only be used for local jobs (on the batch system or LSF) and not for grid jobs. It gives several ways of specifying the dataset files to run over and be aware that they're cumulative:

* Directory + wildcard: Supply the directory containing the files and a regular expression for the files you want to include

d.get_dataset(<dir>, <reg_exp>)

* File list: Give a text file that simply lists the filenames to run over. This can also handle wildcards

d.get_dataset_from_list(<dir>, <reg_exp>)

* Using the 'names' list directly: There is an array within the object that can be added to directly

d.names.append('<filename>')

To show these in action, try the following:

d = ATLASLocalDataset()

# should give 3 files in the list
d.get_dataset('/home/slater/mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545', 'AOD.026664._004*')
len(d.names)
full_print(d.names)

# a previously created filelist with 4 files from the ddiff sample - now 7 files in the dataset
d.get_dataset_from_list('/home/slater/ddiff_filelist.txt')
len(d.names)
full_print(d.names)

# now we'll just add 2 more from the minbias sample
d.names.append('/home/slater/mc08.105001.pythia_minbias.recon.AOD.e349_s462_r541/AOD.025874._00185.pool.root')
d.names.append('/home/slater/mc08.105001.pythia_minbias.recon.AOD.e349_s462_r541/AOD.026320._01874.pool.root')
len(d.names)
full_print(d.names)

Output Dataset

We will be using the ATLASOutputDataset? to store the output. The output dataset is a bit easier to deal as all we will need to do is specify what files are going to be added to the dataset (similar to the output sandbox) and what where to place them. As an example:

j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.outputdata.location = '$HOME/athena/output'

Athena jobs involving user packages

Now that the you've seen the basic running of Athena through Ganga and how to deal with local datasets, we will now move on to running the modified UserAnalysis? package from the previous session on the local PBS system. This is similar to running Athena above but to ensure that our modified UserAnalysis? module is used, we must 'prepare' the package. This will tell Ganga to tar up the user area and send it with the job. The first steps for doing this is to set up Athena as if we were going to run the package as before. From a clean shell:

source cmthome/setup.sh -tag=14.2.10,32,setup
cd $TestArea/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cd ../run

We can now run Ganga as before and then run the following submission script:

j = Job()
j.application = Athena()
j.application.prepare(athena_compile=False)
j.application.option_file='AnalysisSkeleton_topOptions.py'
j.inputdata=ATLASLocalDataset()
j.inputdata.get_dataset('/home/slater/mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545','AOD*.root.*')
j.outputdata=ATLASOutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.outputdata.location = '$HOME/athena/output'
j.backend = PBS()
j.submit()

Splitting and Merging Athena Jobs

The last thing to look at before we move onto running Athena on the Grid is combining what we have just done with splitting and merging. This will speed up the run time as each job will only have a few files to go over and they should all run in parallel. The splitter and merger we will use for this local job are AthenaSplitterJob? and AthenaOutputMerger? . As before, check the help so you get an idea of what we'll be doing.

The splitting of the job is fairly simple. The only thing you will need to specify is the number of subjobs you want per subjob (the other options don't work at the moment!). The merger is even easier as the only thing you would need to specify here is the output directory to store the combined files. As an example, create a submission script that loads in the selection of data we did above (containing sdiff, ddiff and minbias contributions) and then add the following the above submission script (remembering to change the inputdata to the dataset you've just created!):

j.splitter = AthenaSplitterJob()
j.splitter.numsubjobs = 5
j.merger = AthenaOutputMerger()

Once complete, you should have the individual files for each job. Unfortunately, due to a conflict with the Athena environment you will probably get an error for the merger. However, you can still do the merging by running Ganga from a clean shell and doing:

jobs(<id>).merger.merge()

-- MarkSlater - 10 Dec 2008

Topic revision: r4 - 14 Dec 2008 - _47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61mark_32slater?

Computing

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback