Athena on the Grid

The Atlas Computing Model

The Atlas Computing Model has been introduced in order to ensure the most efficient use of the resources on the Grid. The most important concept for this is that of jobs going to data and NOT data going to jobs. This means that, though it may seem the simplest thing to do, you should not just download datasets to local storage and run on them there. When scaled up with several thousand users and many different datasets, everything would grind to a halt, nevermind whether you would have enough storage space! The way to analyse data is to set up your job description (as we did in the previous section), send this job to the sites that have your data and then only transfer back the (much smaller) analysis data.

Having established this method of working, we will now look at how this has been implemented. The data in Atlas comes in various flavours going from raw data through ESD (event summary data) to AODs (Analysis Object Data). There is also the DPD level below this which is even more stripped down, but at present, most analysis is being done on AODs. This data is replicated at many sites across the world allowing many users to run jobs at many different sites. The system that organises all this is DQ2 for which there are useful commands that can be used to list and retrieve datasets. A DQ2 dataset consists of a unique name (e.g. mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545) that should describe the data to some degree. Much like a directory, this dataset will 'contain' the associated files for that dataset. The files themselves can be stored at different storage elements, but usually a dataset will be complete (all files in one location) in at least one place. Just to complicate matters a little more there are also datasets that contain other datasets. These are indicated by a trailing forward slash '/' at the end of the dataset name. There is currently a move to start using just these containers, but for the moment, we will use the simple datasets.

Finally, there are several systems (backends for Ganga) that are used to submit jobs to the grid. There is the LCG system that is based primarily in Europe, the Panda system that started in North America but is now starting to be implemented in Europe as well and finally the NorduGrid? system which handles the Grid in the Neatherlands. Due to lack of time, we will only cover the LCG backend which should allow you to do most analyses. If you want to start using the other backends as well, they work in a very similar way but there are some important differences. Have a look at the help entries for them and the long Ganga tutorial on the web:

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial5

Using the LCG Backend with Athena

In order to run a basic Athena Hello World job on the grid, as before all you need to do is specify the LCG backend. However, it is only possible to do this when not requiring input data. The following example shows this in action:

j = Job()
j.application = Athena()
j.application.atlas_release = '14.2.10'
j.application.option_file = '$HOME/Athena/AtlasOffline-14.2.10/run/HelloWorldOptions.py'
j.application.max_events = 10
j.backend = LCG()
j.submit()

As with the previous LCG jobs, your job will go to somewhere in the world, run and hopefully complete. After it is finished, there is a slight difference to accessing the output as the stdout and stderr files are gzipped to save transfer. However, you can still use very similar commands:

j.peek("stdout.gz", "emacs")

The last (but quite important!) thing that you need to know when running on the LCG backend is how to specify sites for your job to run. This is done in a different way to the previous example as there is a special list of site+space token names that Atlas uses that are then mapped to CE names. These are then further organised in 'clouds' (e.g. CERN, UK, IT) which you can also submit to. There is unfortunately no easy way at present of linking sites to clouds, but you can get a good idea using the following sites and Ganga commands:

TiersOfAtlas? :

http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py

Ganga Robot:

http://gangarobot.cern.ch/index_200812.html

Ganga:

r = AtlasLCGRequirements()
r.list_clouds()
r.list_sites()

We will go over finding the site your data is at below, but you can specify the site or cloud you want to run on and those you wish to exclude by using the AtlasLCGRequirements? object:

r = AtlasLCGRequirements()
r.sites = ['<sitename>']
r.excluded_sites = ['<sitename>']
r.cloud = '<cloudname>'

As an example, try submitting the test job above to the Italian cloud (IT) and the Oxford site 'UKI-SOUTHGRID-OX-HEP_DATADISK'.

Finding your Data

To do an analysis, you will need to find your data. There are several ways to do this, but I will cover two of the more common ways. First, there is the Atlas Metadata Interface (AMI). This can be accessed from the following:

https://atlastagcollector.in2p3.fr:8443/AMI/servlet/net.hep.atlas.Database.Bookkeeping.AMI.Servlet.Command?linkId=62

(or follow the links from http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/index.html)

From here, you can retrieve a large amount of information on datasets by entering wildcard searches. For example, to find the single diffraction example dataset we've been using, try the search term:

%sdiff%aod%

Try some other search terms with AMI to see what is available!

A similar thing can be accomplished (though not with the additional information) using the DQ2Clients? tool. These are accessed by first generating a grid-proxy, setting the local site ID and then sourcing the DQ2Clients? setup script on AFS:

voms-proxy-init --voms atlas
export DQ2_LOCAL_SITE_ID=OXF
source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh

This will give you access to several dq2 related commands. I'll go over the most useful here, but do have a look at the DQ2Clients? Twiki for more information:

https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Clients

dq2-ls

This is a basic dq2 version of the ls command. It also takes wildcards and so, to do the same as we did with AMI:

dq2-ls '*sdiff*aod*'

Again, have a go with other search terms.

dq2-list-files

This lists the files associated with the dataset:

dq2-list-files <dataset_name>

dq2-list-dataset-replicas

This will list the sites where this dataset is available. All the main production and fdr2 AODs should be available at many sites in most clouds. This information will be needed when specifying the jobs later on.

dq2-list-dataset-replicas <dataset_name>

dq2-get

USE WITH CAUTION!! As mentioned above, though dq2-get is very useful and you will need it, you are not supposed to download many GB of data per day!

dq2-get [-n <number of files>] <dataset_name>

Using the DQ2Dataset? in Ganga

As with the local datasets used previously, there are both an input and output dataset:

Input Dataset

For the input dataset, we will use the DQ2Dataset? class. This gives the interface to the DQ2 system in Ganga. The basic usage of the class is fairly simple:

d = DQ2Dataset()
d.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"

The following methods can then be used to find out information about the dataset:

d.list_locations()   # show the locations of the dataset
d.list_contents()   # show the files within the dataset

Output Dataset

The output dataset for Grid use is the the DQ2OutputDataset? . This saves your output data as a DQ2 Dataset that you can retrieve afterwards and also provides several ways of controlling this output. Here is an example:

d = DQ2OutputDataset()
d.outputdata=['AnalysisSkeleton.aan.root']    # output files from job
d.datasetname='MarkSlater.BasicAnaTest'  # not necessary - Ganga will create it if you don't supply one
d.location = 'UKI-SOUTHGRID-OX-HEP_DATADISK'    # not necessary - the site name and space token to save your data. Defaults to the nearest SE

Running a Full Analysis

We now have all the elements to run a complete analysis. We will use the modified UserAnalysis? package from before and the single diffraction MC sample. I will take you through the typical steps from beginning to end so you have the basic idea - obviously, these will depend greatly on the actual analysis you want to do, but the general workflow shouldn't change much!

1) First, we'll set up Athena as before from a clean shell

source cmthome/setup.sh -tag=14.2.10,32,setup
cd $TestArea/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
source setup.sh
cd ../run

2) Next, we will use Ganga to find out where our data is:

d = DQ2Dataset()
d.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"
d.list_locations()
d.list_contents()

3) From this list we see that the data is present at CERN-PROD_MCDISK which is in the 'CERN' cloud. The dataset also contains 1000 files so we will want to split over ~25 subjobs. This gives us the following job script that will tar up our UserAnalysis? module, run over this input data and then output to a DQ2Dataset? :

j = Job()
j.application = Athena()
j.application.prepare(athena_compile=False)
j.application.option_file = '$HOME/Athena/AtlasOffline-14.2.10/run/HelloWorldOptions.py'
j.application.max_events=-1
j.inputdata = DQ2Dataset()
j.inputdata.dataset = "mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545"
j.outputdata=DQ2OutputDataset()
j.outputdata.outputdata=['AnalysisSkeleton.aan.root']
j.backend=LCG()
j.backend.requirements=AtlasLCGRequirements()
j.backend.requirements.cloud = 'CERN'
j.splitter = DQ2JobSplitter()
j.splitter.numsubjobs = 25
j.backend = LCG()
j.submit()

4) When the job has completed, exit Ganga and do the following to retrieve you dataset (where the dataset name can be found by doing 'j.outputdata'):

export DQ2_LOCAL_SITE_ID=OXF
source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
dq2-get <dataset_name>

-- MarkSlater - 11 Dec 2008

Topic revision: r6 - 16 Dec 2008 - 08:13:15 - MarkSlater
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback