(r2) OxfordGangaTutorialAtlasSession3 < Computing

Computing Web>Ganga>OxfordGangaTutorial>OxfordGangaTutorialAtlasSession3 (revision 2) (raw view)~~EditAttach~~

---+ Athena on the Grid

---+++ The Atlas Computing Model

The Atlas Computing Model has been introduced in order to ensure the most efficient use of the resources on the Grid. The most important concept for this is that of jobs going to data and NOT data going to jobs. This means that, though it may seem the simplest thing to do, you should not just download datasets to local storage and run on them there. When scaled up with several thousand users and many different datasets, everything would grind to a halt, nevermind whether you would have enough storage space! The way to analyse data is to set up your job description (as we did in the previous section), send this job to the sites that have your data and then only transfer back the (much smaller) analysis data.

Having established this method of working, we will now look at how this has been implemented. The data in Atlas comes in various flavours going from raw data through ESD (event summary data) to AODs (Analysis Object Data). There is also the DPD level below this which is even more stripped down, but at present, most analysis is being done on AODs. This data is replicated at many sites across the world allowing many users to run jobs at many different sites. The system that organises all this is DQ2 for which there are useful commands that can be used to list and retrieve datasets. A DQ2 dataset consists of a unique name (e.g. mc08.105003.pythia_sdiff.recon.AOD.e344_s456_r545) that should describe the data to some degree. Much like a directory, this dataset will 'contain' the associated files for that dataset. The files themselves can be stored at different storage elements, but usually a dataset will be complete (all files in one location) in at least one place. Just to complicate matters a little more there are also datasets that contain other datasets. These are indicated by a trailing forward slash '/' at the end of the dataset name. There is currently a move to start using just these containers, but for the moment, we will use the simple datasets.

Finally, there are several systems (backends for Ganga) that are used to submit jobs to the grid. There is the LCG system that is based primarily in Europe, the Panda system that started in North America but is now starting to be implemented in Europe as well and finally the NorduGrid system which handles the Grid in the Neatherlands. 
Due to lack of time, we will only cover the LCG backend which should allow you to do most analyses. If you want to start using the other backends as well, they work in a very similar way but there are some important differences. Have a look at the help entries for them and the long Ganga tutorial on the web:

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial5

---+++ DQ2Datasets

---+++ Finding the Correct Cloud/Site

---+++ Running the UserAnalysis Job

-- Main.MarkSlater - 11 Dec 2008

Topic revision: r2 - 14 Dec 2008 - _47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61mark_32slater?

Computing

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback