OxfordGangaTutorialSession1 < Computing

Computing Web>Ganga>OxfordGangaTutorial>OxfordGangaTutorialSession1 (13 Dec 2008, _47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61mark_32slater? )

EditAttach

Session 1 - General Ganga Usage

This session aims to go through the main principles of Ganga and it's general usage outside the particular experimental flavours.

Setting up and Running Ganga

Before running ganga properly, you should set up your own config file. To do this, run Ganga as follows:

/home/slater/runGanga -g --disableLCG

The -g option sets up a file '.gangrc' in your home directory with a generic flavour. The options given in here will override anything else (except the command line). The --disableLCG option (specific to this script) turns off the Grid portion of Ganga. You are now ready to start Ganga properly:

/home/slater/runGanga --disableLCG

If all is well, you should be presented with an IPython prompt (if you've never used IPython before, you will need to press return to get past the welcome message):

Ganga version not specified. Using LATEST version.
Running Ganga LATEST with the following parameters:
-o[LCG]EDG_ENABLE=False -o[LCG]GLITE_ENABLE=False


*** Welcome to Ganga ***
Version: Ganga-5-1-1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.

GangaAtlas                         : INFO     Tasks read from file
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "jobs", completed in 0 seconds
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in "templates", completed in 0 seconds


In [1]:

Just to reiterate, this will setup Ganga in it's 'flavourless' mode - we will come on to using Atlas/LHCb software later!

Getting Help

Ganga is based completely on Python and so the usual Python commands can be entered at the IPython prompt. For the specific Ganga related parts, however, there is an online help system that can be accessed using:

In [1]: help() 
************************************

*** Welcome to Ganga ***
Version: Ganga-5-1-1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.


This is an interactive help based on standard pydoc help.

Type 'index'  to see GPI help index.
Type 'python' to see standard python help screen.
Type 'interactive' to get online interactive help from an expert.
Type 'quit'   to return to Ganga.
************************************

help>

Type 'index' at the prompt to see the Class list available. Then type the name of the particular object you're interested in to see the associated help. You can use 'q' to quit the entry you're currently viewing (though there is currently a bug that displays help on a 'NoneType' object!). You can also do this directly from the IPython prompt using:

In [1]: help(Job)

In preparation for the next exercise, have a look at the help entries for the Job, Executable and Local objects.

Your First Job

We will start with a very basic Hello World job that will run on the machine you are currently logged in on. Create a basic job object with default options and view it:

In [1]: j = Job()
In [2]: j
 Out[6]: Job (
 status = 'new' ,
 name = '' ,
 inputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/input/' ,
 outputdir = '/home/slater/gangadir/workspace/mws/LocalAMGA/0/output/' ,
 outputsandbox = [] ,
 id = 0 ,
 info = JobInfo (
    submit_counter = 0
    ) ,
 inputdata = None ,
 merger = None ,
 inputsandbox = [] ,
 application = Executable (
    exe = 'echo' ,
    env = {} ,
    args = ['Hello World']
    ) ,
 outputdata = None ,
 splitter = None ,
 subjobs = 'Job slice:  jobs(0).subjobs (0 jobs)
' ,
 backend = Local (
    actualCE = '' ,
    workdir = '' ,
    nice = 0 ,
    id = -1 ,
    exitcode = None
    )
 )

Note that by just typing the job variable ('j'), IPython tries to print the information regarding it. For the job object, this is a summary of the object that Ganga uses to manage your job. These include the following parts:

application - The type of application to run
backend - Where to run
inputsandbox/outputsandbox - The files required for input and output that will be sent with the job
inputdata/outputdata - The required dataset files to be accessed by the job
splitter - How to split the job up into several subjobs
merger - How to merge the completed subjobs

For this job, we will be using a basic 'Executable' application ('echo') with the arguments 'Hello World'. There is no input or output data, so these are not set. We'll now submit the job:

In [3]: j.submit()

If all is well, the job will be submitted and you can then check it's progress using the following:

In [4]: jobs

This will show a summary of all the jobs currently running. You're basic Hello World job will go through the following stages: 'submitted', 'running', 'completing' and 'completed'. When your job has reached the completed state, the standard output and error output are transferred to the output directory of the job (as listed in the job object). There are several ways to check this output. First, we will use the 'ls' shell command to see if it's present:

In [5]: !ls $j.outputdir

where we have used the exclamation mark (!) to go into 'shell' mode and the dollar sign ($) to return to 'python' mode. Check the output is valid by using the one of the following:

In [6]: !emacs $j.outputdir/stdout
In [7]: j.peek("stdout", "emacs")

The peek command in the last line can be used to run a command (the second argument) on a file in the output directory (the first argument). These two lines are equivalent. With any luck, you will see Hello World in the stdout. Congratulations, you've run your first job!

More on the Executable Application

The executable is the most basic Ganga application. For the job above, view the info about it using:

In [8]: j.application  
Executable (
    exe = 'echo' ,
    env = {} ,
    args = ['Hello World']
    )

As explained above, you can see the execuable to run ('echo'), any environment variables set ('env' - none in this case) and the arguments to give to the executable ('Hello World'). We'll now make a copy of this job object, change these parameters and resubmit the job:

In [9]: j = jobs(0).copy()
 In [10]: j.application.exe = 'env' 
 In [11]: j.application.env['MYENVVAR'] = 'my environment variable'
 In [12]: j.application.args = []
 In [13]: j.submit()

After completion, check the output and your environment variable should be there, e.g.

In [13]: j.peek("stdout", "grep MYENVVAR")

You can also use the executable application to run your own scripts. First write the following into a file 'myscript.sh' and make it executable. You can do this without exiting Ganga by using the exclamation mark to give shell commands:

 
#!/bin/env bash
echo "A simple script to list the directory contents:"

ls

Now, we'll create a new Executable application:

In [1]: exe_app = Executable()
In [2]: exe_app.exe = File('myscript.sh')

Note that we needed to specify the script with the 'File' option to make sure it's copied and run properly. Now we can create a job object, assign the executable application we've just created and submit the job:

In [3]: j = Job()
In [4]: j.application = exe_app
In [5]: j.submit()

If all is well, after the job has completed, the stdout file will contain the directory list of the temporary area where the job was run.

At this point, just to show the persistency of the job objects, exit Ganga (ctrl-D), restart and view your jobs with the 'jobs' command as before. All your jobs are stored and can be accessed using commands such as:

jobs(<id>)

Submitting to Different Backends

After looking at what can be done with the Executable application, we'll now look at the different backends that can be used. A 'backend' for Ganga describes where you want your job to actually run. Up until this point, we have been using the 'Local' backend which refers to the current computer you're using. Consequently, your computer name can be seen in the 'ActualCE' field when running the 'jobs' command. We'll now try using the local batch system to send jobs:

In [1]: j = Job()
In [2]: j.backend = PBS()
In [3]: j.submit()

Again, your job starts in the submitted state and then will go through running and completed states. That is all there is to running on the local batch system. Similarly, if you log in to lxplus and do the following, you can run on the LSF batch system there:

/afs/cern.ch/sw/ganga/install/5.1.2/bin/ganga
In [1]: j = Job()
In [2]: j.backend = LSF()
In [3]: j.submit()

Finally, it is also possible to submit a job to lxplus from Oxford using the Remote backend. By giving the host, username and ganga execution commands you can define a job locally and submit to the LSF queue at CERN (NOTE: do change the username and ganga_dir!):

In [1]: j = Job()
In [2]: j.backend = Remote()
In [3]: j.backend.host = "lxplus.cern.ch"
In [4]: j.backend.username = "mslater"
In [5]: j.backend.ganga_cmd = "/afs/cern.ch/sw/ganga/install/5.1.2/bin/ganga"
In [6]: j.backend.ganga_dir = "/afs/cern.ch/user/m/mslater/gangadir/remote_jobs"
In [7]: j.backend.remote_backend = LSF()
In [8]: j.submit()

Be aware that with this backend, when specifying input data, you are specifying the data at CERN, not locally! Just for experience, try submitting the previous jobs above (not just the basic Hello World job) to these other backends.

Basic Job Splitting and Merging

Where Ganga becomes very powerful is with the introduction of job splitting. This allows a single job to be defined but several jobs to be created. The splitting of jobs is controlled by the Job Splitter object - for these generic executable examples, we'll be using the ArgSplitter? which takes various argument options and creates a job to run on each. We will also be using the 'TextMerger' object that combines all the output text files from a set of subjobs into one text file. For more information on these, use the help system:

help(ArgSplitter)
help(TextMerger)

Here is an example that takes the basic 'echo' job and splits it with three arguments:

In [1]: s = ArgSplitter()
In [2]: s.args=[ ['A'], ['B'], ['C'] ]
In [3]: j = Job()
In [4]: j.backend = PBS()
In [4]: j.merger=TextMerger()
In [5]: j.merger.files=['stdout']
In [6]: j.merger.ignorefailed = True
In [7]: j.splitter = s
In [8]: j.submit()

While running, to view the subjobs (rather than the overall master job), use the following:

j.subjobs

After the job has completed, the text merger completes automatically and you can view the output in the parent job output directory as before.

Creating Submission Scripts

Clearly, it would be very tedious if you had to keep typing out the same text to submit a job and so there is scripting available within Ganga. To test this, copy the code for the splitting and merging above and paste it into a file called 'test_split.py'. Then, from within Ganga, you can use the 'execfile' command to execute the script:

In [1]: execfile('test_split.py')

You can also run Ganga in batch mode by doing the following:

/home/slater/runGanga --disableLCG test_split.py

ROOT Application

As well as the Executable application, there is also a general Root application for running Root scripts. To have a look at this, first access the help:

help(Root)

The Root aplpication allows the version of Root to be specified as well as the script to run at the command line. Arguments can also be sent to the script. As a simple example, we will create a Root script that ouputs a histogram with a gaussian distribution, taking the number of events as the argument.

First, create the following script and call it 'analysis.C':

void analysis(int events) {
    std::cout << "Creating histogram with Gaussian and " << events << " Events.\n";

    TH1D *hist = new TH1D("gaus", "gaus", 50, 0, 100);

    for (int i = 0; i < events; i++)
    {
        hist->Fill(gRandom->Gaus(50, 30));
    }

    TFile output("out.root", "RECREATE");
    hist->Write();
    output.Close();
}

Now create the following submission script for Ganga:

r = Root()
r.version = '5.14.00b'
r.script = File('analysis.C')
r.args = [100]
j = Job()
j.application = r
j.backend = PBS()
j.outputsandbox = ['out.root']
j.submit()

Note that we have specified the output file in the output sandbox to ensure it gets copied back with the job.

When complete, you should have a root file in the outputdir of your job with the histogram in!

For an even more complicated job, we will now combine the above with a splitter and merger to combine several histograms. Add the following to the job submission script:

j.splitter = ArgSplitter(args=[[100], [100], [10000]])
j.merger = RootMerger()
j.merger.files = ['out.root']

Running this will create 3 subjobs. After they compete, the parent job will have a combined root file that contains a histogram with 10200 entries. Note that the parent job may take longer to complete than the subjobs as the merger is being done.

More Advanced Job Manipulation

To finish off, we will cover some useful features of managing jobs. This is a fairly brief overview and a more complete list can be found at:

http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/

Copying Jobs

You can copy a job regardless of it's status using the following:

j = Job()
j2 = j.copy()

The copied job is able to be submitted regardless of the original jobs status. Consequently, you can do the following:

j = jobs(2)
j.submit()

Job Status

Jobs can be killed and then resubmitted using the following:

j.kill()
j.resubmit()

The status of a job can be forced (e.g. if you think it has hung and you want to set it to failed) using the following:

j.force_status('failed')

Removing Jobs

To clean up your job repository, you can remove jobs using the 'remove' method:

j.remove()

Configuration Options

You can supply different configuration options for Ganga at startup through the .gangarc file. If you wish to change things on the fly however, you can use:

config[<section>][<parameter>] = value

To show what the current settings are, just use the straight config value as with jobs:

config[<section>]

-- MarkSlater - 08 Dec 2008

Topic revision: r5 - 13 Dec 2008 - _47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61mark_32slater?

Computing

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback