PROOF in FairRoot
The necessary changes to the FairRoot allowing usage of PROOF have been added on 08.03.2012.
In order to run FairRunAna on the PROOF cluster, one should edit the macro file, and change:
FairRunAna* fRun = new FairRunAna(); to FairRunAna* fRun = new FairRunAna("proof");
The example macros that show the usage of PROOF are located in macro/global/.
Create several input files with:
karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ date; root -l -q 'sim_BARREL_1000.C(0)' &> p0.dat; date ;root -l -q 'sim_BARREL_1000.C(1)' &> p1.dat; date; root -l -q 'sim_BARREL_1000.C(2)' &> p2.dat; date; root -l -q 'sim_BARREL_1000.C(3)' &> p3.dat; date; root -l -q 'sim_BARREL_1000.C(4)' &> p4.dat; date; root -l -q 'sim_BARREL_1000.C(5)' &> p5.dat; date; root -l -q 'sim_BARREL_1000.C(6)' &> p6.dat; date; root -l -q 'sim_BARREL_1000.C(7)' &> p7.dat; date; root -l -q 'sim_BARREL_1000.C(8)' &> p8.dat; date; root -l -q 'sim_BARREL_1000.C(9)' &> p9.dat; date;
Analyze the files in PROOF:
karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ root -l -q 'tracks_BARREL_1000.C("proof",10);
You may also analyze the files locally:
karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ root -l -q 'tracks_BARREL_1000.C("local",10);
Running on PROOF
The most important difference when running on PROOF is the dialog window that will appear after PROOF starts working. // I would like to have a picture here //.
It shows the progress of the analysis in a graphic form.
When you look at the screen printouts, you will notice that it differs from the one you are used to.
The major differences include:
FairRunAna::RunOnProof(0,10000): running FairAnaSelector on proof server: "workers=3" with PAR file name = "$VMCWORKDIR/gconfig/libFairRoot.par".
+++++++ T P R O O F +++++++++++++++++++++++++++++++++
creating TProof* proof = TProof::Open("workers=3");
+++ Starting PROOF-Lite with 3 workers +++
Opening connections to workers: OK (3 workers)
Setting up worker servers: OK (3 workers)
PROOF set to parallel mode (3 workers)
+++++++ C R E A T E D +++++++++++++++++++++++++++++++
drwxr-x--- 3 karabowi had1 22 31. Okt 13:34 libFairRoot
lrwxrwxrwx 1 karabowi had1 60 9. Mär 11:57 libFairRoot.par -> /misc/karabowi/pandaroot_trunk/trunk/gconfig/libFairRoot.par
FairRunAna::RunOnProof(): The chain seems to have 10000 entries.
FairRunAna::RunOnProof(): There are 10 files in the chain.
FairRunAna::RunOnProof(): Starting inChain->Process("FairAnaSelector","",10000,0)
Info in <:setqueryrunning>: starting query: 1
Info in <:setrunning>: nwrks: 3
Error in <:tgspeedo::build>: speedo.gif not found
-I- FairAnaSelector::Begin()
Looking up for exact location of files: OK (10 files)
Looking up for exact location of files: OK (10 files)
Info in <:tpacketizeradaptive>: Setting max number of workers per node to 3
Validating files: OK (10 files)
Info in <:initstats>: fraction of remote files 1.000000
-I- FairAnaSelector::Terminate(): fOutput->ls() still sending)
OBJ: TSelectorList TSelectorList Special TList used in the TSelector : 0
OBJ: TList MissingFiles Doubly linked list : 0
OBJ: TStatus PROOF_Status : 0 at: 0x3c2bd30
OBJ: TOutputListSelectorDataMap PROOF_TOutputListSelectorDataMap_object Converter from output list to TSelector data members : 0 at: 0x2e53450
-I- FairAnaSelector::Terminate(): -------------
Lite-0: all output objects have been merged
FairRunAna::RunOnProof(): inChain->Process DONE
$HOME/.proof/pandaroot_installation_directory_trunk-macro-yourmacrodir/session-$hostname-$jobID-$pJobID/worker-0.workerNumber.log
To make it clearer, here is the location of my log files:
karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ ls ~/.proof/pandaroot_trunk-trunk-macro-global/session-lxi047-1331290627-14341/worker-0.*.log -l
-rw-r----- 1 karabowi had1 439090 9. Mar 11:57 /misc/karabowi/.proof/pandaroot_trunk-trunk-macro-global/session-lxi047-1331290627-14341/worker-0.0.log
-rw-r----- 1 karabowi had1 375985 9. Mar 11:57 /misc/karabowi/.proof/pandaroot_trunk-trunk-macro-global/session-lxi047-1331290627-14341/worker-0.1.log
-rw-r----- 1 karabowi had1 440103 9. Mar 11:57 /misc/karabowi/.proof/pandaroot_trunk-trunk-macro-global/session-lxi047-1331290627-14341/worker-0.2.log
The log files include all the printouts from the workers.
Moreover there are more output files than you expect:karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ ls -ltrh tracks_*.root
-rw-r----- 1 karabowi had1 378 9. Mar 12:37 tracks_22Part_n10000.root
-rw-r--r-- 1 karabowi had1 1009K 9. Mar 12:37 tracks_22Part_n10000_worker_0.0.root
-rw-r--r-- 1 karabowi had1 1013K 9. Mar 12:37 tracks_22Part_n10000_worker_0.2.root
-rw-r--r-- 1 karabowi had1 1,3M 9. Mar 12:37 tracks_22Part_n10000_worker_0.1.root
The individual workers create the output files on their own, and the trees/files from different workers are not merged. There is also the file one would expected created, but it should
be changed, so that this empty tracks_22Part_n10000.root is not created at all.
Depending on the number of input files, number of workers and the time spent for analysis of one event, the PROOF Packetizer, responsible for sending events to workers, will behave differently, but in general:
Whatever the case, you will find out that there is no correspondance in event order between the input files and the output files. One can however still match the events between the input and output using the fRunId and fMCEntryNo stored in the EventHeader for each event.
Restrictions
There are few restrictions when using PROOF:
There are now two constructors of the FairRunAna:
The changes to the SVN include:
To developers.
Following are several notes to the developers that will have to make their classes usable in PROOF:
Int_t fVerbosity; // verbosity of the task
will be streamed and the value passed to the worker nodes Int_t fVerbosity; //! verbosity of the task
will not be streamed and the value will not be passed to the worker nodes
When checking the task in PROOF, because of that I always have more files than workers, so that they have to call ReInit(). I also analyze the whole set locally and compare: the size of output files, the number of objects created by the task I debug, the shape of the distributions. In case a/ you will have less objects than expected, in case b/ more, and in case c/ you will see peaks in the distribution.
Error in TPacketizerAdaptive::SplitPerHost: The input list contains no elements
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
Info in TProofLite::MarkBad:
+++ Message from master at lxi047.gsi.de : marking lxi047:-1 (0.2) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(...)
+++ Message from master at lxi047.gsi.de : marking lxi047:-1 (0.2) as bad
+++ Reason: undefined message in TProof::CollectInputFrom(...)
+++ Most likely your code crashed
+++ Please check the session logs for error messages either using
+++ the 'Show logs' button or executing
+++
+++ root [] TProof::Mgr("lxi047.gsi.de")->GetSessionLogs()->Display("*")
You may try the suggested method, but it will print out the last 20 lines of each worker. Usually with segmentation violation your error will occur earlier, so that you will get many lines that does not any information necessary for debugging.
For me the quickest way was to access them by:
karabowi@lxi047:pandaroot_trunk:~/pandaroot_trunk/trunk/macro/global$ less ~/.proof/pandaroot_trunk-trunk-macro-global/last-lite-session/worker-0.2.log
Note, that the running directory (pandaroot_trunk/trunk/macro/global) is transformed inside PROOF with and "/" are changed into "-" (pandaroot_trunk-trunk-macro-global).
Remember also to take the log from the worker that crashed (in this case worker 0.2).
In case of not being able to fix your problem, please contact me at r.karabowicz@gsi.de
Known issues
The parameter file - to analyze trees in FairRoot you need to specify a root file with the parameters necessary for reconstruction and stored in the simulation phase. In order to reconstruct a chain of trees with several input trees the matching parameters have to be stored in ONE file. The implication is that you cannot concurrently create several input files, while doing so would spoil the parameter file. We are planning to fix this "feature" as soon as possible.
The PROOF ARchive file produces error on the master. In your macro you load the rootlogon.C macro and execute it to make FairRoot classes known to ROOT's cint. When you run analysis on the PROOF, the default libFairRoot.par PROOF ARchive also loads in and executed the same macro rootlogon.C. Annoying but harmless error is printed on the screen:
+++++++ T P R O O F +++++++++++++++++++++++++++++++++
creating TProof* proof = TProof::Open("");
+++ Starting PROOF-Lite with 8 workers +++
Opening connections to workers: OK (8 workers)
Setting up worker servers: OK (8 workers)
PROOF set to parallel mode (8 workers)
+++++++ C R E A T E D +++++++++++++++++++++++++++++++
EXECUTING libFairRoot.par/SETUP.C without includes
Function SETUP_c7827247() busy. loaded after "/misc/karabowi/pandaroot_trunk/trunk/gconfig/rootlogon.C"
Error: G__unloadfile() Can not unload "/misc/karabowi/pandaroot_trunk/trunk/gconfig/rootlogon.C", file busy /tmp/SETUP_c7827247.C:20:
Note: File "/misc/karabowi/pandaroot_trunk/trunk/gconfig/rootlogon.C" already loaded
*** Interpreter error recovered ***
DONT MIND THE ERRORS HERE, ITS EXECUTED.
Meanwhile I have added the line "DONT MIND THE ERRORS HERE, ITS EXECUTED", and am currently working on proper loading/unloading of the file.