Performance
measurement using G-PM
In the
following section it is assumed that a Grid-enabled application has already
been prepared for the use under OCM-G’s control. See the OCM-G tutorial for a
description on how to prepare the application. The OCM-G tutorial also provides
more detail on the start-up procedure of OCM-G.
Installation of G-PM requires a working installation of OCM-G, as the G-PM RPM is dependant on the OCM-G RPM package. Please refer to the OCM-G tutorial and documentation for more details.
The G-PM tool is distributed as a binary RPM. It can be obtained from:
http://savannah.fzk.de/distribution/crossgrid/autobuilt/i386-rh7.3-gcc2.95.2/wp2/RPMS/cg-gpm-0.4.0-1.i386.rpm
The installation of RPM is done by a typical rpm invocation:
rpm –ivh cg-gpm-0.4.0-1.i386.rpm
Before the application or G-PM can be started, the OCM-G Main Service Manager must be running. It can execute on any machine with inbound connectivity in the port range used by Globus-I/O. Usually, the UI machine will be used.
Main SM connection string: 8d6383e4:800d
The string 8d6383e4:800d is called the Main SM
connection string. It is needed later to tell the application and G-PM where to
connect. Note that this string will be different each time you invoke the Main Service Manager.
In principle, the application can be started in any suitable way, provided only that you add
--ocmg-appname flood --ocmg-mainsm
8d6383e4:800d
to the command line arguments of
each process. In addition, a valid Grid proxy certificate must exist on each
node used by the application. In the following, two examples are given for DaveF,
which is an MPI application based on MPICH-P4.
a.
Login to the cluster frontend and go to the directory where the
DaveF application files are located.
b.
Setup an MPICH machines file, called machines. The file contains the
names of the cluster nodes to be used, one per line. In the tutorial, the DaveF
kernel in run with 2 processes, thus, the file should list at least two
machines.
c.
Ensure that there is a Grid proxy certificate on each of the
cluster nodes. This can be achieved by creating a Grid proxy certificate on the
frontend and copy it to all nodes listed in the machines file. The following
bash commands automize this job:
grid-proxy-init
for i in `cat machines`
do
scp
/tmp/x509up_u$UID $i:/tmp
done
d.
Run the applicaition using the mpirun command:
mpirun -machinefile machines -np 2 davef
horne1.prj \
--ocmg-appname flood --ocmg-mainsm 8d6383e4:800d
a.
Login to the UI machine and go to the directory where the DaveF
application files are located.
b.
Be sure that you have a valid Grid proxy certificate.
c.
Edit the davef-ocmg.jdl file: you must specify the correct Main SM connection string in
the –ocmg-mainsm argument (i.e. 8d6383e4:800d in this example).
d.
Submit the job using
edg-job-submit davef-ocmg.jdl
The JDL file makes use of a special wrapper script (wrapper.sh) to ensure that the Grid proxy certificate is available on each worker node.
Before you can start the G-PM tool, you must
·
have launched the OCM-G Main Service Manager as described in Sect. 2,
·
have a valid Grid proxy certificate on the machine where you want to
start G-PM. If necessary, create a proxy with grid-proxy-init.
Then, you
should start G-PM with the following command:
gpm flood --num-procs 2 --terminate
--ocmg-mainsm 8d6383e4:800d
where
·
flood is the application identifier specified
during the submission of DaveF.
·
--num-procs 2 instructs G-PM to wait until 2 processes
have started. This allows G-PM to be started even before the application job is
actually running.
·
--terminate tells G-PM to shut down the
monitoring system when it exits (the application, however, continues; it just
no longer can be monitored).
Caution: There is a system program in
Linux, which is called gpm, too. Be sure that you either invoke gpm with full path name, or put the
CrossGrid directory at the very beginning of your PATH.
After
G-PM successfully connected to all application processes, it prints a message
like:
Attached to
application.
Init complete. Got 4
processes.
After a
while (~ 30 seconds), a Main Window should appear. The window is depicted in Fig. 1.
Fig. 1 G-PM main window
To create a performance measurement, the user should choose Measurements->New from the main
menu. The Measurement Definition Window should appear. It is depicted in
Fig. 2. After the performance measurement window
is displayed, the user should specify which performance value should be
measured, where it should be measured and how consecutive results should be
integrated. This is done in the following steps:
In this exercise a simultaneous measurement of
two performance properties is presented. Thus, after the first performance
measurement is created the user should create a second one. It is recommended
to create a second performance measurement that is identical to the first one
with the exception that the process on the other host
(i.e. process 1) is measured. This is depicted in Fig. 3
Fig. 2
Measurement definition window - options for the first
performance measurement are specified
Fig. 3
Measurement definition window - options for the second
performance measurement are specified
After the two performance measurements are specified, the user should create a new visualization window. This is done in the following steps.
Fig. 4 Visualization Definition Window
In this exercise, default values for all of this parameters are used. The only exception is the type of the visualization window – see step 3.
The default behavior when an application is started with OCM-G monitoring is that the application waits at the beginning, until it is started by the G-PM tool. This allows to monitor the performance behavior from the very beginning of the execution. To start the application, select ???? from the menu.
The MultiCurve visualization window is
depicted in Fig. 5. The measurement is initially enabled, so
the performance values should be measured and curves should be redisplayed at
regular intervals.
Fig. 5 MultiCurve performance visualization window
In the current version of the multi-curve window, only the
functionalities of the following widgets are implemented:
Among others, user-defined metrics allow G-PM to display scalar data produced by the application in near real time. As an example, the DaveF application simulates the evolution of water stage in a river over time. The application’s main loop, which is the simulation’s time loop, has been instrumented with a probe at the beginning of each iteration. This probe receives the current simulation time as a parameter. Via a user-defined metrics, G-PM can display the solver’s current simulation time, e.g. in the form of a bar graph, resulting in an on-line progress bar.
The specification of this metrics is:
Simulation_Time(Process p, VirtualTime vt)
{
PROBE loop_start(Process, VirtualTime,
double);
double simtime;
return simtime AT loop_start(p, vt,
simtime);
}
This metrics applies to a single process only. It is based on virtual time, i.e. there is a result for each occurrence of a probe event. The first line of the body declares the probe, whose last parameter is a floating point value containing the simulation time. The last line of the body states that the result of this metrics is the value of the probe’s last parameter, at the point in time, where this probe is executed. All together, the metrics returns the value of the simulation time for each execution of the probe, i.e. for each execution of DaveF’s time loop.
There are two methods to define this metrics in G-PM:
Fig. 6 Metrics specification window
After this step, select Measurements->New from the menu to open the measurement definition window. You will find a new metrics Simulation_Time in the metrics list. In a similar way as outlined in Sec. 3.5, you should now define a measurement of this metrics for process 0 of the DaveF application (see Fig. 7). Since the DaveF’s time loop is synchronized between processes, the result of measuring the metrics for process 1 would be the same.
Fig. 7 Definition of a measurement of user-defined metrics Simulation_Time
Now, create a bar graph visualization window, following the steps explained in Sect.6. You should set the upper boundary of the scale to the value 10. The resulting bar graph (see Fig. 8) indicates the progress of the application in terms of its current simulation time. In the example, DaveF has just simulated 1.45 hours of a flood.
Fig. 8 Bar graph showing DaveF’s current simulation time
To close the performance measurement session, the user should close all
open visualization windows using the Quit buttons and exit
the tool with File->Exit.