Performance measurement using G-PM

 

In the following section it is assumed that a Grid-enabled application has already been prepared for the use under OCM-G’s control. See the OCM-G tutorial for a description on how to prepare the application. The OCM-G tutorial also provides more detail on the start-up procedure of OCM-G.

1      Installation of the G-PM

Installation of G-PM requires a working installation of OCM-G, as the G-PM RPM is dependant on the OCM-G RPM package. Please refer to the OCM-G tutorial and documentation for more details.

 

The G-PM tool is distributed as a binary RPM. It can be obtained from:

http://savannah.fzk.de/distribution/crossgrid/autobuilt/i386-rh7.3-gcc2.95.2/wp2/RPMS/cg-gpm-0.4.0-1.i386.rpm

The installation of RPM is done by a typical rpm invocation:

rpm –ivh cg-gpm-0.4.0-1.i386.rpm

 

2      Starting the OCM-G

Before the application or G-PM can be started, the OCM-G Main Service Manager must be running. It can execute on any machine with inbound connectivity in the port range used by Globus-I/O. Usually, the UI machine will be used.

  1. Login to the UI machine.
  2. Ensure that you have a valid Grid proxy certificate. If necessary, run grid-proxy-init to create one.
  3. Invoke cg-ocmg-monitor. This will print a line looking like

            Main SM connection string: 8d6383e4:800d

The string 8d6383e4:800d is called the Main SM connection string. It is needed later to tell the application and G-PM where to connect. Note that this string will be different each time you invoke the Main Service Manager.

3      Starting the application

In principle, the application can be started in any suitable way, provided only that you add

--ocmg-appname flood --ocmg-mainsm 8d6383e4:800d

to the command line arguments of each process. In addition, a valid Grid proxy certificate must exist on each node used by the application. In the following, two examples are given for DaveF, which is an MPI application based on MPICH-P4.

  1. Starting on a local cluster:

a.    Login to the cluster frontend and go to the directory where the DaveF application files are located.

b.    Setup an MPICH machines file, called machines. The file contains the names of the cluster nodes to be used, one per line. In the tutorial, the DaveF kernel in run with 2 processes, thus, the file should list at least two machines.

c.    Ensure that there is a Grid proxy certificate on each of the cluster nodes. This can be achieved by creating a Grid proxy certificate on the frontend and copy it to all nodes listed in the machines file. The following bash commands automize this job:

grid-proxy-init

for i in `cat machines`

do

      scp /tmp/x509up_u$UID $i:/tmp

done

d.    Run the applicaition using the mpirun command:

mpirun -machinefile machines -np 2 davef horne1.prj \

       --ocmg-appname flood --ocmg-mainsm 8d6383e4:800d

 

  1. Using EDG job submission commands:

a.    Login to the UI machine and go to the directory where the DaveF application files are located.

b.    Be sure that you have a valid Grid proxy certificate.

c.    Edit the davef-ocmg.jdl file: you must specify the correct Main SM connection string in the –ocmg-mainsm argument (i.e. 8d6383e4:800d in this example).

d.    Submit the job using

edg-job-submit davef-ocmg.jdl

The JDL file makes use of a special wrapper script (wrapper.sh) to ensure that the Grid proxy certificate is available on each worker node.

           

4      Starting the G-PM tool

Before you can start the G-PM tool, you must

·        have launched the OCM-G Main Service Manager as described in Sect. 2,

·        have a valid Grid proxy certificate on the machine where you want to start G-PM. If necessary, create a proxy with grid-proxy-init.

Then, you should start G-PM with the following command:

gpm flood --num-procs 2 --terminate --ocmg-mainsm 8d6383e4:800d

where

·        flood is the application identifier specified during the submission of DaveF.

·        --num-procs 2 instructs G-PM to wait until 2 processes have started. This allows G-PM to be started even before the application job is actually running.

·        --terminate tells G-PM to shut down the monitoring system when it exits (the application, however, continues; it just no longer can be monitored).

Caution: There is a system program in Linux, which is called gpm, too. Be sure that you either invoke gpm with full path name, or put the CrossGrid directory at the very beginning of your PATH.

 

After G-PM successfully connected to all application processes, it prints a message like:

Attached to application.

Init complete. Got 4 processes.

After a while (~ 30 seconds), a Main Window should appear. The window is depicted in Fig. 1.

 

Fig. 1 G-PM main window

5      Creating a performance measurement with built-in metrics

To create a performance measurement, the user should choose Measurements->New from the main menu. The Measurement Definition Window should appear. It is depicted in
Fig. 2. After the performance measurement window is displayed, the user should specify which performance value should be measured, where it should be measured and how consecutive results should be integrated. This is done in the following steps:

 

  1. The metric is selected from the leftmost panel. In this exercise, the MPI_Send_volume metric is selected.
  2. The objects on which the measurement should be done are specified on the next panel. In this example, the Selected Object option from the pull-down list at the top of the panel is selected. This enables the monitoring of particular processes, as opposed to the Whole Application option, which is used to monitor the selected metric for the whole application.
  3. The sites, hosts and processes are chosen. In this example, LocalSite is chosen, limiting our options to two hosts. The zeus24.cyf-kr.edu.pl host is then chosen and the process that is deployed on this host, process 0, is selected. The process identifier 0 here is the MPI rank of the process.
  4. The integration mode should be specified in the bottom-left panel of the window. In this example, a mean value of the send volume for each update interval is measured; thus the Time Derivative option is checked.
  5. After the parameters are specified, the OK button creates the performance measurement.

 

In this exercise a simultaneous measurement of two performance properties is presented. Thus, after the first performance measurement is created the user should create a second one. It is recommended to create a second performance measurement that is identical to the first one with the exception that the process on the other host (i.e. process 1) is measured. This is depicted in Fig. 3

 

Fig. 2 Measurement definition window - options for the first
performance measurement are specified

 

Fig. 3 Measurement definition window - options for the second
performance measurement are specified

6      Creating the visualization window.

After the two performance measurements are specified, the user should create a new visualization window. This is done in the following steps.

  1. Select the defined measurements in the upper panel of the G-PM main window and select Displays->New display from the main menu.
  2. The Visualization Definition Window should appear. It is depicted in  Fig. 4.
  3. Specify a desired visualization type in the upper-left panel of the window. Currently, two visualization types are usable: MultiCurve and BarGraph. In this exercise the MultiCurve visualization is used – the user should select it.
  4. Additional parameters of the visualization can be specified if necessary:
    1. Time Mode of the display – currently only the Real Time is implemented.
    2. Partition of the axis scale: it can be either Linear or Logarithmic. In the latter case the values are displayed against a log10-type scale.
    3. Behavior of the axis when displayed values exceed current scale boundaries. Two modes are possible: Variable – the axis is rescaled to the new values and Fixed – the axis is not rescaled and the displayed curve/bar spans the whole display range.
    4. Lower/Upper boundary of the scale. These are the initial scale boundaries. However, if Fixed is checked, they are preserved during the whole measurement process.
    5. Update interval – i.e. the interval between consecutive updates to the display window.

 

Fig. 4 Visualization Definition Window

 

In this exercise, default values for all of this parameters are used. The only exception is the type of the visualization window – see step 3.

  1. After the parameters of the visualization are specified, click the OK button. A new visualization window should appear.

7      Starting the application

The default behavior when an application is started with OCM-G monitoring is that the application waits at the beginning, until it is started by the G-PM tool. This allows to monitor the performance behavior from the very beginning of the execution. To start the application, select ???? from the menu.

8      Observing the measured values

The MultiCurve visualization window is depicted in Fig. 5. The measurement is initially enabled, so the performance values should be measured and curves should be redisplayed at regular intervals.

 

Fig. 5 MultiCurve performance visualization window

 

In the current version of the multi-curve window, only the functionalities of the following widgets are implemented:

  1. ZoomIn/ZoomOut – this allows increasing/decreasing the resolution of the time axis,
  2. Focus – if selected, it disables the automatic refreshing of the curve area – i.e. the performance values are measured, but the curves are not updated.
  3. Quit – disables performance monitoring and closes the window.
  4. Scrollbar at the bottom of the curve area – allows to scroll the curves to the previously measured performance values.
  5. List of performance measurements that are used in the visualization window. It is displayed below the curve area. The highlighted performance measurement has its curve displayed in red.

9      Creating a performance measurement with user-defined metrics

Among others, user-defined metrics allow G-PM to display scalar data produced by the application in near real time. As an example, the DaveF application simulates the evolution of water stage in a river over time. The application’s main loop, which is the simulation’s time loop, has been instrumented with a probe at the beginning of each iteration. This probe receives the current simulation time as a parameter. Via a user-defined metrics, G-PM can display the solver’s current simulation time, e.g. in the form of a bar graph, resulting in an on-line progress bar.

The specification of this metrics is:

Simulation_Time(Process p, VirtualTime vt)

{

PROBE loop_start(Process, VirtualTime, double);

double simtime;

return simtime AT loop_start(p, vt, simtime);

}

This metrics applies to a single process only. It is based on virtual time, i.e. there is a result for each occurrence of a probe event. The first line of the body declares the probe, whose last parameter is a floating point value containing the simulation time. The last line of the body states that the result of this metrics is the value of the probe’s last parameter, at the point in time, where this probe is executed. All together, the metrics returns the value of the simulation time for each execution of the probe, i.e. for each execution of DaveF’s time loop.

There are two methods to define this metrics in G-PM:

 

Fig. 6 Metrics specification window

After this step, select Measurements->New from the menu to open the measurement definition window. You will find a new metrics Simulation_Time in the metrics list. In a similar way as outlined in Sec. 3.5, you should now define a measurement of this metrics for process 0 of the DaveF application (see Fig. 7). Since the DaveF’s time loop is synchronized between processes, the result of measuring the metrics for process 1 would be the same.

 

Fig. 7 Definition of a measurement of user-defined metrics Simulation_Time

 

Now, create a bar graph visualization window, following the steps explained in Sect.6. You should set the upper boundary of the scale to the value 10. The resulting bar graph (see Fig. 8) indicates the progress of the application in terms of its current simulation time. In the example, DaveF has just simulated 1.45 hours of a flood.

 

Fig. 8 Bar graph showing DaveF’s current simulation time

 

10  Finalizing the performance measurement session.

To close the performance measurement session, the user should close all open visualization windows using the Quit buttons and exit the tool with File->Exit.