SYNOPSIS

  runacbea -config config.acbea ...


DESCRIPTION

Aspects of runacbea's operation related to the benchmark parameter space and the tools used to explore it are specified by an XML-format configuration file, conventionally named config.acbea. However, the file may have any name, which may include any (or no) filename extension.

The file contains a single acbea_config element with the following content:

acbea

An optional empty element specifying the version of the DTD to which the file conforms.

description

A required empty element having attributes that describe the benchmark run specified by the file.

prime

A required empty element having attributes that describe the benchmark execution environment. (The somewhat irrelevant name of this element is inherited from the ACOVEA package upon which ACBEA is based.)

parameters

A required element specifying the names and allowed values for the benchmark parameters that may be varied by ACOVEA's evolutionary algorithm.

The required elements are described in detail in the subsections that follow. When specifying values for attributes, be aware that characters that are special to XML must be represented by entities such as &lt;, &gt;, and &amp; for <, >, and & respectively in shell commands etc.

description

The description element has the following attributes:

value (required)

A short description of the ACBEA run parameterized by this file, for example ``Evaluation of Acme Institute PetaGiant cluster, 2010-06-05''. This description appears in reports.

version (default 1.0.0)

The version of benchmark addressed by the file. This information is currently unused, other than in reports.

header

A short description of the function of the file, for example ``HPLinpack discrete problems benchmark input for ACBEA''. This information appears in reports. Configuration files generated by runacbea for subsequent runs of the program will have ``(auto-generated)'' appended to this value.

prime

The prime element has the following attributes:

batchcommand (default sh)

The command to submit a shell script for execution on one or more nodes. Where there is no batch scheduler, this may simply be sh, otherwise it should be a script that schedules a job, then waits for it to complete. (The name of the scheduler command itself is unlikely to be what's needed here.) The util/run-oar.sh script in the ACBEA distribution performs this function for the OAR batch job scheduler, and may be used as a template for the development of scripts suitable for other schedulers. Batchcommand may be on the search path; otherwise a pathname relative to current directory for runacbea, or a full pathname may be used.

hostselect (default ^node-)

A Perl regular expression matching the addresses of cluster nodes that may be used for execution. The default is likely to work on homogeneous clusters, but more complex expressions may be used to select nodes of a particular class in a heterogeneous cluster, or to avoid particular dead or degraded nodes. To give a complex example,

 ^chinqchint-([1-689][^0-9]|1[0-35-9]|2[01267]|3[0-35-8])

selects 21 hosts having fully-qualified domain names starting with the following strings: chinqchint-1., chinqchint-6., chinqchint-8., chinqchint-9., chinqchint-10, chinqchint-11, chinqchint-12, chinqchint-15, chinqchint-16, chinqchint-17, chinqchint-18, chinqchint-19, chinqchint-20, chinqchint-21, chinqchint-22, chinqchint-26, chinqchint-27, chinqchint-30, chinqchint-31, chinqchint-32, chinqchint-33, chinqchint-35, chinqchint-36, chinqchint-37 and chinqchint-38.

mpisetup (default empty)

This is the command line (if any) needed to set up MPI environment prior to using mpirun; for example,

 mpdboot -n $(uniq $OAR_NODEFILE | wc -l) --file=$OAR_NODEFILE --rsh=oarsh

is needed to start daemons if MPICH2 is being used in a cluster managed by OAR.

mpirun (default mpirun)

The command to run an MPI-aware application.

mpirunflags (default empty)

This parameter specifies the options that are needed to make mpirun work, and work well. These may call out a communications fabric and the parameters that it requires. For example,

 -hostfile $OAR_NODEFILE -mca pls_rsh_agent oarsh -mca btl ^openib

gives good results with OpenMPI v1.2 on a small gigabit-Ethernet-connected cluster managed by OAR.

mpishutdown (default empty)

This parameter gives the command line (if any) needed to shut down MPI environment after using mpirun; for example, mpdallexit terminates MPICH2 daemons.

processes (default 1)

This parameter gives the number of MPI processes to use; this should be equal to the number of compute cores available to the benchmark, that is, the number of cores per node multiplied by the number of nodes. (Note that ACBEA cannot currently run benchmarks that use partial nodes.) Configuration files generated by runacbea for subsequent runs of the program multiply this value by the scaling factor (see runacbea).

nodes (default 1)

This parameter specifies the number of cluster nodes across which to distribute processes. Configuration files generated by runacbea for subsequent runs of the program multiply this value by the scaling factor (See runacbea).

benchcommand (default dhpl)

Benchcommand is the command that runs the benchmark. It should be dhpl, or some other command that understands dhpl control files. Because processes run on compute nodes by a job manager typically have a default search path, rather than the search path that is seen on the head node by runacbea, this must be a pathname relative to working directory for runacbea, or a full pathname.

benchflags (default -f)

If benchcommand requires command-line flags, they are given here. The name of the parameter file generated by runacbea is appended to these flags when benchcommand is run.

resultfilter (default awk '/^D[RC].*[0-9]$/{print $NF}')

This attribute gives a shell command that isolates GFlops performance figures from the output of the benchmark command. The default value works for dhpl.

resultverify (default awk '/ (failed|skipped) /{c+=$1}END{exit c}')

This shell command returns non-zero status on finding error indicators in the output of the benchmark command. The default value works for dhpl. On the assumption that calculation problems will make themselves apparent early in a runacbea run, verification takes place only in the first generation of evolution unless debugging information has been requested. Output from this command (if any) appears in runacbea's debugging output.

parameters

The parameters element contains one or more parameter elements. In practice exactly the parameters named below must be used, and in the order given.

A parameter element requires a name attribute to give the parameter a name, and may have a description attribute briefly describing its function. A parameter must be of one of four kinds, specified by its type attribute:

global

The parameter applies unchanged to all benchmarks run by a single invocation of dhpl. Its value, which may be integer, floating point or string, is specified by the value attribute.

dummy

The parameter value for a particular benchmark will be supplied by runacbea, based on the values of other parameters, rather than by the configuration file.

enum

The integer parameter value for a particular benchmark may assume any one of a number of values specified as a vertical bar-separated list by the value attribute, for example value="1|2|4|8". As a degenerate case, a parameter with a fixed value for all benchmarks may be specified with a list containing a single value, as in value="1".

tuning

The integer parameter value for a particular benchmark may assume any one of a number of values between inclusive lower and upper bounds specified by the min and max attributes respectively, and separated by the difference specified by the step attribute. For example, min="2" max="8" step="2" would allow a parameter to assume the values 2, 4, 6 and 8.

The parameters required by dhpl are as follows:

OUTF (global)

The name of the benchmark result output file. The value is ignored: runacbea generates file names for each invocation of benchcommand.

OUTD (global)

This parameter tells dhpl where to direct its output: 6=stdout, 7=stderr, other value=file. Runacbea requires that its value be set to 1, indicating output to a file.

THRSH (global)

Dhpl runs sanity checks on benchmark results if this floating-point parameter, which species an error margin, has a value that is greater than zero. 16.0 is a reasonable value. Note that, by default, runacbea suppresses sanity checks in the second and subsequent generations in order to reduce run-time.

NP (global)

This parameter specifies the number of problems specified by a dhpl control file. The value is ignored: runacbea generates a value for each invocation of benchcommand.

N (enum with single possible value)

This parameter specifies the HPL problem size; that is the number of rows and columns in the square matrix of simultaneous equations solved by the benchmark. A suitable starting value may be determined by running ten-sec-n.pl. (See also runacbea/Benchmarking a cluster.)

NB (enum)

This parameter specifies the dimension of the small sub-matrices into which the problem is ultimately decomposed. It has been found to be the most critical parameter in obtaining optimum benchmark results. The range specified in the sample configuration file,

 min="32" max="128" step="8"

is a good starting point, although it may be profitable to increase max to 256 for in some cases. Step should be kept relatively small to avoid the danger of missing a sharp peak in performance. Odd values seem never to give good results, and should be avoided.

PMAP (enum)

This parameter specifies whether the problem matrix is mapped into memory in row- or column-major form (values 0 and 1 respectively). The sample configuration's value="0|1" should be used.

P (enum)

P gives the number of rows in the matrix of compute cores across which the problem is partitioned for solution. Allowed values should be factors of the number of compute cores specified by the processes attribute of prime (see above). For example, for 36 cores, value="1|2|4|6|9|12|18|36" gives an exhaustive list of allowable values for P. In practice, matrices that are considerably ``over-square'' are likely to give poor results, so value="1|2|4|6|9" may be a better list in this case. (Note that specifying an allowed value that is not a factor of the number of cores may not result in an error, just poor results for those benchmarks that use it.)

Q (dummy)

Q gives the number of columns in the matrix of compute cores across which the problem is partitioned. Runacbea calculates its value by dividing the number of cores by the current value of P.

PFACT (enum)

PFACT specifies the panel factorization algorithm used. There are three possibilities: left-looking (0), Crout's method (1), and right-looking (2). The sample configuration's value="0|1|2" should be used.

NBMIN (enum)

The panel factorization algorithm is recursive, dividing the panel into ever-smaller sub-panels. NBMIN specifies the stopping criterion: recursion terminates when the number of columns in any sub-panel is less than or equal to this number, which must be greater than one. As values greater than four are seldom worth investigating, the sample configuration's value="2|3|4" is probably a good choice.

NDIV (enum)

The factorization algorithm divides the panel into NDIV sub-panels at each step. Small positive powers of two, as specified by the sample configuration's value="2|4|8", should be investigated.

RFACT (enum)

Analogously to PFACT, RFACT specifies left-looking (0), Crout's method (1), or right-looking (2) recursive factorization. Again, the sample configuration's value="0|1|2" should be used.

BCAST (enum)

This parameter specifies the inter-node broadcast algorithm, which may be:

  1. Increasing ring

  2. Increasing ring (modified)

  3. Increasing two-ring

  4. Increasing two-ring (modified)

  5. Long (bandwidth reducing)

  6. Long (bandwidth reducing modified)

value="0|1|2|3|4|5", as specified in the sample configuration, will explore all possibilities, although, in practice on small- to medium-sized fully-connected clusters with high-bandwidth interconnect, increasing ring or increasing ring (modified) are likely to give the best results.

DEPTH (enum)

The DEPTH parameter controls the depth to which the factorization algorithm looks ahead through multiple panels. When DEPTH is zero, there is no look-ahead; when one, there is one panel's worth of look-ahead, and so on. HPL's tuning notes state that 0 or 1 are likely to give the best results, and that look-ahead of depths 3 and larger will probably not give you better results, making the sample configuration's value="0|1|2|3" perhaps a little conservative.

FSWAP (enum)

The FSWAP parameter controls the algorithm used to update the trailing sub-matrix. There are two primary choices: binary exchange (0); and long (1), also known as spread-roll, which is a bandwidth-reducing variant of binary exchange. A third possibility, mixed (2), switches from binary exchange to long if the number of columns being processed exceeds the TSWAP value -- see below. The sample configuration specifies value="0|1|2", exploring all three possibilities.

TSWAP (tuning)

The TSWAP parameter is used only when FSWAP is 2, and specifies the column count threshold at which the switch between algorithms takes place. The sample configuration's min="0" max="128" step="32" has been found to give reasonable results. Note that a value of zero results in the long algorithm being used exclusively, while a large value that is never reached results in exclusive use of binary exchange. Thus an alternative starting point would be always to make the value of FSWAP 2 (mixed), and allow extreme values of TSWAP to explore the no-switch cases.

L1 (enum)

The L1 parameter determines how the upper triangle of the matrix being solved is stored in memory: 0 selects a transposed form; 1, non-transposed. The example parameters allow either value.

U (enum)

Like L1, U selects a transposed (1) or non-transposed (0) storage layout, this time for panel rows. Again, the example parameters allow either value.

E (enum)

If E is 1, an equilibration phase is added to the long swapping algorithm -- see TSWAP and FSWAP> above; if it is zero, equilibration is skipped. Equilibration takes time, but is likely to result in better distribution of work across compute nodes. The example parameters allow either value.

A (enum)

Memory is allocated by HPL at addresses that are a multiple of A. The fixed value of 8, the size in bytes of a double-precision floating point number, specified by the sample configuration, is probably adequate, although value="8|16" might also be investigated.


DTD

The Document Type Definition (DTD) for acbea.config is as follows:

    <!ELEMENT acbea_config 
              (acbea?, description, prime, parameters) >
    <!ELEMENT acbea EMPTY>
    <!ELEMENT description EMPTY>
    <!ELEMENT prime EMPTY>
    <!ELEMENT parameters
              (parameter, parameter*) >
    <!ELEMENT parameter EMPTY>
    <!ATTLIST acbea
              version      CDATA "1.0" >
    <!ATTLIST description
              value        CDATA #REQUIRED
              version      CDATA "1.0.0"
              header       CDATA #REQUIRED >
    <!ATTLIST prime
              batchcommand CDATA   "sh"
              hostselect   CDATA   "node-"
              mpisetup     CDATA   ""
              mpirun       CDATA   "mpirun"
              mpishutdown  CDATA   ""
              mpirunflags  CDATA   ""
              processes    CDATA   "1"
              nodes        CDATA   "1"
              benchcommand CDATA   "dhpl"
              benchflags   CDATA   "-f"
              resultfilter CDATA   "awk '/^D[RC].*[0-9]$/{print $NF}'"
              resultverify CDATA   
                  "awk '/ (failed|skipped) /{c+=$1}END{print c}'" >
    <!ATTLIST parameter
              type         (global|dummy|enum|tuning) #REQUIRED
              name         CDATA                      #REQUIRED
              value        CDATA                      #IMPLIED
              description  CDATA                      #IMPLIED
              min          CDATA                      #IMPLIED
              max          CDATA                      #IMPLIED
              step         CDATA                      #IMPLIED >
    ]>


BUGS

It should not be necessary to list the parameters in the correct order (which reflects the required order of the parameters in xhpl's control file): runacbea should be capable of sorting things out.

The description of the hostselect parameter assumes that the batch job management system has some means of selecting compute nodes based on a Perl regular expression. While this is true of OAR because the underlying MySQL database manager can handle such regular expressions, it is probably not true of all job managers. If you use ACBEA with another job manager, you may need to change the interpretation of hostselect.


SEE ALSO

runacbea(1), ten-sec-n.pl(1), perlre(1), http://www.netlib.org/benchmark/hpl/algorithm.html, http://oar.imag.fr.


AUTHOR

Dominic Dunlop, mailto:dominic.dunlop@uni.lu


COPYRIGHT AND LICENCE

Copyright (C) 2009 by Dominic Dunlop

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; for details see http://www.gnu.org/copyleft/fdl.html.