mpirun -np n dhpl [-f] [config.dat]
Dhpl runs the High-Performance LINPACK (HPL) benchmarks specified by its configuration file. It is typically used with the MPI (Message Passing Interface) framework, which allows the benchmark to utilize muptiple processors.
Supply a configuration file, which conventionally has the extension .dat, although any extension (or none) may be used. If no file name is given, DHPL.dat is used.
The configuration has the following format. Header and HPL parameter information must be supplied in full, and in the exact order shown. Blank lines and comment lines having ``#'' in the first column are ignored, as is any additional text following the first white space on a parameter line.
HPLinpack benchmark input file (Discrete problems version) Site name here -- Test run name here
# The four following parameters are defined only once HPL.out Global output file name (if any) 6 Global device out (6=stdout,7=stderr,file) 16.0 Global threshold for error detection 2 Number of of problems defined in this file
# The 17 parameters that follow are redefined for each problem # Problem 1 3300 N: Problem size 32 NB: Block size 0 PMAP: Process mapping (0=Row-,1=Column-major) 1 P: Process matrix rows 1 Q: Process matrix columns 1 PFACT: Panel factorization algorithm (0=left, 1=Crout, 2=Right) 2 NBMIN: Recursive stopping criterion (>= 1) 8 NDIV: Number of panels in recursion 0 RFACT: Recursive panel factorization algorithm (0=left, 1=Crout, 2=Right) 5 BCAST: Broadcast algorithm (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 DEPTH: Lookahead depth (>=0) 1 FSWAP: Swapping algorithm (0=bin-exch,1=long,2=mix) 96 TSWAP: Swapping threshold for mix algorithm 1 L1: Upper triangle storage form (0=transposed,1=no-transposed) 1 U: Panel row storage form (0=transposed,1=no-transposed) 1 E: Swap broadcast equilibration (0=no,1=yes) 8 A: Memory alignment in double (> 0)
# Problem 2 (Values for same variable parameters in same order as above) 3300 120 1 1 1 0 3 8 2 2 2 2 32 0 0 1 8
The format of the output file is as follows:
A fixed header block
============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
Discrete test driver: Dominic Dunlop, University of Luxembourg, May 30, 2008 ============================================================================
Information from line two of the configuration file
Run identifier: Site name here -- Test run name here
Documentation for the result lines
An explanation of the input/output parameters follows: T/V : Wall time / encoded variant (see below). ST : Swapping threshold. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system.
Encoding of first field (c: letter, d: digit):
Dcdccdcdccccdd --- Memory alignment (double-precision words) ||||||||||+------ Swap-broadcast equilibration? (Y)es/(N)o |||||||||+------- Panel row storage (T)ransposed/(N)ot transposed ||||||||+-------- Upper triangle storage (T)ransposed/(N)ot transposed |||||||+--------- Swapping: (B)inary exchange/(L)ong/(M)ixed ||||||+---------- Recursive stopping criterion (>=1) |||||+----------- Panel factorization: (L)eft/(C)rout/(R)ight ||||+------------ Number of panels in recursion |||+------------- Recursive panel factorization: (L)eft/(C)rout/(R)ight ||+-------------- Communication topology (1-5) |+--------------- Lookahead depth (>0) +---------------- Process mapping (R)ow/(C)olumn-major ----------------------------------------------------------------------------
- The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
Precision of host's floating-point calculation
- The relative machine precision (eps) is taken to be 1.110223e-16
Global threshold for error detection from input file
- Computational tests pass if scaled residuals are less than 16.0
Header followed by information for first problem
============================================================================ T/V ST N NB P Q Time Gflops ---------------------------------------------------------------------------- DR15L8C2LNNY08 96 3300 32 1 1 15.15 1.582e+00 ----------------------------------------------------------------------------
Results of residuals check for first problem
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0216893 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0258833 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0054144 ...... PASSED
Information for second problem
============================================================================ T/V ST N NB P Q Time Gflops ---------------------------------------------------------------------------- DC22R8L3MTTY08 32 3300 120 1 1 11.11 2.159e+00 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0238992 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0285205 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0059661 ...... PASSED ============================================================================
Summary of results
Finished 2 tests with the following results: 2 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. ----------------------------------------------------------------------------
End of Tests. ============================================================================
Errors in input file format, or out-of-range parameter values, result in an immediate exit with non-zero status. (Every value in the input file is checked prior to the running of any benchmark.) Since values are assigned to parameters according to their sequence in the input file, a missing value does not cause an error until some subsequent assignment fails.
Parameter combinations that prevent a benchmark from running -- for example, if the problem size specified by N is too large to fit in virtual memory, or if P*Q is greater than the number of processes allocated to the problem by MPI -- result in dhpl aborting with non-zero status at the start of the evaluation of the affected problem.
If the parameter defining global threshold for error detection is greater than zero, dhpl checks the correctness of its results following the solution of each problem. Any failure is reported in dhpl's output file, but does not result in an error exit.
Dhpl is based closely upon xhpl, the benchmark application from version 1.0a of the HPL suite. Only the input processing and result reporting code has been changed; the code that executes the benchmark is unaltered.
mpirun(1), http://www.netlib.org/benchmark/hpl/
The web site for hpc-ga-bench, the project of which dhpl is a part, may be found at https://gforge.uni.lu/projects/hpc-ga-bench. The site provides for downloading, event tracking, and notifications.
Dominic Dunlop, mailto:dominic.dunlop@uni.lu
Copyright (C) 2009 by Dominic Dunlop
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; for details see http://www.gnu.org/copyleft/fdl.html".