NOTE: This document has NOT been kept up-to-date as new features have been added. It should still be mostly correct, but is incomplete. The UPC test harness scripts ============================ Files: * harness.pl = the main test harness PERL script * runjobs.pl = the script that runs the test applications * sysconf = template for a harness system configuration * alvarez = sample system configuration file for alvarez.nersc.gov * seaborg = sample system configuration file for seaborg.nersc.gov * flyer = sample system configuration file for flyer.cse.mtu.edu The intent of the harness is to automate the compilation and execution of the test codes in the UPC test suites. Of the various test suites in the bupc-testsuite directory, currently only "gwu" and "mupc" have been configured to work with this harness. The harness requires a certain amount of system configuration in order to work. Template configuration files are provided alvarez and seaborg. In addition, various command line options can be specified to over-ride configuration file values. In particular, the harness gets the following info from the system configuration file: harness_dir: where does the test harness live? testsuite_dir: where is the bupc-testsuite? testsuites: which suites (subdirs) of bupc-testsuite to use? logroot: where to place the logs, reports, batch scripts, etc. network: which network (e.g. GASNet conduit) to use? batch_sys: which batch system to use (pbs, loadleveler or interactive) queues: which batch queues to use, what are their properties and in what order of preference? repository: which repository to charge when running in the batch system? (may be ignored on systems that don't charge for time). nthread_default: Number of UPC threads to use when running a test. Individual tests can over-ride this default. max_proc_per_node: The maximum number of processes per node that will be used when running the test. Note: read the header in the system configuration file to understand the format. Basically, you can construct structures similar to PERL data structures. Note also that strings of the form %NAME% will be replaced by ENV{NAME} when the file is read. Command line options: -nocompile Do not compile the test suite. -norun Generate the run scripts, but do not submit them to run. -sysconf=file [REQUIRED] Specify the system configuration file. -threads=N Specify default number of UPC threads. -pthreads=N Specify number of pthreads per process. -ppn=N Specify max number of processes per node. -network=s Specify the network [e.g. GASNet conduit]. -suite=name Specify name of test suite(s) to run. [NOTE: May be a comma seperated list or this option may be specified multiple times] -repo=name Specify name accounting repository. -filepat=string Specify a filename pattern. Only tests in the suite(s) that match this pattern will be compiled/run. -runlimit=N Default runtime limit (in seconds) for each test -features=str Comma-separated list of features supported by compilation environment Set automatically when running within upcr -add_feature Comma-separated list of features to be added without removing any automatically set -features= list -del_feature Comma-separated list of features to be removed Mixed -add_feature and -del_feature are processed in order Note that some of these values (threads, ppn, network, suite, repo, upccdir) will over-ride the values in the system configuration file. What does harness.pl do? ======================= (1) reads the system configuration file and parses the command line options. (2) constructs a date and timestamped subdirectory of logroot. The directory name will be of the form YYYYMMDD_HHMMSS. This directory will contain files named: log = harness log file. Look here if problems occur. compile.rpt = status of each test_app compliation run.rpt = status of each test_app execution qscript_XXX = automatically generated batch queue script used to run a set of test_apps. runlist_XXX = list of test_apps that have yet to be run by qscript_XXX. (2) for each specified test suite, the harness will: (a) read the harness configuration file (harness.conf). This file specifies what compilation/run tests to perform and how to determine success/failure of each test. (b) for each test in the harness: * compile the test (unless the -nocompile command line flag was specified) and search for the strings "error" and "warning" coming back to stdout/stderr. * record the success/failure in the compile.rpt file. * if the test was supposed to pass compilation and did, then submit the test to a runlist. The harness will select a list based on the total number of processes required to run the application. The harness computes the number of processes required by dividing the number of UPC threads by the number of pthreads (if specified) then selects a batch queue to use, and decides on the number of processes that will run on each node, and the number of nodes to use. All the jobs requiring the same number of processes will be put on the same runlist. A self-submitting queue script will be generated for each runlist. (3) After all suites have been processes, the harness will submit each of the queue scripts to the batch system, unless the -norun command line option was specified. (4) At some point, the job is run. The queue script executes the "runjobs.pl" PERL script with the runlist as one of its arguments. It also informs the runjobs.pl the total number of seconds allowed by this queue. The runjobs.pl script selects applications from the runlist that are expected to complete before the queue limits expire. The applications are executed under a watchdog, that will kill the job if it exceeds its specified time limit. After the application completes, runjobs.pl determine the success or failure of the run and writes an entry into the "run.rpt" file. If the runjobs.pl script is running out of time, without running all the jobs in the queue, it terminates with a special error code. The queue script captures this error code and re-submits itself to be run again if not all the apps were processed. The Testsuite configuration files: ================================= Each testsuite requires a configuration file named "harness.conf". As an example, here is part of harness.conf for the gwu suite: BEGIN_DEFAULT_CONFIG Flags: Files: $TESTNAME$.upc DynamicThreads: $DEFAULT$ StaticThreads: $DEFAULT$ CompileResult: pass PassExpr: ^Success: FailExpr: ^Failure: ExitCode: 0 BuildCmd: upcc AppArgs: TimeLimit: $DEFAULT$ END_DEFAULT_CONFIG # ------------------------------------------------------------ WildCard: <*>.upc # ------------------------------------------------------------ TestName: I_case_i CompileResult: fail # ------------------------------------------------------------ TestName: I_case1_ii CompileResult: fail Each test is defined by a series of stanzas beginning with the name "TestName". All subsequent stanzas define the attributes for this test. A special set of stanzas, between the BEGIN_DEFAULT_CONFIG and END_DEFAULT_CONFIG markers will apply to all tests, unless specifically re-defined by the test. In the above example, test I_case_i will inherit all the attributes from the default section, except that the value of CompileResult will be replaced with 'fail'. Note the "WildCard" stanza is special, and allows file globing to auto-generate test configurations rather than having to list them all. In this case, all files in the directory with of the form *.upc will generate a test configuration. The test configuration name will be generated by stripping the '.upc' from the name. For example, the file 'foo.upc' will generate a config named 'foo'. Note that the angle brackets in the wildcard expression determine which portion of the name to keep. Stanza definitions: * BuildCmd: specify whether to use 'make' or 'upcc' * Flags: additional flags to be passed to upcc or make * Files: list of files that need to be compiled in order to build the app. This is only needed if using upcc directly. * CompileResults: Is the test supposed to compile (pass) or is it a negative test, and supposed to generate a compiler error (fail). * DynamicThreads: list of UPC thread counts. The application will be run once for each thread number. Note that a dynamic-thread binary will not be built if the only number in the list is zero. * StaticThreads: list of UPC thread counts. A static-thread binary will be compiled for each value. No static binaries will be built if the only number in the list is zero. * PassExpr: A PERL regular expression. 0 => ignore. See below. * FailExpr: A PERL regular expression. 0 => ignore. See below. * ExitCode: The expected exit code when running the application or the string 'ignore'. * RunCmd: command to use for running application, defaults to upcrun can be preceded by an optional "feature-expression ;" * RunCmdArgs: additional arguments to pass to RunCmd * AppArgs: run-time arguments needed by the application can be preceded by an optional "feature-expression ;" * AppEnv: run-time environment variables needed by the application, in format suitable for env, ie: VAR="val" VAR2="val2" can be preceded by an optional "feature-expression ;" * BenchmarkResult: marks harness tests which should report performance information, and is a perl regex that is used to extract the performance metric and units from the program output. BenchmarkResult also implies SaveOutput. * RequireFeature: Marks a test as relying on a comma-separated list of features that must all be provided by the compilation environment (otherwise the test is skipped) May alternately be a feature-expression, where the test is selected iff the expression evaluates to true. Interesting feature values include: trans_bupc,upc_io,upc_collective,upc_memcpy_async,upc_memcpy_vis, [no]debug,[no]trace,[no]stats,packedsptr,structsptr, [no]pthreads,network_[networktype] - see harness log for complete list * ProhibitFeature: Marks a test as not supporting a comma-separated list of features If the compilation environment includes any of these features, the test is skipped May alternately be a feature-expression, where the test is skipped iff the expression evaluates to true. * KnownFailure: Marks a test which is known to fail in a particular way (see below) * WarningFilter: Gives a perl regex to be ignored in compiler output (see below) Special strings (not all valid in every context): * $TESTNAME$ will be replaced with the name of the test. * $DEFAULT$ will be replaced with the default number value specified in the harness.pl script. * $THREADS$ will be replaced with the number of UPC threads valid in Limit and TimeLimit * $VARNAME$ will be replaced with the value of $VARNAME in the environment, with an error if the variable is not set * !VARNAME! will be replaced with the value of $VARNAME in the environment, or empty (without error) if the variable is not set Runtime pass/fail: Three status values are recorded in the run.rpt file for each application. Timeout, ExitCode and Match. The most useful is the MATCH results. The harness scans the standard output of the test run looking for strings that match the FailExpr and PassExpr. If FailExpr is defined (non-zero) and a match is found, the test fails. If no FailExpr is found and a PassExpr is defined and found, the test passes. If PassExpr is defined but not found, the test fails. Definitions: Timeout ======== ok The test ran within the allowed limit FAILED The watchdog killed the job because it ran for too long ExitCode ======== na The watchdog killed the job, no exit code ignore Told to ignore the exit code ok The exit code matched ExitCode. FAILED The exit code differed from ExitCode Match ======== ignore no regular expressions to match ok no FailExpr is found, and PassExpr is found FAILED FailExpr is found, or no PassExpr is found KnownFailure ============ The KnownFailure syntax is very expressive to allow for precise definitions of known problems: KnownFailure: [mode[,mode...]]; feature-expression; bug desc... Any given test may have multiple KnownFailure lines, and they are processed in order - the first matching KnownFailure line (if any) will be the one whose description is reported for a failure. The mode list and feature-expression are optional, and default to a match, but the semicolons must be present. One exception: for backwards-compatibility this: KnownFailure: desc... is accepted as a shorthand for this: KnownFailure: ; ; desc... mode: comma-delimited list of failure modes to which this known failure applies. This should be set as specific as possible while still capturing all the known failure modes for the known bug described in the description. Legal values are: all - all failures compile-all - all compile-time failures compile-warning - compilation succeeds as expected, with an unexpected warning compile-failure - all other forms of compilation failures (errors, unexpected success, etc) run-all - all runtime failures run-match - output match failure at runtime run-crash - crash at runtime run-time - timeout at runtime run-mem - memory exhaustion at runtime run-exit - bad exit code at runtime feature-expression: boolean expression of compiler features (eg "network_lapi && (cpu_64 || os_aix) && nodebug") that must return true for a configuration known to exhibit the bug described in the description. Consult upcc -version to see the list of features active for a given configuration. The tokens "_threads" expands to the number of UPC threads WarningFilter ============= The WarningFilter syntax allows for specific lines of output from the compile step to be ignored when detecting compile-warning or compile-failure. WarningFilter: [feature-expression;] perl regex