MITgcm Development HOWTO Ed Hill III
eh3@mit.edu
0.01 2003-08-07 eh3 Initial version. This document describes how to develop software for the MITgcm project.
Introduction The purpose of this document is to help new developers get "up to speed" with MITgcm development. New Versions of This Document You can obtain the latest version of this document online in various formats. Feedback and corrections If you have questions or comments about this document, please feel free to contact the authors. Background User Manual Before jumping into development, please familiarize yourself with the MITgcm user manual . This document contains volumes of useful information and is included here by reference. Prerequisites To develop for MITgcm project you will need a UNIX or UNIX-like set of build tools including the following:
CVS client make or (preferably) GNU make FORTRAN compiler C compiler [ba]sh and [t]csh shells PERL LaTeX and LaTeX2HTML
Essentially all of the work described here has been tested on recent versions of Red Hat Linux (eg. 7.3 through 9). Except where noted, all shell commands will be provided using bash syntax.
CVS Repository Layout Unlike many open source projects, the MITgcm CVS tree does not follow a simple "src", "docs", "share", and "test" directory layout. Instead, there are multiple higher-level directories that each, to some extent, depend upon the presence of the others. The tree currently resembles: gcmpack/ MITgcm-contrib contributed code CS-regrid goes into utils cvspolicy.html -save- CVSROOT -save- development experimental stuff manual -save- misc -?- MITgcm code adjoint fold into genmake bin stub for ecco build compare01 old from 20th century diags timeave f77 in pkgs now doc tags -- connect to real docs? eesupp cnh? exe ecco user build ,- jobs runtime shell scripts for | various platforms | lsopt line search m| model main dynamics (core) e| optimization_drivers ? r| optim line search interface g| pkg alternate and optional numerics, etc. e|- tools ?| tutorial_examples documented tests | only populated on release1 branch | and not validated during "testscript" '- utils verification std tests mitgcmdoc -> manual -remove- mitgcm.org build web site models -?- packages -?- preprocess -?- tmp -?- Efforts are underway to reduce the complexity. Branches As shown in the online ViewCVS-generated tree, the MITgcm codebase is split into to two branches or "lines" under which development proceeds. These two lines are referred to as the "MAIN" and "ecco" versions of the code. While not identical, the bulk of the MAIN and ecco lines are composed of files from the same codebase. Periodically, a "Release" branch is formed from the "MAIN" development branch. This is done in order to create a relatively stable reference point for both users and developers. The intent is that once a relese branch has been created, only bug-fixes will be added to it. Meanwhile, development (which might "break" or otherwise render invalid the documentation, tutorials, and/or examples contained within a release branch) is allowed to continue along the MAIN and ecco lines. Tagging The intent of tagging is to create "known-good" checkpoints that developers can use as references. Traditionally, MITgcm tagging has maintained the following conventions: Developer checks out code into a local CVS-managed directory, makes various changes/additions, tests these edits, and eventually reaches a point where (s)he is satisfied that the changes form a new "useful" point in the evolution of the code. The developer then runs the testscript shell script to see if any problems are introduced. While not intended to be exhaustive, the test cases within the verification directory do provide some indication whether gross errors have been introduced. Having satisfied him- or herself that the changes are ready to be committed to the CVS repository, the developer then: adds a "checkpointXY_pre" comment (where X is a checkpoint number and Y is a letter) to the tag-index file and checks it into the CVS repository submits the set of changes to the CVS repository and adds comments to tag-index describing what the changes are along with a matching "checkpointXY_post" entry The result of this tagging procedure is a sequence of development checkpoints with comments which resembles: checkpoint50e_post o make KPP work with PTRACERS - fix gad_calc_rhs to call new routine kpp_transport_ptr, which is nearly a copy of kpp_transport_s - there is no analogue to SurfaceTendencyS, so I have to use gPtr(of the surface layer) instead o add a new platform SunFire+mpi (SunFire 15000) to genmake checkpoint50e_pre checkpoint50d_post o change kpp output from multiple-record state files to single-record state files analogous to write_state.F o reduce the output frequency of cg3d-related stuff to the monitor frequency, analogous to the cg2d-related output. o fix small problem with in ptracers_write_checkpoint.F: len(suff)=512, so that writing to internal file fn (with length 512) fails. checkpoint50d_pre This information can be used to refer to various stages of the code development. For example, bugs can be traced to individual sets of CVS checkins based upon their first appearance when comparing the results from different checkpoints. Editing the Documentation Getting the Docs and Code The first step towards editing the documentation is to checkout a copy of code, docs, and build scripts from the CVS server using: $ export CVS_RSH=ssh $ export CVSROOT=':ext:NAME@mitgcm.org:/u/gcmpack' $ mkdir scratch $ cvs co MITgcm manual mitgcm.org These commands extract the necessary information from the CVS server and create a temporary (called scratch) directory for the storage of the HTML and other files that will be created. Please note that you must either create scratch as shown or edit the various Makefiles and scripts used to create the documentation. Editing the Documentation The documentation is contained in the manual directory in a raw LaTeX format. The main document is manual.tex and it uses \input{}s to include the chapters and subsections. Since the same LaTeX source is used to produce PostScript, PDF, and HTML output, care should be taken to follow certain conventions. Two of the most important are the usage of the \filelink{}{} and \varlink{}{} commands. Both of these commands have been defined to simplify the connection between the automatically generated ("code browser") HTML and the HTML version of the manual produced by LaTeX2HTML. They each take two arguments (corresponding to the contents of the two sets of curly braces) which are the text that the author wishes to be "wrapped" within the link, and a specially formatted link thats relative to the MITgcm directory within the CVS tree. The result is a command that resembles either a reference to a variable or subroutine name such as \varlink{tRef}{tRef}, or a reference to a file such as \varlink{tRef}{path-to-the-file_name.F} where the absolute path to the file is of the form /foo/MITgcm/path/to/the/file_name.F (please note how the leading "/foo/MITgcm" component of the path is dropped leaving the path relative to the head of the code directory and each directory separator "/" is turned into a "-") Building the Documentation Given the directory structure of , the entire documentation for the web site can be built using: $ cd mitgcm.org/devel/buildweb $ make All Which builds the PDF from the LaTeX source, creates the HTML output from the LaTeX source, parses the FORTRAN code base to produce a hyperlinked HTML version of the source, and then determines the cross-linking between the various HTML components. If there are no errors, the result of the build process (which can take 30+ minutes on a P4/2.5Ghz) will be contained within a single directory called scratch/dev_docs. This is a freshly built version of the entire on-line users manual. If you have the correct permissions, it can be directly copied to the web server area: $ mv scratch/dev_docs /u/u0/httpd/html and the update is complete. Coding for MITgcm Build Tools Many Open Source projects use the "GNU Autotools" to help streamline the build process for various Unix and Unix-like architectures. For a user, the result is the common "configure" (that is, "./configure && make && make install") commands. For MITgcm, the process is similar. Typical commands are: $ genmake -mods=../code $ make depend $ make The following sections describe the individual steps in the build process. The <filename>genmake2</> Utility Please note that the older genmake is deprecated and will eventually be replaced by genmake2. This HOWTO only describes the newer tool. The first step in any MITgcm build is to create a Unix-style Makefile which will be parsed by make to specify how to compile the MITgcm source files. For more detailed descriptions of what the make tools are and how they are used, please see: http://www.gnu.org/software/make/make.html http://www.oreilly.com/catalog/make2/ Genmake can often be invoked successfully with a command line as simple as: $ genmake2 -mods=../code However, some systems (particularly commercial Unixes that lack a more modern "/bin/sh" implementation or that have shells installed in odd locations) may require an explicit shell invocation such as one of the following: $ /usr/bin/sh genmake2 -make=gmake -mods=../code $ /opt/gnu/bin/bash genmake2 -ieee -make=/usr/local/bin/gmake -mods=../code The genmake2 code has been written in a Bourne and BASH (v1) compatible syntax so it should work with most "sh" and all recent "bash" implementations. As the name implies, genmake2 generates a Makefile. It does so by first parsing the information supplied from the following sources a gemake_local file in the current directory directly from command-line options an "options file" as specified by the command-line option -optfile='FILENAME' then checking certain dependency rules (the package dependencies), and finally writing a Makefile based upon the source code that it finds. For convenience within various Unix shells, genmake2 supports both "long"- and "short"-style options. A complete list of the available options can be obtained from: $ genmake2 -help The most important options for genmake2 are: --optfile=/PATH/FILENAME This specifies the "options file" that should be used for a particular build. The options file is a convenient and machine-indepenent way of specifying parameters such as the FORTRAN compiler (FC=), FORTRAN compiler optimization flags (FFLAGS=), and the locations of various platform- and/or machine-specific tools (eg. MAKEDEPEND=). As with genmake2, all options files should be written to be compatible with Bourne--shell ("sh" or "BASH v1") syntax. Examples of various options files can be found in $ROOTDIR/tools/build_options. If no "optfile" is specified (either through the command lin or the environment variable), genmake2 will try to make a reasonable guess from the list provided in $ROOTDIR/tools/build_options. The method used for making this guess is to first determine the combination of operating system and hardware (eg. "linux_ia32") and then find a working Fortran compiler within the user's path. When these three items have been identified, genmake2 will try to find an optfile that has a matching name. Everyone is encouraged to submit their options files to the MITgcm project for inclusion (please send to MITgcm-support@mitgcm.org). We are particularly grateful for options files tested on new or unique platforms! -pdepend=/PATH/FILENAME This specifies the dependency file used for packages. If not specified, the default dependency file is $ROOTDIR/pkg/pkg_depend. The syntax for this file is parsed on a line-by-line basis where each line containes either a comment ("#") or a simple "PKGNAME1 (+|-)PKGNAME2" pairwise rule where the "+" or "-" symbol specifies a "must be used with" or a "must not be used with" relationship, respectively. If no rule is specified, then it is assumed that the two packages are compatible and will function either with or without each other. -pdefault=PKG -pdefault='PKG1 [PKG2 PKG3 ...]' This option specifies the default set of packages to be used. If not set, the default package list will be read from $ROOTDIR/pkg/pkg_default. -adof=/path/to/file -adoptfile=/path/to/file This option specifies the "adjoint" or automatic differentiation options file to be used. The file is analogous to the "optfile" defined above but it specifies information for the AD build process. The default file is located in $ROOTDIR/tools/adjoint_options/adjoint_default and it defines the "TAF" and "TAMC" compilers. An alternate version is also available at $ROOTDIR/tools/adjoint_options/adjoint_staf that selects the newer "STAF" compiler. As with any compilers, it is helpful to have their directories listed in your $PATH environment variable. -mods=DIR -mods='DIR1 [DIR2 ...]' This option specifies a list of directories containing "modifications". These directories contain files with names that may (or may not) exist in the main MITgcm source tree but will be overridden by any identically-named sources within the "MODS" directories. The order of precedence for this "name-hiding" is as follows: "MODS" directories (in the order given) Packages either explicitly specified or provided by default (in the order given) Packages included due to package dependencies (in the order that that package dependencies are parsed) The "standard dirs" (which may have been specified by the "-standarddirs" option) -make=/path/to/gmake Due to the poor handling of soft-links and other bugs common with the make versions provided by commercial Unix vendors, GNU make (sometimes called gmake) should be preferred. This option provides a means for specifying the make program to be used. A successful run of genmake2 will produce a Makefile, a PACKAGES_CONFIG.h file, and various convenience files used for the automatic differentiation process. In general, it is best to use genmake2 on a "clean" directory that is free of all source (*.[F,f],*.[F,f]90) and header (*.h,*.inc) files. Generally, this can be accomplished in an "un-clean" directory by running "make CLEAN" followed by "make makefile". Using the <filename>Makefile</> Once a Makefile has been created using genmake2, one can build a "standard" (forward simulator) executable using: $ make CLEAN $ make depend $ make The "make CLEAN" step will remove any stale source files, include files, and links. It is strongly recommended for "un-clean" directories which may contain the (perhaps partial) results of previous builds. Such "debris" can interfere with the next stage of the build. The "make depend" step will create a large number of symbolic links from the local directory to the source file locations. It also parses these files and creates an extensive list of dependencies within the Makefile itself. The links that exist at this stage are mostly "large F" files (*.F and *.F90) that need to be processed by a C preprocessor ("CPP"). Since "make depend" edits the Makefile, it is important not to skip this step! The final "make" invokes the C preprocessor to produce the "little f" files (*.f and *.f90) and then compiles them to object code using the specified FORTRAN compiler and options. An intermediate script is often used during this stage to further process (usually, make simple substitutions) custom definitions such as variable types within the source files. This additional stage is necessary in order to overcome some of the inconsistencies in the sizes of objects (bytes) between different compilers. The result of the build process is an executable with the name mitgcmuv. In addition to the forward simulator described above, the Makefile also has a number of targets that can be used to produce various adjoint and tangent-linear builds for optimization and other parameter-sensitivity problems. The additional targets within the Makefile are: make adall This target produces an mitgcmuv_ad executable using the taf or staf adjoint compiler. See the genmake2 "-adof" option for compiler selection. make ftlall Similar to make adall above, this produces... Please report any compilation failures or other build problems to the MITgcm-support@mitgcm.org list. The Verification Suite The MITgcm CVS tree (within the $ROOTDIR/verification/ directory) includes more than a dozen examples intended for regression testing. Each one of these example directories contains "known-good" output files along with all the input (including both code and data files) required for their re-calculation. These example directories are further broken down into sets of subdirectories (eg. /input, /code) intended to expedite the testing process. The <filename>testreport</> Utility Also included in $ROOTDIR/verification/ are shell scripts for automated testing. The newest script (which was written to work with genmake2) is called testreport. This script can be used to build different versions of the MITgcm code, run the various examples, compare the output, and (if specified) email the results of each one of these tests to a central repository. On some systems, the testreport script can be run with a command line as simple as: $ cd verification $ ./testreport -ieee However, some systems (those lacking or wiht a broken "/bin/sh") may require an explicit shell invocation such as: $ sh ./testreport -ieee -t 'exp0 exp4' $ /some/path/to/bash ./testreport -ieee -t 'ideal_2D_oce lab_sea natl_box' The testreport script accepts a number of command-line options which can be listed using the -help option. The most important ones are: -ieee If allowed by the compiler (as defined in the "optfile"), use IEEE arithmetic. This option, along with the GCC compiler, is how the standard results were produced. -tdir TESTDIR -tdir 'TDIR1 TDIR2 [...]' This option specifies the test directory or list of test directories that should be used. Each of these entries should exactly (note: they are case sensitive!) match the names of directries in $ROOTDIR/verification/. If this option is omitted, then all directories that are properly formatted (that is, containing an input sub-directory and a results/output.txt file) will be used. -optfile=/PATH/FILENAME -optfile '/PATH/F1 [/PATH/F2 ...]' This specifies a list of "options files" that will be passed to genmake2. If multiple options files are used (say, to test different compilers or different sets of options for the same compiler), then each options file will be used with each of the test directories. -addr EMAIL -addr 'EMAIL1 EMAIL2 [...]' Send the results (namely, output.txt, genmake_local, genmake_state, and Makefile) to the specified email addresses. The results are gzipped, placed in a tar file, MIME encoded, and sent to the specified address. If no email addresses are specified, no mail is sent. -mpi If the necessary files (TESTDIR/code/CPP_EEOPTIONS.h_mpi and TESTDIR/code/SIZE.h_mpi) exist, then use them for an MPI--enabled run. Note that the use of MPI typically requires a special command option (see "-command" below) to invoke the MPI executable. Examples of PBS scripts using MPI with testreport can be found in the MITgcm-contrib area -command='some command to run' For some tests, particularly MPI runs, the default "make output.txt" is not sufficient. This option allows a more general command (or shell script) to be invoked. Examples of PBS scripts using MPI with testreport can be found in the MITgcm-contrib area The testreport script will write progress to the screen (stdout) as it runs. In addition, it will create a tr_out.txt file that contains a brief comparison of the current output with the "known-good" output. Creating MITgcm Packages Optional parts of code have been separated from the MITgcmUV core driver code and organised into packages. The packaging structure provides a mechanism for maintaining suites of code, specific to particular classes of problems, in a way that is cleanly separated from the generic fluid dynamical engine. The MITgcmUV packaging structure is described below using generic package names ${pkg}. A concrete examples of a package is the code for implementing GM/Redi mixing. This code uses the package name Chris's Notes... MITgcmUV Packages ================= Optional parts of code are separated from the MITgcmUV core driver code and organised into packages. The packaging structure provides a mechanism for maintaining suites of code, specific to particular classes of problem, in a way that is cleanly separated from the generic fluid dynamical engine. The MITgcmUV packaging structure is describe below using generic package names ${pkg}. A concrete examples of a package is the code for implementing GM/Redi mixing. This code uses the package name * ${PKG} = GMREDI * ${pkg} = gmredi * ${Pkg} = gmRedi Package states ============== Packages can be any one of four states, included, excluded, enabled, disabled as follows: included(excluded) compile time state which includes(excludes) package code and routine calls from compilation/linking etc... enabled(disabled) run-time state which enables(disables) package code execution. Every call to a ${pkg}_... routine from outside the package should be placed within both a #ifdef ALLOW_${PKG} ... block and a if ( use${Pkg} ) ... then block. Package states are generally not expected to change during a model run. Package structure ================= o Each package gets its runtime configuration parameters from a file named "data.${pkg}" Package runtime config. options are imported into a common block held in a header file called "${PKG}.h". o The core driver part of the model can check for runtime enabling or disabling of individual packages through logical flags use${Pkg}. The information is loaded from a global package setup file called "data.pkg". The use${Pkg} flags are not used within individual packages. o Included in "${PKG}.h" is a logical flag called ${Pkg}IsOn. The "${PKG}.h" header file can be imported by other packages to check dependencies and requirements from other packages ( see "Package Boot Sequence" section). NOTE: This procedure is not presently implemented, ----- neither for kpp nor for gmRedi. CPP Flags ========= 1. Within the core driver code flags of the form ALLOW_${PKG} are used to include or exclude whole packages. The ALLOW_${PKG} flags are included from a PKG_CPP_OPTIONS block which is currently held in-line in the CPP_OPTIONS.h header file. e.g. Core model code ..... #include "CPP_OPTIONS.h" : : : #ifdef ALLOW_${PKG} if ( use${Pkg} ) CALL ${PKG}_DO_SOMETHING(...) #endif 2. Within an individual package a header file, "${PKG}_OPTIONS.h", is used to set CPP flags specific to that package. It is not recommended to include this file in "CPP_OPTIONS.h". Package Boot Sequence ===================== Calls to package routines within the core code timestepping loop can vary. However, all packages follow a required "boot" sequence outlined here: 1. S/R PACKAGES_BOOT() : CALL OPEN_COPY_DATA_FILE( 'data.pkg', 'PACKAGES_BOOT', ... ) 2. S/R PACKAGES_READPARMS() : #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_READPARMS( retCode ) #endif 3. S/R PACKAGES_INIT_FIXED() : #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_INIT_FIXED( retCode ) #endif 4. S/R PACKAGES_CHECK() : #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_CHECK( retCode ) #else if ( use${Pkg} ) & CALL PACKAGES_CHECK_ERROR('${PKG}') #endif 5. S/R PACKAGES_INIT_VARIABLES() : #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_INIT_VARIA( ) #endif Package Output ============== 6. S/R DO_THE_MODEL_IO #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_DIAGS( ) #endif 7. S/R PACKAGES_WRITE_PICKUP() #ifdef ALLOW_${PKG} if ( use${Pkg} ) & CALL ${PKG}_WRITE_PICKUP( ) #endif Description =========== - ${PKG}_READPARMS() is responsible for reading in the package parameters file data.${pkg}, and storing the package parameters in "${PKG}.h". -> called from INITIALISE_FIXED in PACKAGES_READPARMS - ${PKG}_INIT_FIXED() is responsible for completing the internal setup of a package. -> called from INITIALISE_FIXED in PACKAGES_INIT_FIXED note: 1) some pkg use instead: CALL ${PKG}_INITIALISE ( or the old form CALL ${PKG}_INIT ) 2) for simple pkg setup, this part is done inside ${PKG}_READPARMS - ${PKG}_CHECK() is responsible for validating basic package setup and inter-package dependencies. ${PKG}_CHECK can import other package parameters it may need to check. This is done through header files "${PKG}.h". It is assumed that parameters owned by other packages will not be reset during ${PKG}_CHECK(). -> called from INITIALISE_FIXED in PACKAGES_CHECK - ${PKG}_INIT_VARIA() is responsible for fill-in all package variables with an initial value. Contains eventually a call to ${PKG}_READ_PICKUP that will read from a pickup file the package variables required to restart the model. This routine is called after the core model state has been completely initialised but before the core model timestepping starts. -> called from INITIALISE_VARIA in PACKAGES_INIT_VARIABLES note: the name ${PKG}_INIT_VARIA is not yet standard and some pkg use for e.g. ${PKG}_INI_VARS, ${PKG}_INIT_VARIABLES, or the old form ${PKG}_INIT - ${PKG}_DIAGS() is responsible for writing time-average diagnostics to output files (but the cumulating step is done within the package main S/R). Can also contain other diagnostics (.e.g. CALL ${PKG}_MONITOR) and write snap-shot fields that are hold in common blocks. Other temporary fields are directly dump to file where they are available. NOTE: this part does not yet have a standard form and should be called from a package dedicated S/R such as PACKAGE_WRITE_DIAGS -> called within DO_THE_MODEL_IO - ${PKG}_WRITE_PICKUP() is responsible for writing a package pickup file when necessary for a restart. (found also the old name: ${PKG}_WRITE_CHECKPOINT ) -> called from FORWARD_STEP and THE_MODEL_MAIN in PACKAGES_WRITE_PICKUP Summary ======= - CPP options: ----------------------- * ALLOW_${PKG} include/exclude package for compilation - FORTRAN logical: ----------------------- * use${Pkg} enable package for execution at runtime -> declared in PARAMS.h * ${Pkg}IsOn for package cross-dependency check -> declared in ${PKG}.h N.B.: Not presently used! - header files ----------------------- * ${PKG}_OPTIONS.h has further package-specific CPP options * ${PKG}.h package-specific common block variables, fields - FORTRAN source files ----------------------- * ${pkg}_readparms.F reads parameters from file data.${pkg} * ${pkg}_init_fixed.F complete the package setup * ${pkg}_check.F checks package dependencies and consistencies * ${pkg}_init_varia.F initialises package-related fields * ${pkg}_... .F package source code * ${pkg}_diags.F write diagnostics to file. * ${pkg}_write_pickup.F write a package pickup file to restart the model - parameter file ----------------------- * data.${pkg} parameter file