% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.16 2005/07/18 20:45:27 molod dead $ % $Name: $ \section{NetCDF I/O Integration: MNC} \label{sec:pkg:mnc} \begin{rawhtml} \end{rawhtml} The \texttt{mnc} package is a set of convenience routines written to expedite the process of creating, appending, and reading NetCDF files. NetCDF is an increasingly popular self-describing file format \cite{rew:97} intended primarily for scientific data sets. An extensive collection of NetCDF reference papers, user guides, software, FAQs, and other information can be obtained from UCAR's web site at: \begin{rawhtml} \end{rawhtml} \begin{verbatim} http://www.unidata.ucar.edu/packages/netcdf/ \end{verbatim} \begin{rawhtml} \end{rawhtml} Since it is a ``wrapper'' for netCDF, MNC depends upon the Fortran-77 interface included with the standard netCDF v3.x library which is often called \texttt{libnetcdf.a}. Please contact your local systems administrators or the \begin{rawhtml} \end{rawhtml} MITgcm-support \begin{rawhtml} \end{rawhtml} list for help building and installing netCDF for your particular platform. \subsection{Using MNC} \subsubsection{MNC Configuration} As with all MITgcm packages, MNC can be turned on or off at compile time using the \texttt{packages.conf} file or the \texttt{genmake2} \texttt{-enable=mnc} or \texttt{-disable=mnc} switches. While MNC is likely to work ``as is'', there are a few compile--time constants that may need to be increased for simulations that employ large numbers of tiles within each process. Note that the important quantity is the maximum number of tiles \textbf{per process}. Since MPI configurations tend to distribute large numbers of tiles over relatively large numbers of MPI processes, these constants will rarely need to be increased. If MNC runs out of space within its ``lookup'' tables during a simulation, then it will provide an error message along with a recommendation of which parameter to increase. The parameters are all located within \filelink{pkg/mnc/mnc\_common.h}{pkg-mnc-mnc_common.h} and the ones that may need to be increased are: \begin{center} {\footnotesize \begin{tabular}[htb]{|l|r|l|}\hline \textbf{Name} & \textbf{Default} & \textbf{Description} \\\hline & & \\ \texttt{MNC\_MAX\_ID} & 1000 & \textbf{IDs for various low-level entities} \\ \texttt{MNC\_MAX\_INFO} & 400 & \textbf{IDs (mostly for object sizes)} \\ \texttt{MNC\_CW\_MAX\_I} & 150 & \textbf{IDs for the ``wrapper'' layer} \\\hline \end{tabular} } \end{center} In those rare cases where MNC ``out-of-memory'' error messages are encountered, it is a good idea to increase the too-small parameter by a factor of \textbf{2--10} in order to avoid wasting time on an iterative compile--test sequence. \subsubsection{MNC Inputs} Like most MITgcm packages, all of MNC can be turned on/off at runtime using a single flag in \texttt{data.pkg} \begin{center} {\footnotesize \begin{tabular}[htb]{|l|c|l|l|}\hline \textbf{Name} & \textbf{T} & \textbf{Default} & \textbf{Description} \\\hline & & & \\ \texttt{useMNC} & L & \texttt{.FALSE.} & overall MNC ON/OFF switch \\\hline \end{tabular} } \end{center} One important MNC--related flag is present in the main \texttt{data} namelist file in the \texttt{PARM03} section and it is: \begin{center} {\footnotesize \begin{tabular}[htb]{|l|c|l|l|}\hline \textbf{Name} & \textbf{T} & \textbf{Default} & \textbf{Description} \\\hline & & & \\ \texttt{outputTypesInclusive} & L & \texttt{.FALSE.} & use all available output ``types'' \\\hline \end{tabular} } \end{center} which specifies that turning on MNC for a particular type of output should not simultaneously turn off the default output method as it normally does. Usually, this option is only used for debugging purposes since it is inefficient to write output types using both MNC and MDSIO or ASCII output. This option can also be helpful when transitioning from MDSIO to MNC since the output can be readily compared. For run-time configuration, most of the MNC--related model parameters are contained within a Fortran namelist file called \texttt{data.mnc}. The availabe parameters currently include: \begin{center} {\footnotesize \begin{tabular}[htb]{|l|c|l|l|}\hline \textbf{Name} & \textbf{T} & \textbf{Default} & \textbf{Description} \\\hline & & & \\ \texttt{mnc\_use\_outdir} & L & \texttt{.FALSE.} & create a directory for output \\ \ \ \texttt{mnc\_outdir\_str} & S & \texttt{'mnc\_'} & output directory name \\ \ \ \texttt{mnc\_outdir\_date} & L & \texttt{.FALSE.} & embed date in the outdir name \\ \ \ \texttt{mnc\_outdir\_num} & L & \texttt{.FALSE.} & optional \\ \texttt{pickup\_write\_mnc} & L & \texttt{.FALSE.} & use MNC to write pickup files \\ \texttt{pickup\_read\_mnc} & L & \texttt{.FALSE.} & use MNC to read pickup files \\ \texttt{mnc\_use\_indir} & L & \texttt{.FALSE.} & use a directory (path) for input \\ \ \ \texttt{mnc\_indir\_str} & S & \texttt{''} & input directory (or path) name \\ \texttt{snapshot\_mnc} & L & \texttt{.FALSE.} & write \texttt{snapshot} output w/MNC \\ \texttt{monitor\_mnc} & L & \texttt{.FALSE.} & write \texttt{monitor} output w/MNC \\ \texttt{timeave\_mnc} & L & \texttt{.FALSE.} & write \texttt{timeave} output w/MNC \\ \texttt{autodiff\_mnc} & L & \texttt{.FALSE.} & write \texttt{autodiff} output w/MNC \\ \texttt{mnc\_max\_fsize} & R & 2.1e+09 & max allowable file size \\ \texttt{readgrid\_mnc} & L & \texttt{.FALSE.} & read grid quantities using MNC \\ \texttt{mnc\_echo\_gvtypes} & L & \texttt{.FALSE.} & list pre-defined ``types'' (debug) \\\hline \end{tabular} } \end{center} Unlike the older MDSIO method, MNC has the ability to create or use existing output directories. If either \texttt{mnc\_outdir\_date} or \texttt{mnc\_outdir\_num} is true, then MNC will try to create directories on a \textit{PER PROCESS} basis for its output. This means that a single directory will be created for a non-MPI run and multiple directories (one per MPI process) will be created for an MPI run. This approach was chosen since it works safely on both shared global file systems (such as NFS and AFS) and on local (per-compute-node) file systems. And if both \texttt{mnc\_outdir\_date} and \texttt{mnc\_outdir\_num} are false, then the MNC package will assume that the directory specified in \texttt{mnc\_outdir\_str} already exists and will use it. This allows the user to create and specify directories outside of the model. For input, MNC can use a single global input directory. This is a just convenience that allows MNC to gather all of its input files from a path other than the current working directory. As with MDSIO, the default is to use the current working directory. The flags \texttt{snapshot\_mnc}, \texttt{monitor\_mnc}, \texttt{timeave\_mnc}, and \texttt{autodiff\_mnc} allow the user to turn on MNC for particular ``types'' of output. If a type is selected, then MNC will be used for all output that matches that type. This applies to output from the main model and from all of the optional MITgcm packages. Mostly, the names used here correspond to the names used for the output frequencies in the main \texttt{data} namelist file. The \texttt{mnc\_max\_fsize} parameter is a convenience added to help users work around common file size limitations. On many computer systems, either the opterating system, the file system(s), and/or the netCDF libraries are unable to handle files greater than two or four gigabytes in size. The MNC package is able to work within this limitation by creating new files which grow along the netCDF ``unlimited'' (usually, time) dimension. The default value for this parameter is just slightly less than 2GB which is safe on virtually all operating systems. Essentially, this feature is a way to intelligently and automatically split files output along the unlimited dimension. On systems that support large file sizes, these splits can be readily concatenated (that is, un-done) using tools such as the netCDF Operators (with \texttt{ncrcat}) which is available at: \begin{rawhtml} \end{rawhtml} \begin{verbatim} http://nco.sourceforge.net/ \end{verbatim} \begin{rawhtml} \end{rawhtml} Additional MNC--related parameters may be contained within each package. Please see the individual packages for descriptions of their use of MNC. \subsubsection{MNC Output} Depending upon the flags used, MNC will produce zero or more directories containing one or more netCDF files as output. These files are either mostly or entirely compliant with the netCDF ``CF'' convention (v1.0) and any conformance issues will be fixed over time. The patterns used for file names are: \begin{center} \texttt{BASENAME.nIter0.tileNum.seqNum.nc} \end{center} and an example is: \begin{center} \texttt{grid.0000000000.000001.0000.nc} \end{center} where \texttt{BASENAME} is the name selected to represent a set of variables written together, \texttt{nIter0} is the starting iteration number as specified in the main \texttt{data} namelist input file and written in a zero-filled 10-digit format, \texttt{tileNum} is the six-digit zero-filled tile number, \texttt{seqnum} is a four-digit zero-filled sequence number used when maximum allowable files sizes are too small to contain all of the output for a particular type within one run (new files are created with sequential numbers as files reach the maximum file size limit), and \texttt{.nc} is the file suffix specified by the current netCDF ``CF'' conventions. Some example \texttt{BASENAME} values are: \begin{description} \item[grid] contains the variables that describe the various grid constants related to locations, lengths, areas, etc. \item[state] contains the variables output at the snapshot or \texttt{dumpFreq} time frequency \item[pickup.ckptA, pickup.ckptB] are the ``rolling'' checkpoint files \item[tave] contains the time-averaged quantities from the main model \end{description} All MNC output is currently done in a ``file-per-tile'' fashion since most NetCDF v3.x implementions cannot write safely within MPI or multi-threaded environments. This tiling is done in a global fashion and the tile numbers are appended to the base names as described above. Some scripts to manipulate MNC output are available at \texttt{MITgcm/utils/matlab/} which includes a spatial ``assembly'' script called \texttt{MITgcm/utils/matlab/mnc\_assembly.m}. More general manipulations can be performed on netCDF files with \begin{rawhtml} \end{rawhtml} \begin{verbatim} the NetCDF Operators (``NCO'') at http://nco.sourceforge.net \end{verbatim} \begin{rawhtml} \end{rawhtml} or with \begin{rawhtml} \end{rawhtml} \begin{verbatim} the Climate Data Operators (``CDO'') at http://www.mpimet.mpg.de/~cdo/ \end{verbatim} \begin{rawhtml} \end{rawhtml} Unlike the older MDSIO routines, MNC reads and writes variables on different ``grids'' depending upon their location on, for instance, an Arakawa C--grid. The following table provides examples: \begin{center} {\footnotesize \begin{tabular}[htb]{|l|c|c|c|}\hline \textbf{Name} & \textbf{C--grid location} & \textbf{\# in X} & \textbf{\# in Y} \\\hline Temperature & mass & \texttt{sNx} & \texttt{sNy} \\ Salinity & mass & \texttt{sNx} & \texttt{sNy} \\ U velocity & U & \texttt{sNx+1} & \texttt{sNy} \\ V velocity & V & \texttt{sNx} & \texttt{sNy+1} \\ Vorticity & vorticity & \texttt{sNx+1} & \texttt{sNy+1} \\\hline \end{tabular} } \end{center} and the intent is two--fold: \begin{enumerate} \item For some grid topologies it is impossible to output all quantities using only \texttt{sNx,sNy} arrays for every tile. Two examples of this failure are the missing corners problem for vorticity values on the cubesphere and the velocity edge values for some open--boundary domains. \item Writing quantities located on velocity or vorticity points with the above scheme introduces a very small data redundancy. However, any slight inconvenience is easily offset by the ease with which one can, on every individual tile, interpolate these values to mass points without having to perform an ``exchange'' (or ``halo-filling'') operation to collect the values from neighboring tiles. This makes the most common post--processing operations much easier to implement. \end{enumerate} \subsection{MNC Troubleshooting} \subsubsection{Build Troubleshooting} In order to build MITgcm with MNC enabled, the netCDF v3.x Fortran-77 (not Fortran-90) library must be available. This library is compposed of a single header file (called \texttt{netcdf.inc}) and a single library file (usually called \texttt{libnetcdf.a}) and it must be built with the same compiler (or a binary-compatible compiler) with compatible compiler options as the one used to build MITgcm. For more details concerning the netCDF build and install process, please visit the netCDF home page at: \begin{rawhtml} \end{rawhtml} \begin{verbatim} http://www.unidata.ucar.edu/packages/netcdf/ \end{verbatim} \begin{rawhtml} \end{rawhtml} which includes an extensive list of known--good netCDF configurations for various platforms \subsubsection{Runtime Troubleshooting} Please be aware of the following: \begin{itemize} \item As a safety feature, the MNC package does not, by default, allow pre-existing files to be appended to or overwritten. This is in contrast to the older MDSIO package which will, without any warning, overwrite existing files. If MITgcm aborts with an error message about the inability to open or write to a netCDF file, please check \textbf{first} whether you are attempting to overwrite files from a previous run. \item The constraints placed upon the ``unlimited'' (or ``record'') dimension inherent with NetCDF v3.x make it very inefficient to put variables written at potentially different intervals within the same file. For this reason, MNC output is split into groups of files which attempt to reflect the nature of their content. \item On many systems, netCDF has practical file size limits on the order of 2--4GB (the maximium memory addressable with 32bit pointers or pointer differences) due to a lack of operating system, compiler, and/or library support. The latest revisions of netCDF v3.x have large file support and, on some operating systems, file sizes are only limited by available disk space. \item There is an 80 character limit to the total length of all file names. This limit includes the directory (or path) since paths and file names are internally appended. Generally, file names will not exceed the limit and paths can usually be shortened using, for example, soft links. \item MNC does not (yet) provide a mechanism for reading information from a single ``global'' file as can be done with the MDSIO package. This is in progress. \end{itemize} \subsection{MNC Internals} The \texttt{mnc} package is a two-level convenience library (or ``wrapper'') for most of the NetCDF Fortran API. Its purpose is to streamline the user interface to NetCDF by maintaining internal relations (look-up tables) keyed with strings (or names) and entities such as NetCDF files, variables, and attributes. The two levels of the \texttt{mnc} package are: \begin{description} \item[Upper level] \ The upper level contains information about two kinds of associations: \begin{description} \item[grid type] is lookup table indexed with a grid type name. Each grid type name is associated with a number of dimensions, the dimension sizes (one of which may be unlimited), and starting and ending index arrays. The intent is to store all the necessary size and shape information for the Fortran arrays containing MITgcm--style ``tile'' variables (that is, a central region surrounded by a variably-sized ``halo'' or exchange region as shown in Figures \ref{fig:communication_primitives} and \ref{fig:tiling-strategy}). \item[variable type] is a lookup table indexed by a variable type name. For each name, the table contains a reference to a grid type for the variable and the names and values of various attributes. \end{description} Within the upper level, these associations are not permanently tied to any particular NetCDF file. This allows the information to be re-used over multiple file reads and writes. \item[Lower level] \ In the lower (or internal) level, associations are stored for NetCDF files and many of the entities that they contain including dimensions, variables, and global attributes. All associations are on a per-file basis. Thus, each entity is tied to a unique NetCDF file and will be created or destroyed when files are, respectively, opened or closed. \end{description} \subsubsection{MNC Grid--Types and Variable--Types} As a convenience for users, the MNC package includes numerous routines to aid in the writing of data to NetCDF format. Probably the biggest convenience is the use of pre-defined ``grid types'' and ``variable types''. These ``types'' are simply look-up tables that store dimensions, indicies, attributes, and other information that can all be retrieved using a single character string. The ``grid types'' are a way of mapping variables within MITgcm to NetCDF arrays. Within MITgcm, most spatial variables are defined using two-- or three--dimensional arrays with ``overlap'' regions (see Figures \ref{fig:communication_primitives}, a possible vertical index, and \ref{fig:tiling-strategy}) and tile indicies such as the following ``U'' velocity: \begin{verbatim} _RL uVel (1-OLx:sNx+OLx,1-OLy:sNy+OLy,Nr,nSx,nSy) \end{verbatim} as defined in \filelink{model/inc/DYNVARS.h}{model-inc-DYNVARS.h} The grid type is a character string that encodes the presence and types associated with the four possible dimensions. The character string follows the format \begin{center} \texttt{H0\_H1\_H2\_\_V\_\_T} \end{center} where the terms \textit{H0}, \textit{H1}, \textit{H2}, \textit{V}, \textit{T} can be almost any combination of the following: \begin{center} \begin{tabular}[h]{|ccc|c|c|}\hline \multicolumn{3}{|c|}{Horizontal} & Vertical & Time \\ \textbf{H0}: location & \textbf{H1}: dimensions & \textbf{H2}: halo & \textbf{V}: location & \textbf{T}: level \\\hline \texttt{-} & xy & Hn & \texttt{-} & \texttt{-} \\ U & x & Hy & i & t \\ V & y & & c & \\ Cen & & & & \\ Cor & & & & \\\hline \end{tabular} \end{center} A example list of all pre-defined combinations is contained in the file \begin{center} \texttt{pkg/mnc/pre-defined\_grids.txt}. \end{center} The variable type is an association between a variable type name and the following items: \begin{center} \begin{tabular}[h]{|l|l|}\hline \textbf{Item} & \textbf{Purpose} \\\hline grid type & defines the in-memory arrangement \\ \texttt{bi,bj} dimensions & tiling indices, if present \\\hline \end{tabular} \end{center} and is used by the \texttt{mnc\_cw\_*\_[R|W]} subroutines for reading and writing variables. \subsubsection{Using MNC: Examples} Writing variables to NetCDF files can be accomplished in as few as two function calls. The first function call defines a variable type, associates it with a name (character string), and provides additional information about the indicies for the tile (\texttt{bi},\texttt{bj}) dimensions. The second function call will write the data at, if necessary, the current time level within the model. Examples of the initialization calls can be found in the file \filelink{model/src/ini\_mnc\_io.F}{model-src-ini_mnc_io.F} where these function calls: {\footnotesize \begin{verbatim} C Create MNC definitions for DYNVARS.h variables CALL MNC_CW_ADD_VNAME('iter', '-_-_--__-__t', 0,0, myThid) CALL MNC_CW_ADD_VATTR_TEXT('iter',1, & 'long_name','iteration_count', myThid) CALL MNC_CW_ADD_VNAME('model_time', '-_-_--__-__t', 0,0, myThid) CALL MNC_CW_ADD_VATTR_TEXT('model_time',1, & 'long_name','Model Time', myThid) CALL MNC_CW_ADD_VATTR_TEXT('model_time',1,'units','s', myThid) CALL MNC_CW_ADD_VNAME('U', 'U_xy_Hn__C__t', 4,5, myThid) CALL MNC_CW_ADD_VATTR_TEXT('U',1,'units','m/s', myThid) CALL MNC_CW_ADD_VATTR_TEXT('U',1, & 'coordinates','XU YU RC iter', myThid) CALL MNC_CW_ADD_VNAME('T', 'Cen_xy_Hn__C__t', 4,5, myThid) CALL MNC_CW_ADD_VATTR_TEXT('T',1,'units','degC', myThid) CALL MNC_CW_ADD_VATTR_TEXT('T',1,'long_name', & 'potential_temperature', myThid) CALL MNC_CW_ADD_VATTR_TEXT('T',1, & 'coordinates','XC YC RC iter', myThid) \end{verbatim} } {\noindent initialize four \texttt{VNAME}s and add one or more NetCDF attributes to each.} The four variables defined above are subsequently written at specific time steps within \filelink{model/src/write\_state.F}{model-src-write_state.F} using the function calls: {\footnotesize \begin{verbatim} C Write dynvars using the MNC package CALL MNC_CW_SET_UDIM('state', -1, myThid) CALL MNC_CW_I_W('I','state',0,0,'iter', myIter, myThid) CALL MNC_CW_SET_UDIM('state', 0, myThid) CALL MNC_CW_RL_W('D','state',0,0,'model_time',myTime, myThid) CALL MNC_CW_RL_W('D','state',0,0,'U', uVel, myThid) CALL MNC_CW_RL_W('D','state',0,0,'T', theta, myThid) \end{verbatim} } While it is easiest to write variables within typical 2D and 3D fields where all data is known at a given time, it is also possible to write fields where only a portion (\textit{eg.} a ``slab'' or ``slice'') is known at a given instant. An example is provided within \filelink{pkg/mom\_vecinv/mom\_vecinv.F}{pkg-mom_vecinv-mom_vecinv.F} where an offset vector is used: {\footnotesize \begin{verbatim} IF (useMNC .AND. snapshot_mnc) THEN CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fV', uCf, & offsets, myThid) CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fU', vCf, & offsets, myThid) ENDIF \end{verbatim} } to write a 3D field one depth slice at a time. Each element in the offset vector corresponds (in order) to the dimensions of the ``full'' (or virtual) array and specifies which are known at the time of the call. A zero within the offset array means that all values along that dimension are available while a positive integer means that only values along that index of the dimension are available. In all cases, the matrix passed is assumed to start (that is, have an in-memory structure) coinciding with the start of the specified slice. Thus, using this offset array mechanism, a slice can be written along any single dimension or combinations of dimensions.