--- manual/s_phys_pkgs/mnc.tex 2004/01/29 21:12:31 1.3 +++ manual/s_phys_pkgs/mnc.tex 2005/07/18 14:00:00 1.15 @@ -1,8 +1,11 @@ -% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.3 2004/01/29 21:12:31 edhill Exp $ +% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.15 2005/07/18 14:00:00 edhill Exp $ % $Name: $ -\section{NetCDF I/O Integration} +\section{NetCDF I/O Integration: MNC} \label{sec:pkg:mnc} +\begin{rawhtml} + +\end{rawhtml} The \texttt{mnc} package is a set of convenience routines written to expedite the process of creating, appending, and reading NetCDF files. @@ -17,14 +20,351 @@ \end{verbatim} \begin{rawhtml} \end{rawhtml} +Since it is a ``wrapper'' for netCDF, MNC depends upon the Fortran-77 +interface included with the standard netCDF v3.x library which is +often called \texttt{libnetcdf.a}. Please contact your local systems +administrators or the +\begin{rawhtml} \end{rawhtml} +MITgcm-support +\begin{rawhtml} \end{rawhtml} +list for help building and installing netCDF for your particular +platform. + + +\subsection{Using MNC} + +\subsubsection{MNC Configuration} + +As with all MITgcm packages, MNC can be turned on or off at compile time +using the \texttt{packages.conf} file or the \texttt{genmake2} +\texttt{-enable=mnc} or \texttt{-disable=mnc} switches. + +While MNC is likely to work ``as is'', there are a few compile--time +constants that may need to be increased for simulations that employ +large numbers of tiles within each process. Note that the important +quantity is the maximum number of tiles \textbf{per process}. Since +MPI configurations tend to distribute large numbers of tiles over +relatively large numbers of MPI processes, these constants will rarely +need to be increased. + +If MNC runs out of space within its ``lookup'' tables during a +simulation, then it will provide an error message along with a +recommendation of which parameter to increase. The parameters are all +located within \filelink{pkg/mnc/mnc\_common.h}{pkg-mnc-mnc_common.h} +and the ones that may need to be increased are: + +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|r|l|}\hline + \textbf{Name} & + \textbf{Default} & \textbf{Description} \\\hline + & & \\ + \texttt{MNC\_MAX\_ID} & 1000 & + \textbf{IDs for various low-level entities} \\ + \texttt{MNC\_MAX\_INFO} & 400 & + \textbf{IDs (mostly for object sizes)} \\ + \texttt{MNC\_CW\_MAX\_I} & 150 & + \textbf{IDs for the ``wrapper'' layer} \\\hline + \end{tabular} + } +\end{center} + +In those rare cases where MNC ``out-of-memory'' error messages are +encountered, it is a good idea to increase the too-small parameter by +a factor of \textbf{2--10} in order to avoid wasting time on an +iterative compile--test sequence. + + +\subsubsection{MNC Inputs} + +Like most MITgcm packages, all of MNC can be turned on/off at runtime +using a single flag in \texttt{data.pkg} +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|l|l|}\hline + \textbf{Name} & \textbf{T} & + \textbf{Default} & \textbf{Description} \\\hline + & & & \\ + \texttt{useMNC} & L & \texttt{.FALSE.} & + overall MNC ON/OFF switch \\\hline + \end{tabular} + } +\end{center} + +One important MNC--related flag is present in the main \texttt{data} +namelist file in the \texttt{PARM03} section and it is: +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|l|l|}\hline + \textbf{Name} & \textbf{T} & + \textbf{Default} & \textbf{Description} \\\hline + & & & \\ + \texttt{outputTypesInclusive} & L & \texttt{.FALSE.} & + use all available output ``types'' \\\hline + \end{tabular} + } +\end{center} +which specifies that turning on MNC for a particular type of output +should not simultaneously turn off the default output method as it +normally does. Usually, this option is only used for debugging +purposes since it is inefficient to write output types using both MNC +and MDSIO or ASCII output. This option can also be helpful when +transitioning from MDSIO to MNC since the output can be readily +compared. + +For run-time configuration, most of the MNC--related model parameters +are contained within a Fortran namelist file called +\texttt{data.mnc}. The availabe parameters currently include: +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|l|l|}\hline + \textbf{Name} & \textbf{T} & + \textbf{Default} & \textbf{Description} \\\hline + & & & \\ + \texttt{mnc\_use\_outdir} & L & \texttt{.FALSE.} & + create a directory for output \\ + \ \ \texttt{mnc\_outdir\_str} & S & \texttt{'mnc\_'} & + output directory name \\ + \ \ \texttt{mnc\_outdir\_date} & L & \texttt{.FALSE.} & + embed date in the outdir name \\ + \ \ \texttt{mnc\_outdir\_num} & L & \texttt{.FALSE.} & + optional \\ + \texttt{pickup\_write\_mnc} & L & \texttt{.FALSE.} & + use MNC to write pickup files \\ + \texttt{pickup\_read\_mnc} & L & \texttt{.FALSE.} & + use MNC to read pickup files \\ + \texttt{mnc\_use\_indir} & L & \texttt{.FALSE.} & + use a directory (path) for input \\ + \ \ \texttt{mnc\_indir\_str} & S & \texttt{''} & + input directory (or path) name \\ + \texttt{snapshot\_mnc} & L & \texttt{.FALSE.} & + write \texttt{snapshot} output w/MNC \\ + \texttt{monitor\_mnc} & L & \texttt{.FALSE.} & + write \texttt{monitor} output w/MNC \\ + \texttt{timeave\_mnc} & L & \texttt{.FALSE.} & + write \texttt{timeave} output w/MNC \\ + \texttt{autodiff\_mnc} & L & \texttt{.FALSE.} & + write \texttt{autodiff} output w/MNC \\ + \texttt{mnc\_max\_fsize} & R & 2.1e+09 & + max allowable file size \\ + \texttt{readgrid\_mnc} & L & \texttt{.FALSE.} & + read grid quantities using MNC \\ + \texttt{mnc\_echo\_gvtypes} & L & \texttt{.FALSE.} & + list pre-defined ``types'' (debug) \\\hline + \end{tabular} + } +\end{center} + +Unlike the older MDSIO method, MNC has the ability to create or use +existing output directories. If either \texttt{mnc\_outdir\_date} or +\texttt{mnc\_outdir\_num} is true, then MNC will try to create +directories on a \textit{PER PROCESS} basis for its output. This +means that a single directory will be created for a non-MPI run and +multiple directories (one per MPI process) will be created for an MPI +run. This approach was chosen since it works safely on both shared +global file systems (such as NFS and AFS) and on local +(per-compute-node) file systems. And if both +\texttt{mnc\_outdir\_date} and \texttt{mnc\_outdir\_num} are false, +then the MNC package will assume that the directory specified in +\texttt{mnc\_outdir\_str} already exists and will use it. This allows +the user to create and specify directories outside of the model. + +For input, MNC can use a single global input directory. This is a +just convenience that allows MNC to gather all of its input files from a +path other than the current working directory. As with MDSIO, the +default is to use the current working directory. + +The flags \texttt{snapshot\_mnc}, \texttt{monitor\_mnc}, +\texttt{timeave\_mnc}, and \texttt{autodiff\_mnc} allow the user to +turn on MNC for particular ``types'' of output. If a type is +selected, then MNC will be used for all output that matches that type. +This applies to output from the main model and from all of the +optional MITgcm packages. Mostly, the names used here correspond to +the names used for the output frequencies in the main \texttt{data} +namelist file. + +The \texttt{mnc\_max\_fsize} parameter is a convenience added to help +users work around common file size limitations. On many computer +systems, either the opterating system, the file system(s), and/or the +netCDF libraries are unable to handle files greater than two or four +gigabytes in size. The MNC package is able to work within this +limitation by creating new files which grow along the netCDF +``unlimited'' (usually, time) dimension. The default value for this +parameter is just slightly less than 2GB which is safe on virtually +all operating systems. Essentially, this feature is a way to +intelligently and automatically split files output along the unlimited +dimension. On systems that support large file sizes, these splits can +be readily concatenated (that is, un-done) using tools such as the +netCDF Operators (with \texttt{ncrcat}) which is available at: +\begin{rawhtml} \end{rawhtml} +\begin{verbatim} +http://nco.sourceforge.net/ +\end{verbatim} +\begin{rawhtml} \end{rawhtml} + +Additional MNC--related parameters may be contained within each +package. Please see the individual packages for descriptions of their +use of MNC. + + +\subsubsection{MNC Output} + +Depending upon the flags used, MNC will produce zero or more +directories containing one or more netCDF files as output. These +files are either mostly or entirely compliant with the netCDF ``CF'' +convention (v1.0) and any conformance issues will be fixed over time. +The patterns used for file names are: +\begin{center} +\texttt{BASENAME.nIter0.tileNum.seqNum.nc} +\end{center} +and an example is: +\begin{center} +\texttt{grid.0000000000.000001.0000.nc} +\end{center} +where \texttt{BASENAME} is the name selected to represent a set of +variables written together, \texttt{nIter0} is the starting iteration +number as specified in the main \texttt{data} namelist input file and +written in a zero-filled 10-digit format, \texttt{tileNum} is the +six-digit zero-filled tile number, \texttt{seqnum} is a four-digit +zero-filled sequence number used when maximum allowable files sizes +are too small to contain all of the output for a particular type +within one run (new files are created with sequential numbers as files +reach the maximum file size limit), and \texttt{.nc} is the file +suffix specified by the current netCDF ``CF'' conventions. + +Some example \texttt{BASENAME} values are: +\begin{description} +\item[grid] contains the variables that describe the various grid + constants related to locations, lengths, areas, etc. +\item[state] contains the variables output at the snapshot or + \texttt{dumpFreq} time frequency +\item[pickup.ckptA, pickup.ckptB] are the ``rolling'' checkpoint files +\item[tave] contains the time-averaged quantities from the main model +\end{description} + +All MNC output is currently done in a ``file-per-tile'' fashion since +most NetCDF v3.x implementions cannot write safely within MPI or +multi-threaded environments. This tiling is done in a global fashion +and the tile numbers are appended to the base names as described +above. Some scripts to manipulate MNC output are available at +\texttt{MITgcm/utils/matlab/} which includes a spatial ``assembly'' +script called \texttt{MITgcm/utils/matlab/mnc\_assembly.m}. + +More general manipulations can be performed on netCDF files with +\begin{rawhtml} \end{rawhtml} +\begin{verbatim} +the NetCDF Operators (``NCO'') +at http://nco.sourceforge.net +\end{verbatim} +\begin{rawhtml} \end{rawhtml} +or with +\begin{rawhtml} \end{rawhtml} +\begin{verbatim} +the Climate Data Operators (``CDO'') +at http://www.mpimet.mpg.de/~cdo/ +\end{verbatim} +\begin{rawhtml} \end{rawhtml} -\subsection{Introduction} +Unlike the older MDSIO routines, MNC reads and writes variables on +different ``grids'' depending upon their location on, for instance, an +Arakawa C--grid. The following table provides examples: +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|c|c|}\hline + \textbf{Name} & \textbf{C--grid location} & + \textbf{\# in X} & \textbf{\# in Y} \\\hline + Temperature & mass & \texttt{sNx} & \texttt{sNy} \\ + Salinity & mass & \texttt{sNx} & \texttt{sNy} \\ + U velocity & U & \texttt{sNx+1} & \texttt{sNy} \\ + V velocity & V & \texttt{sNx} & \texttt{sNy+1} \\ + Vorticity & vorticity & \texttt{sNx+1} & \texttt{sNy+1} \\\hline + \end{tabular} + } +\end{center} +and the intent is two--fold: +\begin{enumerate} +\item For some grid topologies it is impossible to output all + quantities using only \texttt{sNx,sNy} arrays for every tile. Two + examples of this failure are the missing corners problem for + vorticity values on the cubesphere and the velocity edge values for + some open--boundary domains. +\item Writing quantities located on velocity or vorticity points with + the above scheme introduces a very small data redundancy. However, + any slight inconvenience is easily offset by the ease with which one + can, on every individual tile, interpolate these values to mass + points without having to perform an ``exchange'' (or + ``halo-filling'') operation to collect the values from neighboring + tiles. This makes the most common post--processing operations much + easier to implement. +\end{enumerate} + + +\subsection{MNC Troubleshooting} + +\subsubsection{Build Troubleshooting} + +In order to build MITgcm with MNC enabled, the netCDF v3.x Fortran-77 +(not Fortran-90) library must be available. This library is compposed +of a single header file (called \texttt{netcdf.inc}) and a single +library file (usually called \texttt{libnetcdf.a}) and it must be +built with the same compiler (or a binary-compatible compiler) with +compatible compiler options as the one used to build MITgcm. + +For more details concerning the netCDF build and install process, +please visit the netCDF home page at: +\begin{rawhtml} \end{rawhtml} +\begin{verbatim} +http://www.unidata.ucar.edu/packages/netcdf/ +\end{verbatim} +\begin{rawhtml} \end{rawhtml} +which includes an extensive list of known--good netCDF configurations +for various platforms + +\subsubsection{Runtime Troubleshooting} + +Please be aware of the following: + +\begin{itemize} +\item As a safety feature, the MNC package does not, by default, allow + pre-existing files to be appended to or overwritten. This is in + contrast to the older MDSIO package which will, without any warning, + overwrite existing files. If MITgcm aborts with an error message + about the inability to open or write to a netCDF file, please check + \textbf{first} whether you are attempting to overwrite files from a + previous run. + +\item The constraints placed upon the ``unlimited'' (or ``record'') + dimension inherent with NetCDF v3.x make it very inefficient to put + variables written at potentially different intervals within the same + file. For this reason, MNC output is split into groups of files + which attempt to reflect the nature of their content. + +\item On many systems, netCDF has practical file size limits on the + order of 2--4GB (the maximium memory addressable with 32bit pointers + or pointer differences) due to a lack of operating system, compiler, + and/or library support. The latest revisions of netCDF v3.x have + large file support and, on some operating systems, file sizes are + only limited by available disk space. + +\item There is an 80 character limit to the total length of all file + names. This limit includes the directory (or path) since paths and + file names are internally appended. Generally, file names will not + exceed the limit and paths can usually be shortened using, for + example, soft links. + +\item MNC does not (yet) provide a mechanism for reading information + from a single ``global'' file as can be done with the MDSIO + package. This is in progress. +\end{itemize} + + +\subsection{MNC Internals} The \texttt{mnc} package is a two-level convenience library (or ``wrapper'') for most of the NetCDF Fortran API. Its purpose is to streamline the user interface to NetCDF by maintaining internal -relations (``look-up tables'') keyed with strings (or ``names'') and -entities such as NetCDF files, variables, and attributes. +relations (look-up tables) keyed with strings (or names) and entities +such as NetCDF files, variables, and attributes. The two levels of the \texttt{mnc} package are: \begin{description} @@ -34,7 +374,7 @@ The upper level contains information about two kinds of associations: \begin{description} - \item[Grid type] is lookup table indexed with a grid type name. + \item[grid type] is lookup table indexed with a grid type name. Each grid type name is associated with a number of dimensions, the dimension sizes (one of which may be unlimited), and starting and ending index arrays. The intent is to store all the necessary @@ -44,7 +384,7 @@ shown in Figures \ref{fig:communication_primitives} and \ref{fig:tiling-strategy}). - \item[Variable type] is a lookup table indexed by a variable type + \item[variable type] is a lookup table indexed by a variable type name. For each name, the table contains a reference to a grid type for the variable and the names and values of various attributes. @@ -66,11 +406,146 @@ \end{description} -\subsection{Key subroutines, parameters and files} +\subsubsection{MNC Grid--Types and Variable--Types} -All of the variables used to implement the lookup tables are described -in \filelink{pkg/mnc/mnc\_common.h}{pkg/mnc/mnc_common.h}. +As a convenience for users, the MNC package includes numerous routines +to aid in the writing of data to NetCDF format. Probably the biggest +convenience is the use of pre-defined ``grid types'' and ``variable +types''. These ``types'' are simply look-up tables that store +dimensions, indicies, attributes, and other information that can all +be retrieved using a single character string. + +The ``grid types'' are a way of mapping variables within MITgcm to +NetCDF arrays. Within MITgcm, most spatial variables are defined +using two-- or three--dimensional arrays with ``overlap'' regions (see +Figures \ref{fig:communication_primitives}, a possible vertical index, +and \ref{fig:tiling-strategy}) and tile indicies such as the following +``U'' velocity: +\begin{verbatim} + _RL uVel (1-OLx:sNx+OLx,1-OLy:sNy+OLy,Nr,nSx,nSy) +\end{verbatim} +as defined in \filelink{model/inc/DYNVARS.h}{model-inc-DYNVARS.h} +The grid type is a character string that encodes the presence and +types associated with the four possible dimensions. The character +string follows the format +\begin{center} + \texttt{H0\_H1\_H2\_\_V\_\_T} +\end{center} +where the terms \textit{H0}, \textit{H1}, \textit{H2}, \textit{V}, +\textit{T} can be almost any combination of the following: +\begin{center} + \begin{tabular}[h]{|ccc|c|c|}\hline + \multicolumn{3}{|c|}{Horizontal} & Vertical & Time \\ + \textbf{H0}: location & \textbf{H1}: dimensions & \textbf{H2}: halo + & \textbf{V}: location & \textbf{T}: level \\\hline + \texttt{-} & xy & Hn & \texttt{-} & \texttt{-} \\ + U & x & Hy & i & t \\ + V & y & & c & \\ + Cen & & & & \\ + Cor & & & & \\\hline + \end{tabular} +\end{center} +A example list of all pre-defined combinations is contained in the +file +\begin{center} + \texttt{pkg/mnc/pre-defined\_grids.txt}. +\end{center} + +The variable type is an association between a variable type name and the +following items: +\begin{center} + \begin{tabular}[h]{|l|l|}\hline + \textbf{Item} & \textbf{Purpose} \\\hline + grid type & defines the in-memory arrangement \\ + \texttt{bi,bj} dimensions & tiling indices, if present \\\hline + \end{tabular} +\end{center} +and is used by the \texttt{mnc\_cw\_*\_[R|W]} subroutines for reading +and writing variables. + + +\subsubsection{Using MNC: Examples} + +Writing variables to NetCDF files can be accomplished in as few as two +function calls. The first function call defines a variable type, +associates it with a name (character string), and provides additional +information about the indicies for the tile (\texttt{bi},\texttt{bj}) +dimensions. The second function call will write the data at, if +necessary, the current time level within the model. + +Examples of the initialization calls can be found in the file +\filelink{model/src/ini\_mnc\_io.F}{model-src-ini_mnc_io.F} +where these function calls: +{\footnotesize +\begin{verbatim} +C Create MNC definitions for DYNVARS.h variables + CALL MNC_CW_ADD_VNAME('iter', '-_-_--__-__t', 0,0, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('iter',1, + & 'long_name','iteration_count', myThid) + + CALL MNC_CW_ADD_VNAME('model_time', '-_-_--__-__t', 0,0, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('model_time',1, + & 'long_name','Model Time', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('model_time',1,'units','s', myThid) + + CALL MNC_CW_ADD_VNAME('U', 'U_xy_Hn__C__t', 4,5, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('U',1,'units','m/s', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('U',1, + & 'coordinates','XU YU RC iter', myThid) + + CALL MNC_CW_ADD_VNAME('T', 'Cen_xy_Hn__C__t', 4,5, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1,'units','degC', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1,'long_name', + & 'potential_temperature', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1, + & 'coordinates','XC YC RC iter', myThid) +\end{verbatim} +} +{\noindent initialize four \texttt{VNAME}s and add one or more NetCDF + attributes to each.} + +The four variables defined above are subsequently written at specific +time steps within +\filelink{model/src/write\_state.F}{model-src-write_state.F} +using the function calls: +{\footnotesize +\begin{verbatim} +C Write dynvars using the MNC package + CALL MNC_CW_SET_UDIM('state', -1, myThid) + CALL MNC_CW_I_W('I','state',0,0,'iter', myIter, myThid) + CALL MNC_CW_SET_UDIM('state', 0, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'model_time',myTime, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'U', uVel, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'T', theta, myThid) +\end{verbatim} +} + +While it is easiest to write variables within typical 2D and 3D fields +where all data is known at a given time, it is also possible to write +fields where only a portion (\textit{eg.} a ``slab'' or ``slice'') is +known at a given instant. An example is provided within +\filelink{pkg/mom\_vecinv/mom\_vecinv.F}{pkg-mom_vecinv-mom_vecinv.F} +where an offset vector is used: {\footnotesize +\begin{verbatim} + IF (useMNC .AND. snapshot_mnc) THEN + CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fV', uCf, + & offsets, myThid) + CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fU', vCf, + & offsets, myThid) + ENDIF +\end{verbatim} +} +to write a 3D field one depth slice at a time. +Each element in the offset vector corresponds (in order) to the +dimensions of the ``full'' (or virtual) array and specifies which are +known at the time of the call. A zero within the offset array means +that all values along that dimension are available while a positive +integer means that only values along that index of the dimension are +available. In all cases, the matrix passed is assumed to start (that +is, have an in-memory structure) coinciding with the start of the +specified slice. Thus, using this offset array mechanism, a slice +can be written along any single dimension or combinations of +dimensions. -\subsection{Package Reference}