--- manual/s_phys_pkgs/mnc.tex 2004/01/28 17:44:35 1.1 +++ manual/s_phys_pkgs/mnc.tex 2004/12/11 22:03:32 1.14 @@ -1,34 +1,393 @@ -% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.1 2004/01/28 17:44:35 edhill Exp $ +% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.14 2004/12/11 22:03:32 edhill Exp $ % $Name: $ -%% * Introduction -%% o what it does, citations (refs go into mitgcm_manual.bib, -%% preferably in alphabetic order) -%% o Equations -%% * Key subroutines and parameters -%% * Reference material (auto generated from Protex and structured comments) -%% o automatically inserted at \section{Reference} - - -\section{MNC: the MITgcm NetCDF Package} - -\subsection{Introduction} - -The MNC package is a set of convenience routines written to expedite -the process of creating, appending, and reading NetCDF files. NetCDF -is a self-describing file format \cite{rew:97} intended primarily for -scientific data. NetCDF reference papers, user guides, FAQs, and other -information can be obtained from UCAR's web site at: +\section{NetCDF I/O Integration: MNC} +\label{sec:pkg:mnc} +\begin{rawhtml} + +\end{rawhtml} + +The \texttt{mnc} package is a set of convenience routines written to +expedite the process of creating, appending, and reading NetCDF files. +NetCDF is an increasingly popular self-describing file format +\cite{rew:97} intended primarily for scientific data sets. An +extensive collection of NetCDF reference papers, user guides, +software, FAQs, and other information can be obtained from UCAR's web +site at: +\begin{rawhtml} \end{rawhtml} +\begin{verbatim} +http://www.unidata.ucar.edu/packages/netcdf/ +\end{verbatim} +\begin{rawhtml} \end{rawhtml} + + +\subsection{Using MNC} + +\subsubsection{MNC Configuration} + +As with all MITgcm packages, MNC can be turned on or off at compile time +using the \texttt{packages.conf} file or the \texttt{genmake2} +\texttt{-enable=mnc} or \texttt{-disable=mnc} switches. + +While MNC is likely to work ``as is'', there are a few compile--time +constants that may need to be increased for simulations that employ +large numbers of tiles within each process. Note that the important +quantity is the maximum number of tiles \textbf{per process}. Since +MPI configurations tend to distribute large numbers of tiles over +relatively large numbers of MPI processes, these constants will rarely +need to be increased. + +If MNC runs out of space within its ``lookup'' tables during a +simulation, then it will provide an error message along with a +recommendation of which parameter to increase. The parameters are all +located within \filelink{pkg/mnc/mnc\_common.h}{pkg-mnc-mnc_common.h} +and the ones that may need to be increased are: + +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|r|l|}\hline + \textbf{Name} & + \textbf{Default} & \textbf{Description} \\\hline + & & \\ + \texttt{MNC\_MAX\_ID} & 1000 & + \textbf{IDs for various low-level entities} \\ + \texttt{MNC\_MAX\_INFO} & 400 & + \textbf{IDs (mostly for object sizes)} \\ + \texttt{MNC\_CW\_MAX\_I} & 150 & + \textbf{IDs for the ``wrapper'' layer} \\\hline + \end{tabular} + } +\end{center} + +In those rare cases where MNC ``out-of-memory'' error messages are +encountered, it is a good idea to increase the too-small parameter by +a factor of \textbf{2--10} in order to avoid wasting time on an +iterative compile--test sequence. + + +\subsubsection{MNC Inputs} + +For run-time configuration, most of the MNC--related model parameters +are contained within a Fortran namelist file called \texttt{data.mnc}. +If this file does not exist, then the MNC package will interpret that +as an indication that it is not to be used. If the \texttt{data.mnc} +file does exist, then it may contain the following parameters: + +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|l|l|}\hline + \textbf{Name} & \textbf{T} & + \textbf{Default} & \textbf{Description} \\\hline + & & & \\ + \texttt{useMNC} & L & \texttt{.FALSE.} & + \textbf{overall MNC ON/OFF switch} \\ + \texttt{mnc\_echo\_gvtypes} & L & \texttt{.FALSE.} & + echo pre-defined ``types'' (debugging) \\ + \texttt{mnc\_use\_outdir} & L & \texttt{.FALSE.} & + create a directory for output \\ + \texttt{mnc\_outdir\_str} & S & \texttt{'mnc\_'} & + output directory name \\ + \texttt{mnc\_outdir\_date} & L & \texttt{.FALSE.} & + embed date in the output dir name \\ + \texttt{pickup\_write\_mnc} & L & \texttt{.FALSE.} & + use MNC to write (create) pickup files \\ + \texttt{pickup\_read\_mnc} & L & \texttt{.FALSE.} & + use MNC to read pickup files \\ + \texttt{mnc\_use\_indir} & L & \texttt{.FALSE.} & + use a directory (path) for input \\ + \texttt{mnc\_indir\_str} & S & \texttt{''} & + input directory (or path) name \\ + \texttt{snapshot\_mnc} & L & \texttt{.FALSE.} & + write \texttt{snapshot} (instantaneous) w/MNC \\ + \texttt{monitor\_mnc} & L & \texttt{.FALSE.} & + write \texttt{monitor} w/MNC \\ + \texttt{timeave\_mnc} & L & \texttt{.FALSE.} & + write \texttt{timeave} w/MNC \\ + \texttt{autodiff\_mnc} & L & \texttt{.FALSE.} & + write \texttt{autodiff} w/MNC \\\hline + \end{tabular} + } +\end{center} + +Additional MNC--related parameters are contained within the main +\texttt{data} namelist file and in some of the namelist files for +individual packages. These options are: +\begin{center} + {\footnotesize + \begin{tabular}[htb]{|l|c|l|l|}\hline + \textbf{Name} & \textbf{T} & + \textbf{Default} & \textbf{Description} \\\hline + \multicolumn{4}{|c|}{\ } \\ + \multicolumn{4}{|c|}{Main namelist file: + ``\textbf{data}''} \\\hline + \texttt{snapshot\_ioinc} & L & \texttt{.FALSE.} & + write \texttt{snapshot} ``inclusively'' \\ + \texttt{timeave\_ioinc} & L & \texttt{.FALSE.} & + write \texttt{timeave} ``inclusively'' \\ + \texttt{monitor\_ioinc} & L & \texttt{.FALSE.} & + write \texttt{monitor} ``inclusively'' \\ + \texttt{the\_run\_name} & C & ``name...'' & + name is included in all MNC output \\\hline + \multicolumn{4}{|c|}{\ } \\ + \multicolumn{4}{|c|}{Diagnostics namelist file: + ``\textbf{data.diagnostics}''} \\\hline + \texttt{diag\_mnc} & L & \texttt{.FALSE.} & + write \texttt{diagnostics} w/MNC \\ + \texttt{diag\_ioinc} & L & \texttt{.FALSE.} & + write \texttt{diagnostics} ``inclusively'' \\\hline + \end{tabular} + } +\end{center} + +By default, turning on MNC for a particular output type will result in +turning off all the corresponding (usually, default) MDSIO or STDOUT +output mechanisms. In other words, output defaults to being an +exclusive selection. To enable multiple kinds of simultaneous output, +flags of the form \texttt{NAME\_ioinc} have been created where +\texttt{NAME} corresponds to the various MNC output flags. When a +\texttt{NAME\_ioinc} flag is set to \texttt{.TRUE.}, then multiple +simultaneous forms of output are allowed for the \texttt{NAME} output +mechanism. The intent of this design is that typical users will only +want one kind of output while people debugging the code (particularly +the I/O routines) may want simultaneous types of output. + +This ``inclusive'' versus ``exclusive'' design is easily applied in +cases where three or more kinds of output may be generated. Thus, it +can be readily extended to additional new output types (eg. HDF5). -\begin{itemize} -\item http://www.unidata.ucar.edu/packages/netcdf/ -\end{itemize} +Input types are always exclusive. +\subsubsection{MNC Output} +While NetCDF files are supposed to be ``self-describing'', it is +helpful to note the following: +\begin{itemize} +\item The constraints placed upon the ``unlimited'' (or ``record'') + dimension inherent with NetCDF v3.x make it very inefficient to put + variables written at potentially different intervals within the same + file. For this reason, MNC output is split into a few file ``base + names'' which try to reflect the nature of their content. + +\item All MNC output is currently done in a ``tile-per-file'' fashion + since most NetCDF v3.x implementions cannot write safely within MPI + or multi-threaded environments. This tiling is done in a global + fashion and the tile numbers are appended to the base names + described above. Some scripts to ``assemble'' output are available + (\texttt{MITgcm/utils/matlab}). More general manipulations can be + accomplished with the + \begin{rawhtml} + + \end{rawhtml} +\begin{verbatim} +NetCDF Operators (or ``NCO'') at http://nco.sourceforge.net +\end{verbatim} + \begin{rawhtml} \end{rawhtml} + which is a very powerful and convenient set of tools for working + with all NetCDF files. + +\item On many systems, NetCDF has practical file size limits on the + order of 2--4GB (the maximium memory addressable with 32bit + pointers) due to a lack of operating system, compiler, and/or + library support. In cases where this limit is reached, it is + generally a good idea to reduce write frequencies or restart from + pickups. + +\item MNC does not (yet) provide a mechanism for reading information + from a single ``global'' file as can be done with the MDSIO + package. This is in progress. + +\end{itemize} -\subsection{Key Routines} +\subsection{MNC Internals} +The \texttt{mnc} package is a two-level convenience library (or +``wrapper'') for most of the NetCDF Fortran API. Its purpose is to +streamline the user interface to NetCDF by maintaining internal +relations (look-up tables) keyed with strings (or names) and entities +such as NetCDF files, variables, and attributes. + +The two levels of the \texttt{mnc} package are: +\begin{description} + +\item[Upper level] \ + + The upper level contains information about two kinds of + associations: + \begin{description} + \item[grid type] is lookup table indexed with a grid type name. + Each grid type name is associated with a number of dimensions, the + dimension sizes (one of which may be unlimited), and starting and + ending index arrays. The intent is to store all the necessary + size and shape information for the Fortran arrays containing + MITgcm--style ``tile'' variables (that is, a central region + surrounded by a variably-sized ``halo'' or exchange region as + shown in Figures \ref{fig:communication_primitives} and + \ref{fig:tiling-strategy}). + + \item[variable type] is a lookup table indexed by a variable type + name. For each name, the table contains a reference to a grid + type for the variable and the names and values of various + attributes. + \end{description} + + Within the upper level, these associations are not permanently tied + to any particular NetCDF file. This allows the information to be + re-used over multiple file reads and writes. + +\item[Lower level] \ + + In the lower (or internal) level, associations are stored for NetCDF + files and many of the entities that they contain including + dimensions, variables, and global attributes. All associations are + on a per-file basis. Thus, each entity is tied to a unique NetCDF + file and will be created or destroyed when files are, respectively, + opened or closed. + +\end{description} + + +\subsubsection{MNC Grid--Types and Variable--Types} + +As a convenience for users, the MNC package includes numerous routines +to aid in the writing of data to NetCDF format. Probably the biggest +convenience is the use of pre-defined ``grid types'' and ``variable +types''. These ``types'' are simply look-up tables that store +dimensions, indicies, attributes, and other information that can all +be retrieved using a single character string. + +The ``grid types'' are a way of mapping variables within MITgcm to +NetCDF arrays. Within MITgcm, most spatial variables are defined +using two-- or three--dimensional arrays with ``overlap'' regions (see +Figures \ref{fig:communication_primitives}, a possible vertical index, +and \ref{fig:tiling-strategy}) and tile indicies such as the following +``U'' velocity: +\begin{verbatim} + _RL uVel (1-OLx:sNx+OLx,1-OLy:sNy+OLy,Nr,nSx,nSy) +\end{verbatim} +as defined in \filelink{model/inc/DYNVARS.h}{model-inc-DYNVARS.h} + +The grid type is a character string that encodes the presence and +types associated with the four possible dimensions. The character +string follows the format +\begin{center} + \texttt{H0\_H1\_H2\_\_V\_\_T} +\end{center} +where the terms \textit{H0}, \textit{H1}, \textit{H2}, \textit{V}, +\textit{T} can be almost any combination of the following: +\begin{center} + \begin{tabular}[h]{|ccc|c|c|}\hline + \multicolumn{3}{|c|}{Horizontal} & Vertical & Time \\ + \textbf{H0}: location & \textbf{H1}: dimensions & \textbf{H2}: halo + & \textbf{V}: location & \textbf{T}: level \\\hline + \texttt{-} & xy & Hn & \texttt{-} & \texttt{-} \\ + U & x & Hy & i & t \\ + V & y & & c & \\ + Cen & & & & \\ + Cor & & & & \\\hline + \end{tabular} +\end{center} +A example list of all pre-defined combinations is contained in the +file +\begin{center} + \texttt{pkg/mnc/pre-defined\_grids.txt}. +\end{center} + +The variable type is an association between a variable type name and the +following items: +\begin{center} + \begin{tabular}[h]{|l|l|}\hline + \textbf{Item} & \textbf{Purpose} \\\hline + grid type & defines the in-memory arrangement \\ + \texttt{bi,bj} dimensions & tiling indices, if present \\\hline + \end{tabular} +\end{center} +and is used by the \texttt{mnc\_cw\_*\_[R|W]} subroutines for reading +and writing variables. + + +\subsubsection{Using MNC: Examples} + +Writing variables to NetCDF files can be accomplished in as few as two +function calls. The first function call defines a variable type, +associates it with a name (character string), and provides additional +information about the indicies for the tile (\texttt{bi},\texttt{bj}) +dimensions. The second function call will write the data at, if +necessary, the current time level within the model. + +Examples of the initialization calls can be found in the file +\filelink{model/src/ini\_mnc\_io.F}{model-src-ini_mnc_io.F} +where these function calls: +{\footnotesize +\begin{verbatim} +C Create MNC definitions for DYNVARS.h variables + CALL MNC_CW_ADD_VNAME('iter', '-_-_--__-__t', 0,0, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('iter',1, + & 'long_name','iteration_count', myThid) + + CALL MNC_CW_ADD_VNAME('model_time', '-_-_--__-__t', 0,0, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('model_time',1, + & 'long_name','Model Time', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('model_time',1,'units','s', myThid) + + CALL MNC_CW_ADD_VNAME('U', 'U_xy_Hn__C__t', 4,5, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('U',1,'units','m/s', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('U',1, + & 'coordinates','XU YU RC iter', myThid) + + CALL MNC_CW_ADD_VNAME('T', 'Cen_xy_Hn__C__t', 4,5, myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1,'units','degC', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1,'long_name', + & 'potential_temperature', myThid) + CALL MNC_CW_ADD_VATTR_TEXT('T',1, + & 'coordinates','XC YC RC iter', myThid) +\end{verbatim} +} +{\noindent initialize four \texttt{VNAME}s and add one or more NetCDF + attributes to each.} + +The four variables defined above are subsequently written at specific +time steps within +\filelink{model/src/write\_state.F}{model-src-write_state.F} +using the function calls: +{\footnotesize +\begin{verbatim} +C Write dynvars using the MNC package + CALL MNC_CW_SET_UDIM('state', -1, myThid) + CALL MNC_CW_I_W('I','state',0,0,'iter', myIter, myThid) + CALL MNC_CW_SET_UDIM('state', 0, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'model_time',myTime, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'U', uVel, myThid) + CALL MNC_CW_RL_W('D','state',0,0,'T', theta, myThid) +\end{verbatim} +} + +While it is easiest to write variables within typical 2D and 3D fields +where all data is known at a given time, it is also possible to write +fields where only a portion (\textit{eg.} a ``slab'' or ``slice'') is +known at a given instant. An example is provided within +\filelink{pkg/mom\_vecinv/mom\_vecinv.F}{pkg-mom_vecinv-mom_vecinv.F} +where an offset vector is used: {\footnotesize +\begin{verbatim} + IF (useMNC .AND. snapshot_mnc) THEN + CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fV', uCf, + & offsets, myThid) + CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fU', vCf, + & offsets, myThid) + ENDIF +\end{verbatim} +} +to write a 3D field one depth slice at a time. + +Each element in the offset vector corresponds (in order) to the +dimensions of the ``full'' (or virtual) array and specifies which are +known at the time of the call. A zero within the offset array means +that all values along that dimension are available while a positive +integer means that only values along that index of the dimension are +available. In all cases, the matrix passed is assumed to start (that +is, have an in-memory structure) coinciding with the start of the +specified slice. Thus, using this offset array mechanism, a slice +can be written along any single dimension or combinations of +dimensions. -\subsection{References}