% $Header: /home/ubuntu/mnt/e9_copy/manual/s_phys_pkgs/Attic/mnc.tex,v 1.16 2005/07/18 20:45:27 molod dead $
% $Name:  $

\section{NetCDF I/O Integration: MNC}
\label{sec:pkg:mnc}
\begin{rawhtml}
<!-- CMIREDIR:package_mnc: -->
\end{rawhtml}

The \texttt{mnc} package is a set of convenience routines written to
expedite the process of creating, appending, and reading NetCDF files.
NetCDF is an increasingly popular self-describing file format
\cite{rew:97} intended primarily for scientific data sets.  An
extensive collection of NetCDF reference papers, user guides,
software, FAQs, and other information can be obtained from UCAR's web
site at:
\begin{rawhtml} <A href="http://www.unidata.ucar.edu/packages/netcdf/"> \end{rawhtml}
\begin{verbatim}
http://www.unidata.ucar.edu/packages/netcdf/
\end{verbatim}
\begin{rawhtml} </A> \end{rawhtml}

Since it is a ``wrapper'' for netCDF, MNC depends upon the Fortran-77
interface included with the standard netCDF v3.x library which is
often called \texttt{libnetcdf.a}.  Please contact your local systems
administrators or the
\begin{rawhtml} <A href="mailto:mitgcm-support@mitgcm.org"> \end{rawhtml}
MITgcm-support
\begin{rawhtml} </A> \end{rawhtml}
list for help building and installing netCDF for your particular
platform.


\subsection{Using MNC}

\subsubsection{MNC Configuration}

As with all MITgcm packages, MNC can be turned on or off at compile time
using the \texttt{packages.conf} file or the \texttt{genmake2}
\texttt{-enable=mnc} or \texttt{-disable=mnc} switches.

While MNC is likely to work ``as is'', there are a few compile--time
constants that may need to be increased for simulations that employ
large numbers of tiles within each process.  Note that the important
quantity is the maximum number of tiles \textbf{per process}.  Since
MPI configurations tend to distribute large numbers of tiles over
relatively large numbers of MPI processes, these constants will rarely
need to be increased.

If MNC runs out of space within its ``lookup'' tables during a
simulation, then it will provide an error message along with a
recommendation of which parameter to increase.  The parameters are all
located within \filelink{pkg/mnc/mnc\_common.h}{pkg-mnc-mnc_common.h}
and the ones that may need to be increased are:

\begin{center}
  {\footnotesize
    \begin{tabular}[htb]{|l|r|l|}\hline
      \textbf{Name}  &  
      \textbf{Default}  &  \textbf{Description}  \\\hline
      &  &  \\
      \texttt{MNC\_MAX\_ID}  &  1000  & 
      \textbf{IDs for various low-level entities}  \\
      \texttt{MNC\_MAX\_INFO}  &   400  & 
      \textbf{IDs (mostly for object sizes)}  \\
      \texttt{MNC\_CW\_MAX\_I}  &  150  & 
      \textbf{IDs for the ``wrapper'' layer}  \\\hline
    \end{tabular}
  }
\end{center}

In those rare cases where MNC ``out-of-memory'' error messages are
encountered, it is a good idea to increase the too-small parameter by
a factor of \textbf{2--10} in order to avoid wasting time on an
iterative compile--test sequence.


\subsubsection{MNC Inputs}

Like most MITgcm packages, all of MNC can be turned on/off at runtime
using a single flag in \texttt{data.pkg}
\begin{center}
  {\footnotesize
    \begin{tabular}[htb]{|l|c|l|l|}\hline
      \textbf{Name}  &  \textbf{T}  &  
      \textbf{Default}  &  \textbf{Description}  \\\hline
      &  &  &  \\
      \texttt{useMNC}  &  L  & \texttt{.FALSE.}  &  
      overall MNC ON/OFF switch  \\\hline
    \end{tabular}
  }
\end{center}

One important MNC--related flag is present in the main \texttt{data}
namelist file in the \texttt{PARM03} section and it is:
\begin{center}
  {\footnotesize
    \begin{tabular}[htb]{|l|c|l|l|}\hline
      \textbf{Name}  &  \textbf{T}  &  
      \textbf{Default}  &  \textbf{Description}  \\\hline
      &  &  &  \\
      \texttt{outputTypesInclusive}  &  L  & \texttt{.FALSE.}  &  
      use all available output ``types''  \\\hline
    \end{tabular}
  }
\end{center}
which specifies that turning on MNC for a particular type of output
should not simultaneously turn off the default output method as it
normally does.  Usually, this option is only used for debugging
purposes since it is inefficient to write output types using both MNC
and MDSIO or ASCII output.  This option can also be helpful when
transitioning from MDSIO to MNC since the output can be readily
compared.

For run-time configuration, most of the MNC--related model parameters
are contained within a Fortran namelist file called
\texttt{data.mnc}.  The availabe parameters currently include:
\begin{center}
  {\footnotesize
    \begin{tabular}[htb]{|l|c|l|l|}\hline
      \textbf{Name}  &  \textbf{T}  &  
      \textbf{Default}  &  \textbf{Description}  \\\hline
      &  &  &  \\
      \texttt{mnc\_use\_outdir}  &  L  & \texttt{.FALSE.}  &  
      create a directory for output  \\
      \ \ \texttt{mnc\_outdir\_str}  &  S  & \texttt{'mnc\_'}  &  
      output directory name \\
      \ \ \texttt{mnc\_outdir\_date}  &  L  & \texttt{.FALSE.}  &  
      embed date in the outdir name  \\
      \ \ \texttt{mnc\_outdir\_num}  &  L  & \texttt{.FALSE.}  &  
      optional  \\
      \texttt{pickup\_write\_mnc}  &  L  & \texttt{.FALSE.}  &  
      use MNC to write pickup files  \\
      \texttt{pickup\_read\_mnc}  &  L  & \texttt{.FALSE.}  &  
      use MNC to read pickup files  \\
      \texttt{mnc\_use\_indir}  &  L  & \texttt{.FALSE.}  &  
      use a directory (path) for input  \\
      \ \ \texttt{mnc\_indir\_str}  &  S  & \texttt{''}  &  
      input directory (or path) name  \\
      \texttt{snapshot\_mnc}  &  L  & \texttt{.FALSE.}  &  
      write \texttt{snapshot} output w/MNC  \\
      \texttt{monitor\_mnc}  &  L  & \texttt{.FALSE.}  &  
      write \texttt{monitor} output w/MNC  \\
      \texttt{timeave\_mnc}  &  L  & \texttt{.FALSE.}  &  
      write \texttt{timeave} output w/MNC  \\
      \texttt{autodiff\_mnc}  &  L  & \texttt{.FALSE.}  &  
      write \texttt{autodiff} output w/MNC  \\
      \texttt{mnc\_max\_fsize}  &  R  & 2.1e+09  &  
      max allowable file size  \\
      \texttt{readgrid\_mnc}  &  L  &  \texttt{.FALSE.}  &  
      read grid quantities using MNC  \\
      \texttt{mnc\_echo\_gvtypes}  &  L  & \texttt{.FALSE.}  &  
      list pre-defined ``types'' (debug)   \\\hline
    \end{tabular}
  }
\end{center}

Unlike the older MDSIO method, MNC has the ability to create or use
existing output directories.  If either \texttt{mnc\_outdir\_date} or
\texttt{mnc\_outdir\_num} is true, then MNC will try to create
directories on a \textit{PER PROCESS} basis for its output.  This
means that a single directory will be created for a non-MPI run and
multiple directories (one per MPI process) will be created for an MPI
run.  This approach was chosen since it works safely on both shared
global file systems (such as NFS and AFS) and on local
(per-compute-node) file systems.  And if both
\texttt{mnc\_outdir\_date} and \texttt{mnc\_outdir\_num} are false,
then the MNC package will assume that the directory specified in
\texttt{mnc\_outdir\_str} already exists and will use it.  This allows
the user to create and specify directories outside of the model.

For input, MNC can use a single global input directory.  This is a
just convenience that allows MNC to gather all of its input files from a
path other than the current working directory.  As with MDSIO, the
default is to use the current working directory.

The flags \texttt{snapshot\_mnc}, \texttt{monitor\_mnc},
\texttt{timeave\_mnc}, and \texttt{autodiff\_mnc} allow the user to
turn on MNC for particular ``types'' of output.  If a type is
selected, then MNC will be used for all output that matches that type.
This applies to output from the main model and from all of the
optional MITgcm packages.  Mostly, the names used here correspond to
the names used for the output frequencies in the main \texttt{data}
namelist file.

The \texttt{mnc\_max\_fsize} parameter is a convenience added to help
users work around common file size limitations.  On many computer
systems, either the opterating system, the file system(s), and/or the
netCDF libraries are unable to handle files greater than two or four
gigabytes in size.  The MNC package is able to work within this
limitation by creating new files which grow along the netCDF
``unlimited'' (usually, time) dimension.  The default value for this
parameter is just slightly less than 2GB which is safe on virtually
all operating systems.  Essentially, this feature is a way to
intelligently and automatically split files output along the unlimited
dimension.  On systems that support large file sizes, these splits can
be readily concatenated (that is, un-done) using tools such as the
netCDF Operators (with \texttt{ncrcat}) which is available at:
\begin{rawhtml} <A href="http://nco.sourceforge.net/"> \end{rawhtml}
\begin{verbatim}
http://nco.sourceforge.net/
\end{verbatim}
\begin{rawhtml} </A> \end{rawhtml}

Additional MNC--related parameters may be contained within each
package.  Please see the individual packages for descriptions of their
use of MNC.


\subsubsection{MNC Output}

Depending upon the flags used, MNC will produce zero or more
directories containing one or more netCDF files as output.  These
files are either mostly or entirely compliant with the netCDF ``CF''
convention (v1.0) and any conformance issues will be fixed over time.
The patterns used for file names are:
\begin{center}
\texttt{BASENAME.nIter0.tileNum.seqNum.nc}
\end{center}
and an example is:
\begin{center}
\texttt{grid.0000000000.000001.0000.nc}
\end{center}
where \texttt{BASENAME} is the name selected to represent a set of
variables written together, \texttt{nIter0} is the starting iteration
number as specified in the main \texttt{data} namelist input file and
written in a zero-filled 10-digit format, \texttt{tileNum} is the
six-digit zero-filled tile number, \texttt{seqnum} is a four-digit
zero-filled sequence number used when maximum allowable files sizes
are too small to contain all of the output for a particular type
within one run (new files are created with sequential numbers as files
reach the maximum file size limit), and \texttt{.nc} is the file
suffix specified by the current netCDF ``CF'' conventions.

Some example \texttt{BASENAME} values are:
\begin{description}
\item[grid] contains the variables that describe the various grid
  constants related to locations, lengths, areas, etc.
\item[state] contains the variables output at the snapshot or
  \texttt{dumpFreq} time frequency
\item[pickup.ckptA, pickup.ckptB] are the ``rolling'' checkpoint files
\item[tave] contains the time-averaged quantities from the main model
\end{description}

All MNC output is currently done in a ``file-per-tile'' fashion since
most NetCDF v3.x implementions cannot write safely within MPI or
multi-threaded environments.  This tiling is done in a global fashion
and the tile numbers are appended to the base names as described
above.  Some scripts to manipulate MNC output are available at
\texttt{MITgcm/utils/matlab/} which includes a spatial ``assembly''
script called \texttt{MITgcm/utils/matlab/mnc\_assembly.m}.

More general manipulations can be performed on netCDF files with 
\begin{rawhtml} <A href="http://nco.sourceforge.net"> \end{rawhtml} 
\begin{verbatim}
the NetCDF Operators (``NCO'')
at http://nco.sourceforge.net
\end{verbatim}
\begin{rawhtml} </A> \end{rawhtml}
or with
\begin{rawhtml} <A href=''http://www.mpimet.mpg.de/~cdo/"> \end{rawhtml} 
\begin{verbatim}
the Climate Data Operators (``CDO'') 
at http://www.mpimet.mpg.de/~cdo/
\end{verbatim}
\begin{rawhtml} </A> \end{rawhtml}

Unlike the older MDSIO routines, MNC reads and writes variables on
different ``grids'' depending upon their location on, for instance, an
Arakawa C--grid.  The following table provides examples:
\begin{center}
  {\footnotesize
    \begin{tabular}[htb]{|l|c|c|c|}\hline
      \textbf{Name}  &  \textbf{C--grid location}  &  
      \textbf{\# in X}  &  \textbf{\# in Y}  \\\hline
      Temperature & mass & \texttt{sNx} & \texttt{sNy} \\
      Salinity & mass & \texttt{sNx} & \texttt{sNy} \\
      U velocity & U & \texttt{sNx+1} & \texttt{sNy} \\
      V velocity & V & \texttt{sNx} & \texttt{sNy+1} \\
      Vorticity & vorticity & \texttt{sNx+1} & \texttt{sNy+1} \\\hline
    \end{tabular}
  }
\end{center}
and the intent is two--fold:
\begin{enumerate}
\item For some grid topologies it is impossible to output all
  quantities using only \texttt{sNx,sNy} arrays for every tile.  Two
  examples of this failure are the missing corners problem for
  vorticity values on the cubesphere and the velocity edge values for
  some open--boundary domains.
\item Writing quantities located on velocity or vorticity points with
  the above scheme introduces a very small data redundancy.  However,
  any slight inconvenience is easily offset by the ease with which one
  can, on every individual tile, interpolate these values to mass
  points without having to perform an ``exchange'' (or
  ``halo-filling'') operation to collect the values from neighboring
  tiles.  This makes the most common post--processing operations much
  easier to implement.
\end{enumerate}


\subsection{MNC Troubleshooting}

\subsubsection{Build Troubleshooting}

In order to build MITgcm with MNC enabled, the netCDF v3.x Fortran-77
(not Fortran-90) library must be available.  This library is compposed
of a single header file (called \texttt{netcdf.inc}) and a single
library file (usually called \texttt{libnetcdf.a}) and it must be
built with the same compiler (or a binary-compatible compiler) with
compatible compiler options as the one used to build MITgcm.

For more details concerning the netCDF build and install process,
please visit the netCDF home page at:
\begin{rawhtml} <A href="http://www.unidata.ucar.edu/packages/netcdf/"> \end{rawhtml}
\begin{verbatim}
http://www.unidata.ucar.edu/packages/netcdf/
\end{verbatim}
\begin{rawhtml} </A> \end{rawhtml} 
which includes an extensive list of known--good netCDF configurations
for various platforms

\subsubsection{Runtime Troubleshooting}

Please be aware of the following:

\begin{itemize}
\item As a safety feature, the MNC package does not, by default, allow
  pre-existing files to be appended to or overwritten.  This is in
  contrast to the older MDSIO package which will, without any warning,
  overwrite existing files.  If MITgcm aborts with an error message
  about the inability to open or write to a netCDF file, please check
  \textbf{first} whether you are attempting to overwrite files from a
  previous run.

\item The constraints placed upon the ``unlimited'' (or ``record'')
  dimension inherent with NetCDF v3.x make it very inefficient to put
  variables written at potentially different intervals within the same
  file.  For this reason, MNC output is split into groups of files
  which attempt to reflect the nature of their content.
  
\item On many systems, netCDF has practical file size limits on the
  order of 2--4GB (the maximium memory addressable with 32bit pointers
  or pointer differences) due to a lack of operating system, compiler,
  and/or library support.  The latest revisions of netCDF v3.x have
  large file support and, on some operating systems, file sizes are
  only limited by available disk space.
  
\item There is an 80 character limit to the total length of all file
  names.  This limit includes the directory (or path) since paths and
  file names are internally appended.  Generally, file names will not
  exceed the limit and paths can usually be shortened using, for
  example, soft links.
  
\item MNC does not (yet) provide a mechanism for reading information
  from a single ``global'' file as can be done with the MDSIO
  package.  This is in progress.
\end{itemize}


\subsection{MNC Internals}

The \texttt{mnc} package is a two-level convenience library (or
``wrapper'') for most of the NetCDF Fortran API.  Its purpose is to
streamline the user interface to NetCDF by maintaining internal
relations (look-up tables) keyed with strings (or names) and entities
such as NetCDF files, variables, and attributes.

The two levels of the \texttt{mnc} package are:
\begin{description}

\item[Upper level] \ 
  
  The upper level contains information about two kinds of
  associations:
  \begin{description}
  \item[grid type] is lookup table indexed with a grid type name.
    Each grid type name is associated with a number of dimensions, the
    dimension sizes (one of which may be unlimited), and starting and
    ending index arrays.  The intent is to store all the necessary
    size and shape information for the Fortran arrays containing
    MITgcm--style ``tile'' variables (that is, a central region
    surrounded by a variably-sized ``halo'' or exchange region as
    shown in Figures \ref{fig:communication_primitives} and
    \ref{fig:tiling-strategy}).
  
  \item[variable type] is a lookup table indexed by a variable type
    name.  For each name, the table contains a reference to a grid
    type for the variable and the names and values of various
    attributes.
  \end{description}
  
  Within the upper level, these associations are not permanently tied
  to any particular NetCDF file.  This allows the information to be
  re-used over multiple file reads and writes.

\item[Lower level] \ 
  
  In the lower (or internal) level, associations are stored for NetCDF
  files and many of the entities that they contain including
  dimensions, variables, and global attributes.  All associations are
  on a per-file basis.  Thus, each entity is tied to a unique NetCDF
  file and will be created or destroyed when files are, respectively,
  opened or closed.

\end{description}


\subsubsection{MNC Grid--Types and Variable--Types}

As a convenience for users, the MNC package includes numerous routines
to aid in the writing of data to NetCDF format.  Probably the biggest
convenience is the use of pre-defined ``grid types'' and ``variable
types''.  These ``types'' are simply look-up tables that store
dimensions, indicies, attributes, and other information that can all
be retrieved using a single character string.

The ``grid types'' are a way of mapping variables within MITgcm to
NetCDF arrays.  Within MITgcm, most spatial variables are defined
using two-- or three--dimensional arrays with ``overlap'' regions (see
Figures \ref{fig:communication_primitives}, a possible vertical index,
and \ref{fig:tiling-strategy}) and tile indicies such as the following
``U'' velocity:
\begin{verbatim}
      _RL  uVel (1-OLx:sNx+OLx,1-OLy:sNy+OLy,Nr,nSx,nSy)
\end{verbatim}
as defined in \filelink{model/inc/DYNVARS.h}{model-inc-DYNVARS.h}

The grid type is a character string that encodes the presence and
types associated with the four possible dimensions.  The character
string follows the format
\begin{center}
  \texttt{H0\_H1\_H2\_\_V\_\_T}
\end{center}
where the terms \textit{H0}, \textit{H1}, \textit{H2}, \textit{V},
\textit{T} can be almost any combination of the following:
\begin{center}
  \begin{tabular}[h]{|ccc|c|c|}\hline
    \multicolumn{3}{|c|}{Horizontal} & Vertical & Time \\
    \textbf{H0}: location & \textbf{H1}: dimensions & \textbf{H2}: halo 
          & \textbf{V}: location & \textbf{T}: level  \\\hline
    \texttt{-} & xy & Hn & \texttt{-} & \texttt{-} \\
    U  &  x  &  Hy  &  i  &  t  \\
    V  &  y  &      &  c  &     \\
    Cen  &   &      &     &     \\
    Cor  &   &      &     &     \\\hline
  \end{tabular}
\end{center}
A example list of all pre-defined combinations is contained in the
file
\begin{center}
  \texttt{pkg/mnc/pre-defined\_grids.txt}.
\end{center}

The variable type is an association between a variable type name and the
following items:
\begin{center}
  \begin{tabular}[h]{|l|l|}\hline
    \textbf{Item}  & \textbf{Purpose}  \\\hline
    grid type  &  defines the in-memory arrangement  \\
    \texttt{bi,bj} dimensions  &  tiling indices, if present  \\\hline
  \end{tabular}
\end{center}
and is used by the \texttt{mnc\_cw\_*\_[R|W]} subroutines for reading
and writing variables.


\subsubsection{Using MNC: Examples}

Writing variables to NetCDF files can be accomplished in as few as two
function calls.  The first function call defines a variable type,
associates it with a name (character string), and provides additional
information about the indicies for the tile (\texttt{bi},\texttt{bj})
dimensions.  The second function call will write the data at, if
necessary, the current time level within the model.

Examples of the initialization calls can be found in the file 
\filelink{model/src/ini\_mnc\_io.F}{model-src-ini_mnc_io.F}
where these function calls:
{\footnotesize
\begin{verbatim}
C     Create MNC definitions for DYNVARS.h variables
      CALL MNC_CW_ADD_VNAME('iter', '-_-_--__-__t', 0,0, myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('iter',1,
     &     'long_name','iteration_count', myThid)

      CALL MNC_CW_ADD_VNAME('model_time', '-_-_--__-__t', 0,0, myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('model_time',1,
     &     'long_name','Model Time', myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('model_time',1,'units','s', myThid)

      CALL MNC_CW_ADD_VNAME('U', 'U_xy_Hn__C__t', 4,5, myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('U',1,'units','m/s', myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('U',1,
     &     'coordinates','XU YU RC iter', myThid)

      CALL MNC_CW_ADD_VNAME('T', 'Cen_xy_Hn__C__t', 4,5, myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('T',1,'units','degC', myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('T',1,'long_name',
     &     'potential_temperature', myThid)
      CALL MNC_CW_ADD_VATTR_TEXT('T',1,
     &     'coordinates','XC YC RC iter', myThid)
\end{verbatim}
}
{\noindent initialize four \texttt{VNAME}s and add one or more NetCDF
  attributes to each.}
    
The four variables defined above are subsequently written at specific
time steps within
\filelink{model/src/write\_state.F}{model-src-write_state.F}
using the function calls:
{\footnotesize
\begin{verbatim}
C       Write dynvars using the MNC package
        CALL MNC_CW_SET_UDIM('state', -1, myThid)
        CALL MNC_CW_I_W('I','state',0,0,'iter', myIter, myThid)
        CALL MNC_CW_SET_UDIM('state', 0, myThid)
        CALL MNC_CW_RL_W('D','state',0,0,'model_time',myTime, myThid)
        CALL MNC_CW_RL_W('D','state',0,0,'U', uVel, myThid)
        CALL MNC_CW_RL_W('D','state',0,0,'T', theta, myThid)
\end{verbatim}
}

While it is easiest to write variables within typical 2D and 3D fields
where all data is known at a given time, it is also possible to write
fields where only a portion (\textit{eg.} a ``slab'' or ``slice'') is
known at a given instant.  An example is provided within
\filelink{pkg/mom\_vecinv/mom\_vecinv.F}{pkg-mom_vecinv-mom_vecinv.F}
where an offset vector is used: {\footnotesize
\begin{verbatim}
       IF (useMNC .AND. snapshot_mnc) THEN
         CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fV', uCf,
   &          offsets, myThid)
         CALL MNC_CW_RL_W_OFFSET('D','mom_vi',bi,bj, 'fU', vCf,
   &          offsets, myThid)
       ENDIF
\end{verbatim}
}
to write a 3D field one depth slice at a time.

Each element in the offset vector corresponds (in order) to the
dimensions of the ``full'' (or virtual) array and specifies which are
known at the time of the call.  A zero within the offset array means
that all values along that dimension are available while a positive
integer means that only values along that index of the dimension are
available.  In all cases, the matrix passed is assumed to start (that
is, have an in-memory structure) coinciding with the start of the
specified slice.  Thus, using this offset array mechanism, a slice
can be written along any single dimension or combinations of
dimensions.