\section{ECCO: model-data comparisons using gridded data sets} \label{sec:pkg:ecco} \begin{rawhtml} \end{rawhtml} \def\mitgcmCheckpointVersion{65z} The functionalities implemented in \texttt{pkg/ecco} are: (1) output time-averaged model fields to compare with gridded data sets; (2) compute normalized model-data distances (i.e., cost functions); (3) compute averages and transports (i.e., integrals). The former is achieved as the model runs forwards in time whereas the others occur after time-integration has completed. Following \cite{for-eta:15} the total cost function is formulated generically as \begin{align} \mathcal{J}(\vec{u}) &= \sum_i \alpha_i \left(\vec{d}_i^T R_i^{-1} \vec{d}_i\right) + \sum_j \beta_j \vec{u}^T\vec{u}, \label{eq:Jtotal} \\ \vec{d}_i &= \mathcal{P}(\vec{m}_i - \vec{o}_i), \label{eq:Jposproc} \\ \vec{m}_i &= \mathcal{S}\mathcal{D}\mathcal{M}(\vec{v}), \label{eq:Jpreproc} \\ \vec{v} &= \mathcal{Q}(\vec{u}), \label{eq:Upreproc} \\ \vec{u} &= \mathcal{R}(\vec{u}') \label{eq:Uprecond} \end{align} using symbols defined in table~\ref{tbl:gencost_symbols}. Per Eq.~\eqref{eq:Jpreproc} model counterparts ($\vec{m}_i$) to observational data ($\vec{o}_i$) derive from adjustable model parameters ($\vec{v}$) through model dynamics integration ($\mathcal{M}$), diagnostic calculations ($\mathcal{D}$), and averaging in space and time ($\mathcal{S}$). Alternatively $\mathcal{S}$ stands for subsampling in space and time (section~\ref{sec:pkg:profiles}). Plain model-data misfits ($\vec{m}_i-\vec{o}_i$) can be penalized directly in Eq.~\eqref{eq:Jtotal} but penalized misfits ($\vec{d}_i$) more generally derive from $\vec{m}_i-\vec{o}_i$ through the generic $\mathcal{P}$ post-processor (Eq. \eqref{eq:Jposproc}). Eqs.~\eqref{eq:Upreproc}-\eqref{eq:Uprecond} pertain to model control parameter adjustment capabilities described in section~\ref{sec:pkg:ctrl}. \begin{table}[!ht] \centering \begin{tabular}{rl} symbol & definition \\ \hline $\vec{u}$ & vector of nondimensional control variables \\ $\vec{v}$ & vector of dimensional control variables \\ $\alpha_i, \beta_j$ & misfit and control cost function multipliers (1 by default) \\ $R_i$ & data error covariance matrix ($R_i^{-1}$ are weights) \\ $\vec{d}_i$ & a set of model-data differences \\ $\vec{o}_i$ & observational data vector \\ $\vec{m}_i$ & model counterpart to $\vec{o}_i$ \\ $\mathcal{P}$ & post-processing operator (e.g., a smoother) \\ $\mathcal{M}$ & forward model dynamics operator \\ $\mathcal{D}$ & diagnostic computation operator \\ $\mathcal{S}$ & averaging/subsampling operator \\ $\mathcal{Q}$ & Pre-processing operator \\ $\mathcal{R}$ & Pre-conditioning operator \end{tabular} \caption{Symbol definitions for pkg/ecco and pkg/ctrl generic cost functions.} \label{tbl:gencost_symbols} \end{table} \subsection{Generic Cost Function} \label{costgen} The parameters available for configuring generic cost function terms in \texttt{data.ecco} are given in table~\ref{tbl:gencost_ecco_params} and examples of possible specifications are available in: \begin{itemize} \itemsep0em \item MITgcm\_contrib/verification\_other/global\_oce\_cs32/input/data.ecco \item MITgcm\_contrib/verification\_other/global\_oce\_cs32/input\_ad.sens/data.ecco \item MITgcm\_contrib/gael/verification/global\_oce\_llc90/input.ecco\_v4/data.ecco \end{itemize} \noindent The gridded observation file name is specified by \texttt{gencost\_datafile}. Observational time series may be provided as on big file or split into yearly files finishing in `\_1992', `\_1993', etc. The corresponding $\vec{m}_i$ physical variable is specified via the \texttt{gencost\_barfile} root (see table~\ref{tbl:gencost_ecco_barfile}). A file named as specified by \texttt{gencost\_barfile} gets created where averaged fields are written progressively as the model steps forward in time. After the final time step this file is re-read by \texttt{cost\_generic.F} to compute the corresponding cost function term. If \texttt{gencost\_outputlevel} = 1 and \texttt{gencost\_name}=`foo' then \texttt{cost\_generic.F} outputs model-data misfit fields (i.e., $\vec{d}_i$) to a file named `misfit\_foo.data' for offline analysis and visualization. In the current implementation, model-data error covariance matrices $R_i$ omit non-diagonal terms. Specifying $R_i$ thus boils down to providing uncertainty fields ($\sigma_i$ such that $R_i=\sigma_i^2$) in a file specified via \texttt{gencost\_errfile}. By default $\sigma_i$ is assumed to be time-invariant but a $\sigma_i$ time series of the same length as the $\vec{o}_i$ time series can be provided using the \texttt{variaweight} option (table~\ref{tbl:gencost_ecco_preproc}). By default cost functions are quadratic but $\vec{d}_i^T R_i^{-1} \vec{d}_i$ can be replaced with $R_i^{-1/2} \vec{d}_i$ using the \texttt{nosumsq} option (table~\ref{tbl:gencost_ecco_preproc}). In principle, any averaging frequency should be possible, but only {`day'}, {`month'}, {`step'}, and {`const'} are implemented for \texttt{gencost\_avgperiod}. If two different averaging frequencies are needed for a variable used in multiple cost function terms (e.g., daily and monthly) then an extension starting with `\_' should be added to \texttt{gencost\_barfile} (such as `\_day' and `\_mon'). \footnote{ecco\_check may be missing a test for conflicting names...} If two cost function terms use the same variable and frequency, however, then using a common \texttt{gencost\_barfile} saves disk space. Climatologies of $\vec{m}_i$ can be formed from the time series of model averages in order to compare with climatologies of $\vec{o}_i$ by activating the `clim' option via \texttt{gencost\_preproc} and setting the corresponding \texttt{gencost\_preproc\_i} integer parameter to the number of records (i.e., a \# of months, days, or time steps) per climatological cycle. The generic post-processor ($\mathcal{P}$ in Eq.~\eqref{eq:Jposproc}) also allows model-data misfits to be, for example, smoothed in space by setting \texttt{gencost\_posproc} to {`smooth'} and specifying the smoother parameters via \texttt{gencost\_posproc\_c} and \texttt{gencost\_posproc\_i} (see table~\ref{tbl:gencost_ecco_preproc}). Other options associated with the computation of Eq.~\eqref{eq:Jtotal} are summarized in table~\ref{tbl:gencost_ecco_preproc} and further discussed below. Multiple \texttt{gencost\_preproc} / \texttt{gencost\_posproc} options may be specified per cost term. In general the specification of \texttt{gencost\_name} is optional, has no impact on the end-result, and only serves to distinguish between cost function terms amongst the model output (STDOUT.0000, STDERR.0000, costfunction000, misfit*.data). Exceptions listed in table~\ref{tbl:gencost_ecco_name} however activate alternative cost function codes (in place of \texttt{cost\_generic.F}) described in section~\ref{v4custom}. In this section and in table~\ref{tbl:gencost_ecco_barfile} (unlike in other parts of the manual) `zonal' / `meridional' are to be taken literally and these components are centered (i.e., not at the staggered model velocity points). Preparing gridded velocity data sets for use in cost functions thus boils down to interpolating them to XC / YC. \begin{table}[!ht] \centering \begin{tabular}{lll} parameter & type & function \\ \hline %\texttt{using\_gencost} & logical & Turns specified generic cost term on. \\ \texttt{gencost\_name} & character(*) & Name of cost term \\ \texttt{gencost\_barfile} & character(*) & File to receive model counterpart $\vec{m}_i$ (see table~\ref{tbl:gencost_ecco_barfile}) \\ \texttt{gencost\_datafile} & character(*) & File containing observational data $\vec{o}_i$ \\ \texttt{gencost\_avgperiod} & character(5) & Averaging period for $\vec{o}_i$ and $\vec{m}_i$ (see text) \\ \texttt{gencost\_outputlevel} & integer & Greater than 0 will output misfit fields\\ \texttt{gencost\_errfile} & character(*) & Uncertainty field name (not used in section~\ref{intgen})\\ \texttt{gencost\_mask} & character(*) & Mask file name root (used only in section~\ref{intgen}) \\ \texttt{mult\_gencost} & real & Multiplier $\alpha_i$ (default: 1) \\ \hline \texttt{gencost\_preproc} & character(*) & Preprocessor names \\ \texttt{gencost\_preproc\_c} & character(*) & Preprocessor character arguments \\ \texttt{gencost\_preproc\_i} & integer(*) & Preprocessor integer arguments \\ \texttt{gencost\_preproc\_r} & real(*) & Preprocessor real arguments \\ \texttt{gencost\_posproc} & character(*) & Post-processor names \\ \texttt{gencost\_posproc\_c} & character(*) & Post-processor character arguments \\ \texttt{gencost\_posproc\_i} & integer(*) & Post-processor integer arguments \\ \texttt{gencost\_posproc\_r} & real(*) & Post-processor real arguments \\ \hline \texttt{gencost\_spmin} & real & Data less than this value will be omitted \\ \texttt{gencost\_spmax} & real & Data greater than this value will be omitted \\ \texttt{gencost\_spzero} & real & Data points equal to this value will be omitted \\ \texttt{gencost\_startdate1} & integer & Start date of observations (YYYMMDD) \\ \texttt{gencost\_startdate2} & integer & Start date of observations (HHMMSS) \\ \texttt{gencost\_is3d} & logical & Needs to be true for 3D fields \\ \hline \texttt{gencost\_enddate1} & integer & Not fully implemented (used only in sec.~\ref{v4custom})\\ \texttt{gencost\_enddate2} & integer & Not fully implemented (used only in sec.~\ref{v4custom})\\ \end{tabular} \caption{Parameters in \texttt{ecco\_gencost\_nml} namelist in \texttt{data.ecco}. All parameters are vectors of length \texttt{NGENCOST} (the \# of available cost terms) except for \texttt{gencost\_*proc*} are arrays of size \texttt{NGENPPROC}$\times$\texttt{NGENCOST}. Notes: \texttt{gencost\_is3d} is automatically reset to true in all 3D cases in table~\ref{tbl:gencost_ecco_barfile}; NGENCOST (20) and NGENPPROC (10) can be changed in ecco.h only at compile time.} \label{tbl:gencost_ecco_params} \end{table} \begin{table}[!ht] \centering \begin{tabular}{lll} variable name & description & remarks \\ \hline\hline \texttt{m\_eta} & sea surface height & free surface + ice + global steric correction \\ \texttt{m\_sst} & sea surface temperature & first level potential temperature \\ \texttt{m\_sss} & sea surface salinity & first level salinity \\ \texttt{m\_bp} & bottom pressure & phiHydLow\\ \texttt{m\_siarea} & sea-ice area & from pkg/seaice \\ \texttt{m\_siheff} & sea-ice effective thickness & from pkg/seaice \\ \texttt{m\_sihsnow} & snow effective thickness & from pkg/seaice \\ \hline \texttt{m\_theta} & potential temperature & three-dimensional \\ \texttt{m\_salt} & salinity & three-dimensional \\ \texttt{m\_UE} & zonal velocity & three-dimensional \\ \texttt{m\_VN} & meridional velocity & three-dimensional \\ \hline \texttt{m\_ustress} & zonal wind stress & from pkg/exf \\ \texttt{m\_vstress} & meridional wind stress & from pkg/exf\\ \texttt{m\_uwind} & zonal wind & from pkg/exf\\ \texttt{m\_vwind} & meridional wind & from pkg/exf\\ \texttt{m\_atemp} & atmospheric temperature & from pkg/exf\\ \texttt{m\_aqh} & atmospheric specific humidity & from pkg/exf\\ \texttt{m\_precip} & precipitation & from pkg/exf\\ \texttt{m\_swdown} & downward shortwave & from pkg/exf\\ \texttt{m\_lwdown} & downward longwave & from pkg/exf\\ \texttt{m\_wspeed} & wind speed & from pkg/exf\\ \hline \texttt{m\_diffkr} & vertical/diapycnal diffusivity & three-dimensional, constant \\ \texttt{m\_kapgm} & GM diffusivity & three-dimensional, constant \\ \texttt{m\_kapredi} & isopycnal diffusivity & three-dimensional, constant \\ \texttt{m\_geothermalflux} & geothermal heat flux & constant \\ \texttt{m\_bottomdrag} & bottom drag & constant \\ \end{tabular} \caption{Implemented \texttt{gencost\_barfile} options (as of checkpoint \mitgcmCheckpointVersion) that can be used via \texttt{cost\_generic.F} (section~\ref{costgen}). An extension starting with `\_' can be appended at the end of the variable name to distinguish between separate cost function terms. Note: the `m\_eta' formula depends on the \texttt{ATMOSPHERIC\_LOADING} and \texttt{ALLOW\_PSBAR\_STERIC} compile time options and `useRealFreshWaterFlux' run time parameter.} \label{tbl:gencost_ecco_barfile} \end{table} \begin{table}[!ht] \centering \begin{tabular}{lll} name & description & specs needed via \texttt{gencost\_preproc\_i}, \texttt{\_r}, or \texttt{\_c} \\ \hline\hline \texttt{gencost\_preproc} \\ \hline \texttt{clim} & Use climatological misfits & integer: no.\ of records per climatological cycle \\ \texttt{mean} & Use time mean of misfits & --- \\ \texttt{anom} & Use anomalies from time mean & --- \\ \texttt{variaweight} & Use time-varying weight $W_i$& --- \\ \texttt{nosumsq} & Use linear misfits & --- \\ \texttt{factor} & Multiply $\vec{m}_i$ by a scaling factor & real: the scaling factor \\ \hline \hline \texttt{gencost\_posproc} \\ \hline \texttt{smooth} & Smooth misfits & character: smoothing scale file\\ & & integer: smoother \# of time steps \\ \end{tabular} \caption{\texttt{gencost\_preproc} and \texttt{gencost\_posproc} options implemented as of checkpoint \mitgcmCheckpointVersion. Note: the distinction between \texttt{gencost\_preproc} and \texttt{gencost\_posproc} seems unclear and may be revisited in the future.} \label{tbl:gencost_ecco_preproc} \end{table} \clearpage \subsection{Generic Integral Function} \label{intgen} The functionality described in this section is operated by \texttt{cost\_gencost\_boxmean.F}. It is primarily aimed at obtaining a mechanistic understanding of a chosen physical variable via adjoint sensitivity computations (see Chapter~\ref{chap:autodiff}) as done for example in \cite{maro-eta:99,heim-eta:11,fuku-etal:14}. Thus the quadratic term in Eq.~\ref{eq:Jtotal} ($\vec{d}_i^T R_i^{-1} \vec{d}_i$) is by default replaced with a $d_i$ scalar\footnote{The quadratic option in fact does not yet exist in \texttt{cost\_gencost\_boxmean.F}...} that derives from model fields through a generic integral formula (Eq.~\ref{eq:Jpreproc}). The specification of \texttt{gencost\_barfile} again selects the physical variable type. Current valid options to use \texttt{cost\_gencost\_boxmean.F} are reported in table~\ref{tbl:genint_ecco_barfile}. A suffix starting with \texttt{`\_'} can again be appended to \texttt{gencost\_barfile}. % and the basic averaging frequency is specified via \texttt{gencost\_avgperiod}. The integral formula is defined by masks provided via binary files which names are specified via \texttt{gencost\_mask}. There are two cases: (1) if \texttt{gencost\_mask = `foo\_mask'} and \texttt{gencost\_barfile} is of the `m\_boxmean*' type then the model will search for horizontal, vertical, and temporal mask files named \texttt{foo\_maskC}, \texttt{foo\_maskK}, and \texttt{foo\_maskT}; (2) if instead \texttt{gencost\_barfile} is of the `m\_horflux\_*' type then the model will search for \texttt{foo\_maskW}, \texttt{foo\_maskS}, \texttt{foo\_maskK}, and \texttt{foo\_maskT}. The `C' mask or the `W' / `S' masks are expected to be two-dimensional fields. The `K' and `T' masks (both optional; all 1 by default) are expected to be one-dimensional vectors. The `K' vector length should match Nr. The `T' vector length should match the \# of records that the specification of \texttt{gencost\_avgperiod} implies but there is no restriction on its values. In case \#1 (`m\_boxmean*') the `C' and `K' masks should consists of +1 and 0 values and a volume average will be computed accordingly. In case \#2 (`m\_horflux*') the `W', `S', and `K' masks should consists of +1, -1, and 0 values and an integrated horizontal transport (or overturn) will be computed accordingly. \begin{table}[!ht] \centering \begin{tabular}{lll} variable name & description & remarks \\ \hline\hline \texttt{m\_boxmean\_theta} & mean of theta over box & specify box \\ \texttt{m\_boxmean\_salt} & mean of salt over box & specify box \\ \texttt{m\_boxmean\_eta} & mean of SSH over box & specify box \\ \hline \texttt{m\_horflux\_vol} & volume transport through section & specify transect \\ \end{tabular} \caption{Implemented \texttt{gencost\_barfile} options (as of checkpoint \mitgcmCheckpointVersion) that can be used via \texttt{cost\_gencost\_boxmean.F} (section~\ref{intgen}).} \label{tbl:genint_ecco_barfile} \end{table} \subsection{Custom Cost Functions} \label{v4custom} This section (very much a work in progress...) pertains to the special cases of \texttt{cost\_gencost\_bpv4.F}, \texttt{cost\_gencost\_seaicev4.F}, \texttt{cost\_gencost\_sshv4.F}, \texttt{cost\_gencost\_sstv4.F}, and \texttt{cost\_gencost\_transp.F}. The cost\_gencost\_transp.F function can be used to compute a transport of volume, heat, or salt through a specified section (non quadratic cost function). To this end one sets \texttt{gencost\_name = `transp*'}, where \texttt{*} is an optional suffix starting with \texttt{`\_'}, and set \texttt{gencost\_barfile} to one of \texttt{m\_trVol}, \texttt{m\_trHeat}, and \texttt{m\_trSalt}. \begin{table}[!ht] \centering \begin{tabular}{lll} name & description & remarks \\ \hline\hline \texttt{sshv4-mdt} & sea surface height & mean dynamic topography (SSH - geod) \\ \texttt{sshv4-tp} & sea surface height & Along-Track Topex/Jason SLA (level 3) \\ \texttt{sshv4-ers} & sea surface height & Along-Track ERS/Envisat SLA (level 3)\\ \texttt{sshv4-gfo} & sea surface height & Along-Track GFO class SLA (level 3)\\ \texttt{sshv4-lsc} & sea surface height & Large-Scale SLA (from the above)\\ \texttt{sshv4-gmsl} & sea surface height & Global-Mean SLA (from the above)\\ \hline \texttt{bpv4-grace} & bottom pressure & GRACE maps (level 4) \\ \hline \texttt{sstv4-amsre} & sea surface temperature & Along-Swath SST (level 3)\\ \texttt{sstv4-amsre-lsc} & sea surface temperature & Large-Scale SST (from the above)\\ \hline \texttt{si4-cons} & sea ice concentration & needs sea-ice adjoint (level 4)\\ \texttt{si4-deconc} & model sea ice deficiency & proxy penalty (from the above)\\ \texttt{si4-exconc} & model sea ice excess & proxy penalty (from the above)\\ \hline \texttt{transp\_trVol} & volume transport & specify section as in section~\ref{intgen}\\ \texttt{transp\_trHeat} & heat transport & specify section as in section~\ref{intgen} \\ \texttt{transp\_trSalt} & salt transport & specify section as in section~\ref{intgen} \\ \end{tabular} \caption{Pre-defined \texttt{gencost\_name} special cases (as of checkpoint \mitgcmCheckpointVersion; section~\ref{v4custom}).} \label{tbl:gencost_ecco_name} \end{table} \subsection{Key Routines} TBA... \bigskip \texttt{ecco\_readparms.F}, \texttt{ecco\_check.F}, \texttt{ecco\_summary.F}, ... \texttt{cost\_generic.F}, \texttt{cost\_gencost\_boxmean.F}, \texttt{ecco\_toolbox.F}, ... \texttt{ecco\_phys.F}, \texttt{cost\_gencost\_customize.F}, \texttt{cost\_averagesfields.F}, ... \subsection{Compile Options} TBA... \bigskip ALLOW\_GENCOST\_CONTRIBUTION, ALLOW\_GENCOST3D, ... ALLOW\_PSBAR\_STERIC, ALLOW\_SHALLOW\_ALTIMETRY, ALLOW\_HIGHLAT\_ALTIMETRY, ... ALLOW\_PROFILES\_CONTRIBUTION, ... ALLOW\_ECCO\_OLD\_FC\_PRINT, ... ECCO\_CTRL\_DEPRECATED, ... \bigskip packages required for some functionalities: smooth, profiles, ctrl