1 |
\section{ECCO: model-data comparisons using gridded data sets} |
2 |
\label{sec:pkg:ecco} |
3 |
\begin{rawhtml} |
4 |
<!-- CMIREDIR:package_ecco: --> |
5 |
\end{rawhtml} |
6 |
|
7 |
\def\mitgcmCheckpointVersion{65x} |
8 |
|
9 |
The functionalities implemented in \texttt{pkg/ecco} are: (1) output time-averaged model fields to compare with gridded data sets; (2) compute normalized model-data distances (i.e., cost functions); (3) compute averages and transports (i.e., integrals). The former is achieved as the model runs forwards in time whereas the others occur after time-integration has completed. Following \cite{for-eta:15} the total cost function is formulated generically as |
10 |
\begin{align} |
11 |
\mathcal{J}(\vec{u}) &= \sum_i \alpha_i \left(\vec{d}_i^T R_i^{-1} \vec{d}_i\right) + \sum_j \beta_j \vec{u}^T\vec{u}, \label{eq:Jtotal} \\ |
12 |
\vec{d}_i &= \mathcal{P}(\vec{m}_i - \vec{o}_i), \label{eq:Jposproc} \\ |
13 |
\vec{m}_i &= \mathcal{S}\mathcal{D}\mathcal{M}(\vec{v}), \label{eq:Jpreproc} \\ |
14 |
\vec{v} &= \mathcal{Q}(\vec{u}), \label{eq:Upreproc} \\ |
15 |
\vec{u} &= \mathcal{R}(\vec{u}') \label{eq:Uprecond} |
16 |
\end{align} |
17 |
using symbols defined in table~\ref{tbl:gencost_symbols}. Per Eq.~\eqref{eq:Jpreproc} model counterparts ($\vec{m}_i$) to observational data ($\vec{o}_i$) derive from adjustable model parameters ($\vec{v}$) through model dynamics integration ($\mathcal{M}$), diagnostic calculations ($\mathcal{D}$), and averaging in space and time ($\mathcal{S}$). Alternatively $\mathcal{S}$ stands for subsampling in space and time (section~\ref{sec:pkg:profiles}). Plain model-data misfits ($\vec{m}_i-\vec{o}_i$) can be penalized directly in Eq.~\eqref{eq:Jtotal} but penalized misfits ($\vec{d}_i$) more generally derive from $\vec{m}_i-\vec{o}_i$ through the generic $\mathcal{P}$ post-processor (Eq. \eqref{eq:Jposproc}). Eqs.~\eqref{eq:Upreproc}-\eqref{eq:Uprecond} pertain to model control parameter adjustment capabilities described in section~\ref{sec:pkg:ctrl}. |
18 |
|
19 |
\begin{table}[!ht] |
20 |
\centering |
21 |
\begin{tabular}{rl} |
22 |
symbol & definition \\ \hline |
23 |
$\vec{u}$ & vector of nondimensional control variables \\ |
24 |
$\vec{v}$ & vector of dimensional control variables \\ |
25 |
$\alpha_i, \beta_j$ & misfit and control cost function multipliers (1 by default) \\ |
26 |
$R_i$ & data error covariance matrix ($R_i^{-1}$ are weights) \\ |
27 |
$\vec{d}_i$ & a set of model-data differences \\ |
28 |
$\vec{o}_i$ & observational data vector \\ |
29 |
$\vec{m}_i$ & model counterpart to $\vec{o}_i$ \\ |
30 |
$\mathcal{P}$ & post-processing operator (e.g., a smoother) \\ |
31 |
$\mathcal{M}$ & forward model dynamics operator \\ |
32 |
$\mathcal{D}$ & diagnostic computation operator \\ |
33 |
$\mathcal{S}$ & averaging/subsampling operator \\ |
34 |
$\mathcal{Q}$ & Pre-processing operator \\ |
35 |
$\mathcal{R}$ & Pre-conditioning operator |
36 |
\end{tabular} |
37 |
\caption{Symbol definitions for pkg/ecco and pkg/ctrl generic cost functions.} |
38 |
\label{tbl:gencost_symbols} |
39 |
\end{table} |
40 |
|
41 |
\subsection{Generic Cost Function} \label{costgen} |
42 |
|
43 |
The parameters available for configuring generic cost function terms in \texttt{data.ecco} are given in table~\ref{tbl:gencost_ecco_params} and examples of possible specifications are available in: |
44 |
\begin{itemize} |
45 |
\itemsep0em |
46 |
\item MITgcm\_contrib/verification\_other/global\_oce\_cs32/input/data.ecco |
47 |
\item MITgcm\_contrib/verification\_other/global\_oce\_cs32/input\_ad.sens/data.ecco |
48 |
\item MITgcm\_contrib/gael/verification/global\_oce\_llc90/input.ecco\_v4/data.ecco |
49 |
\end{itemize} |
50 |
|
51 |
\noindent |
52 |
The gridded observation file name is specified by \texttt{gencost\_datafile}. Observational time series may be provided as on big file or split into yearly files finishing in `\_1992', `\_1993', etc. The corresponding $\vec{m}_i$ physical variable is specified via the \texttt{gencost\_barfile} root (see table~\ref{tbl:gencost_ecco_barfile}). A file named as specified by \texttt{gencost\_barfile} gets created where averaged fields are written progressively as the model steps forward in time. After the final time step this file is re-read by \texttt{cost\_generic.F} to compute the corresponding cost function term. If \texttt{gencost\_outputlevel} = 1 and \texttt{gencost\_name}=`foo' then \texttt{cost\_generic.F} outputs model-data misfit fields (i.e., $\vec{d}_i$) to a file named `misfit\_foo.data' for offline analysis and visualization. |
53 |
|
54 |
In the current implementation, model-data error covariance matrices $R_i$ omit non-diagonal terms. Specifying $R_i$ thus boils down to providing uncertainty fields ($\sigma_i$ such that $R_i=\sigma_i^2$) in a file specified via \texttt{gencost\_errfile}. By default $\sigma_i$ is assumed to be time-invariant but a $\sigma_i$ time series of the same length as the $\vec{o}_i$ time series can be provided using the \texttt{variaweight} option (table~\ref{tbl:gencost_ecco_preproc}). By default cost functions are quadratic but $\vec{d}_i^T R_i^{-1} \vec{d}_i$ can be replaced with $R_i^{-1/2} \vec{d}_i$ using the \texttt{nosumsq} option (table~\ref{tbl:gencost_ecco_preproc}). |
55 |
|
56 |
In principle, any averaging frequency should be possible, but only {`day'}, {`month'}, {`step'}, and {`const'} are implemented for \texttt{gencost\_avgperiod}. If two different averaging frequencies are needed for a variable used in multiple cost function terms (e.g., daily and monthly) then an extension starting with `\_' should be added to \texttt{gencost\_barfile} (such as `\_day' and `\_mon'). \footnote{ecco\_check may be missing a test for conflicting names...} If two cost function terms use the same variable and frequency, however, then using a common \texttt{gencost\_barfile} saves disk space. |
57 |
|
58 |
Climatologies of $\vec{m}_i$ can be formed from the time series of model averages in order to compare with climatologies of $\vec{o}_i$ by activating the `clim' option via \texttt{gencost\_preproc} and setting the corresponding \texttt{gencost\_preproc\_i} integer parameter to the number of records (i.e., a \# of months, days, or time steps) per climatological cycle. The generic post-processor ($\mathcal{P}$ in Eq.~\eqref{eq:Jposproc}) also allows model-data misfits to be, for example, smoothed in space by setting \texttt{gencost\_posproc} to {`smooth'} and specifying the smoother parameters via \texttt{gencost\_posproc\_c} and \texttt{gencost\_posproc\_i} (see table~\ref{tbl:gencost_ecco_preproc}). Other options associated with the computation of Eq.~\eqref{eq:Jtotal} are summarized in table~\ref{tbl:gencost_ecco_preproc} and further discussed below. Multiple \texttt{gencost\_preproc} / \texttt{gencost\_posproc} options may be specified per cost term. |
59 |
|
60 |
In general the specification of \texttt{gencost\_name} is optional, has no impact on the end-result, and only serves to distinguish between cost function terms amongst the model output (STDOUT.0000, STDERR.0000, costfunction000, misfit*.data). Exceptions listed in table~\ref{tbl:gencost_ecco_name} however activate alternative cost function codes (in place of \texttt{cost\_generic.F}) described in section~\ref{v4custom}. The specification of \texttt{gencost\_mask}\footnote{This should be renamed \texttt{gencost\_loc} or \texttt{gencost\_point}...} allows the user to specify whether the gridded data input and model counterparts are located at tracer points (`c'; the default) or velocity points (`w' or `s'). However the `c' option (not `w' or `s') should be used when gridded velocity data is provided as zonal/meridional components at tracer points (e.g., for all vector cases listed in table~\ref{tbl:gencost_ecco_barfile}). |
61 |
|
62 |
\begin{table}[!ht] |
63 |
\centering |
64 |
\begin{tabular}{lll} |
65 |
parameter & type & function \\ \hline |
66 |
%\texttt{using\_gencost} & logical & Turns specified generic cost term on. \\ |
67 |
\texttt{gencost\_name} & character(*) & Name of cost term \\ |
68 |
\texttt{gencost\_barfile} & character(*) & File to receive model counterpart $\vec{m}_i$ (see table~\ref{tbl:gencost_ecco_barfile}) \\ |
69 |
\texttt{gencost\_datafile} & character(*) & File containing observational data $\vec{o}_i$ \\ |
70 |
\texttt{gencost\_avgperiod} & character(5) & Averaging period for $\vec{o}_i$ and $\vec{m}_i$ (see text) \\ |
71 |
\texttt{gencost\_outputlevel} & integer & Greater than 0 will output misfit fields\\ |
72 |
\texttt{gencost\_errfile} & character(*) & File containing diagonal of error matrix $R_i$\\ |
73 |
\texttt{mult\_gencost} & real & Multiplier $\alpha_i$ (default: 1) \\ |
74 |
\hline |
75 |
\texttt{gencost\_preproc} & character(*) & Preprocessor names \\ |
76 |
\texttt{gencost\_preproc\_c} & character(*) & Preprocessor character arguments \\ |
77 |
\texttt{gencost\_preproc\_i} & integer(*) & Preprocessor integer arguments \\ |
78 |
\texttt{gencost\_preproc\_r} & real(*) & Preprocessor real arguments \\ |
79 |
\texttt{gencost\_posproc} & character(*) & Post-processor names \\ |
80 |
\texttt{gencost\_posproc\_c} & character(*) & Post-processor character arguments \\ |
81 |
\texttt{gencost\_posproc\_i} & integer(*) & Post-processor integer arguments \\ |
82 |
\texttt{gencost\_posproc\_r} & real(*) & Post-processor real arguments \\ |
83 |
\hline |
84 |
\texttt{gencost\_mask} & character(1) & Location of $\vec{m}_i$\\ |
85 |
\texttt{gencost\_spmin} & real & Data less than this value will be omitted \\ |
86 |
\texttt{gencost\_spmax} & real & Data greater than this value will be omitted \\ |
87 |
\texttt{gencost\_spzero} & real & Data points equal to this value will be omitted \\ |
88 |
\texttt{gencost\_startdate1} & integer & Start date of observations (YYYMMDD) \\ |
89 |
\texttt{gencost\_startdate2} & integer & Start date of observations (HHMMSS) \\ |
90 |
\texttt{gencost\_is3d} & logical & Needs to be true for 3D fields \\ |
91 |
\hline |
92 |
\texttt{gencost\_enddate1} & integer & Not fully implemented (sec.~\ref{v4custom} only)\\ |
93 |
\texttt{gencost\_enddate2} & integer & Not fully implemented (sec.~\ref{v4custom} only)\\ |
94 |
\end{tabular} |
95 |
\caption{Parameters in \texttt{ecco\_gencost\_nml} namelist in \texttt{data.ecco}. All parameters are vectors of length \texttt{NGENCOST} (the \# of available cost terms) except for \texttt{gencost\_*proc*} are arrays of size \texttt{NGENPPROC}$\times$\texttt{NGENCOST}. Notes: \texttt{gencost\_is3d} will automatically be reset to true in cases listed as 3D in table~\ref{tbl:gencost_ecco_barfile}; the last group of parameters should be disregarded except for the section~\ref{v4custom} special cases.} |
96 |
\label{tbl:gencost_ecco_params} |
97 |
\end{table} |
98 |
|
99 |
\begin{table}[!ht] |
100 |
\centering |
101 |
\begin{tabular}{lll} |
102 |
variable name & description & remarks \\ \hline\hline |
103 |
\texttt{m\_eta} & sea surface height & free surface + ice + global steric correction \\ |
104 |
\texttt{m\_sst} & sea surface temperature & first level potential temperature \\ |
105 |
\texttt{m\_sss} & sea surface salinity & first level salinity \\ |
106 |
\texttt{m\_bp} & bottom pressure & phiHydLow\\ |
107 |
\texttt{m\_siarea} & sea-ice area & from pkg/seaice \\ |
108 |
\texttt{m\_siheff} & sea-ice effective thickness & from pkg/seaice \\ |
109 |
\texttt{m\_sihsnow} & snow effective thickness & from pkg/seaice \\ \hline |
110 |
\texttt{m\_theta} & potential temperature & three-dimensional \\ |
111 |
\texttt{m\_salt} & salinity & three-dimensional \\ |
112 |
\texttt{m\_UE} & zonal velocity & three-dimensional \\ |
113 |
\texttt{m\_VN} & meridional velocity & three-dimensional \\ \hline |
114 |
\texttt{m\_ustress} & zonal wind stress & from pkg/exf \\ |
115 |
\texttt{m\_vstress} & meridional wind stress & from pkg/exf\\ |
116 |
\texttt{m\_uwind} & zonal wind & from pkg/exf\\ |
117 |
\texttt{m\_vwind} & meridional wind & from pkg/exf\\ |
118 |
\texttt{m\_atemp} & atmospheric temperature & from pkg/exf\\ |
119 |
\texttt{m\_aqh} & atmospheric specific humidity & from pkg/exf\\ |
120 |
\texttt{m\_precip} & precipitation & from pkg/exf\\ |
121 |
\texttt{m\_swdown} & downward shortwave & from pkg/exf\\ |
122 |
\texttt{m\_lwdown} & downward longwave & from pkg/exf\\ |
123 |
\texttt{m\_wspeed} & wind speed & from pkg/exf\\ \hline |
124 |
\texttt{m\_diffkr} & vertical/diapycnal diffusivity & three-dimensional, constant \\ |
125 |
\texttt{m\_kapgm} & GM diffusivity & three-dimensional, constant \\ |
126 |
\texttt{m\_kapredi} & isopycnal diffusivity & three-dimensional, constant \\ |
127 |
\texttt{m\_geothermalflux} & geothermal heat flux & constant \\ |
128 |
\texttt{m\_bottomdrag} & bottom drag & constant \\ |
129 |
\end{tabular} |
130 |
\caption{Implemented \texttt{gencost\_barfile} options (as of checkpoint \mitgcmCheckpointVersion) that can be used via \texttt{cost\_generic.F} (section~\ref{costgen}). An extension starting with `\_' can be appended at the end of the variable name to distinguish between separate cost function terms. Notes: here `zonal' / `meridional' are to be taken literally (unlike in other parts of the manual) and these components are centered (i.e., not at the staggered C-grid velocity points); the `m\_eta' formula depends on the \texttt{ATMOSPHERIC\_LOADING} and \texttt{ALLOW\_PSBAR\_STERIC} compile time options and `useRealFreshWaterFlux' run time parameter.} |
131 |
\label{tbl:gencost_ecco_barfile} |
132 |
\end{table} |
133 |
|
134 |
\begin{table}[!ht] |
135 |
\centering |
136 |
\begin{tabular}{lll} |
137 |
name & description & specs needed via \texttt{gencost\_preproc\_i}, \texttt{\_r}, or \texttt{\_c} \\ \hline\hline |
138 |
\texttt{gencost\_preproc} \\ \hline |
139 |
\texttt{clim} & Use climatological misfits & integer: no.\ of records per climatological cycle \\ |
140 |
\texttt{mean} & Use time mean of misfits & --- \\ |
141 |
\texttt{anom} & Use anomalies from time mean & --- \\ |
142 |
\texttt{variaweight} & Use time-varying weight $W_i$& --- \\ |
143 |
\texttt{nosumsq} & Use linear misfits & --- \\ |
144 |
\texttt{factor} & Multiply $\vec{m}_i$ by a scaling factor & real: the scaling factor \\ \hline \hline |
145 |
\texttt{gencost\_posproc} \\ \hline |
146 |
\texttt{smooth} & Smooth misfits & character: smoothing scale file\\ |
147 |
& & integer: smoother \# of time steps \\ |
148 |
\end{tabular} |
149 |
\caption{\texttt{gencost\_preproc} and \texttt{gencost\_posproc} options implemented as of checkpoint \mitgcmCheckpointVersion. Note: the distinction between \texttt{gencost\_preproc} and \texttt{gencost\_posproc} may be revisited in the future.} |
150 |
\label{tbl:gencost_ecco_preproc} |
151 |
\end{table} |
152 |
|
153 |
\clearpage |
154 |
|
155 |
\subsection{Generic Integral Function} \label{intgen} |
156 |
|
157 |
The functionality described in this section is operated by \texttt{cost\_gencost\_boxmean.F}. It is primarily aimed at obtaining a mechanistic understanding of a chosen physical variable via adjoint sensitivity computations (see Chapter~\ref{chap:autodiff}) as done for example in \cite{maro-eta:99,heim-eta:11,fuku-etal:14}. Thus the quadratic term in Eq.~\ref{eq:Jtotal} ($\vec{d}_i^T R_i^{-1} \vec{d}_i$) is by default replaced with a $d_i$ scalar\footnote{The quadratic option in fact does not yet exist in \texttt{cost\_gencost\_boxmean.F}...} that derives from model fields through a generic integral formula (Eq.~\ref{eq:Jpreproc}). The specification of \texttt{gencost\_barfile} again selects the physical variable type. Current valid options to use \texttt{cost\_gencost\_boxmean.F} are reported in table~\ref{tbl:genint_ecco_barfile}. A suffix starting with \texttt{`\_'} can again be appended to \texttt{gencost\_barfile}. |
158 |
% and the basic averaging frequency is specified via \texttt{gencost\_avgperiod}. |
159 |
|
160 |
The integral formula is defined by masks provided via binary files which names are specified via \texttt{gencost\_errfile}. There are two cases: (1) if \texttt{gencost\_errfile = `foo\_mask'} and \texttt{gencost\_barfile} is of the `m\_boxmean*' type then the model will search for horizontal, vertical, and temporal mask files named \texttt{foo\_maskC}, \texttt{foo\_maskK}, and \texttt{foo\_maskT}; (2) if instead \texttt{gencost\_barfile} is of the `m\_horflux\_*' type then the model will search for \texttt{foo\_maskW}, \texttt{foo\_maskS}, \texttt{foo\_maskK}, and \texttt{foo\_maskT}. |
161 |
|
162 |
The `C' mask or the `W' / `S' masks are expected to be two-dimensional fields. The `K' and T masks (both optional; all 1 by default) are expected to be one-dimensional vectors. The`K' vector length should match Nr. The `T' vector length should match the \# of records that the specification of \texttt{gencost\_avgperiod} implies but there is no restriction on its values. In case \#1 (`m\_boxmean*') the `C' and `K' masks should consists of +1 and 0 values and a volume average will be computed accordingly. In case \#2 (`m\_horflux*') the `W', `S', and `K' masks should consists of +1, -1, and 0 values and an integrated horizontal transport (or overturn) will be computed accordingly. |
163 |
|
164 |
\begin{table}[!ht] |
165 |
\centering |
166 |
\begin{tabular}{lll} |
167 |
variable name & description & remarks \\ \hline\hline |
168 |
\texttt{m\_boxmean\_theta} & mean of theta over box & specify box \\ |
169 |
\texttt{m\_boxmean\_salt} & mean of salt over box & specify box \\ |
170 |
\texttt{m\_boxmean\_eta} & mean of SSH over box & specify box \\ |
171 |
\hline |
172 |
\texttt{m\_horflux\_vol} & volume transport through section & specify transect \\ |
173 |
\end{tabular} |
174 |
\caption{Implemented \texttt{gencost\_barfile} options (as of checkpoint \mitgcmCheckpointVersion) that can be used via \texttt{cost\_gencost\_boxmean.F} (section~\ref{intgen}).} |
175 |
\label{tbl:genint_ecco_barfile} |
176 |
\end{table} |
177 |
|
178 |
\subsection{Custom Cost Functions} \label{v4custom} |
179 |
|
180 |
This section (very much a work in progress...) pertains to the special cases of \texttt{cost\_gencost\_bpv4.F}, \texttt{cost\_gencost\_seaicev4.F}, \texttt{cost\_gencost\_sshv4.F}, \texttt{cost\_gencost\_sstv4.F}, and \texttt{cost\_gencost\_transp.F}. The cost\_gencost\_transp.F function can be used to compute a transport of volume, heat, or salt through a specified section (non quadratic cost function). To this end one sets \texttt{gencost\_name = `transp*'}, where \texttt{*} is an optional suffix starting with \texttt{`\_'}, and set \texttt{gencost\_barfile} to one of \texttt{m\_trVol}, \texttt{m\_trHeat}, and \texttt{m\_trSalt}. |
181 |
|
182 |
\begin{table}[!ht] |
183 |
\centering |
184 |
\begin{tabular}{lll} |
185 |
name & description & remarks \\ \hline\hline |
186 |
\texttt{sshv4-mdt} & sea surface height & mean dynamic topography (SSH - geod) \\ |
187 |
\texttt{sshv4-tp} & sea surface height & Along-Track Topex/Jason SLA (level 3) \\ |
188 |
\texttt{sshv4-ers} & sea surface height & Along-Track ERS/Envisat SLA (level 3)\\ |
189 |
\texttt{sshv4-gfo} & sea surface height & Along-Track GFO class SLA (level 3)\\ |
190 |
\texttt{sshv4-lsc} & sea surface height & Large-Scale SLA (from the above)\\ |
191 |
\texttt{sshv4-gmsl} & sea surface height & Global-Mean SLA (from the above)\\ \hline |
192 |
\texttt{bpv4-grace} & bottom pressure & GRACE maps (level 4) \\ \hline |
193 |
\texttt{sstv4-amsre} & sea surface temperature & Along-Swath SST (level 3)\\ |
194 |
\texttt{sstv4-amsre-lsc} & sea surface temperature & Large-Scale SST (from the above)\\ \hline |
195 |
\texttt{si4-cons} & sea ice concentration & needs sea-ice adjoint (level 4)\\ |
196 |
\texttt{si4-deconc} & model sea ice deficiency & proxy penalty (from the above)\\ |
197 |
\texttt{si4-exconc} & model sea ice excess & proxy penalty (from the above)\\ \hline |
198 |
\texttt{transp\_trVol} & volume transport & specify section as in section~\ref{intgen}\\ |
199 |
\texttt{transp\_trHeat} & heat transport & specify section as in section~\ref{intgen} \\ |
200 |
\texttt{transp\_trSalt} & salt transport & specify section as in section~\ref{intgen} \\ |
201 |
\end{tabular} |
202 |
\caption{Pre-defined \texttt{gencost\_name} special cases (as of checkpoint \mitgcmCheckpointVersion; section~\ref{v4custom}).} |
203 |
\label{tbl:gencost_ecco_name} |
204 |
\end{table} |
205 |
|
206 |
\subsection{Key Routines} |
207 |
|
208 |
TBA... \texttt{cost\_generic.F}, \texttt{cost\_gencost\_boxmean.F}, \texttt{ecco\_phys.F}, \texttt{cost\_gencost\_customize.F}, \texttt{cost\_averagesfields.F} ... \texttt{ecco\_readparms.F}, \texttt{ecco\_check.F}, \texttt{ecco\_summary.F}, ... |
209 |
|
210 |
\subsection{Compile Options} |
211 |
|
212 |
TBA... ALLOW\_GENCOST3D, ALLOW\_PSBAR\_STERIC, ECCO\_CTRL\_DEPRECATED, ... |