1 |
% $Header: /u/gcmpack/manual/part5/doc_ad_2.tex,v 1.17 2004/10/16 03:40:17 edhill Exp $ |
2 |
% $Name: $ |
3 |
|
4 |
Author: Patrick Heimbach |
5 |
|
6 |
{\sf Automatic differentiation} (AD), also referred to as algorithmic |
7 |
(or, more loosely, computational) differentiation, involves |
8 |
automatically deriving code to calculate |
9 |
partial derivatives from an existing fully non-linear prognostic code. |
10 |
(see \cite{gri:00}). |
11 |
A software tool is used that parses and transforms source files |
12 |
according to a set of linguistic and mathematical rules. |
13 |
AD tools are like source-to-source translators in that |
14 |
they parse a program code as input and produce a new program code |
15 |
as output. |
16 |
However, unlike a pure source-to-source translation, the output program |
17 |
represents a new algorithm, such as the evaluation of the |
18 |
Jacobian, the Hessian, or higher derivative operators. |
19 |
In principle, a variety of derived algorithms |
20 |
can be generated automatically in this way. |
21 |
|
22 |
The MITGCM has been adapted for use with the |
23 |
Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF |
24 |
(Transformation of Algorithms in Fortran), developed |
25 |
by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}). |
26 |
The first application of the adjoint of the MITGCM for sensitivity |
27 |
studies has been published by \cite{maro-eta:99}. |
28 |
\cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint |
29 |
for ocean state estimation studies. |
30 |
In the following we shall refer to TAMC and TAF synonymously, |
31 |
except were explicitly stated otherwise. |
32 |
|
33 |
TAMC exploits the chain rule for computing the first |
34 |
derivative of a function with |
35 |
respect to a set of input variables. |
36 |
Treating a given forward code as a composition of operations -- |
37 |
each line representing a compositional element, the chain rule is |
38 |
rigorously applied to the code, line by line. The resulting |
39 |
tangent linear or adjoint code, |
40 |
then, may be thought of as the composition in |
41 |
forward or reverse order, respectively, of the |
42 |
Jacobian matrices of the forward code's compositional elements. |
43 |
|
44 |
%********************************************************************** |
45 |
\section{Some basic algebra} |
46 |
\label{sec_ad_algebra} |
47 |
\begin{rawhtml} |
48 |
<!-- CMIREDIR:sec_ad_algebra: --> |
49 |
\end{rawhtml} |
50 |
%********************************************************************** |
51 |
|
52 |
Let $ \cal{M} $ be a general nonlinear, model, i.e. a |
53 |
mapping from the $m$-dimensional space |
54 |
$U \subset I\!\!R^m$ of input variables |
55 |
$\vec{u}=(u_1,\ldots,u_m)$ |
56 |
(model parameters, initial conditions, boundary conditions |
57 |
such as forcing functions) to the $n$-dimensional space |
58 |
$V \subset I\!\!R^n$ of |
59 |
model output variable $\vec{v}=(v_1,\ldots,v_n)$ |
60 |
(model state, model diagnostics, objective function, ...) |
61 |
under consideration, |
62 |
% |
63 |
\begin{equation} |
64 |
\begin{split} |
65 |
{\cal M} \, : & \, U \,\, \longrightarrow \, V \\ |
66 |
~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \, |
67 |
{\cal M}(\vec{u}) |
68 |
\label{fulloperator} |
69 |
\end{split} |
70 |
\end{equation} |
71 |
% |
72 |
The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t. |
73 |
some given basis vectors |
74 |
$ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and |
75 |
$ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as |
76 |
\[ |
77 |
\vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i}, |
78 |
\qquad |
79 |
\vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j} |
80 |
\] |
81 |
|
82 |
Two routes may be followed to determine the sensitivity of the |
83 |
output variable $\vec{v}$ to its input $\vec{u}$. |
84 |
|
85 |
\subsection{Forward or direct sensitivity} |
86 |
% |
87 |
Consider a perturbation to the input variables $\delta \vec{u}$ |
88 |
(typically a single component |
89 |
$\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$). |
90 |
Their effect on the output may be obtained via the linear |
91 |
approximation of the model $ {\cal M}$ in terms of its Jacobian matrix |
92 |
$ M $, evaluated in the point $u^{(0)}$ according to |
93 |
% |
94 |
\begin{equation} |
95 |
\delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u} |
96 |
\label{tangent_linear} |
97 |
\end{equation} |
98 |
with resulting output perturbation $\delta \vec{v}$. |
99 |
In components |
100 |
$M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $, |
101 |
it reads |
102 |
% |
103 |
\begin{equation} |
104 |
\delta v_{j} \, = \, \sum_{i} |
105 |
\left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \, |
106 |
\delta u_{i} |
107 |
\label{jacobi_matrix} |
108 |
\end{equation} |
109 |
% |
110 |
Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}. |
111 |
In contrast to the full nonlinear model $ {\cal M} $, the operator |
112 |
$ M $ is just a matrix |
113 |
which can readily be used to find the forward sensitivity of $\vec{v}$ to |
114 |
perturbations in $u$, |
115 |
but if there are very many input variables $(\gg O(10^{6})$ for |
116 |
large-scale oceanographic application), it quickly becomes |
117 |
prohibitive to proceed directly as in (\ref{tangent_linear}), |
118 |
if the impact of each component $ {\bf e_{i}} $ is to be assessed. |
119 |
|
120 |
\subsection{Reverse or adjoint sensitivity} |
121 |
% |
122 |
Let us consider the special case of a |
123 |
scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g. |
124 |
the total meridional heat transport, |
125 |
the total uptake of $CO_{2}$ in the Southern |
126 |
Ocean over a time interval, |
127 |
or a measure of some model-to-data misfit) |
128 |
% |
129 |
\begin{eqnarray} |
130 |
\begin{array}{cccccc} |
131 |
{\cal J} \, : & U & |
132 |
\longrightarrow & V & |
133 |
\longrightarrow & I \!\! R \\ |
134 |
~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) & |
135 |
\longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u})) |
136 |
\end{array} |
137 |
\label{compo} |
138 |
\end{eqnarray} |
139 |
% |
140 |
The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $, |
141 |
\[ |
142 |
{\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J} |
143 |
\] |
144 |
can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $ |
145 |
w.r.t. their corresponding inner product |
146 |
$\left\langle \,\, , \,\, \right\rangle $ |
147 |
% |
148 |
\begin{equation} |
149 |
\begin{split} |
150 |
{\cal J} & = \, |
151 |
{\cal J} |_{\vec{u}^{(0)}} \, + \, |
152 |
\left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle |
153 |
\, + \, O(\delta \vec{u}^2) \\ |
154 |
~ & = \, |
155 |
{\cal J} |_{\vec{v}^{(0)}} \, + \, |
156 |
\left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle |
157 |
\, + \, O(\delta \vec{v}^2) |
158 |
\end{split} |
159 |
\label{deljidentity} |
160 |
\end{equation} |
161 |
% |
162 |
(note, that the gradient $ \nabla f $ is a co-vector, therefore |
163 |
its transpose is required in the above inner product). |
164 |
Then, using the representation of |
165 |
$ \delta {\cal J} = |
166 |
\left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $, |
167 |
the definition |
168 |
of an adjoint operator $ A^{\ast} $ of a given operator $ A $, |
169 |
\[ |
170 |
\left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle = |
171 |
\left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle |
172 |
\] |
173 |
which for finite-dimensional vector spaces is just the |
174 |
transpose of $ A $, |
175 |
\[ |
176 |
A^{\ast} \, = \, A^T |
177 |
\] |
178 |
and from eq. (\ref{tangent_linear}), (\ref{deljidentity}), |
179 |
we note that |
180 |
(omitting $|$'s): |
181 |
% |
182 |
\begin{equation} |
183 |
\delta {\cal J} |
184 |
\, = \, |
185 |
\left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle |
186 |
\, = \, |
187 |
\left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle |
188 |
\, = \, |
189 |
\left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \, |
190 |
\delta \vec{u} \, \right\rangle |
191 |
\label{inner} |
192 |
\end{equation} |
193 |
% |
194 |
With the identity (\ref{deljidentity}), we then find that |
195 |
the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by |
196 |
invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $ |
197 |
% |
198 |
\begin{equation} |
199 |
\begin{split} |
200 |
\nabla _{u}{\cal J}^T |_{\vec{u}} & |
201 |
= \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\ |
202 |
~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\ |
203 |
~ & = \, \delta \vec{u}^{\ast} |
204 |
\end{split} |
205 |
\label{adjoint} |
206 |
\end{equation} |
207 |
% |
208 |
Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)}, |
209 |
in which $M^T$ is the adjoint (here, the transpose) of the |
210 |
tangent linear operator $M$, $ \delta \vec{v}^{\ast} $ |
211 |
the adjoint variable of the model state $ \vec{v} $, and |
212 |
$ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $. |
213 |
|
214 |
The {\sf reverse} nature of the adjoint calculation can be readily |
215 |
seen as follows. |
216 |
Consider a model integration which consists of $ \Lambda $ |
217 |
consecutive operations |
218 |
$ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} ( |
219 |
...... ( {\cal M}_{\lambda} ( |
220 |
...... |
221 |
( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $, |
222 |
where the ${\cal M}$'s could be the elementary steps, i.e. single lines |
223 |
in the code of the model, or successive time steps of the |
224 |
model integration, |
225 |
starting at step 0 and moving up to step $\Lambda$, with intermediate |
226 |
${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final |
227 |
${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$. |
228 |
Let ${\cal J}$ be a cost function which explicitly depends on the |
229 |
final state $\vec{v}$ only |
230 |
(this restriction is for clarity reasons only). |
231 |
% |
232 |
${\cal J}(u)$ may be decomposed according to: |
233 |
% |
234 |
\begin{equation} |
235 |
{\cal J}({\cal M}(\vec{u})) \, = \, |
236 |
{\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} ( |
237 |
...... ( {\cal M}_{\lambda} ( |
238 |
...... |
239 |
( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) ))))) |
240 |
\label{compos} |
241 |
\end{equation} |
242 |
% |
243 |
Then, according to the chain rule, the forward calculation reads, |
244 |
in terms of the Jacobi matrices |
245 |
(we've omitted the $ | $'s which, nevertheless are important |
246 |
to the aspect of {\it tangent} linearity; |
247 |
note also that by definition |
248 |
$ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle |
249 |
= \nabla_v {\cal J} \cdot \delta \vec{v} $ ) |
250 |
% |
251 |
\begin{equation} |
252 |
\begin{split} |
253 |
\nabla_v {\cal J} (M(\delta \vec{u})) & = \, |
254 |
\nabla_v {\cal J} \cdot M_{\Lambda} |
255 |
\cdot ...... \cdot M_{\lambda} \cdot ...... \cdot |
256 |
M_{1} \cdot M_{0} \cdot \delta \vec{u} \\ |
257 |
~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\ |
258 |
\end{split} |
259 |
\label{forward} |
260 |
\end{equation} |
261 |
% |
262 |
whereas in reverse mode we have |
263 |
% |
264 |
\begin{equation} |
265 |
\boxed{ |
266 |
\begin{split} |
267 |
M^T ( \nabla_v {\cal J}^T) & = \, |
268 |
M_{0}^T \cdot M_{1}^T |
269 |
\cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot |
270 |
M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\ |
271 |
~ & = \, M_{0}^T \cdot M_{1}^T |
272 |
\cdot ...... \cdot |
273 |
\nabla_{v^{(\lambda)}} {\cal J}^T \\ |
274 |
~ & = \, \nabla_u {\cal J}^T |
275 |
\end{split} |
276 |
} |
277 |
\label{reverse} |
278 |
\end{equation} |
279 |
% |
280 |
clearly expressing the reverse nature of the calculation. |
281 |
Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers. |
282 |
If the intermediate steps $\lambda$ in |
283 |
eqn. (\ref{compos}) -- (\ref{reverse}) |
284 |
represent the model state (forward or adjoint) at each |
285 |
intermediate time step as noted above, then correspondingly, |
286 |
$ M^T (\delta \vec{v}^{(\lambda) \, \ast}) = |
287 |
\delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables. |
288 |
It thus becomes evident that the adjoint calculation also |
289 |
yields the adjoint of each model state component |
290 |
$ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely |
291 |
% |
292 |
\begin{equation} |
293 |
\boxed{ |
294 |
\begin{split} |
295 |
\nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}} |
296 |
& = \, |
297 |
M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot |
298 |
M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\ |
299 |
~ & = \, \delta \vec{v}^{(\lambda) \, \ast} |
300 |
\end{split} |
301 |
} |
302 |
\end{equation} |
303 |
% |
304 |
in close analogy to eq. (\ref{adjoint}) |
305 |
We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$ |
306 |
are the Lagrange multipliers of the model equations which determine |
307 |
$ \vec{v}^{(\lambda)}$. |
308 |
|
309 |
In components, eq. (\ref{adjoint}) reads as follows. |
310 |
Let |
311 |
\[ |
312 |
\begin{array}{rclcrcl} |
313 |
\delta \vec{u} & = & |
314 |
\left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad & |
315 |
\delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = & |
316 |
\left( |
317 |
\frac{\partial {\cal J}}{\partial u_1},\ldots, |
318 |
\frac{\partial {\cal J}}{\partial u_m} |
319 |
\right)^T \\ |
320 |
\delta \vec{v} & = & |
321 |
\left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad & |
322 |
\delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = & |
323 |
\left( |
324 |
\frac{\partial {\cal J}}{\partial v_1},\ldots, |
325 |
\frac{\partial {\cal J}}{\partial v_n} |
326 |
\right)^T \\ |
327 |
\end{array} |
328 |
\] |
329 |
denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively, |
330 |
and their adjoint variables; |
331 |
further |
332 |
\[ |
333 |
M \, = \, \left( |
334 |
\begin{array}{ccc} |
335 |
\frac{\partial {\cal M}_1}{\partial u_1} & \ldots & |
336 |
\frac{\partial {\cal M}_1}{\partial u_m} \\ |
337 |
\vdots & ~ & \vdots \\ |
338 |
\frac{\partial {\cal M}_n}{\partial u_1} & \ldots & |
339 |
\frac{\partial {\cal M}_n}{\partial u_m} \\ |
340 |
\end{array} |
341 |
\right) |
342 |
\] |
343 |
is the Jacobi matrix of $ {\cal M} $ |
344 |
(an $ n \times m $ matrix) |
345 |
such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or |
346 |
\[ |
347 |
\delta v_{j} |
348 |
\, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i} |
349 |
\, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}} |
350 |
\delta u_{i} |
351 |
\] |
352 |
% |
353 |
Then eq. (\ref{adjoint}) takes the form |
354 |
\[ |
355 |
\delta u_{i}^{\ast} |
356 |
\, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast} |
357 |
\, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}} |
358 |
\delta v_{j}^{\ast} |
359 |
\] |
360 |
% |
361 |
or |
362 |
% |
363 |
\[ |
364 |
\left( |
365 |
\begin{array}{c} |
366 |
\left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\ |
367 |
\vdots \\ |
368 |
\left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\ |
369 |
\end{array} |
370 |
\right) |
371 |
\, = \, |
372 |
\left( |
373 |
\begin{array}{ccc} |
374 |
\left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}} |
375 |
& \ldots & |
376 |
\left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\ |
377 |
\vdots & ~ & \vdots \\ |
378 |
\left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}} |
379 |
& \ldots & |
380 |
\left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\ |
381 |
\end{array} |
382 |
\right) |
383 |
\cdot |
384 |
\left( |
385 |
\begin{array}{c} |
386 |
\left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\ |
387 |
\vdots \\ |
388 |
\left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\ |
389 |
\end{array} |
390 |
\right) |
391 |
\] |
392 |
% |
393 |
Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $ |
394 |
of any intermediate state $ v^{(\lambda)} $ |
395 |
may be obtained, using the intermediate Jacobian |
396 |
(an $ n_{\lambda+1} \times n_{\lambda} $ matrix) |
397 |
% |
398 |
\[ |
399 |
M_{\lambda} \, = \, |
400 |
\left( |
401 |
\begin{array}{ccc} |
402 |
\frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1} |
403 |
& \ldots & |
404 |
\frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\ |
405 |
\vdots & ~ & \vdots \\ |
406 |
\frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} |
407 |
& \ldots & |
408 |
\frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\ |
409 |
\end{array} |
410 |
\right) |
411 |
\] |
412 |
% |
413 |
and the shorthand notation for the adjoint variables |
414 |
$ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}} |
415 |
{\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $, |
416 |
for intermediate components, yielding |
417 |
\begin{equation} |
418 |
\small |
419 |
\begin{split} |
420 |
\left( |
421 |
\begin{array}{c} |
422 |
\delta v^{(\lambda) \, \ast}_1 \\ |
423 |
\vdots \\ |
424 |
\delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\ |
425 |
\end{array} |
426 |
\right) |
427 |
\, = & |
428 |
\left( |
429 |
\begin{array}{ccc} |
430 |
\frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1} |
431 |
& \ldots \,\, \ldots & |
432 |
\frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\ |
433 |
\vdots & ~ & \vdots \\ |
434 |
\frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} |
435 |
& \ldots \,\, \ldots & |
436 |
\frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\ |
437 |
\end{array} |
438 |
\right) |
439 |
\cdot |
440 |
% |
441 |
\\ ~ & ~ |
442 |
\\ ~ & |
443 |
% |
444 |
\left( |
445 |
\begin{array}{ccc} |
446 |
\frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1} |
447 |
& \ldots & |
448 |
\frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\ |
449 |
\vdots & ~ & \vdots \\ |
450 |
\vdots & ~ & \vdots \\ |
451 |
\frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} |
452 |
& \ldots & |
453 |
\frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\ |
454 |
\end{array} |
455 |
\right) |
456 |
\cdot \, \ldots \, \cdot |
457 |
\left( |
458 |
\begin{array}{c} |
459 |
\delta v^{\ast}_1 \\ |
460 |
\vdots \\ |
461 |
\delta v^{\ast}_{n} \\ |
462 |
\end{array} |
463 |
\right) |
464 |
\end{split} |
465 |
\end{equation} |
466 |
|
467 |
Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in |
468 |
showing the advantage of the reverse over the forward mode |
469 |
if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the |
470 |
cost function $ {\cal J} $ with respect to {\it all} input |
471 |
variables $u$ |
472 |
(or the sensitivity of the cost function with respect to |
473 |
{\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought. |
474 |
In order to be able to solve for each component of the gradient |
475 |
$ \partial {\cal J} / \partial u_{i} $ in (\ref{forward}) |
476 |
a forward calculation has to be performed for each component separately, |
477 |
i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $ |
478 |
for the $i$-th forward calculation. |
479 |
Then, (\ref{forward}) represents the |
480 |
projection of $ \nabla_u {\cal J} $ onto the $i$-th component. |
481 |
The full gradient is retrieved from the $ m $ forward calculations. |
482 |
In contrast, eq. (\ref{reverse}) yields the full |
483 |
gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients |
484 |
$\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation. |
485 |
|
486 |
Note, that if $ {\cal J} $ is a vector-valued function |
487 |
of dimension $ l > 1 $, |
488 |
eq. (\ref{reverse}) has to be modified according to |
489 |
\[ |
490 |
M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right) |
491 |
\, = \, |
492 |
\nabla_u {\cal J}^T \cdot \delta \vec{J} |
493 |
\] |
494 |
where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of |
495 |
dimension $ l $. |
496 |
In this case $ l $ reverse simulations have to be performed |
497 |
for each $ \delta J_{k}, \,\, k = 1, \ldots, l $. |
498 |
Then, the reverse mode is more efficient as long as |
499 |
$ l < n $, otherwise the forward mode is preferable. |
500 |
Strictly, the reverse mode is called adjoint mode only for |
501 |
$ l = 1 $. |
502 |
|
503 |
A detailed analysis of the underlying numerical operations |
504 |
shows that the computation of $\nabla _{u}{\cal J}$ in this way |
505 |
requires about 2 to 5 times the computation of the cost function. |
506 |
Alternatively, the gradient vector could be approximated |
507 |
by finite differences, requiring $m$ computations |
508 |
of the perturbed cost function. |
509 |
|
510 |
To conclude we give two examples of commonly used types |
511 |
of cost functions: |
512 |
|
513 |
\paragraph{Example 1: |
514 |
$ {\cal J} = v_{j} (T) $} ~ \\ |
515 |
The cost function consists of the $j$-th component of the model state |
516 |
$ \vec{v} $ at time $T$. |
517 |
Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th |
518 |
unit vector. The $ \nabla_u {\cal J}^T $ |
519 |
is the projection of the adjoint |
520 |
operator onto the $j$-th component ${\bf f_{j}}$, |
521 |
\[ |
522 |
\nabla_u {\cal J}^T |
523 |
\, = \, M^T \cdot \nabla_v {\cal J}^T |
524 |
\, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}} |
525 |
\] |
526 |
|
527 |
\paragraph{Example 2: |
528 |
$ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, , |
529 |
\, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\ |
530 |
The cost function represents the quadratic model vs. data misfit. |
531 |
Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the |
532 |
operator which maps the model state space onto the data space. |
533 |
Then, $ \nabla_v {\cal J} $ takes the form |
534 |
% |
535 |
\begin{equation*} |
536 |
\begin{split} |
537 |
\nabla_v {\cal J}^T & = \, 2 \, \, H \cdot |
538 |
\left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\ |
539 |
~ & = \, 2 \sum_{j} \left\{ \sum_k |
540 |
\frac{\partial {\cal H}_k}{\partial v_{j}} |
541 |
\left( {\cal H}_k (\vec{v}) - d_k \right) |
542 |
\right\} \, {\vec{f}_{j}} \\ |
543 |
\end{split} |
544 |
\end{equation*} |
545 |
% |
546 |
where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the |
547 |
Jacobi matrix of the data projection operator. |
548 |
Thus, the gradient $ \nabla_u {\cal J} $ is given by the |
549 |
adjoint operator, |
550 |
driven by the model vs. data misfit: |
551 |
\[ |
552 |
\nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot |
553 |
H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right) |
554 |
\] |
555 |
|
556 |
\subsection{Storing vs. recomputation in reverse mode} |
557 |
\label{checkpointing} |
558 |
|
559 |
We note an important aspect of the forward vs. reverse |
560 |
mode calculation. |
561 |
Because of the local character of the derivative |
562 |
(a derivative is defined w.r.t. a point along the trajectory), |
563 |
the intermediate results of the model trajectory |
564 |
$\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$ |
565 |
may be required to evaluate the intermediate Jacobian |
566 |
$M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $. |
567 |
This is the case e.g. for nonlinear expressions |
568 |
(momentum advection, nonlinear equation of state), state-dependent |
569 |
conditional statements (parameterization schemes). |
570 |
In the forward mode, the intermediate results are required |
571 |
in the same order as computed by the full forward model ${\cal M}$, |
572 |
but in the reverse mode they are required in the reverse order. |
573 |
Thus, in the reverse mode the trajectory of the forward model |
574 |
integration ${\cal M}$ has to be stored to be available in the reverse |
575 |
calculation. Alternatively, the complete model state up to the |
576 |
point of evaluation has to be recomputed whenever its value is required. |
577 |
|
578 |
A method to balance the amount of recomputations vs. |
579 |
storage requirements is called {\sf checkpointing} |
580 |
(e.g. \cite{gri:92}, \cite{res-eta:98}). |
581 |
It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing |
582 |
[as an example, we give explicit numbers for a 3-day |
583 |
integration with a 1-hourly timestep in square brackets]. |
584 |
\begin{itemize} |
585 |
% |
586 |
\item [$lev3$] |
587 |
In a first step, the model trajectory is subdivided into |
588 |
$ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals], |
589 |
with the label $lev3$ for this outermost loop. |
590 |
The model is then integrated along the full trajectory, |
591 |
and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep |
592 |
[i.e. 3 times, at |
593 |
$ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $]. |
594 |
In addition, the cost function is computed, if needed. |
595 |
% |
596 |
\item [$lev2$] |
597 |
In a second step each subsection itself is divided into |
598 |
$ {n}^{lev2} $ subsections |
599 |
[$ {n}^{lev2} $=4 6-hour intervals per subsection]. |
600 |
The model picks up at the last outermost dumped state |
601 |
$ v_{k_{n}^{lev3}} $ and is integrated forward in time along |
602 |
the last subsection, with the label $lev2$ for this |
603 |
intermediate loop. |
604 |
The model state is now stored to disk at every $ k_{i}^{lev2} $-th |
605 |
timestep |
606 |
[i.e. 4 times, at |
607 |
$ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $]. |
608 |
% |
609 |
\item [$lev1$] |
610 |
Finally, the model picks up at the last intermediate dump state |
611 |
$ v_{k_{n}^{lev2}} $ and is integrated forward in time along |
612 |
the last subsection, with the label $lev1$ for this |
613 |
intermediate loop. |
614 |
Within this sub-subsection only, parts of the model state is stored |
615 |
to memory at every timestep |
616 |
[i.e. every hour $ i=0,...,5$ corresponding to |
617 |
$ k_{i}^{lev1} = 66, 67, \ldots, 71 $]. |
618 |
The final state $ v_n = v_{k_{n}^{lev1}} $ is reached |
619 |
and the model state of all preceding timesteps along the last |
620 |
innermost subsection are available, enabling integration backwards |
621 |
in time along the last subsection. |
622 |
The adjoint can thus be computed along this last |
623 |
subsection $k_{n}^{lev2}$. |
624 |
% |
625 |
\end{itemize} |
626 |
% |
627 |
This procedure is repeated consecutively for each previous |
628 |
subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $ |
629 |
carrying the adjoint computation to the initial time |
630 |
of the subsection $k_{n}^{lev3}$. |
631 |
Then, the procedure is repeated for the previous subsection |
632 |
$k_{n-1}^{lev3}$ |
633 |
carrying the adjoint computation to the initial time |
634 |
$k_{1}^{lev3}$. |
635 |
|
636 |
For the full model trajectory of |
637 |
$ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps |
638 |
the required storing of the model state was significantly reduced to |
639 |
$ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory |
640 |
[i.e. for the 3-day integration with a total oof 72 timesteps |
641 |
the model state was stored 7 times to disk and roughly 6 times |
642 |
to memory]. |
643 |
This saving in memory comes at a cost of a required |
644 |
3 full forward integrations of the model (one for each |
645 |
checkpointing level). |
646 |
The optimal balance of storage vs. recomputation certainly depends |
647 |
on the computing resources available and may be adjusted by |
648 |
adjusting the partitioning among the |
649 |
$ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $. |
650 |
|
651 |
\begin{figure}[t!] |
652 |
\begin{center} |
653 |
%\psdraft |
654 |
%\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}} |
655 |
%\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}} |
656 |
%\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}} |
657 |
%\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}} |
658 |
%\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}} |
659 |
%\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}} |
660 |
%\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}} |
661 |
%\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}} |
662 |
%\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}} |
663 |
\resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}} |
664 |
%\psfull |
665 |
\end{center} |
666 |
\caption{ |
667 |
Schematic view of intermediate dump and restart for |
668 |
3-level checkpointing.} |
669 |
\label{fig:3levelcheck} |
670 |
\end{figure} |
671 |
|
672 |
% \subsection{Optimal perturbations} |
673 |
% \label{sec_optpert} |
674 |
|
675 |
|
676 |
% \subsection{Error covariance estimate and Hessian matrix} |
677 |
% \label{sec_hessian} |
678 |
|
679 |
\newpage |
680 |
|
681 |
%********************************************************************** |
682 |
\section{TLM and ADM generation in general} |
683 |
\label{sec_ad_setup_gen} |
684 |
\begin{rawhtml} |
685 |
<!-- CMIREDIR:sec_ad_setup_gen: --> |
686 |
\end{rawhtml} |
687 |
%********************************************************************** |
688 |
|
689 |
In this section we describe in a general fashion |
690 |
the parts of the code that are relevant for automatic |
691 |
differentiation using the software tool TAF. |
692 |
|
693 |
\input{part5/doc_ad_the_model} |
694 |
|
695 |
The basic flow is depicted in \ref{fig:adthemodel}. |
696 |
If CPP option {\tt ALLOW\_AUTODIFF\_TAMC} is defined, the driver routine |
697 |
{\it the\_model\_main}, instead of calling {\it the\_main\_loop}, |
698 |
invokes the adjoint of this routine, {\it adthe\_main\_loop}, |
699 |
which is the toplevel routine in terms of automatic differentiation. |
700 |
The routine {\it adthe\_main\_loop} has been generated by TAF. |
701 |
It contains both the forward integration of the full model, the |
702 |
cost function calculation, |
703 |
any additional storing that is required for efficient checkpointing, |
704 |
and the reverse integration of the adjoint model. |
705 |
|
706 |
[DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM] |
707 |
|
708 |
In Fig. \ref{fig:adthemodel} |
709 |
the structure of {\it adthe\_main\_loop} has been strongly |
710 |
simplified to focus on the essentials; in particular, no checkpointing |
711 |
procedures are shown here. |
712 |
Prior to the call of {\it adthe\_main\_loop}, the routine |
713 |
{\it ctrl\_unpack} is invoked to unpack the control vector |
714 |
or initialise the control variables. |
715 |
Following the call of {\it adthe\_main\_loop}, |
716 |
the routine {\it ctrl\_pack} |
717 |
is invoked to pack the control vector |
718 |
(cf. Section \ref{section_ctrl}). |
719 |
If gradient checks are to be performed, the option |
720 |
{\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case |
721 |
the driver routine {\it grdchk\_main} is called after |
722 |
the gradient has been computed via the adjoint |
723 |
(cf. Section \ref{section_grdchk}). |
724 |
|
725 |
%------------------------------------------------------------------ |
726 |
|
727 |
\subsection{General setup |
728 |
\label{section_ad_setup}} |
729 |
|
730 |
In order to configure AD-related setups the following packages need |
731 |
to be enabled: |
732 |
{\it |
733 |
\begin{table}[h!] |
734 |
\begin{tabular}{l} |
735 |
autodiff \\ |
736 |
ctrl \\ |
737 |
cost \\ |
738 |
grdchk \\ |
739 |
\end{tabular} |
740 |
\end{table} |
741 |
} |
742 |
The packages are enabled by adding them to your experiment-specific |
743 |
configuration file |
744 |
{\it packages.conf} (see Section ???). |
745 |
|
746 |
The following AD-specific CPP option files need to be customized: |
747 |
% |
748 |
\begin{itemize} |
749 |
% |
750 |
\item {\it ECCO\_CPPOPTIONS.h} \\ |
751 |
This header file collects CPP options for the packages |
752 |
{\it autodiff, cost, ctrl} as well as AD-unrelated options for |
753 |
the external forcing package {\it exf}. |
754 |
\footnote{NOTE: These options are not set in their package-specific |
755 |
headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected |
756 |
in the single header file {\it ECCO\_CPPOPTIONS.h}. |
757 |
The package-specific header files serve as simple |
758 |
placeholders at this point.} |
759 |
% |
760 |
\item {\it tamc.h} \\ |
761 |
This header configures the splitting of the time stepping loop |
762 |
w.r.t. the 3-level checkpointing (see section ???). |
763 |
|
764 |
% |
765 |
\end{itemize} |
766 |
|
767 |
%------------------------------------------------------------------ |
768 |
|
769 |
\subsection{Building the AD code |
770 |
\label{section_ad_build}} |
771 |
|
772 |
The build process of an AD code is very similar to building |
773 |
the forward model. However, depending on which AD code one wishes |
774 |
to generate, and on which AD tool is available (TAF or TAMC), |
775 |
the following {\tt make} targets are available: |
776 |
|
777 |
\begin{table}[h!] |
778 |
{\footnotesize |
779 |
\begin{tabular}{ccll} |
780 |
~ & {\it AD-target} & {\it output} & {\it description} \\ |
781 |
\hline |
782 |
\hline |
783 |
(1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} & |
784 |
generates code for $<$MODE$>$ using $<$TOOL$>$ \\ |
785 |
~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\ |
786 |
~ & ~ & ~ & useful for compiling on remote platforms \\ |
787 |
\hline |
788 |
(2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} & |
789 |
generates code for $<$MODE$>$ using $<$TOOL$>$ \\ |
790 |
~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\ |
791 |
~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\ |
792 |
\hline |
793 |
(3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} & |
794 |
generates code for $<$MODE$>$ using $<$TOOL$>$ \\ |
795 |
~ & ~ & ~ & and compiles all code \\ |
796 |
~ & ~ & ~ & (use of TAF is set as default) \\ |
797 |
\hline |
798 |
\hline |
799 |
\end{tabular} |
800 |
} |
801 |
\end{table} |
802 |
% |
803 |
Here, the following placeholders are used |
804 |
% |
805 |
\begin{itemize} |
806 |
% |
807 |
\item [$<$TOOL$>$] |
808 |
% |
809 |
\begin{itemize} |
810 |
% |
811 |
\item {\tt TAF} |
812 |
\item {\tt TAMC} |
813 |
% |
814 |
\end{itemize} |
815 |
% |
816 |
\item [$<$MODE$>$] |
817 |
% |
818 |
\begin{itemize} |
819 |
% |
820 |
\item {\tt ad} generates the adjoint model (ADM) |
821 |
\item {\tt ftl} generates the tangent linear model (TLM) |
822 |
\item {\tt svd} generates both ADM and TLM for \\ |
823 |
singular value decomposition (SVD) type calculations |
824 |
% |
825 |
\end{itemize} |
826 |
% |
827 |
\end{itemize} |
828 |
|
829 |
For example, to generate the adjoint model using TAF after routines ({\tt .F}) |
830 |
or headers ({\tt .h}) have been modified, but without compilation, |
831 |
type {\tt make adtaf}; |
832 |
or, to generate the tangent linear model using TAMC without |
833 |
re-generating the input code, type {\tt make ftltamconly}. |
834 |
|
835 |
|
836 |
A typical full build process to generate the ADM via TAF would |
837 |
look like follows: |
838 |
\begin{verbatim} |
839 |
% mkdir build |
840 |
% cd build |
841 |
% ../../../tools/genmake2 -mods=../code_ad |
842 |
% make depend |
843 |
% make adall |
844 |
\end{verbatim} |
845 |
|
846 |
%------------------------------------------------------------------ |
847 |
|
848 |
\subsection{The AD build process in detail |
849 |
\label{section_ad_build_detail}} |
850 |
|
851 |
The {\tt make <MODE>all} target consists of the following procedures: |
852 |
|
853 |
\begin{enumerate} |
854 |
% |
855 |
\item |
856 |
A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option |
857 |
on which code ought to be generated. Depending on the {\tt make} target, |
858 |
the contents is |
859 |
\begin{itemize} |
860 |
\item |
861 |
{\tt \#define ALLOW\_ADJOINT\_RUN} |
862 |
\item |
863 |
{\tt \#define ALLOW\_TANGENTLINEAR\_RUN} |
864 |
\item |
865 |
{\tt \#define ALLOW\_ECCO\_OPTIMIZATION} |
866 |
\end{itemize} |
867 |
% |
868 |
\item |
869 |
A single file {\tt <MODE>\_input\_code.f} is concatenated |
870 |
consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES} |
871 |
and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}. |
872 |
% |
873 |
\item |
874 |
The AD tool is invoked with the {\bf <MODE>\_<TOOL>\_FLAGS}. |
875 |
The default AD tool flags in {\tt genmake2} can be overrwritten by |
876 |
an {\tt adjoint\_options} file (similar to the platform-specific |
877 |
{\tt build\_options}, see Section ???. |
878 |
The AD tool writes the resulting AD code into the file |
879 |
{\tt <MODE>\_input\_code\_ad.f} |
880 |
% |
881 |
\item |
882 |
A short sed script {\tt adjoint\_sed} is applied to |
883 |
{\tt <MODE>\_input\_code\_ad.f} |
884 |
to reinstate {\bf myThid} into the CALL argument list of active file I/O. |
885 |
The result is written to file {\tt <MODE>\_<TOOL>\_output.f}. |
886 |
% |
887 |
\item |
888 |
All routines are compiled and an executable is generated |
889 |
(see Table ???). |
890 |
% |
891 |
\end{enumerate} |
892 |
|
893 |
\subsubsection{The list AD\_FILES and {\tt .list} files} |
894 |
|
895 |
Not all routines are presented to the AD tool. |
896 |
Routines typically hidden are diagnostics routines which |
897 |
do not influence the cost function, but may create |
898 |
artificial flow dependencies such as I/O of active variables. |
899 |
|
900 |
{\tt genmake2} generates a list (or variable) {\bf AD\_FILES} |
901 |
which contains all routines that are shown to the AD tool. |
902 |
This list is put together from all files with suffix {\tt .list} |
903 |
that {\tt genmake2} finds in its search directories. |
904 |
The list file for the core MITgcm routines is in {\tt model/src/} |
905 |
is called {\tt model\_ad\_diff.list}. |
906 |
Note that no wrapper routine is shown to TAF. These are either |
907 |
not visible at all to the AD code, or hand-written AD code |
908 |
is available (see next section). |
909 |
|
910 |
Each package directory contains its package-specific |
911 |
list file {\tt <PKG>\_ad\_diff.list}. For example, |
912 |
{\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}. |
913 |
Thus, enabling a package will automatically extend the |
914 |
{\bf AD\_FILES} list of {\tt genmake2} to incorporate the |
915 |
package-specific routines. |
916 |
Note that you will need to regenerate the {\tt Makefile} if |
917 |
you enable a package (e.g. by adding it to {\tt packages.conf}) |
918 |
and a {\tt Makefile} already exists. |
919 |
|
920 |
\subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files} |
921 |
|
922 |
TAMC and TAF can evaluate user-specified directives |
923 |
that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}). |
924 |
The main categories of directives are STORE directives and |
925 |
FLOW directives. Here, we are concerned with flow directives, |
926 |
store directives are treated elsewhere. |
927 |
|
928 |
Flow directives enable the AD tool to evaluate how it should treat |
929 |
routines that are 'hidden' by the user, i.e. routines which are |
930 |
not contained in the {\bf AD\_FILES} list (see previous section), |
931 |
but which are called in part of the code that the AD tool does see. |
932 |
The flow directive tell the AD tool |
933 |
% |
934 |
\begin{itemize} |
935 |
% |
936 |
\item which subroutine arguments are input/output |
937 |
\item which subroutine arguments are active |
938 |
\item which subroutine arguments are required to compute the cost |
939 |
\item which subroutine arguments are dependent |
940 |
% |
941 |
\end{itemize} |
942 |
% |
943 |
The syntax for the flow directives can be found in the |
944 |
AD tool manuals. |
945 |
|
946 |
{\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES} |
947 |
which contains all files with suffix{\tt .flow} that it finds |
948 |
in its search directories. |
949 |
The flow directives for the core MITgcm routines of |
950 |
{\tt eesupp/src/} and {\tt model/src/} |
951 |
reside in {\tt pkg/autodiff/}. |
952 |
This directory also contains hand-written adjoint code |
953 |
for the MITgcm WRAPPER (see Section ???). |
954 |
|
955 |
Flow directives for package-specific routines are contained in |
956 |
the corresponding package directories in the file |
957 |
{\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in |
958 |
{\tt ptracers\_ad.flow}. |
959 |
|
960 |
\subsubsection{Store directives for 3-level checkpointing} |
961 |
|
962 |
The storing that is required at each period of the |
963 |
3-level checkpointing is controled by three |
964 |
top-level headers. |
965 |
|
966 |
\begin{verbatim} |
967 |
do ilev_3 = 1, nchklev_3 |
968 |
# include ``checkpoint_lev3.h'' |
969 |
do ilev_2 = 1, nchklev_2 |
970 |
# include ``checkpoint_lev2.h'' |
971 |
do ilev_1 = 1, nchklev_1 |
972 |
# include ``checkpoint_lev1.h'' |
973 |
|
974 |
... |
975 |
|
976 |
end do |
977 |
end do |
978 |
end do |
979 |
\end{verbatim} |
980 |
|
981 |
All files {\tt checkpoint\_lev?.h} are contained in directory |
982 |
{\tt pkg/autodiff/}. |
983 |
|
984 |
|
985 |
\subsubsection{Changing the default AD tool flags: ad\_options files} |
986 |
|
987 |
|
988 |
\subsubsection{Hand-written adjoint code} |
989 |
|
990 |
%------------------------------------------------------------------ |
991 |
|
992 |
\subsection{The cost function (dependent variable) |
993 |
\label{section_cost}} |
994 |
|
995 |
The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}. |
996 |
It is a function of the input variables $ \vec{u} $ via the composition |
997 |
$ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $. |
998 |
The input are referred to as the |
999 |
{\sf independent variables} or {\sf control variables}. |
1000 |
All aspects relevant to the treatment of the cost function $ {\cal J} $ |
1001 |
(parameter setting, initialization, accumulation, |
1002 |
final evaluation), are controlled by the package {\it pkg/cost}. |
1003 |
The aspects relevant to the treatment of the independent variables |
1004 |
are controlled by the package {\it pkg/ctrl} and will be treated |
1005 |
in the next section. |
1006 |
|
1007 |
\input{part5/doc_cost_flow} |
1008 |
|
1009 |
\subsubsection{Enabling the package} |
1010 |
|
1011 |
\fbox{ |
1012 |
\begin{minipage}{12cm} |
1013 |
{\it packages.conf}, {\it ECCO\_CPPOPTIONS.h} |
1014 |
\end{minipage} |
1015 |
} |
1016 |
\begin{itemize} |
1017 |
% |
1018 |
\item |
1019 |
The package is enabled by adding {\it cost} to your file {\it packages.conf} |
1020 |
(see Section ???) |
1021 |
% |
1022 |
\item |
1023 |
|
1024 |
|
1025 |
\end{itemize} |
1026 |
% |
1027 |
|
1028 |
N.B.: In general the following packages ought to be enabled |
1029 |
simultaneously: {\it autodiff, cost, ctrl}. |
1030 |
The basic CPP option to enable the cost function is {\bf ALLOW\_COST}. |
1031 |
Each specific cost function contribution has its own option. |
1032 |
For the present example the option is {\bf ALLOW\_COST\_TRACER}. |
1033 |
All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h} |
1034 |
Since the cost function is usually used in conjunction with |
1035 |
automatic differentiation, the CPP option |
1036 |
{\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and |
1037 |
{\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h}) |
1038 |
should be defined. |
1039 |
|
1040 |
\subsubsection{Initialization} |
1041 |
% |
1042 |
The initialization of the {\it cost} package is readily enabled |
1043 |
as soon as the CPP option {\bf ALLOW\_COST} is defined. |
1044 |
% |
1045 |
\begin{itemize} |
1046 |
% |
1047 |
\item |
1048 |
\fbox{ |
1049 |
\begin{minipage}{12cm} |
1050 |
Parameters: {\it cost\_readparms} |
1051 |
\end{minipage} |
1052 |
} |
1053 |
\\ |
1054 |
This S/R |
1055 |
reads runtime flags and parameters from file {\it data.cost}. |
1056 |
For the present example the only relevant parameter read |
1057 |
is {\bf mult\_tracer}. This multiplier enables different |
1058 |
cost function contributions to be switched on |
1059 |
( = 1.) or off ( = 0.) at runtime. |
1060 |
For more complex cost functions which involve model vs. data |
1061 |
misfits, the corresponding data filenames and data |
1062 |
specifications (start date and time, period, ...) are read |
1063 |
in this S/R. |
1064 |
% |
1065 |
\item |
1066 |
\fbox{ |
1067 |
\begin{minipage}{12cm} |
1068 |
Variables: {\it cost\_init} |
1069 |
\end{minipage} |
1070 |
} |
1071 |
\\ |
1072 |
This S/R |
1073 |
initializes the different cost function contributions. |
1074 |
The contribution for the present example is {\bf objf\_tracer} |
1075 |
which is defined on each tile (bi,bj). |
1076 |
% |
1077 |
\end{itemize} |
1078 |
% |
1079 |
\subsubsection{Accumulation} |
1080 |
% |
1081 |
\begin{itemize} |
1082 |
% |
1083 |
\item |
1084 |
\fbox{ |
1085 |
\begin{minipage}{12cm} |
1086 |
{\it cost\_tile}, {\it cost\_tracer} |
1087 |
\end{minipage} |
1088 |
} |
1089 |
\end{itemize} |
1090 |
% |
1091 |
The 'driver' routine |
1092 |
{\it cost\_tile} is called at the end of each time step. |
1093 |
Within this 'driver' routine, S/R are called for each of |
1094 |
the chosen cost function contributions. |
1095 |
In the present example ({\bf ALLOW\_COST\_TRACER}), |
1096 |
S/R {\it cost\_tracer} is called. |
1097 |
It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}). |
1098 |
% |
1099 |
\subsubsection{Finalize all contributions} |
1100 |
% |
1101 |
\begin{itemize} |
1102 |
% |
1103 |
\item |
1104 |
\fbox{ |
1105 |
\begin{minipage}{12cm} |
1106 |
{\it cost\_final} |
1107 |
\end{minipage} |
1108 |
} |
1109 |
\end{itemize} |
1110 |
% |
1111 |
At the end of the forward integration S/R {\it cost\_final} |
1112 |
is called. It accumulates the total cost function {\bf fc} |
1113 |
from each contribution and sums over all tiles: |
1114 |
\begin{equation} |
1115 |
{\cal J} \, = \, |
1116 |
{\rm fc} \, = \, |
1117 |
{\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy} |
1118 |
{\rm objf\_tracer}(bi,bj) \, + \, ... |
1119 |
\end{equation} |
1120 |
% |
1121 |
The total cost function {\bf fc} will be the |
1122 |
'dependent' variable in the argument list for TAMC, i.e. |
1123 |
\begin{verbatim} |
1124 |
tamc -output 'fc' ... |
1125 |
\end{verbatim} |
1126 |
|
1127 |
%%%% \end{document} |
1128 |
|
1129 |
\input{part5/doc_ad_the_main} |
1130 |
|
1131 |
\subsection{The control variables (independent variables) |
1132 |
\label{section_ctrl}} |
1133 |
|
1134 |
The control variables are a subset of the model input |
1135 |
(initial conditions, boundary conditions, model parameters). |
1136 |
Here we identify them with the variable $ \vec{u} $. |
1137 |
All intermediate variables whose derivative w.r.t. control |
1138 |
variables do not vanish are called {\sf active variables}. |
1139 |
All subroutines whose derivative w.r.t. the control variables |
1140 |
don't vanish are called {\sf active routines}. |
1141 |
Read and write operations from and to file can be viewed |
1142 |
as variable assignments. Therefore, files to which |
1143 |
active variables are written and from which active variables |
1144 |
are read are called {\sf active files}. |
1145 |
All aspects relevant to the treatment of the control variables |
1146 |
(parameter setting, initialization, perturbation) |
1147 |
are controlled by the package {\it pkg/ctrl}. |
1148 |
|
1149 |
\input{part5/doc_ctrl_flow} |
1150 |
|
1151 |
\subsubsection{genmake and CPP options} |
1152 |
% |
1153 |
\begin{itemize} |
1154 |
% |
1155 |
\item |
1156 |
\fbox{ |
1157 |
\begin{minipage}{12cm} |
1158 |
{\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h} |
1159 |
\end{minipage} |
1160 |
} |
1161 |
\end{itemize} |
1162 |
% |
1163 |
To enable the directory to be included to the compile list, |
1164 |
{\bf ctrl} has to be added to the {\bf enable} list in |
1165 |
{\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost} |
1166 |
package, cf. previous section). |
1167 |
Each control variable is enabled via its own CPP option |
1168 |
in {\it ECCO\_CPPOPTIONS.h}. |
1169 |
|
1170 |
\subsubsection{Initialization} |
1171 |
% |
1172 |
\begin{itemize} |
1173 |
% |
1174 |
\item |
1175 |
\fbox{ |
1176 |
\begin{minipage}{12cm} |
1177 |
Parameters: {\it ctrl\_readparms} |
1178 |
\end{minipage} |
1179 |
} |
1180 |
\\ |
1181 |
% |
1182 |
This S/R |
1183 |
reads runtime flags and parameters from file {\it data.ctrl}. |
1184 |
For the present example the file contains the file names |
1185 |
of each control variable that is used. |
1186 |
In addition, the number of wet points for each control |
1187 |
variable and the net dimension of the space of control |
1188 |
variables (counting wet points only) {\bf nvarlength} |
1189 |
is determined. |
1190 |
Masks for wet points for each tile {\bf (bi,\,bj)} |
1191 |
and vertical layer {\bf k} are generated for the three |
1192 |
relevant categories on the C-grid: |
1193 |
{\bf nWetCtile} for tracer fields, |
1194 |
{\bf nWetWtile} for zonal velocity fields, |
1195 |
{\bf nWetStile} for meridional velocity fields. |
1196 |
% |
1197 |
\item |
1198 |
\fbox{ |
1199 |
\begin{minipage}{12cm} |
1200 |
Control variables, control vector, |
1201 |
and their gradients: {\it ctrl\_unpack} |
1202 |
\end{minipage} |
1203 |
} |
1204 |
\\ |
1205 |
% |
1206 |
Two important issues related to the handling of the control |
1207 |
variables in the MITGCM need to be addressed. |
1208 |
First, in order to save memory, the control variable arrays |
1209 |
are not kept in memory, but rather read from file and added |
1210 |
to the initial fields during the model initialization phase. |
1211 |
Similarly, the corresponding adjoint fields which represent |
1212 |
the gradient of the cost function w.r.t. the control variables |
1213 |
are written to file at the end of the adjoint integration. |
1214 |
Second, in addition to the files holding the 2-dim. and 3-dim. |
1215 |
control variables and the corresponding cost gradients, |
1216 |
a 1-dim. {\sf control vector} |
1217 |
and {\sf gradient vector} are written to file. They contain |
1218 |
only the wet points of the control variables and the corresponding |
1219 |
gradient. |
1220 |
This leads to a significant data compression. |
1221 |
Furthermore, an option is available |
1222 |
({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to |
1223 |
non-dimensionalise the control and gradient vector, |
1224 |
which otherwise would contain different pieces of different |
1225 |
magnitudes and units. |
1226 |
Finally, the control and gradient vector can be passed to a |
1227 |
minimization routine if an update of the control variables |
1228 |
is sought as part of a minimization exercise. |
1229 |
|
1230 |
The files holding fields and vectors of the control variables |
1231 |
and gradient are generated and initialised in S/R {\it ctrl\_unpack}. |
1232 |
% |
1233 |
\end{itemize} |
1234 |
|
1235 |
\subsubsection{Perturbation of the independent variables} |
1236 |
% |
1237 |
The dependency flow for differentiation w.r.t. the controls |
1238 |
starts with adding a perturbation onto the input variable, |
1239 |
thus defining the independent or control variables for TAMC. |
1240 |
Three types of controls may be considered: |
1241 |
% |
1242 |
\begin{itemize} |
1243 |
% |
1244 |
\item |
1245 |
\fbox{ |
1246 |
\begin{minipage}{12cm} |
1247 |
{\it ctrl\_map\_ini} (initial value sensitivity): |
1248 |
\end{minipage} |
1249 |
} |
1250 |
\\ |
1251 |
% |
1252 |
Consider as an example the initial tracer distribution |
1253 |
{\bf tr1} as control variable. |
1254 |
After {\bf tr1} has been initialised in |
1255 |
{\it ini\_tr1} (dynamical variables such as |
1256 |
temperature and salinity are initialised in {\it ini\_fields}), |
1257 |
a perturbation anomaly is added to the field in S/R |
1258 |
{\it ctrl\_map\_ini} |
1259 |
% |
1260 |
\begin{equation} |
1261 |
\begin{split} |
1262 |
u & = \, u_{[0]} \, + \, \Delta u \\ |
1263 |
{\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...) |
1264 |
\label{perturb} |
1265 |
\end{split} |
1266 |
\end{equation} |
1267 |
% |
1268 |
{\bf xx\_tr1} is a 3-dim. global array |
1269 |
holding the perturbation. In the case of a simple |
1270 |
sensitivity study this array is identical to zero. |
1271 |
However, it's specification is essential in the context |
1272 |
of automatic differentiation since TAMC |
1273 |
treats the corresponding line in the code symbolically |
1274 |
when determining the differentiation chain and its origin. |
1275 |
Thus, the variable names are part of the argument list |
1276 |
when calling TAMC: |
1277 |
% |
1278 |
\begin{verbatim} |
1279 |
tamc -input 'xx_tr1 ...' ... |
1280 |
\end{verbatim} |
1281 |
% |
1282 |
Now, as mentioned above, the MITGCM avoids maintaining |
1283 |
an array for each control variable by reading the |
1284 |
perturbation to a temporary array from file. |
1285 |
To ensure the symbolic link to be recognized by TAMC, a scalar |
1286 |
dummy variable {\bf xx\_tr1\_dummy} is introduced |
1287 |
and an 'active read' routine of the adjoint support |
1288 |
package {\it pkg/autodiff} is invoked. |
1289 |
The read-procedure is tagged with the variable |
1290 |
{\bf xx\_tr1\_dummy} enabling TAMC to recognize the |
1291 |
initialization of the perturbation. |
1292 |
The modified call of TAMC thus reads |
1293 |
% |
1294 |
\begin{verbatim} |
1295 |
tamc -input 'xx_tr1_dummy ...' ... |
1296 |
\end{verbatim} |
1297 |
% |
1298 |
and the modified operation to (\ref{perturb}) |
1299 |
in the code takes on the form |
1300 |
% |
1301 |
\begin{verbatim} |
1302 |
call active_read_xyz( |
1303 |
& ..., tmpfld3d, ..., xx_tr1_dummy, ... ) |
1304 |
|
1305 |
tr1(...) = tr1(...) + tmpfld3d(...) |
1306 |
\end{verbatim} |
1307 |
% |
1308 |
Note, that reading an active variable corresponds |
1309 |
to a variable assignment. Its derivative corresponds |
1310 |
to a write statement of the adjoint variable, followed by |
1311 |
a reset. |
1312 |
The 'active file' routines have been designed |
1313 |
to support active read and corresponding adjoint active write |
1314 |
operations (and vice versa). |
1315 |
% |
1316 |
\item |
1317 |
\fbox{ |
1318 |
\begin{minipage}{12cm} |
1319 |
{\it ctrl\_map\_forcing} (boundary value sensitivity): |
1320 |
\end{minipage} |
1321 |
} |
1322 |
\\ |
1323 |
% |
1324 |
The handling of boundary values as control variables |
1325 |
proceeds exactly analogous to the initial values |
1326 |
with the symbolic perturbation taking place in S/R |
1327 |
{\it ctrl\_map\_forcing}. |
1328 |
Note however an important difference: |
1329 |
Since the boundary values are time dependent with a new |
1330 |
forcing field applied at each time steps, |
1331 |
the general problem may be thought of as |
1332 |
a new control variable at each time step |
1333 |
(or, if the perturbation is averaged over a certain period, |
1334 |
at each $ N $ timesteps), i.e. |
1335 |
\[ |
1336 |
u_{\rm forcing} \, = \, |
1337 |
\{ \, u_{\rm forcing} ( t_n ) \, \}_{ |
1338 |
n \, = \, 1, \ldots , {\rm nTimeSteps} } |
1339 |
\] |
1340 |
% |
1341 |
In the current example an equilibrium state is considered, |
1342 |
and only an initial perturbation to |
1343 |
surface forcing is applied with respect to the |
1344 |
equilibrium state. |
1345 |
A time dependent treatment of the surface forcing is |
1346 |
implemented in the ECCO environment, involving the |
1347 |
calendar ({\it cal}~) and external forcing ({\it exf}~) packages. |
1348 |
% |
1349 |
\item |
1350 |
\fbox{ |
1351 |
\begin{minipage}{12cm} |
1352 |
{\it ctrl\_map\_params} (parameter sensitivity): |
1353 |
\end{minipage} |
1354 |
} |
1355 |
\\ |
1356 |
% |
1357 |
This routine is not yet implemented, but would proceed |
1358 |
proceed along the same lines as the initial value sensitivity. |
1359 |
The mixing parameters {\bf diffkr} and {\bf kapgm} |
1360 |
are currently added as controls in {\it ctrl\_map\_ini.F}. |
1361 |
% |
1362 |
\end{itemize} |
1363 |
% |
1364 |
|
1365 |
\subsubsection{Output of adjoint variables and gradient} |
1366 |
% |
1367 |
Several ways exist to generate output of adjoint fields. |
1368 |
% |
1369 |
\begin{itemize} |
1370 |
% |
1371 |
\item |
1372 |
\fbox{ |
1373 |
\begin{minipage}{12cm} |
1374 |
{\it ctrl\_map\_ini, ctrl\_map\_forcing}: |
1375 |
\end{minipage} |
1376 |
} |
1377 |
\\ |
1378 |
\begin{itemize} |
1379 |
% |
1380 |
\item {\bf xx\_...}: the control variable fields \\ |
1381 |
Before the forward integration, the control |
1382 |
variables are read from file {\bf xx\_ ...} and added to |
1383 |
the model field. |
1384 |
% |
1385 |
\item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient |
1386 |
$ \nabla _{u}{\cal J} $ for each control variable \\ |
1387 |
After the adjoint integration the corresponding adjoint |
1388 |
variables are written to {\bf adxx\_ ...}. |
1389 |
% |
1390 |
\end{itemize} |
1391 |
% |
1392 |
\item |
1393 |
\fbox{ |
1394 |
\begin{minipage}{12cm} |
1395 |
{\it ctrl\_unpack, ctrl\_pack}: |
1396 |
\end{minipage} |
1397 |
} |
1398 |
\\ |
1399 |
% |
1400 |
\begin{itemize} |
1401 |
% |
1402 |
\item {\bf vector\_ctrl}: the control vector \\ |
1403 |
At the very beginning of the model initialization, |
1404 |
the updated compressed control vector is read (or initialised) |
1405 |
and distributed to 2-dim. and 3-dim. control variable fields. |
1406 |
% |
1407 |
\item {\bf vector\_grad}: the gradient vector \\ |
1408 |
At the very end of the adjoint integration, |
1409 |
the 2-dim. and 3-dim. adjoint variables are read, |
1410 |
compressed to a single vector and written to file. |
1411 |
% |
1412 |
\end{itemize} |
1413 |
% |
1414 |
\item |
1415 |
\fbox{ |
1416 |
\begin{minipage}{12cm} |
1417 |
{\it addummy\_in\_stepping}: |
1418 |
\end{minipage} |
1419 |
} |
1420 |
\\ |
1421 |
In addition to writing the gradient at the end of the |
1422 |
forward/adjoint integration, many more adjoint variables |
1423 |
of the model state |
1424 |
at intermediate times can be written using S/R |
1425 |
{\it addummy\_in\_stepping}. |
1426 |
This routine is part of the adjoint support package |
1427 |
{\it pkg/autodiff} (cf.f. below). |
1428 |
The procedure is enabled using via the CPP-option |
1429 |
{\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}). |
1430 |
To be part of the adjoint code, the corresponding S/R |
1431 |
{\it dummy\_in\_stepping} has to be called in the forward |
1432 |
model (S/R {\it the\_main\_loop}) at the appropriate place. |
1433 |
The adjoint common blocks are extracted from the adjoint code |
1434 |
via the header file {\it adcommon.h}. |
1435 |
|
1436 |
{\it dummy\_in\_stepping} is essentially empty, |
1437 |
the corresponding adjoint routine is hand-written rather |
1438 |
than generated automatically. |
1439 |
Appropriate flow directives ({\it dummy\_in\_stepping.flow}) |
1440 |
ensure that TAMC does not automatically |
1441 |
generate {\it addummy\_in\_stepping} by trying to differentiate |
1442 |
{\it dummy\_in\_stepping}, but instead refers to |
1443 |
the hand-written routine. |
1444 |
|
1445 |
{\it dummy\_in\_stepping} is called in the forward code |
1446 |
at the beginning of each |
1447 |
timestep, before the call to {\it dynamics}, thus ensuring |
1448 |
that {\it addummy\_in\_stepping} is called at the end of |
1449 |
each timestep in the adjoint calculation, after the call to |
1450 |
{\it addynamics}. |
1451 |
|
1452 |
{\it addummy\_in\_stepping} includes the header files |
1453 |
{\it adcommon.h}. |
1454 |
This header file is also hand-written. It contains |
1455 |
the common blocks |
1456 |
{\bf /addynvars\_r/}, {\bf /addynvars\_cd/}, |
1457 |
{\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/}, |
1458 |
{\bf /adtr1\_r/}, {\bf /adffields/}, |
1459 |
which have been extracted from the adjoint code to enable |
1460 |
access to the adjoint variables. |
1461 |
|
1462 |
{\bf WARNING:} If the structure of the common blocks |
1463 |
{\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes |
1464 |
similar changes will occur in the adjoint common blocks. |
1465 |
Therefore, consistency between the TAMC-generated common blocks |
1466 |
and those in {\it adcommon.h} have to be checked. |
1467 |
% |
1468 |
\end{itemize} |
1469 |
|
1470 |
|
1471 |
\subsubsection{Control variable handling for |
1472 |
optimization applications} |
1473 |
|
1474 |
In optimization mode the cost function $ {\cal J}(u) $ is sought |
1475 |
to be minimized with respect to a set of control variables |
1476 |
$ \delta {\cal J} \, = \, 0 $, in an iterative manner. |
1477 |
The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together |
1478 |
with the value of the cost function itself $ {\cal J}(u_{[k]}) $ |
1479 |
at iteration step $ k $ serve |
1480 |
as input to a minimization routine (e.g. quasi-Newton method, |
1481 |
conjugate gradient, ... \cite{gil-lem:89}) |
1482 |
to compute an update in the |
1483 |
control variable for iteration step $k+1$ |
1484 |
\[ |
1485 |
u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]} |
1486 |
\quad \mbox{satisfying} \quad |
1487 |
{\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right) |
1488 |
\] |
1489 |
$ u_{[k+1]} $ then serves as input for a forward/adjoint run |
1490 |
to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step |
1491 |
$ k+1 $. |
1492 |
Tab. \ref{???} sketches the flow between forward/adjoint model |
1493 |
and the minimization routine. |
1494 |
|
1495 |
\begin{eqnarray*} |
1496 |
\scriptsize |
1497 |
\begin{array}{ccccc} |
1498 |
u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\ |
1499 |
{\Big\downarrow} |
1500 |
& ~ & ~ & ~ & ~ \\ |
1501 |
~ & ~ & ~ & ~ & ~ \\ |
1502 |
\hline |
1503 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1504 |
\multicolumn{1}{|c}{ |
1505 |
u_{[k]} = u_{[0]} + \Delta u_{[k]}} & |
1506 |
\stackrel{\bf forward}{\bf \longrightarrow} & |
1507 |
v_{[k]} = M \left( u_{[k]} \right) & |
1508 |
\stackrel{\bf forward}{\bf \longrightarrow} & |
1509 |
\multicolumn{1}{c|}{ |
1510 |
{\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\ |
1511 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1512 |
\hline |
1513 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1514 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\ |
1515 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1516 |
\hline |
1517 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1518 |
\multicolumn{1}{|c}{ |
1519 |
\nabla_u {\cal J}_{[k]} (\delta {\cal J}) = |
1520 |
T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} & |
1521 |
\stackrel{\bf adjoint}{\mathbf \longleftarrow} & |
1522 |
ad \, v_{[k]} (\delta {\cal J}) = |
1523 |
\nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) & |
1524 |
\stackrel{\bf adjoint}{\mathbf \longleftarrow} & |
1525 |
\multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\ |
1526 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1527 |
\hline |
1528 |
~ & ~ & ~ & ~ & ~ \\ |
1529 |
\hspace*{15ex}{\Bigg\downarrow} |
1530 |
\quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]} |
1531 |
& ~ & ~ & ~ & ~ \\ |
1532 |
~ & ~ & ~ & ~ & ~ \\ |
1533 |
\hline |
1534 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1535 |
\multicolumn{1}{|c}{ |
1536 |
{\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} & |
1537 |
{\mathbf \longrightarrow} & \text{\bf minimisation} & |
1538 |
{\mathbf \longrightarrow} & |
1539 |
\multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\ |
1540 |
\multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\ |
1541 |
\hline |
1542 |
~ & ~ & ~ & ~ & ~ \\ |
1543 |
~ & ~ & ~ & ~ & \Big\downarrow \\ |
1544 |
~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\ |
1545 |
\end{array} |
1546 |
\end{eqnarray*} |
1547 |
|
1548 |
The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide |
1549 |
the link between the model and the minimization routine. |
1550 |
As described in Section \ref{???} |
1551 |
the {\it unpack} and {\it pack} routines read and write |
1552 |
control and gradient {\it vectors} which are compressed |
1553 |
to contain only wet points, in addition to the full |
1554 |
2-dim. and 3-dim. fields. |
1555 |
The corresponding I/O flow looks as follows: |
1556 |
|
1557 |
\vspace*{0.5cm} |
1558 |
|
1559 |
{\scriptsize |
1560 |
\begin{tabular}{ccccc} |
1561 |
{\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\ |
1562 |
{\big\downarrow} & ~ & ~ & ~ & ~ \\ |
1563 |
\cline{1-1} |
1564 |
\multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\ |
1565 |
\cline{1-1} |
1566 |
{\big\downarrow} & ~ & ~ & ~ & ~ \\ |
1567 |
\cline{3-3} |
1568 |
\multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ & |
1569 |
\multicolumn{1}{|c|}{~} & ~ & ~ \\ |
1570 |
\multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} & |
1571 |
$\stackrel{\mbox{read}}{\longrightarrow}$ & |
1572 |
\multicolumn{1}{|c|}{forward integration} & ~ & ~ \\ |
1573 |
\multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~} |
1574 |
& ~ & ~ \\ |
1575 |
\cline{3-3} |
1576 |
~ & ~ & $\downarrow$ & ~ & ~ \\ |
1577 |
\cline{3-3} |
1578 |
~ & ~ & |
1579 |
\multicolumn{1}{|c|}{~} & ~ & |
1580 |
\multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\ |
1581 |
~ & ~ & \multicolumn{1}{|c|}{adjoint integration} & |
1582 |
$\stackrel{\mbox{write}}{\longrightarrow}$ & |
1583 |
\multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\ |
1584 |
~ & ~ & \multicolumn{1}{|c|}{~} |
1585 |
& ~ & \multicolumn{1}{l}{\bf \vdots} \\ |
1586 |
\cline{3-3} |
1587 |
~ & ~ & ~ & ~ & {\big\downarrow} \\ |
1588 |
\cline{5-5} |
1589 |
~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\ |
1590 |
\cline{5-5} |
1591 |
~ & ~ & ~ & ~ & {\big\downarrow} \\ |
1592 |
~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\ |
1593 |
\end{tabular} |
1594 |
} |
1595 |
|
1596 |
\vspace*{0.5cm} |
1597 |
|
1598 |
|
1599 |
{\it ctrl\_unpack} reads the updated control vector |
1600 |
{\bf vector\_ctrl\_$<$k$>$}. |
1601 |
It distributes the different control variables to |
1602 |
2-dim. and 3-dim. files {\it xx\_...$<$k$>$}. |
1603 |
At the start of the forward integration the control variables |
1604 |
are read from {\it xx\_...$<$k$>$} and added to the |
1605 |
field. |
1606 |
Correspondingly, at the end of the adjoint integration |
1607 |
the adjoint fields are written |
1608 |
to {\it adxx\_...$<$k$>$}, again via the active file routines. |
1609 |
Finally, {\it ctrl\_pack} collects all adjoint files |
1610 |
and writes them to the compressed vector file |
1611 |
{\bf vector\_grad\_$<$k$>$}. |