/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.15 - (show annotations) (download) (as text)
Wed Apr 24 11:01:46 2002 UTC (23 years, 3 months ago) by heimbach
Branch: MAIN
Changes since 1.14: +57 -30 lines
File MIME type: application/x-tex
Starting update of part 5.

1 % $Header: /u/gcmpack/mitgcmdoc/part5/doc_ad_2.tex,v 1.14 2002/02/28 19:32:20 cnh Exp $
2 % $Name: $
3
4 {\sf Automatic differentiation} (AD), also referred to as algorithmic
5 (or, more loosely, computational) differentiation, involves
6 automatically deriving code to calculate
7 partial derivatives from an existing fully non-linear prognostic code.
8 (see \cite{gri:00}).
9 A software tool is used that parses and transforms source files
10 according to a set of linguistic and mathematical rules.
11 AD tools are like source-to-source translators in that
12 they parse a program code as input and produce a new program code
13 as output.
14 However, unlike a pure source-to-source translation, the output program
15 represents a new algorithm, such as the evaluation of the
16 Jacobian, the Hessian, or higher derivative operators.
17 In principle, a variety of derived algorithms
18 can be generated automatically in this way.
19
20 The MITGCM has been adapted for use with the
21 Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF
22 (Transformation of Algorithms in Fortran), developed
23 by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
24 The first application of the adjoint of the MITGCM for sensitivity
25 studies has been published by \cite{maro-eta:99}.
26 \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
27 for ocean state estimation studies.
28 In the following we shall refer to TAMC and TAF synonymously,
29 except were explicitly stated otherwise.
30
31 TAMC exploits the chain rule for computing the first
32 derivative of a function with
33 respect to a set of input variables.
34 Treating a given forward code as a composition of operations --
35 each line representing a compositional element, the chain rule is
36 rigorously applied to the code, line by line. The resulting
37 tangent linear or adjoint code,
38 then, may be thought of as the composition in
39 forward or reverse order, respectively, of the
40 Jacobian matrices of the forward code's compositional elements.
41
42 %**********************************************************************
43 \section{Some basic algebra}
44 \label{sec_ad_algebra}
45 %**********************************************************************
46
47 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
48 mapping from the $m$-dimensional space
49 $U \subset I\!\!R^m$ of input variables
50 $\vec{u}=(u_1,\ldots,u_m)$
51 (model parameters, initial conditions, boundary conditions
52 such as forcing functions) to the $n$-dimensional space
53 $V \subset I\!\!R^n$ of
54 model output variable $\vec{v}=(v_1,\ldots,v_n)$
55 (model state, model diagnostics, objective function, ...)
56 under consideration,
57 %
58 \begin{equation}
59 \begin{split}
60 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
61 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
62 {\cal M}(\vec{u})
63 \label{fulloperator}
64 \end{split}
65 \end{equation}
66 %
67 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
68 some given basis vectors
69 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
70 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
71 \[
72 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
73 \qquad
74 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
75 \]
76
77 Two routes may be followed to determine the sensitivity of the
78 output variable $\vec{v}$ to its input $\vec{u}$.
79
80 \subsection{Forward or direct sensitivity}
81 %
82 Consider a perturbation to the input variables $\delta \vec{u}$
83 (typically a single component
84 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
85 Their effect on the output may be obtained via the linear
86 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
87 $ M $, evaluated in the point $u^{(0)}$ according to
88 %
89 \begin{equation}
90 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
91 \label{tangent_linear}
92 \end{equation}
93 with resulting output perturbation $\delta \vec{v}$.
94 In components
95 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
96 it reads
97 %
98 \begin{equation}
99 \delta v_{j} \, = \, \sum_{i}
100 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
101 \delta u_{i}
102 \label{jacobi_matrix}
103 \end{equation}
104 %
105 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
106 In contrast to the full nonlinear model $ {\cal M} $, the operator
107 $ M $ is just a matrix
108 which can readily be used to find the forward sensitivity of $\vec{v}$ to
109 perturbations in $u$,
110 but if there are very many input variables $(\gg O(10^{6})$ for
111 large-scale oceanographic application), it quickly becomes
112 prohibitive to proceed directly as in (\ref{tangent_linear}),
113 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
114
115 \subsection{Reverse or adjoint sensitivity}
116 %
117 Let us consider the special case of a
118 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
119 the total meridional heat transport,
120 the total uptake of $CO_{2}$ in the Southern
121 Ocean over a time interval,
122 or a measure of some model-to-data misfit)
123 %
124 \begin{eqnarray}
125 \begin{array}{cccccc}
126 {\cal J} \, : & U &
127 \longrightarrow & V &
128 \longrightarrow & I \!\! R \\
129 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
130 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
131 \end{array}
132 \label{compo}
133 \end{eqnarray}
134 %
135 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
136 \[
137 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
138 \]
139 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
140 w.r.t. their corresponding inner product
141 $\left\langle \,\, , \,\, \right\rangle $
142 %
143 \begin{equation}
144 \begin{split}
145 {\cal J} & = \,
146 {\cal J} |_{\vec{u}^{(0)}} \, + \,
147 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
148 \, + \, O(\delta \vec{u}^2) \\
149 ~ & = \,
150 {\cal J} |_{\vec{v}^{(0)}} \, + \,
151 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
152 \, + \, O(\delta \vec{v}^2)
153 \end{split}
154 \label{deljidentity}
155 \end{equation}
156 %
157 (note, that the gradient $ \nabla f $ is a co-vector, therefore
158 its transpose is required in the above inner product).
159 Then, using the representation of
160 $ \delta {\cal J} =
161 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
162 the definition
163 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
164 \[
165 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
166 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
167 \]
168 which for finite-dimensional vector spaces is just the
169 transpose of $ A $,
170 \[
171 A^{\ast} \, = \, A^T
172 \]
173 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
174 we note that
175 (omitting $|$'s):
176 %
177 \begin{equation}
178 \delta {\cal J}
179 \, = \,
180 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
181 \, = \,
182 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
183 \, = \,
184 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
185 \delta \vec{u} \, \right\rangle
186 \label{inner}
187 \end{equation}
188 %
189 With the identity (\ref{deljidentity}), we then find that
190 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
191 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
192 %
193 \begin{equation}
194 \begin{split}
195 \nabla _{u}{\cal J}^T |_{\vec{u}} &
196 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
197 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
198 ~ & = \, \delta \vec{u}^{\ast}
199 \end{split}
200 \label{adjoint}
201 \end{equation}
202 %
203 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
204 in which $M^T$ is the adjoint (here, the transpose) of the
205 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
206 the adjoint variable of the model state $ \vec{v} $, and
207 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
208
209 The {\sf reverse} nature of the adjoint calculation can be readily
210 seen as follows.
211 Consider a model integration which consists of $ \Lambda $
212 consecutive operations
213 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
214 ...... ( {\cal M}_{\lambda} (
215 ......
216 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
217 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
218 in the code of the model, or successive time steps of the
219 model integration,
220 starting at step 0 and moving up to step $\Lambda$, with intermediate
221 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
222 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
223 Let ${\cal J}$ be a cost function which explicitly depends on the
224 final state $\vec{v}$ only
225 (this restriction is for clarity reasons only).
226 %
227 ${\cal J}(u)$ may be decomposed according to:
228 %
229 \begin{equation}
230 {\cal J}({\cal M}(\vec{u})) \, = \,
231 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
232 ...... ( {\cal M}_{\lambda} (
233 ......
234 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
235 \label{compos}
236 \end{equation}
237 %
238 Then, according to the chain rule, the forward calculation reads,
239 in terms of the Jacobi matrices
240 (we've omitted the $ | $'s which, nevertheless are important
241 to the aspect of {\it tangent} linearity;
242 note also that by definition
243 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
244 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
245 %
246 \begin{equation}
247 \begin{split}
248 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
249 \nabla_v {\cal J} \cdot M_{\Lambda}
250 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
251 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
252 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
253 \end{split}
254 \label{forward}
255 \end{equation}
256 %
257 whereas in reverse mode we have
258 %
259 \begin{equation}
260 \boxed{
261 \begin{split}
262 M^T ( \nabla_v {\cal J}^T) & = \,
263 M_{0}^T \cdot M_{1}^T
264 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
265 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
266 ~ & = \, M_{0}^T \cdot M_{1}^T
267 \cdot ...... \cdot
268 \nabla_{v^{(\lambda)}} {\cal J}^T \\
269 ~ & = \, \nabla_u {\cal J}^T
270 \end{split}
271 }
272 \label{reverse}
273 \end{equation}
274 %
275 clearly expressing the reverse nature of the calculation.
276 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
277 If the intermediate steps $\lambda$ in
278 eqn. (\ref{compos}) -- (\ref{reverse})
279 represent the model state (forward or adjoint) at each
280 intermediate time step as noted above, then correspondingly,
281 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
282 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
283 It thus becomes evident that the adjoint calculation also
284 yields the adjoint of each model state component
285 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
286 %
287 \begin{equation}
288 \boxed{
289 \begin{split}
290 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
291 & = \,
292 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
293 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
294 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
295 \end{split}
296 }
297 \end{equation}
298 %
299 in close analogy to eq. (\ref{adjoint})
300 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
301 are the Lagrange multipliers of the model equations which determine
302 $ \vec{v}^{(\lambda)}$.
303
304 In components, eq. (\ref{adjoint}) reads as follows.
305 Let
306 \[
307 \begin{array}{rclcrcl}
308 \delta \vec{u} & = &
309 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
310 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
311 \left(
312 \frac{\partial {\cal J}}{\partial u_1},\ldots,
313 \frac{\partial {\cal J}}{\partial u_m}
314 \right)^T \\
315 \delta \vec{v} & = &
316 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
317 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
318 \left(
319 \frac{\partial {\cal J}}{\partial v_1},\ldots,
320 \frac{\partial {\cal J}}{\partial v_n}
321 \right)^T \\
322 \end{array}
323 \]
324 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
325 and their adjoint variables;
326 further
327 \[
328 M \, = \, \left(
329 \begin{array}{ccc}
330 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
331 \frac{\partial {\cal M}_1}{\partial u_m} \\
332 \vdots & ~ & \vdots \\
333 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
334 \frac{\partial {\cal M}_n}{\partial u_m} \\
335 \end{array}
336 \right)
337 \]
338 is the Jacobi matrix of $ {\cal M} $
339 (an $ n \times m $ matrix)
340 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
341 \[
342 \delta v_{j}
343 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
344 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
345 \delta u_{i}
346 \]
347 %
348 Then eq. (\ref{adjoint}) takes the form
349 \[
350 \delta u_{i}^{\ast}
351 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
352 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
353 \delta v_{j}^{\ast}
354 \]
355 %
356 or
357 %
358 \[
359 \left(
360 \begin{array}{c}
361 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
362 \vdots \\
363 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
364 \end{array}
365 \right)
366 \, = \,
367 \left(
368 \begin{array}{ccc}
369 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
370 & \ldots &
371 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
372 \vdots & ~ & \vdots \\
373 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
374 & \ldots &
375 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
376 \end{array}
377 \right)
378 \cdot
379 \left(
380 \begin{array}{c}
381 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
382 \vdots \\
383 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
384 \end{array}
385 \right)
386 \]
387 %
388 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
389 of any intermediate state $ v^{(\lambda)} $
390 may be obtained, using the intermediate Jacobian
391 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
392 %
393 \[
394 M_{\lambda} \, = \,
395 \left(
396 \begin{array}{ccc}
397 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
398 & \ldots &
399 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
400 \vdots & ~ & \vdots \\
401 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
402 & \ldots &
403 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
404 \end{array}
405 \right)
406 \]
407 %
408 and the shorthand notation for the adjoint variables
409 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
410 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
411 for intermediate components, yielding
412 \begin{equation}
413 \small
414 \begin{split}
415 \left(
416 \begin{array}{c}
417 \delta v^{(\lambda) \, \ast}_1 \\
418 \vdots \\
419 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
420 \end{array}
421 \right)
422 \, = &
423 \left(
424 \begin{array}{ccc}
425 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
426 & \ldots \,\, \ldots &
427 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
428 \vdots & ~ & \vdots \\
429 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
430 & \ldots \,\, \ldots &
431 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
432 \end{array}
433 \right)
434 \cdot
435 %
436 \\ ~ & ~
437 \\ ~ &
438 %
439 \left(
440 \begin{array}{ccc}
441 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
442 & \ldots &
443 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
444 \vdots & ~ & \vdots \\
445 \vdots & ~ & \vdots \\
446 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
447 & \ldots &
448 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
449 \end{array}
450 \right)
451 \cdot \, \ldots \, \cdot
452 \left(
453 \begin{array}{c}
454 \delta v^{\ast}_1 \\
455 \vdots \\
456 \delta v^{\ast}_{n} \\
457 \end{array}
458 \right)
459 \end{split}
460 \end{equation}
461
462 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
463 showing the advantage of the reverse over the forward mode
464 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
465 cost function $ {\cal J} $ with respect to {\it all} input
466 variables $u$
467 (or the sensitivity of the cost function with respect to
468 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
469 In order to be able to solve for each component of the gradient
470 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
471 a forward calculation has to be performed for each component separately,
472 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
473 for the $i$-th forward calculation.
474 Then, (\ref{forward}) represents the
475 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
476 The full gradient is retrieved from the $ m $ forward calculations.
477 In contrast, eq. (\ref{reverse}) yields the full
478 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
479 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
480
481 Note, that if $ {\cal J} $ is a vector-valued function
482 of dimension $ l > 1 $,
483 eq. (\ref{reverse}) has to be modified according to
484 \[
485 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
486 \, = \,
487 \nabla_u {\cal J}^T \cdot \delta \vec{J}
488 \]
489 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
490 dimension $ l $.
491 In this case $ l $ reverse simulations have to be performed
492 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
493 Then, the reverse mode is more efficient as long as
494 $ l < n $, otherwise the forward mode is preferable.
495 Strictly, the reverse mode is called adjoint mode only for
496 $ l = 1 $.
497
498 A detailed analysis of the underlying numerical operations
499 shows that the computation of $\nabla _{u}{\cal J}$ in this way
500 requires about 2 to 5 times the computation of the cost function.
501 Alternatively, the gradient vector could be approximated
502 by finite differences, requiring $m$ computations
503 of the perturbed cost function.
504
505 To conclude we give two examples of commonly used types
506 of cost functions:
507
508 \paragraph{Example 1:
509 $ {\cal J} = v_{j} (T) $} ~ \\
510 The cost function consists of the $j$-th component of the model state
511 $ \vec{v} $ at time $T$.
512 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
513 unit vector. The $ \nabla_u {\cal J}^T $
514 is the projection of the adjoint
515 operator onto the $j$-th component ${\bf f_{j}}$,
516 \[
517 \nabla_u {\cal J}^T
518 \, = \, M^T \cdot \nabla_v {\cal J}^T
519 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
520 \]
521
522 \paragraph{Example 2:
523 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
524 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
525 The cost function represents the quadratic model vs. data misfit.
526 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
527 operator which maps the model state space onto the data space.
528 Then, $ \nabla_v {\cal J} $ takes the form
529 %
530 \begin{equation*}
531 \begin{split}
532 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
533 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
534 ~ & = \, 2 \sum_{j} \left\{ \sum_k
535 \frac{\partial {\cal H}_k}{\partial v_{j}}
536 \left( {\cal H}_k (\vec{v}) - d_k \right)
537 \right\} \, {\vec{f}_{j}} \\
538 \end{split}
539 \end{equation*}
540 %
541 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
542 Jacobi matrix of the data projection operator.
543 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
544 adjoint operator,
545 driven by the model vs. data misfit:
546 \[
547 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
548 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
549 \]
550
551 \subsection{Storing vs. recomputation in reverse mode}
552 \label{checkpointing}
553
554 We note an important aspect of the forward vs. reverse
555 mode calculation.
556 Because of the local character of the derivative
557 (a derivative is defined w.r.t. a point along the trajectory),
558 the intermediate results of the model trajectory
559 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
560 may be required to evaluate the intermediate Jacobian
561 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
562 This is the case e.g. for nonlinear expressions
563 (momentum advection, nonlinear equation of state), state-dependent
564 conditional statements (parameterization schemes).
565 In the forward mode, the intermediate results are required
566 in the same order as computed by the full forward model ${\cal M}$,
567 but in the reverse mode they are required in the reverse order.
568 Thus, in the reverse mode the trajectory of the forward model
569 integration ${\cal M}$ has to be stored to be available in the reverse
570 calculation. Alternatively, the complete model state up to the
571 point of evaluation has to be recomputed whenever its value is required.
572
573 A method to balance the amount of recomputations vs.
574 storage requirements is called {\sf checkpointing}
575 (e.g. \cite{gri:92}, \cite{res-eta:98}).
576 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
577 [as an example, we give explicit numbers for a 3-day
578 integration with a 1-hourly timestep in square brackets].
579 \begin{itemize}
580 %
581 \item [$lev3$]
582 In a first step, the model trajectory is subdivided into
583 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
584 with the label $lev3$ for this outermost loop.
585 The model is then integrated along the full trajectory,
586 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
587 [i.e. 3 times, at
588 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
589 In addition, the cost function is computed, if needed.
590 %
591 \item [$lev2$]
592 In a second step each subsection itself is divided into
593 $ {n}^{lev2} $ subsections
594 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
595 The model picks up at the last outermost dumped state
596 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
597 the last subsection, with the label $lev2$ for this
598 intermediate loop.
599 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
600 timestep
601 [i.e. 4 times, at
602 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
603 %
604 \item [$lev1$]
605 Finally, the model picks up at the last intermediate dump state
606 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
607 the last subsection, with the label $lev1$ for this
608 intermediate loop.
609 Within this sub-subsection only, parts of the model state is stored
610 to memory at every timestep
611 [i.e. every hour $ i=0,...,5$ corresponding to
612 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
613 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
614 and the model state of all preceding timesteps along the last
615 innermost subsection are available, enabling integration backwards
616 in time along the last subsection.
617 The adjoint can thus be computed along this last
618 subsection $k_{n}^{lev2}$.
619 %
620 \end{itemize}
621 %
622 This procedure is repeated consecutively for each previous
623 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
624 carrying the adjoint computation to the initial time
625 of the subsection $k_{n}^{lev3}$.
626 Then, the procedure is repeated for the previous subsection
627 $k_{n-1}^{lev3}$
628 carrying the adjoint computation to the initial time
629 $k_{1}^{lev3}$.
630
631 For the full model trajectory of
632 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
633 the required storing of the model state was significantly reduced to
634 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
635 [i.e. for the 3-day integration with a total oof 72 timesteps
636 the model state was stored 7 times to disk and roughly 6 times
637 to memory].
638 This saving in memory comes at a cost of a required
639 3 full forward integrations of the model (one for each
640 checkpointing level).
641 The optimal balance of storage vs. recomputation certainly depends
642 on the computing resources available and may be adjusted by
643 adjusting the partitioning among the
644 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
645
646 \begin{figure}[t!]
647 \begin{center}
648 %\psdraft
649 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
650 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
651 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
652 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
653 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
654 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
655 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
656 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
657 %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
658 \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
659 %\psfull
660 \end{center}
661 \caption{
662 Schematic view of intermediate dump and restart for
663 3-level checkpointing.}
664 \label{fig:3levelcheck}
665 \end{figure}
666
667 % \subsection{Optimal perturbations}
668 % \label{sec_optpert}
669
670
671 % \subsection{Error covariance estimate and Hessian matrix}
672 % \label{sec_hessian}
673
674 \newpage
675
676 %**********************************************************************
677 \section{TLM and ADM generation in general}
678 \label{sec_ad_setup_gen}
679 %**********************************************************************
680
681 In this section we describe in a general fashion
682 the parts of the code that are relevant for automatic
683 differentiation using the software tool TAMC.
684
685 \input{part5/doc_ad_the_model}
686
687 The basic flow is depicted in \ref{fig:adthemodel}.
688 If the option {\tt ALLOW\_AUTODIFF\_TAMC} is defined, the driver routine
689 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
690 invokes the adjoint of this routine, {\it adthe\_main\_loop},
691 which is the toplevel routine in terms of reverse mode computation.
692 The routine {\it adthe\_main\_loop} has been generated by TAMC.
693 It contains both the forward integration of the full model,
694 any additional storing that is required for efficient checkpointing,
695 and the reverse integration of the adjoint model.
696 The structure of {\it adthe\_main\_loop} has been strongly
697 simplified for clarification; in particular, no checkpointing
698 procedures are shown here.
699 Prior to the call of {\it adthe\_main\_loop}, the routine
700 {\it ctrl\_unpack} is invoked to unpack the control vector
701 or initialise the control variables.
702 Following the call of {\it adthe\_main\_loop},
703 the routine {\it ctrl\_pack}
704 is invoked to pack the control vector
705 (cf. Section \ref{section_ctrl}).
706 If gradient checks are to be performed, the option
707 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
708 the driver routine {\it grdchk\_main} is called after
709 the gradient has been computed via the adjoint
710 (cf. Section \ref{section_grdchk}).
711
712 \subsection{The cost function (dependent variable)
713 \label{section_cost}}
714
715 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
716 It is a function of the input variables $ \vec{u} $ via the composition
717 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
718 The input are referred to as the
719 {\sf independent variables} or {\sf control variables}.
720 All aspects relevant to the treatment of the cost function $ {\cal J} $
721 (parameter setting, initialization, accumulation,
722 final evaluation), are controlled by the package {\it pkg/cost}.
723 The aspects relevant to the treatment of the independent variables
724 are controlled by the package {\it pkg/ctrl} and will be treated
725 in the next section.
726
727 \input{part5/doc_cost_flow}
728
729 \subsubsection{genmake and CPP options}
730 %
731 \begin{itemize}
732 %
733 \item
734 \fbox{
735 \begin{minipage}{12cm}
736 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
737 \end{minipage}
738 }
739 \end{itemize}
740 %
741 The directory {\it pkg/cost} can be included to the
742 compile list in 3 different ways (cf. Section \ref{???}):
743 %
744 \begin{enumerate}
745 %
746 \item {\it genmake}: \\
747 Change the default settings in the file {\it genmake} by adding
748 {\bf cost} to the {\bf enable} list (not recommended).
749 %
750 \item {\it .genmakerc}: \\
751 Customize the settings of {\bf enable}, {\bf disable} which are
752 appropriate for your experiment in the file {\it .genmakerc}
753 and add the file to your compile directory.
754 %
755 \item genmake-options: \\
756 Call {\it genmake} with the option
757 {\tt genmake -enable=cost}.
758 %
759 \end{enumerate}
760 N.B.: In general the following packages ought to be enabled
761 simultaneously: {\it autodiff, cost, ctrl}.
762 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
763 Each specific cost function contribution has its own option.
764 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
765 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
766 Since the cost function is usually used in conjunction with
767 automatic differentiation, the CPP option
768 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
769 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
770 should be defined.
771
772 \subsubsection{Initialization}
773 %
774 The initialization of the {\it cost} package is readily enabled
775 as soon as the CPP option {\bf ALLOW\_COST} is defined.
776 %
777 \begin{itemize}
778 %
779 \item
780 \fbox{
781 \begin{minipage}{12cm}
782 Parameters: {\it cost\_readparms}
783 \end{minipage}
784 }
785 \\
786 This S/R
787 reads runtime flags and parameters from file {\it data.cost}.
788 For the present example the only relevant parameter read
789 is {\bf mult\_tracer}. This multiplier enables different
790 cost function contributions to be switched on
791 ( = 1.) or off ( = 0.) at runtime.
792 For more complex cost functions which involve model vs. data
793 misfits, the corresponding data filenames and data
794 specifications (start date and time, period, ...) are read
795 in this S/R.
796 %
797 \item
798 \fbox{
799 \begin{minipage}{12cm}
800 Variables: {\it cost\_init}
801 \end{minipage}
802 }
803 \\
804 This S/R
805 initializes the different cost function contributions.
806 The contribution for the present example is {\bf objf\_tracer}
807 which is defined on each tile (bi,bj).
808 %
809 \end{itemize}
810 %
811 \subsubsection{Accumulation}
812 %
813 \begin{itemize}
814 %
815 \item
816 \fbox{
817 \begin{minipage}{12cm}
818 {\it cost\_tile}, {\it cost\_tracer}
819 \end{minipage}
820 }
821 \end{itemize}
822 %
823 The 'driver' routine
824 {\it cost\_tile} is called at the end of each time step.
825 Within this 'driver' routine, S/R are called for each of
826 the chosen cost function contributions.
827 In the present example ({\bf ALLOW\_COST\_TRACER}),
828 S/R {\it cost\_tracer} is called.
829 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
830 %
831 \subsubsection{Finalize all contributions}
832 %
833 \begin{itemize}
834 %
835 \item
836 \fbox{
837 \begin{minipage}{12cm}
838 {\it cost\_final}
839 \end{minipage}
840 }
841 \end{itemize}
842 %
843 At the end of the forward integration S/R {\it cost\_final}
844 is called. It accumulates the total cost function {\bf fc}
845 from each contribution and sums over all tiles:
846 \begin{equation}
847 {\cal J} \, = \,
848 {\rm fc} \, = \,
849 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
850 {\rm objf\_tracer}(bi,bj) \, + \, ...
851 \end{equation}
852 %
853 The total cost function {\bf fc} will be the
854 'dependent' variable in the argument list for TAMC, i.e.
855 \begin{verbatim}
856 tamc -output 'fc' ...
857 \end{verbatim}
858
859 %%%% \end{document}
860
861 \input{part5/doc_ad_the_main}
862
863 \subsection{The control variables (independent variables)
864 \label{section_ctrl}}
865
866 The control variables are a subset of the model input
867 (initial conditions, boundary conditions, model parameters).
868 Here we identify them with the variable $ \vec{u} $.
869 All intermediate variables whose derivative w.r.t. control
870 variables do not vanish are called {\sf active variables}.
871 All subroutines whose derivative w.r.t. the control variables
872 don't vanish are called {\sf active routines}.
873 Read and write operations from and to file can be viewed
874 as variable assignments. Therefore, files to which
875 active variables are written and from which active variables
876 are read are called {\sf active files}.
877 All aspects relevant to the treatment of the control variables
878 (parameter setting, initialization, perturbation)
879 are controlled by the package {\it pkg/ctrl}.
880
881 \input{part5/doc_ctrl_flow}
882
883 \subsubsection{genmake and CPP options}
884 %
885 \begin{itemize}
886 %
887 \item
888 \fbox{
889 \begin{minipage}{12cm}
890 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
891 \end{minipage}
892 }
893 \end{itemize}
894 %
895 To enable the directory to be included to the compile list,
896 {\bf ctrl} has to be added to the {\bf enable} list in
897 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
898 package, cf. previous section).
899 Each control variable is enabled via its own CPP option
900 in {\it ECCO\_CPPOPTIONS.h}.
901
902 \subsubsection{Initialization}
903 %
904 \begin{itemize}
905 %
906 \item
907 \fbox{
908 \begin{minipage}{12cm}
909 Parameters: {\it ctrl\_readparms}
910 \end{minipage}
911 }
912 \\
913 %
914 This S/R
915 reads runtime flags and parameters from file {\it data.ctrl}.
916 For the present example the file contains the file names
917 of each control variable that is used.
918 In addition, the number of wet points for each control
919 variable and the net dimension of the space of control
920 variables (counting wet points only) {\bf nvarlength}
921 is determined.
922 Masks for wet points for each tile {\bf (bi,\,bj)}
923 and vertical layer {\bf k} are generated for the three
924 relevant categories on the C-grid:
925 {\bf nWetCtile} for tracer fields,
926 {\bf nWetWtile} for zonal velocity fields,
927 {\bf nWetStile} for meridional velocity fields.
928 %
929 \item
930 \fbox{
931 \begin{minipage}{12cm}
932 Control variables, control vector,
933 and their gradients: {\it ctrl\_unpack}
934 \end{minipage}
935 }
936 \\
937 %
938 Two important issues related to the handling of the control
939 variables in the MITGCM need to be addressed.
940 First, in order to save memory, the control variable arrays
941 are not kept in memory, but rather read from file and added
942 to the initial fields during the model initialization phase.
943 Similarly, the corresponding adjoint fields which represent
944 the gradient of the cost function w.r.t. the control variables
945 are written to file at the end of the adjoint integration.
946 Second, in addition to the files holding the 2-dim. and 3-dim.
947 control variables and the corresponding cost gradients,
948 a 1-dim. {\sf control vector}
949 and {\sf gradient vector} are written to file. They contain
950 only the wet points of the control variables and the corresponding
951 gradient.
952 This leads to a significant data compression.
953 Furthermore, an option is available
954 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
955 non-dimensionalise the control and gradient vector,
956 which otherwise would contain different pieces of different
957 magnitudes and units.
958 Finally, the control and gradient vector can be passed to a
959 minimization routine if an update of the control variables
960 is sought as part of a minimization exercise.
961
962 The files holding fields and vectors of the control variables
963 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
964 %
965 \end{itemize}
966
967 \subsubsection{Perturbation of the independent variables}
968 %
969 The dependency flow for differentiation w.r.t. the controls
970 starts with adding a perturbation onto the input variable,
971 thus defining the independent or control variables for TAMC.
972 Three types of controls may be considered:
973 %
974 \begin{itemize}
975 %
976 \item
977 \fbox{
978 \begin{minipage}{12cm}
979 {\it ctrl\_map\_ini} (initial value sensitivity):
980 \end{minipage}
981 }
982 \\
983 %
984 Consider as an example the initial tracer distribution
985 {\bf tr1} as control variable.
986 After {\bf tr1} has been initialised in
987 {\it ini\_tr1} (dynamical variables such as
988 temperature and salinity are initialised in {\it ini\_fields}),
989 a perturbation anomaly is added to the field in S/R
990 {\it ctrl\_map\_ini}
991 %
992 \begin{equation}
993 \begin{split}
994 u & = \, u_{[0]} \, + \, \Delta u \\
995 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
996 \label{perturb}
997 \end{split}
998 \end{equation}
999 %
1000 {\bf xx\_tr1} is a 3-dim. global array
1001 holding the perturbation. In the case of a simple
1002 sensitivity study this array is identical to zero.
1003 However, it's specification is essential in the context
1004 of automatic differentiation since TAMC
1005 treats the corresponding line in the code symbolically
1006 when determining the differentiation chain and its origin.
1007 Thus, the variable names are part of the argument list
1008 when calling TAMC:
1009 %
1010 \begin{verbatim}
1011 tamc -input 'xx_tr1 ...' ...
1012 \end{verbatim}
1013 %
1014 Now, as mentioned above, the MITGCM avoids maintaining
1015 an array for each control variable by reading the
1016 perturbation to a temporary array from file.
1017 To ensure the symbolic link to be recognized by TAMC, a scalar
1018 dummy variable {\bf xx\_tr1\_dummy} is introduced
1019 and an 'active read' routine of the adjoint support
1020 package {\it pkg/autodiff} is invoked.
1021 The read-procedure is tagged with the variable
1022 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1023 initialization of the perturbation.
1024 The modified call of TAMC thus reads
1025 %
1026 \begin{verbatim}
1027 tamc -input 'xx_tr1_dummy ...' ...
1028 \end{verbatim}
1029 %
1030 and the modified operation to (\ref{perturb})
1031 in the code takes on the form
1032 %
1033 \begin{verbatim}
1034 call active_read_xyz(
1035 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1036
1037 tr1(...) = tr1(...) + tmpfld3d(...)
1038 \end{verbatim}
1039 %
1040 Note, that reading an active variable corresponds
1041 to a variable assignment. Its derivative corresponds
1042 to a write statement of the adjoint variable, followed by
1043 a reset.
1044 The 'active file' routines have been designed
1045 to support active read and corresponding adjoint active write
1046 operations (and vice versa).
1047 %
1048 \item
1049 \fbox{
1050 \begin{minipage}{12cm}
1051 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1052 \end{minipage}
1053 }
1054 \\
1055 %
1056 The handling of boundary values as control variables
1057 proceeds exactly analogous to the initial values
1058 with the symbolic perturbation taking place in S/R
1059 {\it ctrl\_map\_forcing}.
1060 Note however an important difference:
1061 Since the boundary values are time dependent with a new
1062 forcing field applied at each time steps,
1063 the general problem may be thought of as
1064 a new control variable at each time step
1065 (or, if the perturbation is averaged over a certain period,
1066 at each $ N $ timesteps), i.e.
1067 \[
1068 u_{\rm forcing} \, = \,
1069 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1070 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1071 \]
1072 %
1073 In the current example an equilibrium state is considered,
1074 and only an initial perturbation to
1075 surface forcing is applied with respect to the
1076 equilibrium state.
1077 A time dependent treatment of the surface forcing is
1078 implemented in the ECCO environment, involving the
1079 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1080 %
1081 \item
1082 \fbox{
1083 \begin{minipage}{12cm}
1084 {\it ctrl\_map\_params} (parameter sensitivity):
1085 \end{minipage}
1086 }
1087 \\
1088 %
1089 This routine is not yet implemented, but would proceed
1090 proceed along the same lines as the initial value sensitivity.
1091 The mixing parameters {\bf diffkr} and {\bf kapgm}
1092 are currently added as controls in {\it ctrl\_map\_ini.F}.
1093 %
1094 \end{itemize}
1095 %
1096
1097 \subsubsection{Output of adjoint variables and gradient}
1098 %
1099 Several ways exist to generate output of adjoint fields.
1100 %
1101 \begin{itemize}
1102 %
1103 \item
1104 \fbox{
1105 \begin{minipage}{12cm}
1106 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1107 \end{minipage}
1108 }
1109 \\
1110 \begin{itemize}
1111 %
1112 \item {\bf xx\_...}: the control variable fields \\
1113 Before the forward integration, the control
1114 variables are read from file {\bf xx\_ ...} and added to
1115 the model field.
1116 %
1117 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1118 $ \nabla _{u}{\cal J} $ for each control variable \\
1119 After the adjoint integration the corresponding adjoint
1120 variables are written to {\bf adxx\_ ...}.
1121 %
1122 \end{itemize}
1123 %
1124 \item
1125 \fbox{
1126 \begin{minipage}{12cm}
1127 {\it ctrl\_unpack, ctrl\_pack}:
1128 \end{minipage}
1129 }
1130 \\
1131 %
1132 \begin{itemize}
1133 %
1134 \item {\bf vector\_ctrl}: the control vector \\
1135 At the very beginning of the model initialization,
1136 the updated compressed control vector is read (or initialised)
1137 and distributed to 2-dim. and 3-dim. control variable fields.
1138 %
1139 \item {\bf vector\_grad}: the gradient vector \\
1140 At the very end of the adjoint integration,
1141 the 2-dim. and 3-dim. adjoint variables are read,
1142 compressed to a single vector and written to file.
1143 %
1144 \end{itemize}
1145 %
1146 \item
1147 \fbox{
1148 \begin{minipage}{12cm}
1149 {\it addummy\_in\_stepping}:
1150 \end{minipage}
1151 }
1152 \\
1153 In addition to writing the gradient at the end of the
1154 forward/adjoint integration, many more adjoint variables
1155 of the model state
1156 at intermediate times can be written using S/R
1157 {\it addummy\_in\_stepping}.
1158 This routine is part of the adjoint support package
1159 {\it pkg/autodiff} (cf.f. below).
1160 The procedure is enabled using via the CPP-option
1161 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1162 To be part of the adjoint code, the corresponding S/R
1163 {\it dummy\_in\_stepping} has to be called in the forward
1164 model (S/R {\it the\_main\_loop}) at the appropriate place.
1165 The adjoint common blocks are extracted from the adjoint code
1166 via the header file {\it adcommon.h}.
1167
1168 {\it dummy\_in\_stepping} is essentially empty,
1169 the corresponding adjoint routine is hand-written rather
1170 than generated automatically.
1171 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1172 ensure that TAMC does not automatically
1173 generate {\it addummy\_in\_stepping} by trying to differentiate
1174 {\it dummy\_in\_stepping}, but instead refers to
1175 the hand-written routine.
1176
1177 {\it dummy\_in\_stepping} is called in the forward code
1178 at the beginning of each
1179 timestep, before the call to {\it dynamics}, thus ensuring
1180 that {\it addummy\_in\_stepping} is called at the end of
1181 each timestep in the adjoint calculation, after the call to
1182 {\it addynamics}.
1183
1184 {\it addummy\_in\_stepping} includes the header files
1185 {\it adcommon.h}.
1186 This header file is also hand-written. It contains
1187 the common blocks
1188 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1189 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1190 {\bf /adtr1\_r/}, {\bf /adffields/},
1191 which have been extracted from the adjoint code to enable
1192 access to the adjoint variables.
1193
1194 {\bf WARNING:} If the structure of the common blocks
1195 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1196 similar changes will occur in the adjoint common blocks.
1197 Therefore, consistency between the TAMC-generated common blocks
1198 and those in {\it adcommon.h} have to be checked.
1199 %
1200 \end{itemize}
1201
1202
1203 \subsubsection{Control variable handling for
1204 optimization applications}
1205
1206 In optimization mode the cost function $ {\cal J}(u) $ is sought
1207 to be minimized with respect to a set of control variables
1208 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1209 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1210 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1211 at iteration step $ k $ serve
1212 as input to a minimization routine (e.g. quasi-Newton method,
1213 conjugate gradient, ... \cite{gil-lem:89})
1214 to compute an update in the
1215 control variable for iteration step $k+1$
1216 \[
1217 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1218 \quad \mbox{satisfying} \quad
1219 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1220 \]
1221 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1222 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1223 $ k+1 $.
1224 Tab. \ref{???} sketches the flow between forward/adjoint model
1225 and the minimization routine.
1226
1227 \begin{eqnarray*}
1228 \scriptsize
1229 \begin{array}{ccccc}
1230 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1231 {\Big\downarrow}
1232 & ~ & ~ & ~ & ~ \\
1233 ~ & ~ & ~ & ~ & ~ \\
1234 \hline
1235 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1236 \multicolumn{1}{|c}{
1237 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1238 \stackrel{\bf forward}{\bf \longrightarrow} &
1239 v_{[k]} = M \left( u_{[k]} \right) &
1240 \stackrel{\bf forward}{\bf \longrightarrow} &
1241 \multicolumn{1}{c|}{
1242 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1243 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1244 \hline
1245 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1246 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1247 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1248 \hline
1249 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1250 \multicolumn{1}{|c}{
1251 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1252 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1253 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1254 ad \, v_{[k]} (\delta {\cal J}) =
1255 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1256 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1257 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1258 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1259 \hline
1260 ~ & ~ & ~ & ~ & ~ \\
1261 \hspace*{15ex}{\Bigg\downarrow}
1262 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1263 & ~ & ~ & ~ & ~ \\
1264 ~ & ~ & ~ & ~ & ~ \\
1265 \hline
1266 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1267 \multicolumn{1}{|c}{
1268 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1269 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1270 {\mathbf \longrightarrow} &
1271 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1272 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1273 \hline
1274 ~ & ~ & ~ & ~ & ~ \\
1275 ~ & ~ & ~ & ~ & \Big\downarrow \\
1276 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1277 \end{array}
1278 \end{eqnarray*}
1279
1280 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1281 the link between the model and the minimization routine.
1282 As described in Section \ref{???}
1283 the {\it unpack} and {\it pack} routines read and write
1284 control and gradient {\it vectors} which are compressed
1285 to contain only wet points, in addition to the full
1286 2-dim. and 3-dim. fields.
1287 The corresponding I/O flow looks as follows:
1288
1289 \vspace*{0.5cm}
1290
1291 {\scriptsize
1292 \begin{tabular}{ccccc}
1293 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1294 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1295 \cline{1-1}
1296 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1297 \cline{1-1}
1298 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1299 \cline{3-3}
1300 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1301 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1302 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1303 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1304 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1305 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1306 & ~ & ~ \\
1307 \cline{3-3}
1308 ~ & ~ & $\downarrow$ & ~ & ~ \\
1309 \cline{3-3}
1310 ~ & ~ &
1311 \multicolumn{1}{|c|}{~} & ~ &
1312 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1313 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1314 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1315 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1316 ~ & ~ & \multicolumn{1}{|c|}{~}
1317 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1318 \cline{3-3}
1319 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1320 \cline{5-5}
1321 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1322 \cline{5-5}
1323 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1324 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1325 \end{tabular}
1326 }
1327
1328 \vspace*{0.5cm}
1329
1330
1331 {\it ctrl\_unpack} reads the updated control vector
1332 {\bf vector\_ctrl\_$<$k$>$}.
1333 It distributes the different control variables to
1334 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1335 At the start of the forward integration the control variables
1336 are read from {\it xx\_...$<$k$>$} and added to the
1337 field.
1338 Correspondingly, at the end of the adjoint integration
1339 the adjoint fields are written
1340 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1341 Finally, {\it ctrl\_pack} collects all adjoint files
1342 and writes them to the compressed vector file
1343 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22