/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.18 - (show annotations) (download) (as text)
Mon Aug 1 22:31:36 2005 UTC (19 years, 11 months ago) by heimbach
Branch: MAIN
Changes since 1.17: +3 -1 lines
File MIME type: application/x-tex
Updating (mostly kpp.tex, and corrections to exf.tex)

1 % $Header: /u/gcmpack/manual/part5/doc_ad_2.tex,v 1.17 2004/10/16 03:40:17 edhill Exp $
2 % $Name: $
3
4 Author: Patrick Heimbach
5
6 {\sf Automatic differentiation} (AD), also referred to as algorithmic
7 (or, more loosely, computational) differentiation, involves
8 automatically deriving code to calculate
9 partial derivatives from an existing fully non-linear prognostic code.
10 (see \cite{gri:00}).
11 A software tool is used that parses and transforms source files
12 according to a set of linguistic and mathematical rules.
13 AD tools are like source-to-source translators in that
14 they parse a program code as input and produce a new program code
15 as output.
16 However, unlike a pure source-to-source translation, the output program
17 represents a new algorithm, such as the evaluation of the
18 Jacobian, the Hessian, or higher derivative operators.
19 In principle, a variety of derived algorithms
20 can be generated automatically in this way.
21
22 The MITGCM has been adapted for use with the
23 Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF
24 (Transformation of Algorithms in Fortran), developed
25 by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
26 The first application of the adjoint of the MITGCM for sensitivity
27 studies has been published by \cite{maro-eta:99}.
28 \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
29 for ocean state estimation studies.
30 In the following we shall refer to TAMC and TAF synonymously,
31 except were explicitly stated otherwise.
32
33 TAMC exploits the chain rule for computing the first
34 derivative of a function with
35 respect to a set of input variables.
36 Treating a given forward code as a composition of operations --
37 each line representing a compositional element, the chain rule is
38 rigorously applied to the code, line by line. The resulting
39 tangent linear or adjoint code,
40 then, may be thought of as the composition in
41 forward or reverse order, respectively, of the
42 Jacobian matrices of the forward code's compositional elements.
43
44 %**********************************************************************
45 \section{Some basic algebra}
46 \label{sec_ad_algebra}
47 \begin{rawhtml}
48 <!-- CMIREDIR:sec_ad_algebra: -->
49 \end{rawhtml}
50 %**********************************************************************
51
52 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
53 mapping from the $m$-dimensional space
54 $U \subset I\!\!R^m$ of input variables
55 $\vec{u}=(u_1,\ldots,u_m)$
56 (model parameters, initial conditions, boundary conditions
57 such as forcing functions) to the $n$-dimensional space
58 $V \subset I\!\!R^n$ of
59 model output variable $\vec{v}=(v_1,\ldots,v_n)$
60 (model state, model diagnostics, objective function, ...)
61 under consideration,
62 %
63 \begin{equation}
64 \begin{split}
65 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
66 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
67 {\cal M}(\vec{u})
68 \label{fulloperator}
69 \end{split}
70 \end{equation}
71 %
72 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
73 some given basis vectors
74 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
75 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
76 \[
77 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
78 \qquad
79 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
80 \]
81
82 Two routes may be followed to determine the sensitivity of the
83 output variable $\vec{v}$ to its input $\vec{u}$.
84
85 \subsection{Forward or direct sensitivity}
86 %
87 Consider a perturbation to the input variables $\delta \vec{u}$
88 (typically a single component
89 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
90 Their effect on the output may be obtained via the linear
91 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
92 $ M $, evaluated in the point $u^{(0)}$ according to
93 %
94 \begin{equation}
95 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
96 \label{tangent_linear}
97 \end{equation}
98 with resulting output perturbation $\delta \vec{v}$.
99 In components
100 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
101 it reads
102 %
103 \begin{equation}
104 \delta v_{j} \, = \, \sum_{i}
105 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
106 \delta u_{i}
107 \label{jacobi_matrix}
108 \end{equation}
109 %
110 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
111 In contrast to the full nonlinear model $ {\cal M} $, the operator
112 $ M $ is just a matrix
113 which can readily be used to find the forward sensitivity of $\vec{v}$ to
114 perturbations in $u$,
115 but if there are very many input variables $(\gg O(10^{6})$ for
116 large-scale oceanographic application), it quickly becomes
117 prohibitive to proceed directly as in (\ref{tangent_linear}),
118 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
119
120 \subsection{Reverse or adjoint sensitivity}
121 %
122 Let us consider the special case of a
123 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
124 the total meridional heat transport,
125 the total uptake of $CO_{2}$ in the Southern
126 Ocean over a time interval,
127 or a measure of some model-to-data misfit)
128 %
129 \begin{eqnarray}
130 \begin{array}{cccccc}
131 {\cal J} \, : & U &
132 \longrightarrow & V &
133 \longrightarrow & I \!\! R \\
134 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
135 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
136 \end{array}
137 \label{compo}
138 \end{eqnarray}
139 %
140 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
141 \[
142 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
143 \]
144 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
145 w.r.t. their corresponding inner product
146 $\left\langle \,\, , \,\, \right\rangle $
147 %
148 \begin{equation}
149 \begin{split}
150 {\cal J} & = \,
151 {\cal J} |_{\vec{u}^{(0)}} \, + \,
152 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
153 \, + \, O(\delta \vec{u}^2) \\
154 ~ & = \,
155 {\cal J} |_{\vec{v}^{(0)}} \, + \,
156 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
157 \, + \, O(\delta \vec{v}^2)
158 \end{split}
159 \label{deljidentity}
160 \end{equation}
161 %
162 (note, that the gradient $ \nabla f $ is a co-vector, therefore
163 its transpose is required in the above inner product).
164 Then, using the representation of
165 $ \delta {\cal J} =
166 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
167 the definition
168 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
169 \[
170 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
171 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
172 \]
173 which for finite-dimensional vector spaces is just the
174 transpose of $ A $,
175 \[
176 A^{\ast} \, = \, A^T
177 \]
178 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
179 we note that
180 (omitting $|$'s):
181 %
182 \begin{equation}
183 \delta {\cal J}
184 \, = \,
185 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
186 \, = \,
187 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
188 \, = \,
189 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
190 \delta \vec{u} \, \right\rangle
191 \label{inner}
192 \end{equation}
193 %
194 With the identity (\ref{deljidentity}), we then find that
195 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
196 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
197 %
198 \begin{equation}
199 \begin{split}
200 \nabla _{u}{\cal J}^T |_{\vec{u}} &
201 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
202 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
203 ~ & = \, \delta \vec{u}^{\ast}
204 \end{split}
205 \label{adjoint}
206 \end{equation}
207 %
208 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
209 in which $M^T$ is the adjoint (here, the transpose) of the
210 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
211 the adjoint variable of the model state $ \vec{v} $, and
212 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
213
214 The {\sf reverse} nature of the adjoint calculation can be readily
215 seen as follows.
216 Consider a model integration which consists of $ \Lambda $
217 consecutive operations
218 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
219 ...... ( {\cal M}_{\lambda} (
220 ......
221 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
222 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
223 in the code of the model, or successive time steps of the
224 model integration,
225 starting at step 0 and moving up to step $\Lambda$, with intermediate
226 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
227 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
228 Let ${\cal J}$ be a cost function which explicitly depends on the
229 final state $\vec{v}$ only
230 (this restriction is for clarity reasons only).
231 %
232 ${\cal J}(u)$ may be decomposed according to:
233 %
234 \begin{equation}
235 {\cal J}({\cal M}(\vec{u})) \, = \,
236 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
237 ...... ( {\cal M}_{\lambda} (
238 ......
239 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
240 \label{compos}
241 \end{equation}
242 %
243 Then, according to the chain rule, the forward calculation reads,
244 in terms of the Jacobi matrices
245 (we've omitted the $ | $'s which, nevertheless are important
246 to the aspect of {\it tangent} linearity;
247 note also that by definition
248 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
249 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
250 %
251 \begin{equation}
252 \begin{split}
253 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
254 \nabla_v {\cal J} \cdot M_{\Lambda}
255 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
256 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
257 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
258 \end{split}
259 \label{forward}
260 \end{equation}
261 %
262 whereas in reverse mode we have
263 %
264 \begin{equation}
265 \boxed{
266 \begin{split}
267 M^T ( \nabla_v {\cal J}^T) & = \,
268 M_{0}^T \cdot M_{1}^T
269 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
270 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
271 ~ & = \, M_{0}^T \cdot M_{1}^T
272 \cdot ...... \cdot
273 \nabla_{v^{(\lambda)}} {\cal J}^T \\
274 ~ & = \, \nabla_u {\cal J}^T
275 \end{split}
276 }
277 \label{reverse}
278 \end{equation}
279 %
280 clearly expressing the reverse nature of the calculation.
281 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
282 If the intermediate steps $\lambda$ in
283 eqn. (\ref{compos}) -- (\ref{reverse})
284 represent the model state (forward or adjoint) at each
285 intermediate time step as noted above, then correspondingly,
286 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
287 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
288 It thus becomes evident that the adjoint calculation also
289 yields the adjoint of each model state component
290 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
291 %
292 \begin{equation}
293 \boxed{
294 \begin{split}
295 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
296 & = \,
297 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
298 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
299 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
300 \end{split}
301 }
302 \end{equation}
303 %
304 in close analogy to eq. (\ref{adjoint})
305 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
306 are the Lagrange multipliers of the model equations which determine
307 $ \vec{v}^{(\lambda)}$.
308
309 In components, eq. (\ref{adjoint}) reads as follows.
310 Let
311 \[
312 \begin{array}{rclcrcl}
313 \delta \vec{u} & = &
314 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
315 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
316 \left(
317 \frac{\partial {\cal J}}{\partial u_1},\ldots,
318 \frac{\partial {\cal J}}{\partial u_m}
319 \right)^T \\
320 \delta \vec{v} & = &
321 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
322 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
323 \left(
324 \frac{\partial {\cal J}}{\partial v_1},\ldots,
325 \frac{\partial {\cal J}}{\partial v_n}
326 \right)^T \\
327 \end{array}
328 \]
329 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
330 and their adjoint variables;
331 further
332 \[
333 M \, = \, \left(
334 \begin{array}{ccc}
335 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
336 \frac{\partial {\cal M}_1}{\partial u_m} \\
337 \vdots & ~ & \vdots \\
338 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
339 \frac{\partial {\cal M}_n}{\partial u_m} \\
340 \end{array}
341 \right)
342 \]
343 is the Jacobi matrix of $ {\cal M} $
344 (an $ n \times m $ matrix)
345 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
346 \[
347 \delta v_{j}
348 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
349 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
350 \delta u_{i}
351 \]
352 %
353 Then eq. (\ref{adjoint}) takes the form
354 \[
355 \delta u_{i}^{\ast}
356 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
357 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
358 \delta v_{j}^{\ast}
359 \]
360 %
361 or
362 %
363 \[
364 \left(
365 \begin{array}{c}
366 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
367 \vdots \\
368 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
369 \end{array}
370 \right)
371 \, = \,
372 \left(
373 \begin{array}{ccc}
374 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
375 & \ldots &
376 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
377 \vdots & ~ & \vdots \\
378 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
379 & \ldots &
380 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
381 \end{array}
382 \right)
383 \cdot
384 \left(
385 \begin{array}{c}
386 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
387 \vdots \\
388 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
389 \end{array}
390 \right)
391 \]
392 %
393 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
394 of any intermediate state $ v^{(\lambda)} $
395 may be obtained, using the intermediate Jacobian
396 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
397 %
398 \[
399 M_{\lambda} \, = \,
400 \left(
401 \begin{array}{ccc}
402 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
403 & \ldots &
404 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
405 \vdots & ~ & \vdots \\
406 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
407 & \ldots &
408 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
409 \end{array}
410 \right)
411 \]
412 %
413 and the shorthand notation for the adjoint variables
414 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
415 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
416 for intermediate components, yielding
417 \begin{equation}
418 \small
419 \begin{split}
420 \left(
421 \begin{array}{c}
422 \delta v^{(\lambda) \, \ast}_1 \\
423 \vdots \\
424 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
425 \end{array}
426 \right)
427 \, = &
428 \left(
429 \begin{array}{ccc}
430 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
431 & \ldots \,\, \ldots &
432 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
433 \vdots & ~ & \vdots \\
434 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
435 & \ldots \,\, \ldots &
436 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
437 \end{array}
438 \right)
439 \cdot
440 %
441 \\ ~ & ~
442 \\ ~ &
443 %
444 \left(
445 \begin{array}{ccc}
446 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
447 & \ldots &
448 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
449 \vdots & ~ & \vdots \\
450 \vdots & ~ & \vdots \\
451 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
452 & \ldots &
453 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
454 \end{array}
455 \right)
456 \cdot \, \ldots \, \cdot
457 \left(
458 \begin{array}{c}
459 \delta v^{\ast}_1 \\
460 \vdots \\
461 \delta v^{\ast}_{n} \\
462 \end{array}
463 \right)
464 \end{split}
465 \end{equation}
466
467 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
468 showing the advantage of the reverse over the forward mode
469 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
470 cost function $ {\cal J} $ with respect to {\it all} input
471 variables $u$
472 (or the sensitivity of the cost function with respect to
473 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
474 In order to be able to solve for each component of the gradient
475 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
476 a forward calculation has to be performed for each component separately,
477 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
478 for the $i$-th forward calculation.
479 Then, (\ref{forward}) represents the
480 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
481 The full gradient is retrieved from the $ m $ forward calculations.
482 In contrast, eq. (\ref{reverse}) yields the full
483 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
484 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
485
486 Note, that if $ {\cal J} $ is a vector-valued function
487 of dimension $ l > 1 $,
488 eq. (\ref{reverse}) has to be modified according to
489 \[
490 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
491 \, = \,
492 \nabla_u {\cal J}^T \cdot \delta \vec{J}
493 \]
494 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
495 dimension $ l $.
496 In this case $ l $ reverse simulations have to be performed
497 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
498 Then, the reverse mode is more efficient as long as
499 $ l < n $, otherwise the forward mode is preferable.
500 Strictly, the reverse mode is called adjoint mode only for
501 $ l = 1 $.
502
503 A detailed analysis of the underlying numerical operations
504 shows that the computation of $\nabla _{u}{\cal J}$ in this way
505 requires about 2 to 5 times the computation of the cost function.
506 Alternatively, the gradient vector could be approximated
507 by finite differences, requiring $m$ computations
508 of the perturbed cost function.
509
510 To conclude we give two examples of commonly used types
511 of cost functions:
512
513 \paragraph{Example 1:
514 $ {\cal J} = v_{j} (T) $} ~ \\
515 The cost function consists of the $j$-th component of the model state
516 $ \vec{v} $ at time $T$.
517 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
518 unit vector. The $ \nabla_u {\cal J}^T $
519 is the projection of the adjoint
520 operator onto the $j$-th component ${\bf f_{j}}$,
521 \[
522 \nabla_u {\cal J}^T
523 \, = \, M^T \cdot \nabla_v {\cal J}^T
524 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
525 \]
526
527 \paragraph{Example 2:
528 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
529 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
530 The cost function represents the quadratic model vs. data misfit.
531 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
532 operator which maps the model state space onto the data space.
533 Then, $ \nabla_v {\cal J} $ takes the form
534 %
535 \begin{equation*}
536 \begin{split}
537 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
538 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
539 ~ & = \, 2 \sum_{j} \left\{ \sum_k
540 \frac{\partial {\cal H}_k}{\partial v_{j}}
541 \left( {\cal H}_k (\vec{v}) - d_k \right)
542 \right\} \, {\vec{f}_{j}} \\
543 \end{split}
544 \end{equation*}
545 %
546 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
547 Jacobi matrix of the data projection operator.
548 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
549 adjoint operator,
550 driven by the model vs. data misfit:
551 \[
552 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
553 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
554 \]
555
556 \subsection{Storing vs. recomputation in reverse mode}
557 \label{checkpointing}
558
559 We note an important aspect of the forward vs. reverse
560 mode calculation.
561 Because of the local character of the derivative
562 (a derivative is defined w.r.t. a point along the trajectory),
563 the intermediate results of the model trajectory
564 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
565 may be required to evaluate the intermediate Jacobian
566 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
567 This is the case e.g. for nonlinear expressions
568 (momentum advection, nonlinear equation of state), state-dependent
569 conditional statements (parameterization schemes).
570 In the forward mode, the intermediate results are required
571 in the same order as computed by the full forward model ${\cal M}$,
572 but in the reverse mode they are required in the reverse order.
573 Thus, in the reverse mode the trajectory of the forward model
574 integration ${\cal M}$ has to be stored to be available in the reverse
575 calculation. Alternatively, the complete model state up to the
576 point of evaluation has to be recomputed whenever its value is required.
577
578 A method to balance the amount of recomputations vs.
579 storage requirements is called {\sf checkpointing}
580 (e.g. \cite{gri:92}, \cite{res-eta:98}).
581 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
582 [as an example, we give explicit numbers for a 3-day
583 integration with a 1-hourly timestep in square brackets].
584 \begin{itemize}
585 %
586 \item [$lev3$]
587 In a first step, the model trajectory is subdivided into
588 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
589 with the label $lev3$ for this outermost loop.
590 The model is then integrated along the full trajectory,
591 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
592 [i.e. 3 times, at
593 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
594 In addition, the cost function is computed, if needed.
595 %
596 \item [$lev2$]
597 In a second step each subsection itself is divided into
598 $ {n}^{lev2} $ subsections
599 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
600 The model picks up at the last outermost dumped state
601 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
602 the last subsection, with the label $lev2$ for this
603 intermediate loop.
604 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
605 timestep
606 [i.e. 4 times, at
607 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
608 %
609 \item [$lev1$]
610 Finally, the model picks up at the last intermediate dump state
611 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
612 the last subsection, with the label $lev1$ for this
613 intermediate loop.
614 Within this sub-subsection only, parts of the model state is stored
615 to memory at every timestep
616 [i.e. every hour $ i=0,...,5$ corresponding to
617 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
618 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
619 and the model state of all preceding timesteps along the last
620 innermost subsection are available, enabling integration backwards
621 in time along the last subsection.
622 The adjoint can thus be computed along this last
623 subsection $k_{n}^{lev2}$.
624 %
625 \end{itemize}
626 %
627 This procedure is repeated consecutively for each previous
628 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
629 carrying the adjoint computation to the initial time
630 of the subsection $k_{n}^{lev3}$.
631 Then, the procedure is repeated for the previous subsection
632 $k_{n-1}^{lev3}$
633 carrying the adjoint computation to the initial time
634 $k_{1}^{lev3}$.
635
636 For the full model trajectory of
637 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
638 the required storing of the model state was significantly reduced to
639 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
640 [i.e. for the 3-day integration with a total oof 72 timesteps
641 the model state was stored 7 times to disk and roughly 6 times
642 to memory].
643 This saving in memory comes at a cost of a required
644 3 full forward integrations of the model (one for each
645 checkpointing level).
646 The optimal balance of storage vs. recomputation certainly depends
647 on the computing resources available and may be adjusted by
648 adjusting the partitioning among the
649 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
650
651 \begin{figure}[t!]
652 \begin{center}
653 %\psdraft
654 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
655 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
656 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
657 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
658 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
659 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
660 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
661 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
662 %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
663 \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
664 %\psfull
665 \end{center}
666 \caption{
667 Schematic view of intermediate dump and restart for
668 3-level checkpointing.}
669 \label{fig:3levelcheck}
670 \end{figure}
671
672 % \subsection{Optimal perturbations}
673 % \label{sec_optpert}
674
675
676 % \subsection{Error covariance estimate and Hessian matrix}
677 % \label{sec_hessian}
678
679 \newpage
680
681 %**********************************************************************
682 \section{TLM and ADM generation in general}
683 \label{sec_ad_setup_gen}
684 \begin{rawhtml}
685 <!-- CMIREDIR:sec_ad_setup_gen: -->
686 \end{rawhtml}
687 %**********************************************************************
688
689 In this section we describe in a general fashion
690 the parts of the code that are relevant for automatic
691 differentiation using the software tool TAF.
692
693 \input{part5/doc_ad_the_model}
694
695 The basic flow is depicted in \ref{fig:adthemodel}.
696 If CPP option {\tt ALLOW\_AUTODIFF\_TAMC} is defined, the driver routine
697 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
698 invokes the adjoint of this routine, {\it adthe\_main\_loop},
699 which is the toplevel routine in terms of automatic differentiation.
700 The routine {\it adthe\_main\_loop} has been generated by TAF.
701 It contains both the forward integration of the full model, the
702 cost function calculation,
703 any additional storing that is required for efficient checkpointing,
704 and the reverse integration of the adjoint model.
705
706 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
707
708 In Fig. \ref{fig:adthemodel}
709 the structure of {\it adthe\_main\_loop} has been strongly
710 simplified to focus on the essentials; in particular, no checkpointing
711 procedures are shown here.
712 Prior to the call of {\it adthe\_main\_loop}, the routine
713 {\it ctrl\_unpack} is invoked to unpack the control vector
714 or initialise the control variables.
715 Following the call of {\it adthe\_main\_loop},
716 the routine {\it ctrl\_pack}
717 is invoked to pack the control vector
718 (cf. Section \ref{section_ctrl}).
719 If gradient checks are to be performed, the option
720 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
721 the driver routine {\it grdchk\_main} is called after
722 the gradient has been computed via the adjoint
723 (cf. Section \ref{section_grdchk}).
724
725 %------------------------------------------------------------------
726
727 \subsection{General setup
728 \label{section_ad_setup}}
729
730 In order to configure AD-related setups the following packages need
731 to be enabled:
732 {\it
733 \begin{table}[h!]
734 \begin{tabular}{l}
735 autodiff \\
736 ctrl \\
737 cost \\
738 grdchk \\
739 \end{tabular}
740 \end{table}
741 }
742 The packages are enabled by adding them to your experiment-specific
743 configuration file
744 {\it packages.conf} (see Section ???).
745
746 The following AD-specific CPP option files need to be customized:
747 %
748 \begin{itemize}
749 %
750 \item {\it ECCO\_CPPOPTIONS.h} \\
751 This header file collects CPP options for the packages
752 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
753 the external forcing package {\it exf}.
754 \footnote{NOTE: These options are not set in their package-specific
755 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
756 in the single header file {\it ECCO\_CPPOPTIONS.h}.
757 The package-specific header files serve as simple
758 placeholders at this point.}
759 %
760 \item {\it tamc.h} \\
761 This header configures the splitting of the time stepping loop
762 w.r.t. the 3-level checkpointing (see section ???).
763
764 %
765 \end{itemize}
766
767 %------------------------------------------------------------------
768
769 \subsection{Building the AD code
770 \label{section_ad_build}}
771
772 The build process of an AD code is very similar to building
773 the forward model. However, depending on which AD code one wishes
774 to generate, and on which AD tool is available (TAF or TAMC),
775 the following {\tt make} targets are available:
776
777 \begin{table}[h!]
778 {\footnotesize
779 \begin{tabular}{ccll}
780 ~ & {\it AD-target} & {\it output} & {\it description} \\
781 \hline
782 \hline
783 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
784 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
785 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
786 ~ & ~ & ~ & useful for compiling on remote platforms \\
787 \hline
788 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
789 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
790 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
791 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
792 \hline
793 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
794 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
795 ~ & ~ & ~ & and compiles all code \\
796 ~ & ~ & ~ & (use of TAF is set as default) \\
797 \hline
798 \hline
799 \end{tabular}
800 }
801 \end{table}
802 %
803 Here, the following placeholders are used
804 %
805 \begin{itemize}
806 %
807 \item [$<$TOOL$>$]
808 %
809 \begin{itemize}
810 %
811 \item {\tt TAF}
812 \item {\tt TAMC}
813 %
814 \end{itemize}
815 %
816 \item [$<$MODE$>$]
817 %
818 \begin{itemize}
819 %
820 \item {\tt ad} generates the adjoint model (ADM)
821 \item {\tt ftl} generates the tangent linear model (TLM)
822 \item {\tt svd} generates both ADM and TLM for \\
823 singular value decomposition (SVD) type calculations
824 %
825 \end{itemize}
826 %
827 \end{itemize}
828
829 For example, to generate the adjoint model using TAF after routines ({\tt .F})
830 or headers ({\tt .h}) have been modified, but without compilation,
831 type {\tt make adtaf};
832 or, to generate the tangent linear model using TAMC without
833 re-generating the input code, type {\tt make ftltamconly}.
834
835
836 A typical full build process to generate the ADM via TAF would
837 look like follows:
838 \begin{verbatim}
839 % mkdir build
840 % cd build
841 % ../../../tools/genmake2 -mods=../code_ad
842 % make depend
843 % make adall
844 \end{verbatim}
845
846 %------------------------------------------------------------------
847
848 \subsection{The AD build process in detail
849 \label{section_ad_build_detail}}
850
851 The {\tt make <MODE>all} target consists of the following procedures:
852
853 \begin{enumerate}
854 %
855 \item
856 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
857 on which code ought to be generated. Depending on the {\tt make} target,
858 the contents is
859 \begin{itemize}
860 \item
861 {\tt \#define ALLOW\_ADJOINT\_RUN}
862 \item
863 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
864 \item
865 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
866 \end{itemize}
867 %
868 \item
869 A single file {\tt <MODE>\_input\_code.f} is concatenated
870 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
871 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
872 %
873 \item
874 The AD tool is invoked with the {\bf <MODE>\_<TOOL>\_FLAGS}.
875 The default AD tool flags in {\tt genmake2} can be overrwritten by
876 an {\tt adjoint\_options} file (similar to the platform-specific
877 {\tt build\_options}, see Section ???.
878 The AD tool writes the resulting AD code into the file
879 {\tt <MODE>\_input\_code\_ad.f}
880 %
881 \item
882 A short sed script {\tt adjoint\_sed} is applied to
883 {\tt <MODE>\_input\_code\_ad.f}
884 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
885 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
886 %
887 \item
888 All routines are compiled and an executable is generated
889 (see Table ???).
890 %
891 \end{enumerate}
892
893 \subsubsection{The list AD\_FILES and {\tt .list} files}
894
895 Not all routines are presented to the AD tool.
896 Routines typically hidden are diagnostics routines which
897 do not influence the cost function, but may create
898 artificial flow dependencies such as I/O of active variables.
899
900 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
901 which contains all routines that are shown to the AD tool.
902 This list is put together from all files with suffix {\tt .list}
903 that {\tt genmake2} finds in its search directories.
904 The list file for the core MITgcm routines is in {\tt model/src/}
905 is called {\tt model\_ad\_diff.list}.
906 Note that no wrapper routine is shown to TAF. These are either
907 not visible at all to the AD code, or hand-written AD code
908 is available (see next section).
909
910 Each package directory contains its package-specific
911 list file {\tt <PKG>\_ad\_diff.list}. For example,
912 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
913 Thus, enabling a package will automatically extend the
914 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
915 package-specific routines.
916 Note that you will need to regenerate the {\tt Makefile} if
917 you enable a package (e.g. by adding it to {\tt packages.conf})
918 and a {\tt Makefile} already exists.
919
920 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
921
922 TAMC and TAF can evaluate user-specified directives
923 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
924 The main categories of directives are STORE directives and
925 FLOW directives. Here, we are concerned with flow directives,
926 store directives are treated elsewhere.
927
928 Flow directives enable the AD tool to evaluate how it should treat
929 routines that are 'hidden' by the user, i.e. routines which are
930 not contained in the {\bf AD\_FILES} list (see previous section),
931 but which are called in part of the code that the AD tool does see.
932 The flow directive tell the AD tool
933 %
934 \begin{itemize}
935 %
936 \item which subroutine arguments are input/output
937 \item which subroutine arguments are active
938 \item which subroutine arguments are required to compute the cost
939 \item which subroutine arguments are dependent
940 %
941 \end{itemize}
942 %
943 The syntax for the flow directives can be found in the
944 AD tool manuals.
945
946 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
947 which contains all files with suffix{\tt .flow} that it finds
948 in its search directories.
949 The flow directives for the core MITgcm routines of
950 {\tt eesupp/src/} and {\tt model/src/}
951 reside in {\tt pkg/autodiff/}.
952 This directory also contains hand-written adjoint code
953 for the MITgcm WRAPPER (see Section ???).
954
955 Flow directives for package-specific routines are contained in
956 the corresponding package directories in the file
957 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
958 {\tt ptracers\_ad.flow}.
959
960 \subsubsection{Store directives for 3-level checkpointing}
961
962 The storing that is required at each period of the
963 3-level checkpointing is controled by three
964 top-level headers.
965
966 \begin{verbatim}
967 do ilev_3 = 1, nchklev_3
968 # include ``checkpoint_lev3.h''
969 do ilev_2 = 1, nchklev_2
970 # include ``checkpoint_lev2.h''
971 do ilev_1 = 1, nchklev_1
972 # include ``checkpoint_lev1.h''
973
974 ...
975
976 end do
977 end do
978 end do
979 \end{verbatim}
980
981 All files {\tt checkpoint\_lev?.h} are contained in directory
982 {\tt pkg/autodiff/}.
983
984
985 \subsubsection{Changing the default AD tool flags: ad\_options files}
986
987
988 \subsubsection{Hand-written adjoint code}
989
990 %------------------------------------------------------------------
991
992 \subsection{The cost function (dependent variable)
993 \label{section_cost}}
994
995 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
996 It is a function of the input variables $ \vec{u} $ via the composition
997 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
998 The input are referred to as the
999 {\sf independent variables} or {\sf control variables}.
1000 All aspects relevant to the treatment of the cost function $ {\cal J} $
1001 (parameter setting, initialization, accumulation,
1002 final evaluation), are controlled by the package {\it pkg/cost}.
1003 The aspects relevant to the treatment of the independent variables
1004 are controlled by the package {\it pkg/ctrl} and will be treated
1005 in the next section.
1006
1007 \input{part5/doc_cost_flow}
1008
1009 \subsubsection{Enabling the package}
1010
1011 \fbox{
1012 \begin{minipage}{12cm}
1013 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1014 \end{minipage}
1015 }
1016 \begin{itemize}
1017 %
1018 \item
1019 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1020 (see Section ???)
1021 %
1022 \item
1023
1024
1025 \end{itemize}
1026 %
1027
1028 N.B.: In general the following packages ought to be enabled
1029 simultaneously: {\it autodiff, cost, ctrl}.
1030 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1031 Each specific cost function contribution has its own option.
1032 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1033 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1034 Since the cost function is usually used in conjunction with
1035 automatic differentiation, the CPP option
1036 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1037 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1038 should be defined.
1039
1040 \subsubsection{Initialization}
1041 %
1042 The initialization of the {\it cost} package is readily enabled
1043 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1044 %
1045 \begin{itemize}
1046 %
1047 \item
1048 \fbox{
1049 \begin{minipage}{12cm}
1050 Parameters: {\it cost\_readparms}
1051 \end{minipage}
1052 }
1053 \\
1054 This S/R
1055 reads runtime flags and parameters from file {\it data.cost}.
1056 For the present example the only relevant parameter read
1057 is {\bf mult\_tracer}. This multiplier enables different
1058 cost function contributions to be switched on
1059 ( = 1.) or off ( = 0.) at runtime.
1060 For more complex cost functions which involve model vs. data
1061 misfits, the corresponding data filenames and data
1062 specifications (start date and time, period, ...) are read
1063 in this S/R.
1064 %
1065 \item
1066 \fbox{
1067 \begin{minipage}{12cm}
1068 Variables: {\it cost\_init}
1069 \end{minipage}
1070 }
1071 \\
1072 This S/R
1073 initializes the different cost function contributions.
1074 The contribution for the present example is {\bf objf\_tracer}
1075 which is defined on each tile (bi,bj).
1076 %
1077 \end{itemize}
1078 %
1079 \subsubsection{Accumulation}
1080 %
1081 \begin{itemize}
1082 %
1083 \item
1084 \fbox{
1085 \begin{minipage}{12cm}
1086 {\it cost\_tile}, {\it cost\_tracer}
1087 \end{minipage}
1088 }
1089 \end{itemize}
1090 %
1091 The 'driver' routine
1092 {\it cost\_tile} is called at the end of each time step.
1093 Within this 'driver' routine, S/R are called for each of
1094 the chosen cost function contributions.
1095 In the present example ({\bf ALLOW\_COST\_TRACER}),
1096 S/R {\it cost\_tracer} is called.
1097 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1098 %
1099 \subsubsection{Finalize all contributions}
1100 %
1101 \begin{itemize}
1102 %
1103 \item
1104 \fbox{
1105 \begin{minipage}{12cm}
1106 {\it cost\_final}
1107 \end{minipage}
1108 }
1109 \end{itemize}
1110 %
1111 At the end of the forward integration S/R {\it cost\_final}
1112 is called. It accumulates the total cost function {\bf fc}
1113 from each contribution and sums over all tiles:
1114 \begin{equation}
1115 {\cal J} \, = \,
1116 {\rm fc} \, = \,
1117 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1118 {\rm objf\_tracer}(bi,bj) \, + \, ...
1119 \end{equation}
1120 %
1121 The total cost function {\bf fc} will be the
1122 'dependent' variable in the argument list for TAMC, i.e.
1123 \begin{verbatim}
1124 tamc -output 'fc' ...
1125 \end{verbatim}
1126
1127 %%%% \end{document}
1128
1129 \input{part5/doc_ad_the_main}
1130
1131 \subsection{The control variables (independent variables)
1132 \label{section_ctrl}}
1133
1134 The control variables are a subset of the model input
1135 (initial conditions, boundary conditions, model parameters).
1136 Here we identify them with the variable $ \vec{u} $.
1137 All intermediate variables whose derivative w.r.t. control
1138 variables do not vanish are called {\sf active variables}.
1139 All subroutines whose derivative w.r.t. the control variables
1140 don't vanish are called {\sf active routines}.
1141 Read and write operations from and to file can be viewed
1142 as variable assignments. Therefore, files to which
1143 active variables are written and from which active variables
1144 are read are called {\sf active files}.
1145 All aspects relevant to the treatment of the control variables
1146 (parameter setting, initialization, perturbation)
1147 are controlled by the package {\it pkg/ctrl}.
1148
1149 \input{part5/doc_ctrl_flow}
1150
1151 \subsubsection{genmake and CPP options}
1152 %
1153 \begin{itemize}
1154 %
1155 \item
1156 \fbox{
1157 \begin{minipage}{12cm}
1158 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1159 \end{minipage}
1160 }
1161 \end{itemize}
1162 %
1163 To enable the directory to be included to the compile list,
1164 {\bf ctrl} has to be added to the {\bf enable} list in
1165 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1166 package, cf. previous section).
1167 Each control variable is enabled via its own CPP option
1168 in {\it ECCO\_CPPOPTIONS.h}.
1169
1170 \subsubsection{Initialization}
1171 %
1172 \begin{itemize}
1173 %
1174 \item
1175 \fbox{
1176 \begin{minipage}{12cm}
1177 Parameters: {\it ctrl\_readparms}
1178 \end{minipage}
1179 }
1180 \\
1181 %
1182 This S/R
1183 reads runtime flags and parameters from file {\it data.ctrl}.
1184 For the present example the file contains the file names
1185 of each control variable that is used.
1186 In addition, the number of wet points for each control
1187 variable and the net dimension of the space of control
1188 variables (counting wet points only) {\bf nvarlength}
1189 is determined.
1190 Masks for wet points for each tile {\bf (bi,\,bj)}
1191 and vertical layer {\bf k} are generated for the three
1192 relevant categories on the C-grid:
1193 {\bf nWetCtile} for tracer fields,
1194 {\bf nWetWtile} for zonal velocity fields,
1195 {\bf nWetStile} for meridional velocity fields.
1196 %
1197 \item
1198 \fbox{
1199 \begin{minipage}{12cm}
1200 Control variables, control vector,
1201 and their gradients: {\it ctrl\_unpack}
1202 \end{minipage}
1203 }
1204 \\
1205 %
1206 Two important issues related to the handling of the control
1207 variables in the MITGCM need to be addressed.
1208 First, in order to save memory, the control variable arrays
1209 are not kept in memory, but rather read from file and added
1210 to the initial fields during the model initialization phase.
1211 Similarly, the corresponding adjoint fields which represent
1212 the gradient of the cost function w.r.t. the control variables
1213 are written to file at the end of the adjoint integration.
1214 Second, in addition to the files holding the 2-dim. and 3-dim.
1215 control variables and the corresponding cost gradients,
1216 a 1-dim. {\sf control vector}
1217 and {\sf gradient vector} are written to file. They contain
1218 only the wet points of the control variables and the corresponding
1219 gradient.
1220 This leads to a significant data compression.
1221 Furthermore, an option is available
1222 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1223 non-dimensionalise the control and gradient vector,
1224 which otherwise would contain different pieces of different
1225 magnitudes and units.
1226 Finally, the control and gradient vector can be passed to a
1227 minimization routine if an update of the control variables
1228 is sought as part of a minimization exercise.
1229
1230 The files holding fields and vectors of the control variables
1231 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1232 %
1233 \end{itemize}
1234
1235 \subsubsection{Perturbation of the independent variables}
1236 %
1237 The dependency flow for differentiation w.r.t. the controls
1238 starts with adding a perturbation onto the input variable,
1239 thus defining the independent or control variables for TAMC.
1240 Three types of controls may be considered:
1241 %
1242 \begin{itemize}
1243 %
1244 \item
1245 \fbox{
1246 \begin{minipage}{12cm}
1247 {\it ctrl\_map\_ini} (initial value sensitivity):
1248 \end{minipage}
1249 }
1250 \\
1251 %
1252 Consider as an example the initial tracer distribution
1253 {\bf tr1} as control variable.
1254 After {\bf tr1} has been initialised in
1255 {\it ini\_tr1} (dynamical variables such as
1256 temperature and salinity are initialised in {\it ini\_fields}),
1257 a perturbation anomaly is added to the field in S/R
1258 {\it ctrl\_map\_ini}
1259 %
1260 \begin{equation}
1261 \begin{split}
1262 u & = \, u_{[0]} \, + \, \Delta u \\
1263 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1264 \label{perturb}
1265 \end{split}
1266 \end{equation}
1267 %
1268 {\bf xx\_tr1} is a 3-dim. global array
1269 holding the perturbation. In the case of a simple
1270 sensitivity study this array is identical to zero.
1271 However, it's specification is essential in the context
1272 of automatic differentiation since TAMC
1273 treats the corresponding line in the code symbolically
1274 when determining the differentiation chain and its origin.
1275 Thus, the variable names are part of the argument list
1276 when calling TAMC:
1277 %
1278 \begin{verbatim}
1279 tamc -input 'xx_tr1 ...' ...
1280 \end{verbatim}
1281 %
1282 Now, as mentioned above, the MITGCM avoids maintaining
1283 an array for each control variable by reading the
1284 perturbation to a temporary array from file.
1285 To ensure the symbolic link to be recognized by TAMC, a scalar
1286 dummy variable {\bf xx\_tr1\_dummy} is introduced
1287 and an 'active read' routine of the adjoint support
1288 package {\it pkg/autodiff} is invoked.
1289 The read-procedure is tagged with the variable
1290 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1291 initialization of the perturbation.
1292 The modified call of TAMC thus reads
1293 %
1294 \begin{verbatim}
1295 tamc -input 'xx_tr1_dummy ...' ...
1296 \end{verbatim}
1297 %
1298 and the modified operation to (\ref{perturb})
1299 in the code takes on the form
1300 %
1301 \begin{verbatim}
1302 call active_read_xyz(
1303 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1304
1305 tr1(...) = tr1(...) + tmpfld3d(...)
1306 \end{verbatim}
1307 %
1308 Note, that reading an active variable corresponds
1309 to a variable assignment. Its derivative corresponds
1310 to a write statement of the adjoint variable, followed by
1311 a reset.
1312 The 'active file' routines have been designed
1313 to support active read and corresponding adjoint active write
1314 operations (and vice versa).
1315 %
1316 \item
1317 \fbox{
1318 \begin{minipage}{12cm}
1319 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1320 \end{minipage}
1321 }
1322 \\
1323 %
1324 The handling of boundary values as control variables
1325 proceeds exactly analogous to the initial values
1326 with the symbolic perturbation taking place in S/R
1327 {\it ctrl\_map\_forcing}.
1328 Note however an important difference:
1329 Since the boundary values are time dependent with a new
1330 forcing field applied at each time steps,
1331 the general problem may be thought of as
1332 a new control variable at each time step
1333 (or, if the perturbation is averaged over a certain period,
1334 at each $ N $ timesteps), i.e.
1335 \[
1336 u_{\rm forcing} \, = \,
1337 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1338 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1339 \]
1340 %
1341 In the current example an equilibrium state is considered,
1342 and only an initial perturbation to
1343 surface forcing is applied with respect to the
1344 equilibrium state.
1345 A time dependent treatment of the surface forcing is
1346 implemented in the ECCO environment, involving the
1347 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1348 %
1349 \item
1350 \fbox{
1351 \begin{minipage}{12cm}
1352 {\it ctrl\_map\_params} (parameter sensitivity):
1353 \end{minipage}
1354 }
1355 \\
1356 %
1357 This routine is not yet implemented, but would proceed
1358 proceed along the same lines as the initial value sensitivity.
1359 The mixing parameters {\bf diffkr} and {\bf kapgm}
1360 are currently added as controls in {\it ctrl\_map\_ini.F}.
1361 %
1362 \end{itemize}
1363 %
1364
1365 \subsubsection{Output of adjoint variables and gradient}
1366 %
1367 Several ways exist to generate output of adjoint fields.
1368 %
1369 \begin{itemize}
1370 %
1371 \item
1372 \fbox{
1373 \begin{minipage}{12cm}
1374 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1375 \end{minipage}
1376 }
1377 \\
1378 \begin{itemize}
1379 %
1380 \item {\bf xx\_...}: the control variable fields \\
1381 Before the forward integration, the control
1382 variables are read from file {\bf xx\_ ...} and added to
1383 the model field.
1384 %
1385 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1386 $ \nabla _{u}{\cal J} $ for each control variable \\
1387 After the adjoint integration the corresponding adjoint
1388 variables are written to {\bf adxx\_ ...}.
1389 %
1390 \end{itemize}
1391 %
1392 \item
1393 \fbox{
1394 \begin{minipage}{12cm}
1395 {\it ctrl\_unpack, ctrl\_pack}:
1396 \end{minipage}
1397 }
1398 \\
1399 %
1400 \begin{itemize}
1401 %
1402 \item {\bf vector\_ctrl}: the control vector \\
1403 At the very beginning of the model initialization,
1404 the updated compressed control vector is read (or initialised)
1405 and distributed to 2-dim. and 3-dim. control variable fields.
1406 %
1407 \item {\bf vector\_grad}: the gradient vector \\
1408 At the very end of the adjoint integration,
1409 the 2-dim. and 3-dim. adjoint variables are read,
1410 compressed to a single vector and written to file.
1411 %
1412 \end{itemize}
1413 %
1414 \item
1415 \fbox{
1416 \begin{minipage}{12cm}
1417 {\it addummy\_in\_stepping}:
1418 \end{minipage}
1419 }
1420 \\
1421 In addition to writing the gradient at the end of the
1422 forward/adjoint integration, many more adjoint variables
1423 of the model state
1424 at intermediate times can be written using S/R
1425 {\it addummy\_in\_stepping}.
1426 This routine is part of the adjoint support package
1427 {\it pkg/autodiff} (cf.f. below).
1428 The procedure is enabled using via the CPP-option
1429 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1430 To be part of the adjoint code, the corresponding S/R
1431 {\it dummy\_in\_stepping} has to be called in the forward
1432 model (S/R {\it the\_main\_loop}) at the appropriate place.
1433 The adjoint common blocks are extracted from the adjoint code
1434 via the header file {\it adcommon.h}.
1435
1436 {\it dummy\_in\_stepping} is essentially empty,
1437 the corresponding adjoint routine is hand-written rather
1438 than generated automatically.
1439 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1440 ensure that TAMC does not automatically
1441 generate {\it addummy\_in\_stepping} by trying to differentiate
1442 {\it dummy\_in\_stepping}, but instead refers to
1443 the hand-written routine.
1444
1445 {\it dummy\_in\_stepping} is called in the forward code
1446 at the beginning of each
1447 timestep, before the call to {\it dynamics}, thus ensuring
1448 that {\it addummy\_in\_stepping} is called at the end of
1449 each timestep in the adjoint calculation, after the call to
1450 {\it addynamics}.
1451
1452 {\it addummy\_in\_stepping} includes the header files
1453 {\it adcommon.h}.
1454 This header file is also hand-written. It contains
1455 the common blocks
1456 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1457 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1458 {\bf /adtr1\_r/}, {\bf /adffields/},
1459 which have been extracted from the adjoint code to enable
1460 access to the adjoint variables.
1461
1462 {\bf WARNING:} If the structure of the common blocks
1463 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1464 similar changes will occur in the adjoint common blocks.
1465 Therefore, consistency between the TAMC-generated common blocks
1466 and those in {\it adcommon.h} have to be checked.
1467 %
1468 \end{itemize}
1469
1470
1471 \subsubsection{Control variable handling for
1472 optimization applications}
1473
1474 In optimization mode the cost function $ {\cal J}(u) $ is sought
1475 to be minimized with respect to a set of control variables
1476 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1477 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1478 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1479 at iteration step $ k $ serve
1480 as input to a minimization routine (e.g. quasi-Newton method,
1481 conjugate gradient, ... \cite{gil-lem:89})
1482 to compute an update in the
1483 control variable for iteration step $k+1$
1484 \[
1485 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1486 \quad \mbox{satisfying} \quad
1487 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1488 \]
1489 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1490 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1491 $ k+1 $.
1492 Tab. \ref{???} sketches the flow between forward/adjoint model
1493 and the minimization routine.
1494
1495 \begin{eqnarray*}
1496 \scriptsize
1497 \begin{array}{ccccc}
1498 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1499 {\Big\downarrow}
1500 & ~ & ~ & ~ & ~ \\
1501 ~ & ~ & ~ & ~ & ~ \\
1502 \hline
1503 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1504 \multicolumn{1}{|c}{
1505 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1506 \stackrel{\bf forward}{\bf \longrightarrow} &
1507 v_{[k]} = M \left( u_{[k]} \right) &
1508 \stackrel{\bf forward}{\bf \longrightarrow} &
1509 \multicolumn{1}{c|}{
1510 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1511 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1512 \hline
1513 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1514 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1515 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1516 \hline
1517 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1518 \multicolumn{1}{|c}{
1519 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1520 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1521 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1522 ad \, v_{[k]} (\delta {\cal J}) =
1523 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1524 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1525 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1526 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1527 \hline
1528 ~ & ~ & ~ & ~ & ~ \\
1529 \hspace*{15ex}{\Bigg\downarrow}
1530 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1531 & ~ & ~ & ~ & ~ \\
1532 ~ & ~ & ~ & ~ & ~ \\
1533 \hline
1534 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1535 \multicolumn{1}{|c}{
1536 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1537 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1538 {\mathbf \longrightarrow} &
1539 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1540 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1541 \hline
1542 ~ & ~ & ~ & ~ & ~ \\
1543 ~ & ~ & ~ & ~ & \Big\downarrow \\
1544 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1545 \end{array}
1546 \end{eqnarray*}
1547
1548 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1549 the link between the model and the minimization routine.
1550 As described in Section \ref{???}
1551 the {\it unpack} and {\it pack} routines read and write
1552 control and gradient {\it vectors} which are compressed
1553 to contain only wet points, in addition to the full
1554 2-dim. and 3-dim. fields.
1555 The corresponding I/O flow looks as follows:
1556
1557 \vspace*{0.5cm}
1558
1559 {\scriptsize
1560 \begin{tabular}{ccccc}
1561 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1562 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1563 \cline{1-1}
1564 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1565 \cline{1-1}
1566 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1567 \cline{3-3}
1568 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1569 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1570 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1571 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1572 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1573 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1574 & ~ & ~ \\
1575 \cline{3-3}
1576 ~ & ~ & $\downarrow$ & ~ & ~ \\
1577 \cline{3-3}
1578 ~ & ~ &
1579 \multicolumn{1}{|c|}{~} & ~ &
1580 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1581 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1582 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1583 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1584 ~ & ~ & \multicolumn{1}{|c|}{~}
1585 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1586 \cline{3-3}
1587 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1588 \cline{5-5}
1589 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1590 \cline{5-5}
1591 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1592 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1593 \end{tabular}
1594 }
1595
1596 \vspace*{0.5cm}
1597
1598
1599 {\it ctrl\_unpack} reads the updated control vector
1600 {\bf vector\_ctrl\_$<$k$>$}.
1601 It distributes the different control variables to
1602 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1603 At the start of the forward integration the control variables
1604 are read from {\it xx\_...$<$k$>$} and added to the
1605 field.
1606 Correspondingly, at the end of the adjoint integration
1607 the adjoint fields are written
1608 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1609 Finally, {\it ctrl\_pack} collects all adjoint files
1610 and writes them to the compressed vector file
1611 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22