/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.19 - (show annotations) (download) (as text)
Tue Aug 2 22:26:58 2005 UTC (19 years, 11 months ago) by heimbach
Branch: MAIN
Changes since 1.18: +10 -5 lines
File MIME type: application/x-tex
Updating

1 % $Header: /u/gcmpack/manual/part5/doc_ad_2.tex,v 1.18 2005/08/01 22:31:36 heimbach Exp $
2 % $Name: $
3
4 Author: Patrick Heimbach
5
6 {\sf Automatic differentiation} (AD), also referred to as algorithmic
7 (or, more loosely, computational) differentiation, involves
8 automatically deriving code to calculate
9 partial derivatives from an existing fully non-linear prognostic code.
10 (see \cite{gri:00}).
11 A software tool is used that parses and transforms source files
12 according to a set of linguistic and mathematical rules.
13 AD tools are like source-to-source translators in that
14 they parse a program code as input and produce a new program code
15 as output.
16 However, unlike a pure source-to-source translation, the output program
17 represents a new algorithm, such as the evaluation of the
18 Jacobian, the Hessian, or higher derivative operators.
19 In principle, a variety of derived algorithms
20 can be generated automatically in this way.
21
22 The MITGCM has been adapted for use with the
23 Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF
24 (Transformation of Algorithms in Fortran), developed
25 by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
26 The first application of the adjoint of the MITGCM for sensitivity
27 studies has been published by \cite{maro-eta:99}.
28 \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
29 for ocean state estimation studies.
30 In the following we shall refer to TAMC and TAF synonymously,
31 except were explicitly stated otherwise.
32
33 TAMC exploits the chain rule for computing the first
34 derivative of a function with
35 respect to a set of input variables.
36 Treating a given forward code as a composition of operations --
37 each line representing a compositional element, the chain rule is
38 rigorously applied to the code, line by line. The resulting
39 tangent linear or adjoint code,
40 then, may be thought of as the composition in
41 forward or reverse order, respectively, of the
42 Jacobian matrices of the forward code's compositional elements.
43
44 %**********************************************************************
45 \section{Some basic algebra}
46 \label{sec_ad_algebra}
47 \begin{rawhtml}
48 <!-- CMIREDIR:sec_ad_algebra: -->
49 \end{rawhtml}
50 %**********************************************************************
51
52 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
53 mapping from the $m$-dimensional space
54 $U \subset I\!\!R^m$ of input variables
55 $\vec{u}=(u_1,\ldots,u_m)$
56 (model parameters, initial conditions, boundary conditions
57 such as forcing functions) to the $n$-dimensional space
58 $V \subset I\!\!R^n$ of
59 model output variable $\vec{v}=(v_1,\ldots,v_n)$
60 (model state, model diagnostics, objective function, ...)
61 under consideration,
62 %
63 \begin{equation}
64 \begin{split}
65 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
66 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
67 {\cal M}(\vec{u})
68 \label{fulloperator}
69 \end{split}
70 \end{equation}
71 %
72 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
73 some given basis vectors
74 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
75 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
76 \[
77 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
78 \qquad
79 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
80 \]
81
82 Two routes may be followed to determine the sensitivity of the
83 output variable $\vec{v}$ to its input $\vec{u}$.
84
85 \subsection{Forward or direct sensitivity}
86 %
87 Consider a perturbation to the input variables $\delta \vec{u}$
88 (typically a single component
89 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
90 Their effect on the output may be obtained via the linear
91 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
92 $ M $, evaluated in the point $u^{(0)}$ according to
93 %
94 \begin{equation}
95 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
96 \label{tangent_linear}
97 \end{equation}
98 with resulting output perturbation $\delta \vec{v}$.
99 In components
100 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
101 it reads
102 %
103 \begin{equation}
104 \delta v_{j} \, = \, \sum_{i}
105 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
106 \delta u_{i}
107 \label{jacobi_matrix}
108 \end{equation}
109 %
110 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
111 In contrast to the full nonlinear model $ {\cal M} $, the operator
112 $ M $ is just a matrix
113 which can readily be used to find the forward sensitivity of $\vec{v}$ to
114 perturbations in $u$,
115 but if there are very many input variables $(\gg O(10^{6})$ for
116 large-scale oceanographic application), it quickly becomes
117 prohibitive to proceed directly as in (\ref{tangent_linear}),
118 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
119
120 \subsection{Reverse or adjoint sensitivity}
121 %
122 Let us consider the special case of a
123 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
124 the total meridional heat transport,
125 the total uptake of $CO_{2}$ in the Southern
126 Ocean over a time interval,
127 or a measure of some model-to-data misfit)
128 %
129 \begin{eqnarray}
130 \begin{array}{cccccc}
131 {\cal J} \, : & U &
132 \longrightarrow & V &
133 \longrightarrow & I \!\! R \\
134 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
135 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
136 \end{array}
137 \label{compo}
138 \end{eqnarray}
139 %
140 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
141 \[
142 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
143 \]
144 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
145 w.r.t. their corresponding inner product
146 $\left\langle \,\, , \,\, \right\rangle $
147 %
148 \begin{equation}
149 \begin{split}
150 {\cal J} & = \,
151 {\cal J} |_{\vec{u}^{(0)}} \, + \,
152 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
153 \, + \, O(\delta \vec{u}^2) \\
154 ~ & = \,
155 {\cal J} |_{\vec{v}^{(0)}} \, + \,
156 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
157 \, + \, O(\delta \vec{v}^2)
158 \end{split}
159 \label{deljidentity}
160 \end{equation}
161 %
162 (note, that the gradient $ \nabla f $ is a co-vector, therefore
163 its transpose is required in the above inner product).
164 Then, using the representation of
165 $ \delta {\cal J} =
166 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
167 the definition
168 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
169 \[
170 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
171 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
172 \]
173 which for finite-dimensional vector spaces is just the
174 transpose of $ A $,
175 \[
176 A^{\ast} \, = \, A^T
177 \]
178 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
179 we note that
180 (omitting $|$'s):
181 %
182 \begin{equation}
183 \delta {\cal J}
184 \, = \,
185 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
186 \, = \,
187 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
188 \, = \,
189 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
190 \delta \vec{u} \, \right\rangle
191 \label{inner}
192 \end{equation}
193 %
194 With the identity (\ref{deljidentity}), we then find that
195 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
196 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
197 %
198 \begin{equation}
199 \begin{split}
200 \nabla _{u}{\cal J}^T |_{\vec{u}} &
201 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
202 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
203 ~ & = \, \delta \vec{u}^{\ast}
204 \end{split}
205 \label{adjoint}
206 \end{equation}
207 %
208 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
209 in which $M^T$ is the adjoint (here, the transpose) of the
210 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
211 the adjoint variable of the model state $ \vec{v} $, and
212 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
213
214 The {\sf reverse} nature of the adjoint calculation can be readily
215 seen as follows.
216 Consider a model integration which consists of $ \Lambda $
217 consecutive operations
218 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
219 ...... ( {\cal M}_{\lambda} (
220 ......
221 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
222 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
223 in the code of the model, or successive time steps of the
224 model integration,
225 starting at step 0 and moving up to step $\Lambda$, with intermediate
226 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
227 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
228 Let ${\cal J}$ be a cost function which explicitly depends on the
229 final state $\vec{v}$ only
230 (this restriction is for clarity reasons only).
231 %
232 ${\cal J}(u)$ may be decomposed according to:
233 %
234 \begin{equation}
235 {\cal J}({\cal M}(\vec{u})) \, = \,
236 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
237 ...... ( {\cal M}_{\lambda} (
238 ......
239 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
240 \label{compos}
241 \end{equation}
242 %
243 Then, according to the chain rule, the forward calculation reads,
244 in terms of the Jacobi matrices
245 (we've omitted the $ | $'s which, nevertheless are important
246 to the aspect of {\it tangent} linearity;
247 note also that by definition
248 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
249 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
250 %
251 \begin{equation}
252 \begin{split}
253 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
254 \nabla_v {\cal J} \cdot M_{\Lambda}
255 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
256 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
257 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
258 \end{split}
259 \label{forward}
260 \end{equation}
261 %
262 whereas in reverse mode we have
263 %
264 \begin{equation}
265 \boxed{
266 \begin{split}
267 M^T ( \nabla_v {\cal J}^T) & = \,
268 M_{0}^T \cdot M_{1}^T
269 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
270 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
271 ~ & = \, M_{0}^T \cdot M_{1}^T
272 \cdot ...... \cdot
273 \nabla_{v^{(\lambda)}} {\cal J}^T \\
274 ~ & = \, \nabla_u {\cal J}^T
275 \end{split}
276 }
277 \label{reverse}
278 \end{equation}
279 %
280 clearly expressing the reverse nature of the calculation.
281 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
282 If the intermediate steps $\lambda$ in
283 eqn. (\ref{compos}) -- (\ref{reverse})
284 represent the model state (forward or adjoint) at each
285 intermediate time step as noted above, then correspondingly,
286 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
287 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
288 It thus becomes evident that the adjoint calculation also
289 yields the adjoint of each model state component
290 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
291 %
292 \begin{equation}
293 \boxed{
294 \begin{split}
295 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
296 & = \,
297 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
298 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
299 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
300 \end{split}
301 }
302 \end{equation}
303 %
304 in close analogy to eq. (\ref{adjoint})
305 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
306 are the Lagrange multipliers of the model equations which determine
307 $ \vec{v}^{(\lambda)}$.
308
309 In components, eq. (\ref{adjoint}) reads as follows.
310 Let
311 \[
312 \begin{array}{rclcrcl}
313 \delta \vec{u} & = &
314 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
315 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
316 \left(
317 \frac{\partial {\cal J}}{\partial u_1},\ldots,
318 \frac{\partial {\cal J}}{\partial u_m}
319 \right)^T \\
320 \delta \vec{v} & = &
321 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
322 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
323 \left(
324 \frac{\partial {\cal J}}{\partial v_1},\ldots,
325 \frac{\partial {\cal J}}{\partial v_n}
326 \right)^T \\
327 \end{array}
328 \]
329 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
330 and their adjoint variables;
331 further
332 \[
333 M \, = \, \left(
334 \begin{array}{ccc}
335 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
336 \frac{\partial {\cal M}_1}{\partial u_m} \\
337 \vdots & ~ & \vdots \\
338 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
339 \frac{\partial {\cal M}_n}{\partial u_m} \\
340 \end{array}
341 \right)
342 \]
343 is the Jacobi matrix of $ {\cal M} $
344 (an $ n \times m $ matrix)
345 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
346 \[
347 \delta v_{j}
348 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
349 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
350 \delta u_{i}
351 \]
352 %
353 Then eq. (\ref{adjoint}) takes the form
354 \[
355 \delta u_{i}^{\ast}
356 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
357 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
358 \delta v_{j}^{\ast}
359 \]
360 %
361 or
362 %
363 \[
364 \left(
365 \begin{array}{c}
366 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
367 \vdots \\
368 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
369 \end{array}
370 \right)
371 \, = \,
372 \left(
373 \begin{array}{ccc}
374 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
375 & \ldots &
376 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
377 \vdots & ~ & \vdots \\
378 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
379 & \ldots &
380 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
381 \end{array}
382 \right)
383 \cdot
384 \left(
385 \begin{array}{c}
386 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
387 \vdots \\
388 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
389 \end{array}
390 \right)
391 \]
392 %
393 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
394 of any intermediate state $ v^{(\lambda)} $
395 may be obtained, using the intermediate Jacobian
396 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
397 %
398 \[
399 M_{\lambda} \, = \,
400 \left(
401 \begin{array}{ccc}
402 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
403 & \ldots &
404 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
405 \vdots & ~ & \vdots \\
406 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
407 & \ldots &
408 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
409 \end{array}
410 \right)
411 \]
412 %
413 and the shorthand notation for the adjoint variables
414 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
415 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
416 for intermediate components, yielding
417 \begin{equation}
418 \small
419 \begin{split}
420 \left(
421 \begin{array}{c}
422 \delta v^{(\lambda) \, \ast}_1 \\
423 \vdots \\
424 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
425 \end{array}
426 \right)
427 \, = &
428 \left(
429 \begin{array}{ccc}
430 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
431 & \ldots \,\, \ldots &
432 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
433 \vdots & ~ & \vdots \\
434 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
435 & \ldots \,\, \ldots &
436 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
437 \end{array}
438 \right)
439 \cdot
440 %
441 \\ ~ & ~
442 \\ ~ &
443 %
444 \left(
445 \begin{array}{ccc}
446 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
447 & \ldots &
448 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
449 \vdots & ~ & \vdots \\
450 \vdots & ~ & \vdots \\
451 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
452 & \ldots &
453 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
454 \end{array}
455 \right)
456 \cdot \, \ldots \, \cdot
457 \left(
458 \begin{array}{c}
459 \delta v^{\ast}_1 \\
460 \vdots \\
461 \delta v^{\ast}_{n} \\
462 \end{array}
463 \right)
464 \end{split}
465 \end{equation}
466
467 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
468 showing the advantage of the reverse over the forward mode
469 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
470 cost function $ {\cal J} $ with respect to {\it all} input
471 variables $u$
472 (or the sensitivity of the cost function with respect to
473 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
474 In order to be able to solve for each component of the gradient
475 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
476 a forward calculation has to be performed for each component separately,
477 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
478 for the $i$-th forward calculation.
479 Then, (\ref{forward}) represents the
480 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
481 The full gradient is retrieved from the $ m $ forward calculations.
482 In contrast, eq. (\ref{reverse}) yields the full
483 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
484 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
485
486 Note, that if $ {\cal J} $ is a vector-valued function
487 of dimension $ l > 1 $,
488 eq. (\ref{reverse}) has to be modified according to
489 \[
490 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
491 \, = \,
492 \nabla_u {\cal J}^T \cdot \delta \vec{J}
493 \]
494 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
495 dimension $ l $.
496 In this case $ l $ reverse simulations have to be performed
497 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
498 Then, the reverse mode is more efficient as long as
499 $ l < n $, otherwise the forward mode is preferable.
500 Strictly, the reverse mode is called adjoint mode only for
501 $ l = 1 $.
502
503 A detailed analysis of the underlying numerical operations
504 shows that the computation of $\nabla _{u}{\cal J}$ in this way
505 requires about 2 to 5 times the computation of the cost function.
506 Alternatively, the gradient vector could be approximated
507 by finite differences, requiring $m$ computations
508 of the perturbed cost function.
509
510 To conclude we give two examples of commonly used types
511 of cost functions:
512
513 \paragraph{Example 1:
514 $ {\cal J} = v_{j} (T) $} ~ \\
515 The cost function consists of the $j$-th component of the model state
516 $ \vec{v} $ at time $T$.
517 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
518 unit vector. The $ \nabla_u {\cal J}^T $
519 is the projection of the adjoint
520 operator onto the $j$-th component ${\bf f_{j}}$,
521 \[
522 \nabla_u {\cal J}^T
523 \, = \, M^T \cdot \nabla_v {\cal J}^T
524 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
525 \]
526
527 \paragraph{Example 2:
528 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
529 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
530 The cost function represents the quadratic model vs. data misfit.
531 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
532 operator which maps the model state space onto the data space.
533 Then, $ \nabla_v {\cal J} $ takes the form
534 %
535 \begin{equation*}
536 \begin{split}
537 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
538 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
539 ~ & = \, 2 \sum_{j} \left\{ \sum_k
540 \frac{\partial {\cal H}_k}{\partial v_{j}}
541 \left( {\cal H}_k (\vec{v}) - d_k \right)
542 \right\} \, {\vec{f}_{j}} \\
543 \end{split}
544 \end{equation*}
545 %
546 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
547 Jacobi matrix of the data projection operator.
548 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
549 adjoint operator,
550 driven by the model vs. data misfit:
551 \[
552 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
553 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
554 \]
555
556 \subsection{Storing vs. recomputation in reverse mode}
557 \label{checkpointing}
558
559 We note an important aspect of the forward vs. reverse
560 mode calculation.
561 Because of the local character of the derivative
562 (a derivative is defined w.r.t. a point along the trajectory),
563 the intermediate results of the model trajectory
564 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
565 may be required to evaluate the intermediate Jacobian
566 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
567 This is the case e.g. for nonlinear expressions
568 (momentum advection, nonlinear equation of state), state-dependent
569 conditional statements (parameterization schemes).
570 In the forward mode, the intermediate results are required
571 in the same order as computed by the full forward model ${\cal M}$,
572 but in the reverse mode they are required in the reverse order.
573 Thus, in the reverse mode the trajectory of the forward model
574 integration ${\cal M}$ has to be stored to be available in the reverse
575 calculation. Alternatively, the complete model state up to the
576 point of evaluation has to be recomputed whenever its value is required.
577
578 A method to balance the amount of recomputations vs.
579 storage requirements is called {\sf checkpointing}
580 (e.g. \cite{gri:92}, \cite{res-eta:98}).
581 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
582 [as an example, we give explicit numbers for a 3-day
583 integration with a 1-hourly timestep in square brackets].
584 \begin{itemize}
585 %
586 \item [$lev3$]
587 In a first step, the model trajectory is subdivided into
588 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
589 with the label $lev3$ for this outermost loop.
590 The model is then integrated along the full trajectory,
591 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
592 [i.e. 3 times, at
593 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
594 In addition, the cost function is computed, if needed.
595 %
596 \item [$lev2$]
597 In a second step each subsection itself is divided into
598 $ {n}^{lev2} $ subsections
599 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
600 The model picks up at the last outermost dumped state
601 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
602 the last subsection, with the label $lev2$ for this
603 intermediate loop.
604 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
605 timestep
606 [i.e. 4 times, at
607 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
608 %
609 \item [$lev1$]
610 Finally, the model picks up at the last intermediate dump state
611 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
612 the last subsection, with the label $lev1$ for this
613 intermediate loop.
614 Within this sub-subsection only, parts of the model state is stored
615 to memory at every timestep
616 [i.e. every hour $ i=0,...,5$ corresponding to
617 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
618 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
619 and the model state of all preceding timesteps along the last
620 innermost subsection are available, enabling integration backwards
621 in time along the last subsection.
622 The adjoint can thus be computed along this last
623 subsection $k_{n}^{lev2}$.
624 %
625 \end{itemize}
626 %
627 This procedure is repeated consecutively for each previous
628 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
629 carrying the adjoint computation to the initial time
630 of the subsection $k_{n}^{lev3}$.
631 Then, the procedure is repeated for the previous subsection
632 $k_{n-1}^{lev3}$
633 carrying the adjoint computation to the initial time
634 $k_{1}^{lev3}$.
635
636 For the full model trajectory of
637 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
638 the required storing of the model state was significantly reduced to
639 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
640 [i.e. for the 3-day integration with a total oof 72 timesteps
641 the model state was stored 7 times to disk and roughly 6 times
642 to memory].
643 This saving in memory comes at a cost of a required
644 3 full forward integrations of the model (one for each
645 checkpointing level).
646 The optimal balance of storage vs. recomputation certainly depends
647 on the computing resources available and may be adjusted by
648 adjusting the partitioning among the
649 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
650
651 \begin{figure}[t!]
652 \begin{center}
653 %\psdraft
654 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
655 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
656 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
657 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
658 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
659 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
660 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
661 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
662 %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
663 \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
664 %\psfull
665 \end{center}
666 \caption{
667 Schematic view of intermediate dump and restart for
668 3-level checkpointing.}
669 \label{fig:3levelcheck}
670 \end{figure}
671
672 % \subsection{Optimal perturbations}
673 % \label{sec_optpert}
674
675
676 % \subsection{Error covariance estimate and Hessian matrix}
677 % \label{sec_hessian}
678
679 \newpage
680
681 %**********************************************************************
682 \section{TLM and ADM generation in general}
683 \label{sec_ad_setup_gen}
684 \begin{rawhtml}
685 <!-- CMIREDIR:sec_ad_setup_gen: -->
686 \end{rawhtml}
687 %**********************************************************************
688
689 In this section we describe in a general fashion
690 the parts of the code that are relevant for automatic
691 differentiation using the software tool TAF.
692
693 \input{part5/doc_ad_the_model}
694
695 The basic flow is depicted in \ref{fig:adthemodel}.
696 If CPP option \texttt{ALLOW\_AUTODIFF\_TAMC} is defined,
697 the driver routine
698 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
699 invokes the adjoint of this routine, {\it adthe\_main\_loop}
700 (case \texttt{\#define ALLOW\_ADJOINT\_RUN}), or
701 the tangent linear of this routine {\it g\_the\_main\_loop}
702 (case \texttt{\#define ALLOW\_TANGENTLINEAR\_RUN}),
703 which are the toplevel routines in terms of automatic differentiation.
704 The routines {\it adthe\_main\_loop} or {\it g\_the\_main\_loop}
705 are generated by TAF.
706 It contains both the forward integration of the full model, the
707 cost function calculation,
708 any additional storing that is required for efficient checkpointing,
709 and the reverse integration of the adjoint model.
710
711 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
712
713 In Fig. \ref{fig:adthemodel}
714 the structure of {\it adthe\_main\_loop} has been strongly
715 simplified to focus on the essentials; in particular, no checkpointing
716 procedures are shown here.
717 Prior to the call of {\it adthe\_main\_loop}, the routine
718 {\it ctrl\_unpack} is invoked to unpack the control vector
719 or initialise the control variables.
720 Following the call of {\it adthe\_main\_loop},
721 the routine {\it ctrl\_pack}
722 is invoked to pack the control vector
723 (cf. Section \ref{section_ctrl}).
724 If gradient checks are to be performed, the option
725 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
726 the driver routine {\it grdchk\_main} is called after
727 the gradient has been computed via the adjoint
728 (cf. Section \ref{section_grdchk}).
729
730 %------------------------------------------------------------------
731
732 \subsection{General setup
733 \label{section_ad_setup}}
734
735 In order to configure AD-related setups the following packages need
736 to be enabled:
737 {\it
738 \begin{table}[h!]
739 \begin{tabular}{l}
740 autodiff \\
741 ctrl \\
742 cost \\
743 grdchk \\
744 \end{tabular}
745 \end{table}
746 }
747 The packages are enabled by adding them to your experiment-specific
748 configuration file
749 {\it packages.conf} (see Section ???).
750
751 The following AD-specific CPP option files need to be customized:
752 %
753 \begin{itemize}
754 %
755 \item {\it ECCO\_CPPOPTIONS.h} \\
756 This header file collects CPP options for the packages
757 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
758 the external forcing package {\it exf}.
759 \footnote{NOTE: These options are not set in their package-specific
760 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
761 in the single header file {\it ECCO\_CPPOPTIONS.h}.
762 The package-specific header files serve as simple
763 placeholders at this point.}
764 %
765 \item {\it tamc.h} \\
766 This header configures the splitting of the time stepping loop
767 w.r.t. the 3-level checkpointing (see section ???).
768
769 %
770 \end{itemize}
771
772 %------------------------------------------------------------------
773
774 \subsection{Building the AD code
775 \label{section_ad_build}}
776
777 The build process of an AD code is very similar to building
778 the forward model. However, depending on which AD code one wishes
779 to generate, and on which AD tool is available (TAF or TAMC),
780 the following {\tt make} targets are available:
781
782 \begin{table}[h!]
783 {\footnotesize
784 \begin{tabular}{ccll}
785 ~ & {\it AD-target} & {\it output} & {\it description} \\
786 \hline
787 \hline
788 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
789 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
790 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
791 ~ & ~ & ~ & useful for compiling on remote platforms \\
792 \hline
793 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
794 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
795 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
796 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
797 \hline
798 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
799 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
800 ~ & ~ & ~ & and compiles all code \\
801 ~ & ~ & ~ & (use of TAF is set as default) \\
802 \hline
803 \hline
804 \end{tabular}
805 }
806 \end{table}
807 %
808 Here, the following placeholders are used
809 %
810 \begin{itemize}
811 %
812 \item [$<$TOOL$>$]
813 %
814 \begin{itemize}
815 %
816 \item {\tt TAF}
817 \item {\tt TAMC}
818 %
819 \end{itemize}
820 %
821 \item [$<$MODE$>$]
822 %
823 \begin{itemize}
824 %
825 \item {\tt ad} generates the adjoint model (ADM)
826 \item {\tt ftl} generates the tangent linear model (TLM)
827 \item {\tt svd} generates both ADM and TLM for \\
828 singular value decomposition (SVD) type calculations
829 %
830 \end{itemize}
831 %
832 \end{itemize}
833
834 For example, to generate the adjoint model using TAF after routines ({\tt .F})
835 or headers ({\tt .h}) have been modified, but without compilation,
836 type {\tt make adtaf};
837 or, to generate the tangent linear model using TAMC without
838 re-generating the input code, type {\tt make ftltamconly}.
839
840
841 A typical full build process to generate the ADM via TAF would
842 look like follows:
843 \begin{verbatim}
844 % mkdir build
845 % cd build
846 % ../../../tools/genmake2 -mods=../code_ad
847 % make depend
848 % make adall
849 \end{verbatim}
850
851 %------------------------------------------------------------------
852
853 \subsection{The AD build process in detail
854 \label{section_ad_build_detail}}
855
856 The {\tt make <MODE>all} target consists of the following procedures:
857
858 \begin{enumerate}
859 %
860 \item
861 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
862 on which code ought to be generated. Depending on the {\tt make} target,
863 the contents is
864 \begin{itemize}
865 \item
866 {\tt \#define ALLOW\_ADJOINT\_RUN}
867 \item
868 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
869 \item
870 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
871 \end{itemize}
872 %
873 \item
874 A single file {\tt <MODE>\_input\_code.f} is concatenated
875 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
876 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
877 %
878 \item
879 The AD tool is invoked with the {\bf <MODE>\_<TOOL>\_FLAGS}.
880 The default AD tool flags in {\tt genmake2} can be overrwritten by
881 an {\tt adjoint\_options} file (similar to the platform-specific
882 {\tt build\_options}, see Section ???.
883 The AD tool writes the resulting AD code into the file
884 {\tt <MODE>\_input\_code\_ad.f}
885 %
886 \item
887 A short sed script {\tt adjoint\_sed} is applied to
888 {\tt <MODE>\_input\_code\_ad.f}
889 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
890 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
891 %
892 \item
893 All routines are compiled and an executable is generated
894 (see Table ???).
895 %
896 \end{enumerate}
897
898 \subsubsection{The list AD\_FILES and {\tt .list} files}
899
900 Not all routines are presented to the AD tool.
901 Routines typically hidden are diagnostics routines which
902 do not influence the cost function, but may create
903 artificial flow dependencies such as I/O of active variables.
904
905 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
906 which contains all routines that are shown to the AD tool.
907 This list is put together from all files with suffix {\tt .list}
908 that {\tt genmake2} finds in its search directories.
909 The list file for the core MITgcm routines is in {\tt model/src/}
910 is called {\tt model\_ad\_diff.list}.
911 Note that no wrapper routine is shown to TAF. These are either
912 not visible at all to the AD code, or hand-written AD code
913 is available (see next section).
914
915 Each package directory contains its package-specific
916 list file {\tt <PKG>\_ad\_diff.list}. For example,
917 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
918 Thus, enabling a package will automatically extend the
919 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
920 package-specific routines.
921 Note that you will need to regenerate the {\tt Makefile} if
922 you enable a package (e.g. by adding it to {\tt packages.conf})
923 and a {\tt Makefile} already exists.
924
925 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
926
927 TAMC and TAF can evaluate user-specified directives
928 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
929 The main categories of directives are STORE directives and
930 FLOW directives. Here, we are concerned with flow directives,
931 store directives are treated elsewhere.
932
933 Flow directives enable the AD tool to evaluate how it should treat
934 routines that are 'hidden' by the user, i.e. routines which are
935 not contained in the {\bf AD\_FILES} list (see previous section),
936 but which are called in part of the code that the AD tool does see.
937 The flow directive tell the AD tool
938 %
939 \begin{itemize}
940 %
941 \item which subroutine arguments are input/output
942 \item which subroutine arguments are active
943 \item which subroutine arguments are required to compute the cost
944 \item which subroutine arguments are dependent
945 %
946 \end{itemize}
947 %
948 The syntax for the flow directives can be found in the
949 AD tool manuals.
950
951 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
952 which contains all files with suffix{\tt .flow} that it finds
953 in its search directories.
954 The flow directives for the core MITgcm routines of
955 {\tt eesupp/src/} and {\tt model/src/}
956 reside in {\tt pkg/autodiff/}.
957 This directory also contains hand-written adjoint code
958 for the MITgcm WRAPPER (see Section ???).
959
960 Flow directives for package-specific routines are contained in
961 the corresponding package directories in the file
962 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
963 {\tt ptracers\_ad.flow}.
964
965 \subsubsection{Store directives for 3-level checkpointing}
966
967 The storing that is required at each period of the
968 3-level checkpointing is controled by three
969 top-level headers.
970
971 \begin{verbatim}
972 do ilev_3 = 1, nchklev_3
973 # include ``checkpoint_lev3.h''
974 do ilev_2 = 1, nchklev_2
975 # include ``checkpoint_lev2.h''
976 do ilev_1 = 1, nchklev_1
977 # include ``checkpoint_lev1.h''
978
979 ...
980
981 end do
982 end do
983 end do
984 \end{verbatim}
985
986 All files {\tt checkpoint\_lev?.h} are contained in directory
987 {\tt pkg/autodiff/}.
988
989
990 \subsubsection{Changing the default AD tool flags: ad\_options files}
991
992
993 \subsubsection{Hand-written adjoint code}
994
995 %------------------------------------------------------------------
996
997 \subsection{The cost function (dependent variable)
998 \label{section_cost}}
999
1000 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
1001 It is a function of the input variables $ \vec{u} $ via the composition
1002 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
1003 The input are referred to as the
1004 {\sf independent variables} or {\sf control variables}.
1005 All aspects relevant to the treatment of the cost function $ {\cal J} $
1006 (parameter setting, initialization, accumulation,
1007 final evaluation), are controlled by the package {\it pkg/cost}.
1008 The aspects relevant to the treatment of the independent variables
1009 are controlled by the package {\it pkg/ctrl} and will be treated
1010 in the next section.
1011
1012 \input{part5/doc_cost_flow}
1013
1014 \subsubsection{Enabling the package}
1015
1016 \fbox{
1017 \begin{minipage}{12cm}
1018 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1019 \end{minipage}
1020 }
1021 \begin{itemize}
1022 %
1023 \item
1024 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1025 (see Section ???)
1026 %
1027 \item
1028
1029
1030 \end{itemize}
1031 %
1032
1033 N.B.: In general the following packages ought to be enabled
1034 simultaneously: {\it autodiff, cost, ctrl}.
1035 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1036 Each specific cost function contribution has its own option.
1037 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1038 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1039 Since the cost function is usually used in conjunction with
1040 automatic differentiation, the CPP option
1041 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1042 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1043 should be defined.
1044
1045 \subsubsection{Initialization}
1046 %
1047 The initialization of the {\it cost} package is readily enabled
1048 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1049 %
1050 \begin{itemize}
1051 %
1052 \item
1053 \fbox{
1054 \begin{minipage}{12cm}
1055 Parameters: {\it cost\_readparms}
1056 \end{minipage}
1057 }
1058 \\
1059 This S/R
1060 reads runtime flags and parameters from file {\it data.cost}.
1061 For the present example the only relevant parameter read
1062 is {\bf mult\_tracer}. This multiplier enables different
1063 cost function contributions to be switched on
1064 ( = 1.) or off ( = 0.) at runtime.
1065 For more complex cost functions which involve model vs. data
1066 misfits, the corresponding data filenames and data
1067 specifications (start date and time, period, ...) are read
1068 in this S/R.
1069 %
1070 \item
1071 \fbox{
1072 \begin{minipage}{12cm}
1073 Variables: {\it cost\_init}
1074 \end{minipage}
1075 }
1076 \\
1077 This S/R
1078 initializes the different cost function contributions.
1079 The contribution for the present example is {\bf objf\_tracer}
1080 which is defined on each tile (bi,bj).
1081 %
1082 \end{itemize}
1083 %
1084 \subsubsection{Accumulation}
1085 %
1086 \begin{itemize}
1087 %
1088 \item
1089 \fbox{
1090 \begin{minipage}{12cm}
1091 {\it cost\_tile}, {\it cost\_tracer}
1092 \end{minipage}
1093 }
1094 \end{itemize}
1095 %
1096 The 'driver' routine
1097 {\it cost\_tile} is called at the end of each time step.
1098 Within this 'driver' routine, S/R are called for each of
1099 the chosen cost function contributions.
1100 In the present example ({\bf ALLOW\_COST\_TRACER}),
1101 S/R {\it cost\_tracer} is called.
1102 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1103 %
1104 \subsubsection{Finalize all contributions}
1105 %
1106 \begin{itemize}
1107 %
1108 \item
1109 \fbox{
1110 \begin{minipage}{12cm}
1111 {\it cost\_final}
1112 \end{minipage}
1113 }
1114 \end{itemize}
1115 %
1116 At the end of the forward integration S/R {\it cost\_final}
1117 is called. It accumulates the total cost function {\bf fc}
1118 from each contribution and sums over all tiles:
1119 \begin{equation}
1120 {\cal J} \, = \,
1121 {\rm fc} \, = \,
1122 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1123 {\rm objf\_tracer}(bi,bj) \, + \, ...
1124 \end{equation}
1125 %
1126 The total cost function {\bf fc} will be the
1127 'dependent' variable in the argument list for TAMC, i.e.
1128 \begin{verbatim}
1129 tamc -output 'fc' ...
1130 \end{verbatim}
1131
1132 %%%% \end{document}
1133
1134 \input{part5/doc_ad_the_main}
1135
1136 \subsection{The control variables (independent variables)
1137 \label{section_ctrl}}
1138
1139 The control variables are a subset of the model input
1140 (initial conditions, boundary conditions, model parameters).
1141 Here we identify them with the variable $ \vec{u} $.
1142 All intermediate variables whose derivative w.r.t. control
1143 variables do not vanish are called {\sf active variables}.
1144 All subroutines whose derivative w.r.t. the control variables
1145 don't vanish are called {\sf active routines}.
1146 Read and write operations from and to file can be viewed
1147 as variable assignments. Therefore, files to which
1148 active variables are written and from which active variables
1149 are read are called {\sf active files}.
1150 All aspects relevant to the treatment of the control variables
1151 (parameter setting, initialization, perturbation)
1152 are controlled by the package {\it pkg/ctrl}.
1153
1154 \input{part5/doc_ctrl_flow}
1155
1156 \subsubsection{genmake and CPP options}
1157 %
1158 \begin{itemize}
1159 %
1160 \item
1161 \fbox{
1162 \begin{minipage}{12cm}
1163 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1164 \end{minipage}
1165 }
1166 \end{itemize}
1167 %
1168 To enable the directory to be included to the compile list,
1169 {\bf ctrl} has to be added to the {\bf enable} list in
1170 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1171 package, cf. previous section).
1172 Each control variable is enabled via its own CPP option
1173 in {\it ECCO\_CPPOPTIONS.h}.
1174
1175 \subsubsection{Initialization}
1176 %
1177 \begin{itemize}
1178 %
1179 \item
1180 \fbox{
1181 \begin{minipage}{12cm}
1182 Parameters: {\it ctrl\_readparms}
1183 \end{minipage}
1184 }
1185 \\
1186 %
1187 This S/R
1188 reads runtime flags and parameters from file {\it data.ctrl}.
1189 For the present example the file contains the file names
1190 of each control variable that is used.
1191 In addition, the number of wet points for each control
1192 variable and the net dimension of the space of control
1193 variables (counting wet points only) {\bf nvarlength}
1194 is determined.
1195 Masks for wet points for each tile {\bf (bi,\,bj)}
1196 and vertical layer {\bf k} are generated for the three
1197 relevant categories on the C-grid:
1198 {\bf nWetCtile} for tracer fields,
1199 {\bf nWetWtile} for zonal velocity fields,
1200 {\bf nWetStile} for meridional velocity fields.
1201 %
1202 \item
1203 \fbox{
1204 \begin{minipage}{12cm}
1205 Control variables, control vector,
1206 and their gradients: {\it ctrl\_unpack}
1207 \end{minipage}
1208 }
1209 \\
1210 %
1211 Two important issues related to the handling of the control
1212 variables in the MITGCM need to be addressed.
1213 First, in order to save memory, the control variable arrays
1214 are not kept in memory, but rather read from file and added
1215 to the initial fields during the model initialization phase.
1216 Similarly, the corresponding adjoint fields which represent
1217 the gradient of the cost function w.r.t. the control variables
1218 are written to file at the end of the adjoint integration.
1219 Second, in addition to the files holding the 2-dim. and 3-dim.
1220 control variables and the corresponding cost gradients,
1221 a 1-dim. {\sf control vector}
1222 and {\sf gradient vector} are written to file. They contain
1223 only the wet points of the control variables and the corresponding
1224 gradient.
1225 This leads to a significant data compression.
1226 Furthermore, an option is available
1227 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1228 non-dimensionalise the control and gradient vector,
1229 which otherwise would contain different pieces of different
1230 magnitudes and units.
1231 Finally, the control and gradient vector can be passed to a
1232 minimization routine if an update of the control variables
1233 is sought as part of a minimization exercise.
1234
1235 The files holding fields and vectors of the control variables
1236 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1237 %
1238 \end{itemize}
1239
1240 \subsubsection{Perturbation of the independent variables}
1241 %
1242 The dependency flow for differentiation w.r.t. the controls
1243 starts with adding a perturbation onto the input variable,
1244 thus defining the independent or control variables for TAMC.
1245 Three types of controls may be considered:
1246 %
1247 \begin{itemize}
1248 %
1249 \item
1250 \fbox{
1251 \begin{minipage}{12cm}
1252 {\it ctrl\_map\_ini} (initial value sensitivity):
1253 \end{minipage}
1254 }
1255 \\
1256 %
1257 Consider as an example the initial tracer distribution
1258 {\bf tr1} as control variable.
1259 After {\bf tr1} has been initialised in
1260 {\it ini\_tr1} (dynamical variables such as
1261 temperature and salinity are initialised in {\it ini\_fields}),
1262 a perturbation anomaly is added to the field in S/R
1263 {\it ctrl\_map\_ini}
1264 %
1265 \begin{equation}
1266 \begin{split}
1267 u & = \, u_{[0]} \, + \, \Delta u \\
1268 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1269 \label{perturb}
1270 \end{split}
1271 \end{equation}
1272 %
1273 {\bf xx\_tr1} is a 3-dim. global array
1274 holding the perturbation. In the case of a simple
1275 sensitivity study this array is identical to zero.
1276 However, it's specification is essential in the context
1277 of automatic differentiation since TAMC
1278 treats the corresponding line in the code symbolically
1279 when determining the differentiation chain and its origin.
1280 Thus, the variable names are part of the argument list
1281 when calling TAMC:
1282 %
1283 \begin{verbatim}
1284 tamc -input 'xx_tr1 ...' ...
1285 \end{verbatim}
1286 %
1287 Now, as mentioned above, the MITGCM avoids maintaining
1288 an array for each control variable by reading the
1289 perturbation to a temporary array from file.
1290 To ensure the symbolic link to be recognized by TAMC, a scalar
1291 dummy variable {\bf xx\_tr1\_dummy} is introduced
1292 and an 'active read' routine of the adjoint support
1293 package {\it pkg/autodiff} is invoked.
1294 The read-procedure is tagged with the variable
1295 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1296 initialization of the perturbation.
1297 The modified call of TAMC thus reads
1298 %
1299 \begin{verbatim}
1300 tamc -input 'xx_tr1_dummy ...' ...
1301 \end{verbatim}
1302 %
1303 and the modified operation to (\ref{perturb})
1304 in the code takes on the form
1305 %
1306 \begin{verbatim}
1307 call active_read_xyz(
1308 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1309
1310 tr1(...) = tr1(...) + tmpfld3d(...)
1311 \end{verbatim}
1312 %
1313 Note, that reading an active variable corresponds
1314 to a variable assignment. Its derivative corresponds
1315 to a write statement of the adjoint variable, followed by
1316 a reset.
1317 The 'active file' routines have been designed
1318 to support active read and corresponding adjoint active write
1319 operations (and vice versa).
1320 %
1321 \item
1322 \fbox{
1323 \begin{minipage}{12cm}
1324 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1325 \end{minipage}
1326 }
1327 \\
1328 %
1329 The handling of boundary values as control variables
1330 proceeds exactly analogous to the initial values
1331 with the symbolic perturbation taking place in S/R
1332 {\it ctrl\_map\_forcing}.
1333 Note however an important difference:
1334 Since the boundary values are time dependent with a new
1335 forcing field applied at each time steps,
1336 the general problem may be thought of as
1337 a new control variable at each time step
1338 (or, if the perturbation is averaged over a certain period,
1339 at each $ N $ timesteps), i.e.
1340 \[
1341 u_{\rm forcing} \, = \,
1342 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1343 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1344 \]
1345 %
1346 In the current example an equilibrium state is considered,
1347 and only an initial perturbation to
1348 surface forcing is applied with respect to the
1349 equilibrium state.
1350 A time dependent treatment of the surface forcing is
1351 implemented in the ECCO environment, involving the
1352 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1353 %
1354 \item
1355 \fbox{
1356 \begin{minipage}{12cm}
1357 {\it ctrl\_map\_params} (parameter sensitivity):
1358 \end{minipage}
1359 }
1360 \\
1361 %
1362 This routine is not yet implemented, but would proceed
1363 proceed along the same lines as the initial value sensitivity.
1364 The mixing parameters {\bf diffkr} and {\bf kapgm}
1365 are currently added as controls in {\it ctrl\_map\_ini.F}.
1366 %
1367 \end{itemize}
1368 %
1369
1370 \subsubsection{Output of adjoint variables and gradient}
1371 %
1372 Several ways exist to generate output of adjoint fields.
1373 %
1374 \begin{itemize}
1375 %
1376 \item
1377 \fbox{
1378 \begin{minipage}{12cm}
1379 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1380 \end{minipage}
1381 }
1382 \\
1383 \begin{itemize}
1384 %
1385 \item {\bf xx\_...}: the control variable fields \\
1386 Before the forward integration, the control
1387 variables are read from file {\bf xx\_ ...} and added to
1388 the model field.
1389 %
1390 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1391 $ \nabla _{u}{\cal J} $ for each control variable \\
1392 After the adjoint integration the corresponding adjoint
1393 variables are written to {\bf adxx\_ ...}.
1394 %
1395 \end{itemize}
1396 %
1397 \item
1398 \fbox{
1399 \begin{minipage}{12cm}
1400 {\it ctrl\_unpack, ctrl\_pack}:
1401 \end{minipage}
1402 }
1403 \\
1404 %
1405 \begin{itemize}
1406 %
1407 \item {\bf vector\_ctrl}: the control vector \\
1408 At the very beginning of the model initialization,
1409 the updated compressed control vector is read (or initialised)
1410 and distributed to 2-dim. and 3-dim. control variable fields.
1411 %
1412 \item {\bf vector\_grad}: the gradient vector \\
1413 At the very end of the adjoint integration,
1414 the 2-dim. and 3-dim. adjoint variables are read,
1415 compressed to a single vector and written to file.
1416 %
1417 \end{itemize}
1418 %
1419 \item
1420 \fbox{
1421 \begin{minipage}{12cm}
1422 {\it addummy\_in\_stepping}:
1423 \end{minipage}
1424 }
1425 \\
1426 In addition to writing the gradient at the end of the
1427 forward/adjoint integration, many more adjoint variables
1428 of the model state
1429 at intermediate times can be written using S/R
1430 {\it addummy\_in\_stepping}.
1431 This routine is part of the adjoint support package
1432 {\it pkg/autodiff} (cf.f. below).
1433 The procedure is enabled using via the CPP-option
1434 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1435 To be part of the adjoint code, the corresponding S/R
1436 {\it dummy\_in\_stepping} has to be called in the forward
1437 model (S/R {\it the\_main\_loop}) at the appropriate place.
1438 The adjoint common blocks are extracted from the adjoint code
1439 via the header file {\it adcommon.h}.
1440
1441 {\it dummy\_in\_stepping} is essentially empty,
1442 the corresponding adjoint routine is hand-written rather
1443 than generated automatically.
1444 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1445 ensure that TAMC does not automatically
1446 generate {\it addummy\_in\_stepping} by trying to differentiate
1447 {\it dummy\_in\_stepping}, but instead refers to
1448 the hand-written routine.
1449
1450 {\it dummy\_in\_stepping} is called in the forward code
1451 at the beginning of each
1452 timestep, before the call to {\it dynamics}, thus ensuring
1453 that {\it addummy\_in\_stepping} is called at the end of
1454 each timestep in the adjoint calculation, after the call to
1455 {\it addynamics}.
1456
1457 {\it addummy\_in\_stepping} includes the header files
1458 {\it adcommon.h}.
1459 This header file is also hand-written. It contains
1460 the common blocks
1461 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1462 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1463 {\bf /adtr1\_r/}, {\bf /adffields/},
1464 which have been extracted from the adjoint code to enable
1465 access to the adjoint variables.
1466
1467 {\bf WARNING:} If the structure of the common blocks
1468 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1469 similar changes will occur in the adjoint common blocks.
1470 Therefore, consistency between the TAMC-generated common blocks
1471 and those in {\it adcommon.h} have to be checked.
1472 %
1473 \end{itemize}
1474
1475
1476 \subsubsection{Control variable handling for
1477 optimization applications}
1478
1479 In optimization mode the cost function $ {\cal J}(u) $ is sought
1480 to be minimized with respect to a set of control variables
1481 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1482 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1483 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1484 at iteration step $ k $ serve
1485 as input to a minimization routine (e.g. quasi-Newton method,
1486 conjugate gradient, ... \cite{gil-lem:89})
1487 to compute an update in the
1488 control variable for iteration step $k+1$
1489 \[
1490 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1491 \quad \mbox{satisfying} \quad
1492 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1493 \]
1494 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1495 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1496 $ k+1 $.
1497 Tab. \ref{???} sketches the flow between forward/adjoint model
1498 and the minimization routine.
1499
1500 \begin{eqnarray*}
1501 \scriptsize
1502 \begin{array}{ccccc}
1503 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1504 {\Big\downarrow}
1505 & ~ & ~ & ~ & ~ \\
1506 ~ & ~ & ~ & ~ & ~ \\
1507 \hline
1508 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1509 \multicolumn{1}{|c}{
1510 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1511 \stackrel{\bf forward}{\bf \longrightarrow} &
1512 v_{[k]} = M \left( u_{[k]} \right) &
1513 \stackrel{\bf forward}{\bf \longrightarrow} &
1514 \multicolumn{1}{c|}{
1515 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1516 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1517 \hline
1518 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1519 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1520 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1521 \hline
1522 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1523 \multicolumn{1}{|c}{
1524 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1525 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1526 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1527 ad \, v_{[k]} (\delta {\cal J}) =
1528 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1529 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1530 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1531 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1532 \hline
1533 ~ & ~ & ~ & ~ & ~ \\
1534 \hspace*{15ex}{\Bigg\downarrow}
1535 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1536 & ~ & ~ & ~ & ~ \\
1537 ~ & ~ & ~ & ~ & ~ \\
1538 \hline
1539 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1540 \multicolumn{1}{|c}{
1541 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1542 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1543 {\mathbf \longrightarrow} &
1544 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1545 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1546 \hline
1547 ~ & ~ & ~ & ~ & ~ \\
1548 ~ & ~ & ~ & ~ & \Big\downarrow \\
1549 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1550 \end{array}
1551 \end{eqnarray*}
1552
1553 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1554 the link between the model and the minimization routine.
1555 As described in Section \ref{???}
1556 the {\it unpack} and {\it pack} routines read and write
1557 control and gradient {\it vectors} which are compressed
1558 to contain only wet points, in addition to the full
1559 2-dim. and 3-dim. fields.
1560 The corresponding I/O flow looks as follows:
1561
1562 \vspace*{0.5cm}
1563
1564 {\scriptsize
1565 \begin{tabular}{ccccc}
1566 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1567 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1568 \cline{1-1}
1569 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1570 \cline{1-1}
1571 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1572 \cline{3-3}
1573 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1574 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1575 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1576 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1577 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1578 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1579 & ~ & ~ \\
1580 \cline{3-3}
1581 ~ & ~ & $\downarrow$ & ~ & ~ \\
1582 \cline{3-3}
1583 ~ & ~ &
1584 \multicolumn{1}{|c|}{~} & ~ &
1585 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1586 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1587 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1588 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1589 ~ & ~ & \multicolumn{1}{|c|}{~}
1590 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1591 \cline{3-3}
1592 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1593 \cline{5-5}
1594 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1595 \cline{5-5}
1596 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1597 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1598 \end{tabular}
1599 }
1600
1601 \vspace*{0.5cm}
1602
1603
1604 {\it ctrl\_unpack} reads the updated control vector
1605 {\bf vector\_ctrl\_$<$k$>$}.
1606 It distributes the different control variables to
1607 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1608 At the start of the forward integration the control variables
1609 are read from {\it xx\_...$<$k$>$} and added to the
1610 field.
1611 Correspondingly, at the end of the adjoint integration
1612 the adjoint fields are written
1613 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1614 Finally, {\it ctrl\_pack} collects all adjoint files
1615 and writes them to the compressed vector file
1616 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22