/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.16 - (show annotations) (download) (as text)
Tue May 11 21:55:14 2004 UTC (21 years, 2 months ago) by heimbach
Branch: MAIN
Changes since 1.15: +292 -32 lines
File MIME type: application/x-tex
update

1 % $Header: /u/gcmpack/manual/part5/doc_ad_2.tex,v 1.15 2002/04/24 11:01:46 heimbach Exp $
2 % $Name: $
3
4 {\sf Automatic differentiation} (AD), also referred to as algorithmic
5 (or, more loosely, computational) differentiation, involves
6 automatically deriving code to calculate
7 partial derivatives from an existing fully non-linear prognostic code.
8 (see \cite{gri:00}).
9 A software tool is used that parses and transforms source files
10 according to a set of linguistic and mathematical rules.
11 AD tools are like source-to-source translators in that
12 they parse a program code as input and produce a new program code
13 as output.
14 However, unlike a pure source-to-source translation, the output program
15 represents a new algorithm, such as the evaluation of the
16 Jacobian, the Hessian, or higher derivative operators.
17 In principle, a variety of derived algorithms
18 can be generated automatically in this way.
19
20 The MITGCM has been adapted for use with the
21 Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF
22 (Transformation of Algorithms in Fortran), developed
23 by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
24 The first application of the adjoint of the MITGCM for sensitivity
25 studies has been published by \cite{maro-eta:99}.
26 \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
27 for ocean state estimation studies.
28 In the following we shall refer to TAMC and TAF synonymously,
29 except were explicitly stated otherwise.
30
31 TAMC exploits the chain rule for computing the first
32 derivative of a function with
33 respect to a set of input variables.
34 Treating a given forward code as a composition of operations --
35 each line representing a compositional element, the chain rule is
36 rigorously applied to the code, line by line. The resulting
37 tangent linear or adjoint code,
38 then, may be thought of as the composition in
39 forward or reverse order, respectively, of the
40 Jacobian matrices of the forward code's compositional elements.
41
42 %**********************************************************************
43 \section{Some basic algebra}
44 \label{sec_ad_algebra}
45 %**********************************************************************
46
47 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
48 mapping from the $m$-dimensional space
49 $U \subset I\!\!R^m$ of input variables
50 $\vec{u}=(u_1,\ldots,u_m)$
51 (model parameters, initial conditions, boundary conditions
52 such as forcing functions) to the $n$-dimensional space
53 $V \subset I\!\!R^n$ of
54 model output variable $\vec{v}=(v_1,\ldots,v_n)$
55 (model state, model diagnostics, objective function, ...)
56 under consideration,
57 %
58 \begin{equation}
59 \begin{split}
60 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
61 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
62 {\cal M}(\vec{u})
63 \label{fulloperator}
64 \end{split}
65 \end{equation}
66 %
67 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
68 some given basis vectors
69 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
70 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
71 \[
72 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
73 \qquad
74 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
75 \]
76
77 Two routes may be followed to determine the sensitivity of the
78 output variable $\vec{v}$ to its input $\vec{u}$.
79
80 \subsection{Forward or direct sensitivity}
81 %
82 Consider a perturbation to the input variables $\delta \vec{u}$
83 (typically a single component
84 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
85 Their effect on the output may be obtained via the linear
86 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
87 $ M $, evaluated in the point $u^{(0)}$ according to
88 %
89 \begin{equation}
90 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
91 \label{tangent_linear}
92 \end{equation}
93 with resulting output perturbation $\delta \vec{v}$.
94 In components
95 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
96 it reads
97 %
98 \begin{equation}
99 \delta v_{j} \, = \, \sum_{i}
100 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
101 \delta u_{i}
102 \label{jacobi_matrix}
103 \end{equation}
104 %
105 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
106 In contrast to the full nonlinear model $ {\cal M} $, the operator
107 $ M $ is just a matrix
108 which can readily be used to find the forward sensitivity of $\vec{v}$ to
109 perturbations in $u$,
110 but if there are very many input variables $(\gg O(10^{6})$ for
111 large-scale oceanographic application), it quickly becomes
112 prohibitive to proceed directly as in (\ref{tangent_linear}),
113 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
114
115 \subsection{Reverse or adjoint sensitivity}
116 %
117 Let us consider the special case of a
118 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
119 the total meridional heat transport,
120 the total uptake of $CO_{2}$ in the Southern
121 Ocean over a time interval,
122 or a measure of some model-to-data misfit)
123 %
124 \begin{eqnarray}
125 \begin{array}{cccccc}
126 {\cal J} \, : & U &
127 \longrightarrow & V &
128 \longrightarrow & I \!\! R \\
129 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
130 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
131 \end{array}
132 \label{compo}
133 \end{eqnarray}
134 %
135 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
136 \[
137 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
138 \]
139 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
140 w.r.t. their corresponding inner product
141 $\left\langle \,\, , \,\, \right\rangle $
142 %
143 \begin{equation}
144 \begin{split}
145 {\cal J} & = \,
146 {\cal J} |_{\vec{u}^{(0)}} \, + \,
147 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
148 \, + \, O(\delta \vec{u}^2) \\
149 ~ & = \,
150 {\cal J} |_{\vec{v}^{(0)}} \, + \,
151 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
152 \, + \, O(\delta \vec{v}^2)
153 \end{split}
154 \label{deljidentity}
155 \end{equation}
156 %
157 (note, that the gradient $ \nabla f $ is a co-vector, therefore
158 its transpose is required in the above inner product).
159 Then, using the representation of
160 $ \delta {\cal J} =
161 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
162 the definition
163 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
164 \[
165 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
166 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
167 \]
168 which for finite-dimensional vector spaces is just the
169 transpose of $ A $,
170 \[
171 A^{\ast} \, = \, A^T
172 \]
173 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
174 we note that
175 (omitting $|$'s):
176 %
177 \begin{equation}
178 \delta {\cal J}
179 \, = \,
180 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
181 \, = \,
182 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
183 \, = \,
184 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
185 \delta \vec{u} \, \right\rangle
186 \label{inner}
187 \end{equation}
188 %
189 With the identity (\ref{deljidentity}), we then find that
190 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
191 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
192 %
193 \begin{equation}
194 \begin{split}
195 \nabla _{u}{\cal J}^T |_{\vec{u}} &
196 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
197 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
198 ~ & = \, \delta \vec{u}^{\ast}
199 \end{split}
200 \label{adjoint}
201 \end{equation}
202 %
203 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
204 in which $M^T$ is the adjoint (here, the transpose) of the
205 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
206 the adjoint variable of the model state $ \vec{v} $, and
207 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
208
209 The {\sf reverse} nature of the adjoint calculation can be readily
210 seen as follows.
211 Consider a model integration which consists of $ \Lambda $
212 consecutive operations
213 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
214 ...... ( {\cal M}_{\lambda} (
215 ......
216 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
217 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
218 in the code of the model, or successive time steps of the
219 model integration,
220 starting at step 0 and moving up to step $\Lambda$, with intermediate
221 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
222 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
223 Let ${\cal J}$ be a cost function which explicitly depends on the
224 final state $\vec{v}$ only
225 (this restriction is for clarity reasons only).
226 %
227 ${\cal J}(u)$ may be decomposed according to:
228 %
229 \begin{equation}
230 {\cal J}({\cal M}(\vec{u})) \, = \,
231 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
232 ...... ( {\cal M}_{\lambda} (
233 ......
234 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
235 \label{compos}
236 \end{equation}
237 %
238 Then, according to the chain rule, the forward calculation reads,
239 in terms of the Jacobi matrices
240 (we've omitted the $ | $'s which, nevertheless are important
241 to the aspect of {\it tangent} linearity;
242 note also that by definition
243 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
244 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
245 %
246 \begin{equation}
247 \begin{split}
248 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
249 \nabla_v {\cal J} \cdot M_{\Lambda}
250 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
251 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
252 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
253 \end{split}
254 \label{forward}
255 \end{equation}
256 %
257 whereas in reverse mode we have
258 %
259 \begin{equation}
260 \boxed{
261 \begin{split}
262 M^T ( \nabla_v {\cal J}^T) & = \,
263 M_{0}^T \cdot M_{1}^T
264 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
265 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
266 ~ & = \, M_{0}^T \cdot M_{1}^T
267 \cdot ...... \cdot
268 \nabla_{v^{(\lambda)}} {\cal J}^T \\
269 ~ & = \, \nabla_u {\cal J}^T
270 \end{split}
271 }
272 \label{reverse}
273 \end{equation}
274 %
275 clearly expressing the reverse nature of the calculation.
276 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
277 If the intermediate steps $\lambda$ in
278 eqn. (\ref{compos}) -- (\ref{reverse})
279 represent the model state (forward or adjoint) at each
280 intermediate time step as noted above, then correspondingly,
281 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
282 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
283 It thus becomes evident that the adjoint calculation also
284 yields the adjoint of each model state component
285 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
286 %
287 \begin{equation}
288 \boxed{
289 \begin{split}
290 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
291 & = \,
292 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
293 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
294 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
295 \end{split}
296 }
297 \end{equation}
298 %
299 in close analogy to eq. (\ref{adjoint})
300 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
301 are the Lagrange multipliers of the model equations which determine
302 $ \vec{v}^{(\lambda)}$.
303
304 In components, eq. (\ref{adjoint}) reads as follows.
305 Let
306 \[
307 \begin{array}{rclcrcl}
308 \delta \vec{u} & = &
309 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
310 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
311 \left(
312 \frac{\partial {\cal J}}{\partial u_1},\ldots,
313 \frac{\partial {\cal J}}{\partial u_m}
314 \right)^T \\
315 \delta \vec{v} & = &
316 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
317 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
318 \left(
319 \frac{\partial {\cal J}}{\partial v_1},\ldots,
320 \frac{\partial {\cal J}}{\partial v_n}
321 \right)^T \\
322 \end{array}
323 \]
324 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
325 and their adjoint variables;
326 further
327 \[
328 M \, = \, \left(
329 \begin{array}{ccc}
330 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
331 \frac{\partial {\cal M}_1}{\partial u_m} \\
332 \vdots & ~ & \vdots \\
333 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
334 \frac{\partial {\cal M}_n}{\partial u_m} \\
335 \end{array}
336 \right)
337 \]
338 is the Jacobi matrix of $ {\cal M} $
339 (an $ n \times m $ matrix)
340 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
341 \[
342 \delta v_{j}
343 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
344 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
345 \delta u_{i}
346 \]
347 %
348 Then eq. (\ref{adjoint}) takes the form
349 \[
350 \delta u_{i}^{\ast}
351 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
352 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
353 \delta v_{j}^{\ast}
354 \]
355 %
356 or
357 %
358 \[
359 \left(
360 \begin{array}{c}
361 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
362 \vdots \\
363 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
364 \end{array}
365 \right)
366 \, = \,
367 \left(
368 \begin{array}{ccc}
369 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
370 & \ldots &
371 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
372 \vdots & ~ & \vdots \\
373 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
374 & \ldots &
375 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
376 \end{array}
377 \right)
378 \cdot
379 \left(
380 \begin{array}{c}
381 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
382 \vdots \\
383 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
384 \end{array}
385 \right)
386 \]
387 %
388 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
389 of any intermediate state $ v^{(\lambda)} $
390 may be obtained, using the intermediate Jacobian
391 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
392 %
393 \[
394 M_{\lambda} \, = \,
395 \left(
396 \begin{array}{ccc}
397 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
398 & \ldots &
399 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
400 \vdots & ~ & \vdots \\
401 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
402 & \ldots &
403 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
404 \end{array}
405 \right)
406 \]
407 %
408 and the shorthand notation for the adjoint variables
409 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
410 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
411 for intermediate components, yielding
412 \begin{equation}
413 \small
414 \begin{split}
415 \left(
416 \begin{array}{c}
417 \delta v^{(\lambda) \, \ast}_1 \\
418 \vdots \\
419 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
420 \end{array}
421 \right)
422 \, = &
423 \left(
424 \begin{array}{ccc}
425 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
426 & \ldots \,\, \ldots &
427 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
428 \vdots & ~ & \vdots \\
429 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
430 & \ldots \,\, \ldots &
431 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
432 \end{array}
433 \right)
434 \cdot
435 %
436 \\ ~ & ~
437 \\ ~ &
438 %
439 \left(
440 \begin{array}{ccc}
441 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
442 & \ldots &
443 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
444 \vdots & ~ & \vdots \\
445 \vdots & ~ & \vdots \\
446 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
447 & \ldots &
448 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
449 \end{array}
450 \right)
451 \cdot \, \ldots \, \cdot
452 \left(
453 \begin{array}{c}
454 \delta v^{\ast}_1 \\
455 \vdots \\
456 \delta v^{\ast}_{n} \\
457 \end{array}
458 \right)
459 \end{split}
460 \end{equation}
461
462 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
463 showing the advantage of the reverse over the forward mode
464 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
465 cost function $ {\cal J} $ with respect to {\it all} input
466 variables $u$
467 (or the sensitivity of the cost function with respect to
468 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
469 In order to be able to solve for each component of the gradient
470 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
471 a forward calculation has to be performed for each component separately,
472 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
473 for the $i$-th forward calculation.
474 Then, (\ref{forward}) represents the
475 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
476 The full gradient is retrieved from the $ m $ forward calculations.
477 In contrast, eq. (\ref{reverse}) yields the full
478 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
479 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
480
481 Note, that if $ {\cal J} $ is a vector-valued function
482 of dimension $ l > 1 $,
483 eq. (\ref{reverse}) has to be modified according to
484 \[
485 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
486 \, = \,
487 \nabla_u {\cal J}^T \cdot \delta \vec{J}
488 \]
489 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
490 dimension $ l $.
491 In this case $ l $ reverse simulations have to be performed
492 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
493 Then, the reverse mode is more efficient as long as
494 $ l < n $, otherwise the forward mode is preferable.
495 Strictly, the reverse mode is called adjoint mode only for
496 $ l = 1 $.
497
498 A detailed analysis of the underlying numerical operations
499 shows that the computation of $\nabla _{u}{\cal J}$ in this way
500 requires about 2 to 5 times the computation of the cost function.
501 Alternatively, the gradient vector could be approximated
502 by finite differences, requiring $m$ computations
503 of the perturbed cost function.
504
505 To conclude we give two examples of commonly used types
506 of cost functions:
507
508 \paragraph{Example 1:
509 $ {\cal J} = v_{j} (T) $} ~ \\
510 The cost function consists of the $j$-th component of the model state
511 $ \vec{v} $ at time $T$.
512 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
513 unit vector. The $ \nabla_u {\cal J}^T $
514 is the projection of the adjoint
515 operator onto the $j$-th component ${\bf f_{j}}$,
516 \[
517 \nabla_u {\cal J}^T
518 \, = \, M^T \cdot \nabla_v {\cal J}^T
519 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
520 \]
521
522 \paragraph{Example 2:
523 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
524 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
525 The cost function represents the quadratic model vs. data misfit.
526 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
527 operator which maps the model state space onto the data space.
528 Then, $ \nabla_v {\cal J} $ takes the form
529 %
530 \begin{equation*}
531 \begin{split}
532 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
533 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
534 ~ & = \, 2 \sum_{j} \left\{ \sum_k
535 \frac{\partial {\cal H}_k}{\partial v_{j}}
536 \left( {\cal H}_k (\vec{v}) - d_k \right)
537 \right\} \, {\vec{f}_{j}} \\
538 \end{split}
539 \end{equation*}
540 %
541 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
542 Jacobi matrix of the data projection operator.
543 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
544 adjoint operator,
545 driven by the model vs. data misfit:
546 \[
547 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
548 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
549 \]
550
551 \subsection{Storing vs. recomputation in reverse mode}
552 \label{checkpointing}
553
554 We note an important aspect of the forward vs. reverse
555 mode calculation.
556 Because of the local character of the derivative
557 (a derivative is defined w.r.t. a point along the trajectory),
558 the intermediate results of the model trajectory
559 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
560 may be required to evaluate the intermediate Jacobian
561 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
562 This is the case e.g. for nonlinear expressions
563 (momentum advection, nonlinear equation of state), state-dependent
564 conditional statements (parameterization schemes).
565 In the forward mode, the intermediate results are required
566 in the same order as computed by the full forward model ${\cal M}$,
567 but in the reverse mode they are required in the reverse order.
568 Thus, in the reverse mode the trajectory of the forward model
569 integration ${\cal M}$ has to be stored to be available in the reverse
570 calculation. Alternatively, the complete model state up to the
571 point of evaluation has to be recomputed whenever its value is required.
572
573 A method to balance the amount of recomputations vs.
574 storage requirements is called {\sf checkpointing}
575 (e.g. \cite{gri:92}, \cite{res-eta:98}).
576 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
577 [as an example, we give explicit numbers for a 3-day
578 integration with a 1-hourly timestep in square brackets].
579 \begin{itemize}
580 %
581 \item [$lev3$]
582 In a first step, the model trajectory is subdivided into
583 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
584 with the label $lev3$ for this outermost loop.
585 The model is then integrated along the full trajectory,
586 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
587 [i.e. 3 times, at
588 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
589 In addition, the cost function is computed, if needed.
590 %
591 \item [$lev2$]
592 In a second step each subsection itself is divided into
593 $ {n}^{lev2} $ subsections
594 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
595 The model picks up at the last outermost dumped state
596 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
597 the last subsection, with the label $lev2$ for this
598 intermediate loop.
599 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
600 timestep
601 [i.e. 4 times, at
602 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
603 %
604 \item [$lev1$]
605 Finally, the model picks up at the last intermediate dump state
606 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
607 the last subsection, with the label $lev1$ for this
608 intermediate loop.
609 Within this sub-subsection only, parts of the model state is stored
610 to memory at every timestep
611 [i.e. every hour $ i=0,...,5$ corresponding to
612 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
613 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
614 and the model state of all preceding timesteps along the last
615 innermost subsection are available, enabling integration backwards
616 in time along the last subsection.
617 The adjoint can thus be computed along this last
618 subsection $k_{n}^{lev2}$.
619 %
620 \end{itemize}
621 %
622 This procedure is repeated consecutively for each previous
623 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
624 carrying the adjoint computation to the initial time
625 of the subsection $k_{n}^{lev3}$.
626 Then, the procedure is repeated for the previous subsection
627 $k_{n-1}^{lev3}$
628 carrying the adjoint computation to the initial time
629 $k_{1}^{lev3}$.
630
631 For the full model trajectory of
632 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
633 the required storing of the model state was significantly reduced to
634 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
635 [i.e. for the 3-day integration with a total oof 72 timesteps
636 the model state was stored 7 times to disk and roughly 6 times
637 to memory].
638 This saving in memory comes at a cost of a required
639 3 full forward integrations of the model (one for each
640 checkpointing level).
641 The optimal balance of storage vs. recomputation certainly depends
642 on the computing resources available and may be adjusted by
643 adjusting the partitioning among the
644 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
645
646 \begin{figure}[t!]
647 \begin{center}
648 %\psdraft
649 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
650 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
651 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
652 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
653 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
654 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
655 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
656 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
657 %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
658 \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
659 %\psfull
660 \end{center}
661 \caption{
662 Schematic view of intermediate dump and restart for
663 3-level checkpointing.}
664 \label{fig:3levelcheck}
665 \end{figure}
666
667 % \subsection{Optimal perturbations}
668 % \label{sec_optpert}
669
670
671 % \subsection{Error covariance estimate and Hessian matrix}
672 % \label{sec_hessian}
673
674 \newpage
675
676 %**********************************************************************
677 \section{TLM and ADM generation in general}
678 \label{sec_ad_setup_gen}
679 %**********************************************************************
680
681 In this section we describe in a general fashion
682 the parts of the code that are relevant for automatic
683 differentiation using the software tool TAF.
684
685 \input{part5/doc_ad_the_model}
686
687 The basic flow is depicted in \ref{fig:adthemodel}.
688 If CPP option {\tt ALLOW\_AUTODIFF\_TAMC} is defined, the driver routine
689 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
690 invokes the adjoint of this routine, {\it adthe\_main\_loop},
691 which is the toplevel routine in terms of automatic differentiation.
692 The routine {\it adthe\_main\_loop} has been generated by TAF.
693 It contains both the forward integration of the full model, the
694 cost function calculation,
695 any additional storing that is required for efficient checkpointing,
696 and the reverse integration of the adjoint model.
697
698 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
699
700 In Fig. \ref{fig:adthemodel}
701 the structure of {\it adthe\_main\_loop} has been strongly
702 simplified to focus on the essentials; in particular, no checkpointing
703 procedures are shown here.
704 Prior to the call of {\it adthe\_main\_loop}, the routine
705 {\it ctrl\_unpack} is invoked to unpack the control vector
706 or initialise the control variables.
707 Following the call of {\it adthe\_main\_loop},
708 the routine {\it ctrl\_pack}
709 is invoked to pack the control vector
710 (cf. Section \ref{section_ctrl}).
711 If gradient checks are to be performed, the option
712 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
713 the driver routine {\it grdchk\_main} is called after
714 the gradient has been computed via the adjoint
715 (cf. Section \ref{section_grdchk}).
716
717 %------------------------------------------------------------------
718
719 \subsection{General setup
720 \label{section_ad_setup}}
721
722 In order to configure AD-related setups the following packages need
723 to be enabled:
724 {\it
725 \begin{table}[h!]
726 \begin{tabular}{l}
727 autodiff \\
728 ctrl \\
729 cost \\
730 grdchk \\
731 \end{tabular}
732 \end{table}
733 }
734 The packages are enabled by adding them to your experiment-specific
735 configuration file
736 {\it packages.conf} (see Section ???).
737
738 The following AD-specific CPP option files need to be customized:
739 %
740 \begin{itemize}
741 %
742 \item {\it ECCO\_CPPOPTIONS.h} \\
743 This header file collects CPP options for the packages
744 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
745 the external forcing package {\it exf}.
746 \footnote{NOTE: These options are not set in their package-specific
747 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
748 in the single header file {\it ECCO\_CPPOPTIONS.h}.
749 The package-specific header files serve as simple
750 placeholders at this point.}
751 %
752 \item {\it tamc.h} \\
753 This header configures the splitting of the time stepping loop
754 w.r.t. the 3-level checkpointing (see section ???).
755
756 %
757 \end{itemize}
758
759 %------------------------------------------------------------------
760
761 \subsection{Building the AD code
762 \label{section_ad_build}}
763
764 The build process of an AD code is very similar to building
765 the forward model. However, depending on which AD code one wishes
766 to generate, and on which AD tool is available (TAF or TAMC),
767 the following {\tt make} targets are available:
768
769 \begin{table}[h!]
770 {\footnotesize
771 \begin{tabular}{ccll}
772 ~ & {\it AD-target} & {\it output} & {\it description} \\
773 \hline
774 \hline
775 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
776 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
777 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
778 ~ & ~ & ~ & useful for compiling on remote platforms \\
779 \hline
780 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
781 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
782 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
783 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
784 \hline
785 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
786 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
787 ~ & ~ & ~ & and compiles all code \\
788 ~ & ~ & ~ & (use of TAF is set as default) \\
789 \hline
790 \hline
791 \end{tabular}
792 }
793 \end{table}
794 %
795 Here, the following placeholders are used
796 %
797 \begin{itemize}
798 %
799 \item [$<$TOOL$>$]
800 %
801 \begin{itemize}
802 %
803 \item {\tt TAF}
804 \item {\tt TAMC}
805 %
806 \end{itemize}
807 %
808 \item [$<$MODE$>$]
809 %
810 \begin{itemize}
811 %
812 \item {\tt ad} generates the adjoint model (ADM)
813 \item {\tt ftl} generates the tangent linear model (TLM)
814 \item {\tt svd} generates both ADM and TLM for \\
815 singular value decomposition (SVD) type calculations
816 %
817 \end{itemize}
818 %
819 \end{itemize}
820
821 For example, to generate the adjoint model using TAF after routines ({\tt .F})
822 or headers ({\tt .h}) have been modified, but without compilation,
823 type {\tt make adtaf};
824 or, to generate the tangent linear model using TAMC without
825 re-generating the input code, type {\tt make ftltamconly}.
826
827
828 A typical full build process to generate the ADM via TAF would
829 look like follows:
830 \begin{verbatim}
831 % mkdir build
832 % cd build
833 % ../../../tools/genmake2 -mods=../code_ad
834 % make depend
835 % make adall
836 \end{verbatim}
837
838 %------------------------------------------------------------------
839
840 \subsection{The AD build process in detail
841 \label{section_ad_build_detail}}
842
843 The {\tt make <MODE>all} target consists of the following procedures:
844
845 \begin{enumerate}
846 %
847 \item
848 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
849 on which code ought to be generated. Depending on the {\tt make} target,
850 the contents is
851 \begin{itemize}
852 \item
853 {\tt \#define ALLOW\_ADJOINT\_RUN}
854 \item
855 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
856 \item
857 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
858 \end{itemize}
859 %
860 \item
861 A single file {\tt <MODE>\_input\_code.f} is concatenated
862 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
863 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
864 %
865 \item
866 The AD tool is invoked with the {\bf <MODE>\_<TOOL>\_FLAGS}.
867 The default AD tool flags in {\tt genmake2} can be overrwritten by
868 an {\tt adjoint\_options} file (similar to the platform-specific
869 {\tt build\_options}, see Section ???.
870 The AD tool writes the resulting AD code into the file
871 {\tt <MODE>\_input\_code\_ad.f}
872 %
873 \item
874 A short sed script {\tt adjoint\_sed} is applied to
875 {\tt <MODE>\_input\_code\_ad.f}
876 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
877 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
878 %
879 \item
880 All routines are compiled and an executable is generated
881 (see Table ???).
882 %
883 \end{enumerate}
884
885 \subsubsection{The list AD\_FILES and {\tt .list} files}
886
887 Not all routines are presented to the AD tool.
888 Routines typically hidden are diagnostics routines which
889 do not influence the cost function, but may create
890 artificial flow dependencies such as I/O of active variables.
891
892 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
893 which contains all routines that are shown to the AD tool.
894 This list is put together from all files with suffix {\tt .list}
895 that {\tt genmake2} finds in its search directories.
896 The list file for the core MITgcm routines is in {\tt model/src/}
897 is called {\tt model\_ad\_diff.list}.
898 Note that no wrapper routine is shown to TAF. These are either
899 not visible at all to the AD code, or hand-written AD code
900 is available (see next section).
901
902 Each package directory contains its package-specific
903 list file {\tt <PKG>\_ad\_diff.list}. For example,
904 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
905 Thus, enabling a package will automatically extend the
906 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
907 package-specific routines.
908 Note that you will need to regenerate the {\tt Makefile} if
909 you enable a package (e.g. by adding it to {\tt packages.conf})
910 and a {\tt Makefile} already exists.
911
912 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
913
914 TAMC and TAF can evaluate user-specified directives
915 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
916 The main categories of directives are STORE directives and
917 FLOW directives. Here, we are concerned with flow directives,
918 store directives are treated elsewhere.
919
920 Flow directives enable the AD tool to evaluate how it should treat
921 routines that are 'hidden' by the user, i.e. routines which are
922 not contained in the {\bf AD\_FILES} list (see previous section),
923 but which are called in part of the code that the AD tool does see.
924 The flow directive tell the AD tool
925 %
926 \begin{itemize}
927 %
928 \item which subroutine arguments are input/output
929 \item which subroutine arguments are active
930 \item which subroutine arguments are required to compute the cost
931 \item which subroutine arguments are dependent
932 %
933 \end{itemize}
934 %
935 The syntax for the flow directives can be found in the
936 AD tool manuals.
937
938 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
939 which contains all files with suffix{\tt .flow} that it finds
940 in its search directories.
941 The flow directives for the core MITgcm routines of
942 {\tt eesupp/src/} and {\tt model/src/}
943 reside in {\tt pkg/autodiff/}.
944 This directory also contains hand-written adjoint code
945 for the MITgcm WRAPPER (see Section ???).
946
947 Flow directives for package-specific routines are contained in
948 the corresponding package directories in the file
949 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
950 {\tt ptracers\_ad.flow}.
951
952 \subsubsection{Store directives for 3-level checkpointing}
953
954 The storing that is required at each period of the
955 3-level checkpointing is controled by three
956 top-level headers.
957
958 \begin{verbatim}
959 do ilev_3 = 1, nchklev_3
960 # include ``checkpoint_lev3.h''
961 do ilev_2 = 1, nchklev_2
962 # include ``checkpoint_lev2.h''
963 do ilev_1 = 1, nchklev_1
964 # include ``checkpoint_lev1.h''
965
966 ...
967
968 end do
969 end do
970 end do
971 \end{verbatim}
972
973 All files {\tt checkpoint\_lev?.h} are contained in directory
974 {\tt pkg/autodiff/}.
975
976
977 \subsubsection{Changing the default AD tool flags: ad\_options files}
978
979
980 \subsubsection{Hand-written adjoint code}
981
982 %------------------------------------------------------------------
983
984 \subsection{The cost function (dependent variable)
985 \label{section_cost}}
986
987 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
988 It is a function of the input variables $ \vec{u} $ via the composition
989 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
990 The input are referred to as the
991 {\sf independent variables} or {\sf control variables}.
992 All aspects relevant to the treatment of the cost function $ {\cal J} $
993 (parameter setting, initialization, accumulation,
994 final evaluation), are controlled by the package {\it pkg/cost}.
995 The aspects relevant to the treatment of the independent variables
996 are controlled by the package {\it pkg/ctrl} and will be treated
997 in the next section.
998
999 \input{part5/doc_cost_flow}
1000
1001 \subsubsection{Enabling the package}
1002
1003 \fbox{
1004 \begin{minipage}{12cm}
1005 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1006 \end{minipage}
1007 }
1008 \begin{itemize}
1009 %
1010 \item
1011 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1012 (see Section ???)
1013 %
1014 \item
1015
1016
1017 \end{itemize}
1018 %
1019
1020 N.B.: In general the following packages ought to be enabled
1021 simultaneously: {\it autodiff, cost, ctrl}.
1022 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1023 Each specific cost function contribution has its own option.
1024 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1025 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1026 Since the cost function is usually used in conjunction with
1027 automatic differentiation, the CPP option
1028 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1029 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1030 should be defined.
1031
1032 \subsubsection{Initialization}
1033 %
1034 The initialization of the {\it cost} package is readily enabled
1035 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1036 %
1037 \begin{itemize}
1038 %
1039 \item
1040 \fbox{
1041 \begin{minipage}{12cm}
1042 Parameters: {\it cost\_readparms}
1043 \end{minipage}
1044 }
1045 \\
1046 This S/R
1047 reads runtime flags and parameters from file {\it data.cost}.
1048 For the present example the only relevant parameter read
1049 is {\bf mult\_tracer}. This multiplier enables different
1050 cost function contributions to be switched on
1051 ( = 1.) or off ( = 0.) at runtime.
1052 For more complex cost functions which involve model vs. data
1053 misfits, the corresponding data filenames and data
1054 specifications (start date and time, period, ...) are read
1055 in this S/R.
1056 %
1057 \item
1058 \fbox{
1059 \begin{minipage}{12cm}
1060 Variables: {\it cost\_init}
1061 \end{minipage}
1062 }
1063 \\
1064 This S/R
1065 initializes the different cost function contributions.
1066 The contribution for the present example is {\bf objf\_tracer}
1067 which is defined on each tile (bi,bj).
1068 %
1069 \end{itemize}
1070 %
1071 \subsubsection{Accumulation}
1072 %
1073 \begin{itemize}
1074 %
1075 \item
1076 \fbox{
1077 \begin{minipage}{12cm}
1078 {\it cost\_tile}, {\it cost\_tracer}
1079 \end{minipage}
1080 }
1081 \end{itemize}
1082 %
1083 The 'driver' routine
1084 {\it cost\_tile} is called at the end of each time step.
1085 Within this 'driver' routine, S/R are called for each of
1086 the chosen cost function contributions.
1087 In the present example ({\bf ALLOW\_COST\_TRACER}),
1088 S/R {\it cost\_tracer} is called.
1089 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1090 %
1091 \subsubsection{Finalize all contributions}
1092 %
1093 \begin{itemize}
1094 %
1095 \item
1096 \fbox{
1097 \begin{minipage}{12cm}
1098 {\it cost\_final}
1099 \end{minipage}
1100 }
1101 \end{itemize}
1102 %
1103 At the end of the forward integration S/R {\it cost\_final}
1104 is called. It accumulates the total cost function {\bf fc}
1105 from each contribution and sums over all tiles:
1106 \begin{equation}
1107 {\cal J} \, = \,
1108 {\rm fc} \, = \,
1109 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1110 {\rm objf\_tracer}(bi,bj) \, + \, ...
1111 \end{equation}
1112 %
1113 The total cost function {\bf fc} will be the
1114 'dependent' variable in the argument list for TAMC, i.e.
1115 \begin{verbatim}
1116 tamc -output 'fc' ...
1117 \end{verbatim}
1118
1119 %%%% \end{document}
1120
1121 \input{part5/doc_ad_the_main}
1122
1123 \subsection{The control variables (independent variables)
1124 \label{section_ctrl}}
1125
1126 The control variables are a subset of the model input
1127 (initial conditions, boundary conditions, model parameters).
1128 Here we identify them with the variable $ \vec{u} $.
1129 All intermediate variables whose derivative w.r.t. control
1130 variables do not vanish are called {\sf active variables}.
1131 All subroutines whose derivative w.r.t. the control variables
1132 don't vanish are called {\sf active routines}.
1133 Read and write operations from and to file can be viewed
1134 as variable assignments. Therefore, files to which
1135 active variables are written and from which active variables
1136 are read are called {\sf active files}.
1137 All aspects relevant to the treatment of the control variables
1138 (parameter setting, initialization, perturbation)
1139 are controlled by the package {\it pkg/ctrl}.
1140
1141 \input{part5/doc_ctrl_flow}
1142
1143 \subsubsection{genmake and CPP options}
1144 %
1145 \begin{itemize}
1146 %
1147 \item
1148 \fbox{
1149 \begin{minipage}{12cm}
1150 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1151 \end{minipage}
1152 }
1153 \end{itemize}
1154 %
1155 To enable the directory to be included to the compile list,
1156 {\bf ctrl} has to be added to the {\bf enable} list in
1157 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1158 package, cf. previous section).
1159 Each control variable is enabled via its own CPP option
1160 in {\it ECCO\_CPPOPTIONS.h}.
1161
1162 \subsubsection{Initialization}
1163 %
1164 \begin{itemize}
1165 %
1166 \item
1167 \fbox{
1168 \begin{minipage}{12cm}
1169 Parameters: {\it ctrl\_readparms}
1170 \end{minipage}
1171 }
1172 \\
1173 %
1174 This S/R
1175 reads runtime flags and parameters from file {\it data.ctrl}.
1176 For the present example the file contains the file names
1177 of each control variable that is used.
1178 In addition, the number of wet points for each control
1179 variable and the net dimension of the space of control
1180 variables (counting wet points only) {\bf nvarlength}
1181 is determined.
1182 Masks for wet points for each tile {\bf (bi,\,bj)}
1183 and vertical layer {\bf k} are generated for the three
1184 relevant categories on the C-grid:
1185 {\bf nWetCtile} for tracer fields,
1186 {\bf nWetWtile} for zonal velocity fields,
1187 {\bf nWetStile} for meridional velocity fields.
1188 %
1189 \item
1190 \fbox{
1191 \begin{minipage}{12cm}
1192 Control variables, control vector,
1193 and their gradients: {\it ctrl\_unpack}
1194 \end{minipage}
1195 }
1196 \\
1197 %
1198 Two important issues related to the handling of the control
1199 variables in the MITGCM need to be addressed.
1200 First, in order to save memory, the control variable arrays
1201 are not kept in memory, but rather read from file and added
1202 to the initial fields during the model initialization phase.
1203 Similarly, the corresponding adjoint fields which represent
1204 the gradient of the cost function w.r.t. the control variables
1205 are written to file at the end of the adjoint integration.
1206 Second, in addition to the files holding the 2-dim. and 3-dim.
1207 control variables and the corresponding cost gradients,
1208 a 1-dim. {\sf control vector}
1209 and {\sf gradient vector} are written to file. They contain
1210 only the wet points of the control variables and the corresponding
1211 gradient.
1212 This leads to a significant data compression.
1213 Furthermore, an option is available
1214 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1215 non-dimensionalise the control and gradient vector,
1216 which otherwise would contain different pieces of different
1217 magnitudes and units.
1218 Finally, the control and gradient vector can be passed to a
1219 minimization routine if an update of the control variables
1220 is sought as part of a minimization exercise.
1221
1222 The files holding fields and vectors of the control variables
1223 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1224 %
1225 \end{itemize}
1226
1227 \subsubsection{Perturbation of the independent variables}
1228 %
1229 The dependency flow for differentiation w.r.t. the controls
1230 starts with adding a perturbation onto the input variable,
1231 thus defining the independent or control variables for TAMC.
1232 Three types of controls may be considered:
1233 %
1234 \begin{itemize}
1235 %
1236 \item
1237 \fbox{
1238 \begin{minipage}{12cm}
1239 {\it ctrl\_map\_ini} (initial value sensitivity):
1240 \end{minipage}
1241 }
1242 \\
1243 %
1244 Consider as an example the initial tracer distribution
1245 {\bf tr1} as control variable.
1246 After {\bf tr1} has been initialised in
1247 {\it ini\_tr1} (dynamical variables such as
1248 temperature and salinity are initialised in {\it ini\_fields}),
1249 a perturbation anomaly is added to the field in S/R
1250 {\it ctrl\_map\_ini}
1251 %
1252 \begin{equation}
1253 \begin{split}
1254 u & = \, u_{[0]} \, + \, \Delta u \\
1255 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1256 \label{perturb}
1257 \end{split}
1258 \end{equation}
1259 %
1260 {\bf xx\_tr1} is a 3-dim. global array
1261 holding the perturbation. In the case of a simple
1262 sensitivity study this array is identical to zero.
1263 However, it's specification is essential in the context
1264 of automatic differentiation since TAMC
1265 treats the corresponding line in the code symbolically
1266 when determining the differentiation chain and its origin.
1267 Thus, the variable names are part of the argument list
1268 when calling TAMC:
1269 %
1270 \begin{verbatim}
1271 tamc -input 'xx_tr1 ...' ...
1272 \end{verbatim}
1273 %
1274 Now, as mentioned above, the MITGCM avoids maintaining
1275 an array for each control variable by reading the
1276 perturbation to a temporary array from file.
1277 To ensure the symbolic link to be recognized by TAMC, a scalar
1278 dummy variable {\bf xx\_tr1\_dummy} is introduced
1279 and an 'active read' routine of the adjoint support
1280 package {\it pkg/autodiff} is invoked.
1281 The read-procedure is tagged with the variable
1282 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1283 initialization of the perturbation.
1284 The modified call of TAMC thus reads
1285 %
1286 \begin{verbatim}
1287 tamc -input 'xx_tr1_dummy ...' ...
1288 \end{verbatim}
1289 %
1290 and the modified operation to (\ref{perturb})
1291 in the code takes on the form
1292 %
1293 \begin{verbatim}
1294 call active_read_xyz(
1295 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1296
1297 tr1(...) = tr1(...) + tmpfld3d(...)
1298 \end{verbatim}
1299 %
1300 Note, that reading an active variable corresponds
1301 to a variable assignment. Its derivative corresponds
1302 to a write statement of the adjoint variable, followed by
1303 a reset.
1304 The 'active file' routines have been designed
1305 to support active read and corresponding adjoint active write
1306 operations (and vice versa).
1307 %
1308 \item
1309 \fbox{
1310 \begin{minipage}{12cm}
1311 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1312 \end{minipage}
1313 }
1314 \\
1315 %
1316 The handling of boundary values as control variables
1317 proceeds exactly analogous to the initial values
1318 with the symbolic perturbation taking place in S/R
1319 {\it ctrl\_map\_forcing}.
1320 Note however an important difference:
1321 Since the boundary values are time dependent with a new
1322 forcing field applied at each time steps,
1323 the general problem may be thought of as
1324 a new control variable at each time step
1325 (or, if the perturbation is averaged over a certain period,
1326 at each $ N $ timesteps), i.e.
1327 \[
1328 u_{\rm forcing} \, = \,
1329 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1330 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1331 \]
1332 %
1333 In the current example an equilibrium state is considered,
1334 and only an initial perturbation to
1335 surface forcing is applied with respect to the
1336 equilibrium state.
1337 A time dependent treatment of the surface forcing is
1338 implemented in the ECCO environment, involving the
1339 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1340 %
1341 \item
1342 \fbox{
1343 \begin{minipage}{12cm}
1344 {\it ctrl\_map\_params} (parameter sensitivity):
1345 \end{minipage}
1346 }
1347 \\
1348 %
1349 This routine is not yet implemented, but would proceed
1350 proceed along the same lines as the initial value sensitivity.
1351 The mixing parameters {\bf diffkr} and {\bf kapgm}
1352 are currently added as controls in {\it ctrl\_map\_ini.F}.
1353 %
1354 \end{itemize}
1355 %
1356
1357 \subsubsection{Output of adjoint variables and gradient}
1358 %
1359 Several ways exist to generate output of adjoint fields.
1360 %
1361 \begin{itemize}
1362 %
1363 \item
1364 \fbox{
1365 \begin{minipage}{12cm}
1366 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1367 \end{minipage}
1368 }
1369 \\
1370 \begin{itemize}
1371 %
1372 \item {\bf xx\_...}: the control variable fields \\
1373 Before the forward integration, the control
1374 variables are read from file {\bf xx\_ ...} and added to
1375 the model field.
1376 %
1377 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1378 $ \nabla _{u}{\cal J} $ for each control variable \\
1379 After the adjoint integration the corresponding adjoint
1380 variables are written to {\bf adxx\_ ...}.
1381 %
1382 \end{itemize}
1383 %
1384 \item
1385 \fbox{
1386 \begin{minipage}{12cm}
1387 {\it ctrl\_unpack, ctrl\_pack}:
1388 \end{minipage}
1389 }
1390 \\
1391 %
1392 \begin{itemize}
1393 %
1394 \item {\bf vector\_ctrl}: the control vector \\
1395 At the very beginning of the model initialization,
1396 the updated compressed control vector is read (or initialised)
1397 and distributed to 2-dim. and 3-dim. control variable fields.
1398 %
1399 \item {\bf vector\_grad}: the gradient vector \\
1400 At the very end of the adjoint integration,
1401 the 2-dim. and 3-dim. adjoint variables are read,
1402 compressed to a single vector and written to file.
1403 %
1404 \end{itemize}
1405 %
1406 \item
1407 \fbox{
1408 \begin{minipage}{12cm}
1409 {\it addummy\_in\_stepping}:
1410 \end{minipage}
1411 }
1412 \\
1413 In addition to writing the gradient at the end of the
1414 forward/adjoint integration, many more adjoint variables
1415 of the model state
1416 at intermediate times can be written using S/R
1417 {\it addummy\_in\_stepping}.
1418 This routine is part of the adjoint support package
1419 {\it pkg/autodiff} (cf.f. below).
1420 The procedure is enabled using via the CPP-option
1421 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1422 To be part of the adjoint code, the corresponding S/R
1423 {\it dummy\_in\_stepping} has to be called in the forward
1424 model (S/R {\it the\_main\_loop}) at the appropriate place.
1425 The adjoint common blocks are extracted from the adjoint code
1426 via the header file {\it adcommon.h}.
1427
1428 {\it dummy\_in\_stepping} is essentially empty,
1429 the corresponding adjoint routine is hand-written rather
1430 than generated automatically.
1431 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1432 ensure that TAMC does not automatically
1433 generate {\it addummy\_in\_stepping} by trying to differentiate
1434 {\it dummy\_in\_stepping}, but instead refers to
1435 the hand-written routine.
1436
1437 {\it dummy\_in\_stepping} is called in the forward code
1438 at the beginning of each
1439 timestep, before the call to {\it dynamics}, thus ensuring
1440 that {\it addummy\_in\_stepping} is called at the end of
1441 each timestep in the adjoint calculation, after the call to
1442 {\it addynamics}.
1443
1444 {\it addummy\_in\_stepping} includes the header files
1445 {\it adcommon.h}.
1446 This header file is also hand-written. It contains
1447 the common blocks
1448 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1449 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1450 {\bf /adtr1\_r/}, {\bf /adffields/},
1451 which have been extracted from the adjoint code to enable
1452 access to the adjoint variables.
1453
1454 {\bf WARNING:} If the structure of the common blocks
1455 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1456 similar changes will occur in the adjoint common blocks.
1457 Therefore, consistency between the TAMC-generated common blocks
1458 and those in {\it adcommon.h} have to be checked.
1459 %
1460 \end{itemize}
1461
1462
1463 \subsubsection{Control variable handling for
1464 optimization applications}
1465
1466 In optimization mode the cost function $ {\cal J}(u) $ is sought
1467 to be minimized with respect to a set of control variables
1468 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1469 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1470 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1471 at iteration step $ k $ serve
1472 as input to a minimization routine (e.g. quasi-Newton method,
1473 conjugate gradient, ... \cite{gil-lem:89})
1474 to compute an update in the
1475 control variable for iteration step $k+1$
1476 \[
1477 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1478 \quad \mbox{satisfying} \quad
1479 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1480 \]
1481 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1482 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1483 $ k+1 $.
1484 Tab. \ref{???} sketches the flow between forward/adjoint model
1485 and the minimization routine.
1486
1487 \begin{eqnarray*}
1488 \scriptsize
1489 \begin{array}{ccccc}
1490 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1491 {\Big\downarrow}
1492 & ~ & ~ & ~ & ~ \\
1493 ~ & ~ & ~ & ~ & ~ \\
1494 \hline
1495 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1496 \multicolumn{1}{|c}{
1497 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1498 \stackrel{\bf forward}{\bf \longrightarrow} &
1499 v_{[k]} = M \left( u_{[k]} \right) &
1500 \stackrel{\bf forward}{\bf \longrightarrow} &
1501 \multicolumn{1}{c|}{
1502 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1503 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1504 \hline
1505 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1506 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1507 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1508 \hline
1509 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1510 \multicolumn{1}{|c}{
1511 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1512 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1513 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1514 ad \, v_{[k]} (\delta {\cal J}) =
1515 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1516 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1517 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1518 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1519 \hline
1520 ~ & ~ & ~ & ~ & ~ \\
1521 \hspace*{15ex}{\Bigg\downarrow}
1522 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1523 & ~ & ~ & ~ & ~ \\
1524 ~ & ~ & ~ & ~ & ~ \\
1525 \hline
1526 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1527 \multicolumn{1}{|c}{
1528 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1529 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1530 {\mathbf \longrightarrow} &
1531 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1532 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1533 \hline
1534 ~ & ~ & ~ & ~ & ~ \\
1535 ~ & ~ & ~ & ~ & \Big\downarrow \\
1536 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1537 \end{array}
1538 \end{eqnarray*}
1539
1540 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1541 the link between the model and the minimization routine.
1542 As described in Section \ref{???}
1543 the {\it unpack} and {\it pack} routines read and write
1544 control and gradient {\it vectors} which are compressed
1545 to contain only wet points, in addition to the full
1546 2-dim. and 3-dim. fields.
1547 The corresponding I/O flow looks as follows:
1548
1549 \vspace*{0.5cm}
1550
1551 {\scriptsize
1552 \begin{tabular}{ccccc}
1553 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1554 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1555 \cline{1-1}
1556 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1557 \cline{1-1}
1558 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1559 \cline{3-3}
1560 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1561 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1562 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1563 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1564 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1565 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1566 & ~ & ~ \\
1567 \cline{3-3}
1568 ~ & ~ & $\downarrow$ & ~ & ~ \\
1569 \cline{3-3}
1570 ~ & ~ &
1571 \multicolumn{1}{|c|}{~} & ~ &
1572 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1573 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1574 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1575 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1576 ~ & ~ & \multicolumn{1}{|c|}{~}
1577 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1578 \cline{3-3}
1579 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1580 \cline{5-5}
1581 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1582 \cline{5-5}
1583 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1584 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1585 \end{tabular}
1586 }
1587
1588 \vspace*{0.5cm}
1589
1590
1591 {\it ctrl\_unpack} reads the updated control vector
1592 {\bf vector\_ctrl\_$<$k$>$}.
1593 It distributes the different control variables to
1594 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1595 At the start of the forward integration the control variables
1596 are read from {\it xx\_...$<$k$>$} and added to the
1597 field.
1598 Correspondingly, at the end of the adjoint integration
1599 the adjoint fields are written
1600 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1601 Finally, {\it ctrl\_pack} collects all adjoint files
1602 and writes them to the compressed vector file
1603 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22