/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.24 - (show annotations) (download) (as text)
Tue Aug 31 20:56:21 2010 UTC (14 years, 10 months ago) by jmc
Branch: MAIN
CVS Tags: checkpoint01, HEAD
Changes since 1.23: +4 -5 lines
File MIME type: application/x-tex
change "\ref{???}" to "ref:ask-the-author" so that we know it's unfinished
but not broken (+ avoid latex warnings).

1 % $Header: /u/gcmpack/manual/s_autodiff/text/doc_ad_2.tex,v 1.23 2010/08/30 23:09:19 jmc Exp $
2 % $Name: $
3
4 Author: Patrick Heimbach
5
6 {\sf Automatic differentiation} (AD), also referred to as algorithmic
7 (or, more loosely, computational) differentiation, involves
8 automatically deriving code to calculate partial derivatives from an
9 existing fully non-linear prognostic code. (see \cite{gri:00}). A
10 software tool is used that parses and transforms source files
11 according to a set of linguistic and mathematical rules. AD tools are
12 like source-to-source translators in that they parse a program code as
13 input and produce a new program code as output
14 (we restrict our discussion to source-to-source tools, ignoring
15 operator-overloading tools). However, unlike a
16 pure source-to-source translation, the output program represents a new
17 algorithm, such as the evaluation of the Jacobian, the Hessian, or
18 higher derivative operators. In principle, a variety of derived
19 algorithms can be generated automatically in this way.
20
21 MITgcm has been adapted for use with the Tangent linear and Adjoint
22 Model Compiler (TAMC) and its successor TAF (Transformation of
23 Algorithms in Fortran), developed by Ralf Giering (\cite{gie-kam:98},
24 \cite{gie:99,gie:00}). The first application of the adjoint of MITgcm
25 for sensitivity studies has been published by \cite{maro-eta:99}.
26 \cite{stam-etal:97,stam-etal:02} use MITgcm and its adjoint for ocean
27 state estimation studies. In the following we shall refer to TAMC and
28 TAF synonymously, except were explicitly stated otherwise.
29
30 As of mid-2007 we are also able to generate fairly efficient
31 adjoint code of the MITgcm using a new, open-source AD tool,
32 called OpenAD (see \cite{naum-etal:06,utke-etal:08}.
33 This enables us for the first time to compare adjoint models
34 generated from different AD tools, providing an additional
35 accuracy check, complementary to finite-difference gradient checks.
36 OpenAD and its application to MITgcm is described in detail
37 in section \ref{sec_ad_openad}.
38
39 The AD tool exploits the chain rule for computing the first derivative of a
40 function with respect to a set of input variables. Treating a given
41 forward code as a composition of operations -- each line representing
42 a compositional element, the chain rule is rigorously applied to the
43 code, line by line. The resulting tangent linear or adjoint code,
44 then, may be thought of as the composition in forward or reverse
45 order, respectively, of the Jacobian matrices of the forward code's
46 compositional elements.
47
48 %**********************************************************************
49 \section{Some basic algebra}
50 \label{sec_ad_algebra}
51 \begin{rawhtml}
52 <!-- CMIREDIR:sec_ad_algebra: -->
53 \end{rawhtml}
54 %**********************************************************************
55
56 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
57 mapping from the $m$-dimensional space
58 $U \subset I\!\!R^m$ of input variables
59 $\vec{u}=(u_1,\ldots,u_m)$
60 (model parameters, initial conditions, boundary conditions
61 such as forcing functions) to the $n$-dimensional space
62 $V \subset I\!\!R^n$ of
63 model output variable $\vec{v}=(v_1,\ldots,v_n)$
64 (model state, model diagnostics, objective function, ...)
65 under consideration,
66 %
67 \begin{equation}
68 \begin{aligned}
69 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
70 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
71 {\cal M}(\vec{u})
72 \label{fulloperator}
73 \end{aligned}
74 \end{equation}
75 %
76 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
77 some given basis vectors
78 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
79 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
80 \[
81 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
82 \qquad
83 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
84 \]
85
86 Two routes may be followed to determine the sensitivity of the
87 output variable $\vec{v}$ to its input $\vec{u}$.
88
89 \subsection{Forward or direct sensitivity}
90 %
91 Consider a perturbation to the input variables $\delta \vec{u}$
92 (typically a single component
93 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
94 Their effect on the output may be obtained via the linear
95 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
96 $ M $, evaluated in the point $u^{(0)}$ according to
97 %
98 \begin{equation}
99 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
100 \label{tangent_linear}
101 \end{equation}
102 with resulting output perturbation $\delta \vec{v}$.
103 In components
104 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
105 it reads
106 %
107 \begin{equation}
108 \delta v_{j} \, = \, \sum_{i}
109 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
110 \delta u_{i}
111 \label{jacobi_matrix}
112 \end{equation}
113 %
114 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
115 In contrast to the full nonlinear model $ {\cal M} $, the operator
116 $ M $ is just a matrix
117 which can readily be used to find the forward sensitivity of $\vec{v}$ to
118 perturbations in $u$,
119 but if there are very many input variables $(\gg O(10^{6})$ for
120 large-scale oceanographic application), it quickly becomes
121 prohibitive to proceed directly as in (\ref{tangent_linear}),
122 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
123
124 \subsection{Reverse or adjoint sensitivity}
125 %
126 Let us consider the special case of a
127 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
128 the total meridional heat transport,
129 the total uptake of $CO_{2}$ in the Southern
130 Ocean over a time interval,
131 or a measure of some model-to-data misfit)
132 %
133 \begin{eqnarray}
134 \begin{array}{cccccc}
135 {\cal J} \, : & U &
136 \longrightarrow & V &
137 \longrightarrow & I \!\! R \\
138 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
139 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
140 \end{array}
141 \label{compo}
142 \end{eqnarray}
143 %
144 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
145 \[
146 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
147 \]
148 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
149 w.r.t. their corresponding inner product
150 $\left\langle \,\, , \,\, \right\rangle $
151 %
152 \begin{equation}
153 \begin{aligned}
154 {\cal J} & = \,
155 {\cal J} |_{\vec{u}^{(0)}} \, + \,
156 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
157 \, + \, O(\delta \vec{u}^2) \\
158 ~ & = \,
159 {\cal J} |_{\vec{v}^{(0)}} \, + \,
160 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
161 \, + \, O(\delta \vec{v}^2)
162 \end{aligned}
163 \label{deljidentity}
164 \end{equation}
165 %
166 (note, that the gradient $ \nabla f $ is a co-vector, therefore
167 its transpose is required in the above inner product).
168 Then, using the representation of
169 $ \delta {\cal J} =
170 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
171 the definition
172 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
173 \[
174 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
175 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
176 \]
177 which for finite-dimensional vector spaces is just the
178 transpose of $ A $,
179 \[
180 A^{\ast} \, = \, A^T
181 \]
182 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
183 we note that
184 (omitting $|$'s):
185 %
186 \begin{equation}
187 \delta {\cal J}
188 \, = \,
189 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
190 \, = \,
191 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
192 \, = \,
193 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
194 \delta \vec{u} \, \right\rangle
195 \label{inner}
196 \end{equation}
197 %
198 With the identity (\ref{deljidentity}), we then find that
199 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
200 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
201 %
202 \begin{equation}
203 \begin{aligned}
204 \nabla _{u}{\cal J}^T |_{\vec{u}} &
205 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
206 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
207 ~ & = \, \delta \vec{u}^{\ast}
208 \end{aligned}
209 \label{adjoint}
210 \end{equation}
211 %
212 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
213 in which $M^T$ is the adjoint (here, the transpose) of the
214 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
215 the adjoint variable of the model state $ \vec{v} $, and
216 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
217
218 The {\sf reverse} nature of the adjoint calculation can be readily
219 seen as follows.
220 Consider a model integration which consists of $ \Lambda $
221 consecutive operations
222 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
223 ...... ( {\cal M}_{\lambda} (
224 ......
225 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
226 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
227 in the code of the model, or successive time steps of the
228 model integration,
229 starting at step 0 and moving up to step $\Lambda$, with intermediate
230 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
231 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
232 Let ${\cal J}$ be a cost function which explicitly depends on the
233 final state $\vec{v}$ only
234 (this restriction is for clarity reasons only).
235 %
236 ${\cal J}(u)$ may be decomposed according to:
237 %
238 \begin{equation}
239 {\cal J}({\cal M}(\vec{u})) \, = \,
240 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
241 ...... ( {\cal M}_{\lambda} (
242 ......
243 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
244 \label{compos}
245 \end{equation}
246 %
247 Then, according to the chain rule, the forward calculation reads,
248 in terms of the Jacobi matrices
249 (we've omitted the $ | $'s which, nevertheless are important
250 to the aspect of {\it tangent} linearity;
251 note also that by definition
252 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
253 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
254 %
255 \begin{equation}
256 \begin{aligned}
257 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
258 \nabla_v {\cal J} \cdot M_{\Lambda}
259 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
260 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
261 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
262 \end{aligned}
263 \label{forward}
264 \end{equation}
265 %
266 whereas in reverse mode we have
267 %
268 \begin{equation}
269 \boxed{
270 \begin{aligned}
271 M^T ( \nabla_v {\cal J}^T) & = \,
272 M_{0}^T \cdot M_{1}^T
273 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
274 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
275 ~ & = \, M_{0}^T \cdot M_{1}^T
276 \cdot ...... \cdot
277 \nabla_{v^{(\lambda)}} {\cal J}^T \\
278 ~ & = \, \nabla_u {\cal J}^T
279 \end{aligned}
280 }
281 \label{reverse}
282 \end{equation}
283 %
284 clearly expressing the reverse nature of the calculation.
285 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
286 If the intermediate steps $\lambda$ in
287 eqn. (\ref{compos}) -- (\ref{reverse})
288 represent the model state (forward or adjoint) at each
289 intermediate time step as noted above, then correspondingly,
290 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
291 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
292 It thus becomes evident that the adjoint calculation also
293 yields the adjoint of each model state component
294 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
295 %
296 \begin{equation}
297 \boxed{
298 \begin{aligned}
299 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
300 & = \,
301 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
302 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
303 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
304 \end{aligned}
305 }
306 \end{equation}
307 %
308 in close analogy to eq. (\ref{adjoint})
309 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
310 are the Lagrange multipliers of the model equations which determine
311 $ \vec{v}^{(\lambda)}$.
312
313 In components, eq. (\ref{adjoint}) reads as follows.
314 Let
315 \[
316 \begin{array}{rclcrcl}
317 \delta \vec{u} & = &
318 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
319 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
320 \left(
321 \frac{\partial {\cal J}}{\partial u_1},\ldots,
322 \frac{\partial {\cal J}}{\partial u_m}
323 \right)^T \\
324 \delta \vec{v} & = &
325 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
326 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
327 \left(
328 \frac{\partial {\cal J}}{\partial v_1},\ldots,
329 \frac{\partial {\cal J}}{\partial v_n}
330 \right)^T \\
331 \end{array}
332 \]
333 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
334 and their adjoint variables;
335 further
336 \[
337 M \, = \, \left(
338 \begin{array}{ccc}
339 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
340 \frac{\partial {\cal M}_1}{\partial u_m} \\
341 \vdots & ~ & \vdots \\
342 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
343 \frac{\partial {\cal M}_n}{\partial u_m} \\
344 \end{array}
345 \right)
346 \]
347 is the Jacobi matrix of $ {\cal M} $
348 (an $ n \times m $ matrix)
349 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
350 \[
351 \delta v_{j}
352 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
353 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
354 \delta u_{i}
355 \]
356 %
357 Then eq. (\ref{adjoint}) takes the form
358 \[
359 \delta u_{i}^{\ast}
360 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
361 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
362 \delta v_{j}^{\ast}
363 \]
364 %
365 or
366 %
367 \[
368 \left(
369 \begin{array}{c}
370 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
371 \vdots \\
372 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
373 \end{array}
374 \right)
375 \, = \,
376 \left(
377 \begin{array}{ccc}
378 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
379 & \ldots &
380 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
381 \vdots & ~ & \vdots \\
382 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
383 & \ldots &
384 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
385 \end{array}
386 \right)
387 \cdot
388 \left(
389 \begin{array}{c}
390 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
391 \vdots \\
392 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
393 \end{array}
394 \right)
395 \]
396 %
397 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
398 of any intermediate state $ v^{(\lambda)} $
399 may be obtained, using the intermediate Jacobian
400 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
401 %
402 \[
403 M_{\lambda} \, = \,
404 \left(
405 \begin{array}{ccc}
406 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
407 & \ldots &
408 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
409 \vdots & ~ & \vdots \\
410 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
411 & \ldots &
412 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
413 \end{array}
414 \right)
415 \]
416 %
417 and the shorthand notation for the adjoint variables
418 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
419 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
420 for intermediate components, yielding
421 {\small
422 \begin{equation}
423 \begin{aligned}
424 \left(
425 \begin{array}{c}
426 \delta v^{(\lambda) \, \ast}_1 \\
427 \vdots \\
428 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
429 \end{array}
430 \right)
431 \, = &
432 \left(
433 \begin{array}{ccc}
434 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
435 & \ldots \,\, \ldots &
436 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
437 \vdots & ~ & \vdots \\
438 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
439 & \ldots \,\, \ldots &
440 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
441 \end{array}
442 \right)
443 \cdot
444 %
445 \\ ~ & ~
446 \\ ~ &
447 %
448 \left(
449 \begin{array}{ccc}
450 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
451 & \ldots &
452 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
453 \vdots & ~ & \vdots \\
454 \vdots & ~ & \vdots \\
455 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
456 & \ldots &
457 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
458 \end{array}
459 \right)
460 \cdot \, \ldots \, \cdot
461 \left(
462 \begin{array}{c}
463 \delta v^{\ast}_1 \\
464 \vdots \\
465 \delta v^{\ast}_{n} \\
466 \end{array}
467 \right)
468 \end{aligned}
469 \end{equation}
470 }
471
472 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
473 showing the advantage of the reverse over the forward mode
474 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
475 cost function $ {\cal J} $ with respect to {\it all} input
476 variables $u$
477 (or the sensitivity of the cost function with respect to
478 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
479 In order to be able to solve for each component of the gradient
480 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
481 a forward calculation has to be performed for each component separately,
482 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
483 for the $i$-th forward calculation.
484 Then, (\ref{forward}) represents the
485 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
486 The full gradient is retrieved from the $ m $ forward calculations.
487 In contrast, eq. (\ref{reverse}) yields the full
488 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
489 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
490
491 Note, that if $ {\cal J} $ is a vector-valued function
492 of dimension $ l > 1 $,
493 eq. (\ref{reverse}) has to be modified according to
494 \[
495 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
496 \, = \,
497 \nabla_u {\cal J}^T \cdot \delta \vec{J}
498 \]
499 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
500 dimension $ l $.
501 In this case $ l $ reverse simulations have to be performed
502 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
503 Then, the reverse mode is more efficient as long as
504 $ l < n $, otherwise the forward mode is preferable.
505 Strictly, the reverse mode is called adjoint mode only for
506 $ l = 1 $.
507
508 A detailed analysis of the underlying numerical operations
509 shows that the computation of $\nabla _{u}{\cal J}$ in this way
510 requires about 2 to 5 times the computation of the cost function.
511 Alternatively, the gradient vector could be approximated
512 by finite differences, requiring $m$ computations
513 of the perturbed cost function.
514
515 To conclude we give two examples of commonly used types
516 of cost functions:
517
518 \paragraph{Example 1:
519 $ {\cal J} = v_{j} (T) $} ~ \\
520 The cost function consists of the $j$-th component of the model state
521 $ \vec{v} $ at time $T$.
522 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
523 unit vector. The $ \nabla_u {\cal J}^T $
524 is the projection of the adjoint
525 operator onto the $j$-th component ${\bf f_{j}}$,
526 \[
527 \nabla_u {\cal J}^T
528 \, = \, M^T \cdot \nabla_v {\cal J}^T
529 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
530 \]
531
532 \paragraph{Example 2:
533 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
534 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
535 The cost function represents the quadratic model vs. data misfit.
536 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
537 operator which maps the model state space onto the data space.
538 Then, $ \nabla_v {\cal J} $ takes the form
539 %
540 \begin{equation*}
541 \begin{aligned}
542 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
543 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
544 ~ & = \, 2 \sum_{j} \left\{ \sum_k
545 \frac{\partial {\cal H}_k}{\partial v_{j}}
546 \left( {\cal H}_k (\vec{v}) - d_k \right)
547 \right\} \, {\vec{f}_{j}} \\
548 \end{aligned}
549 \end{equation*}
550 %
551 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
552 Jacobi matrix of the data projection operator.
553 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
554 adjoint operator,
555 driven by the model vs. data misfit:
556 \[
557 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
558 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
559 \]
560
561 \subsection{Storing vs. recomputation in reverse mode}
562 \label{checkpointing}
563
564 We note an important aspect of the forward vs. reverse
565 mode calculation.
566 Because of the local character of the derivative
567 (a derivative is defined w.r.t. a point along the trajectory),
568 the intermediate results of the model trajectory
569 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
570 may be required to evaluate the intermediate Jacobian
571 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
572 This is the case e.g. for nonlinear expressions
573 (momentum advection, nonlinear equation of state), state-dependent
574 conditional statements (parameterization schemes).
575 In the forward mode, the intermediate results are required
576 in the same order as computed by the full forward model ${\cal M}$,
577 but in the reverse mode they are required in the reverse order.
578 Thus, in the reverse mode the trajectory of the forward model
579 integration ${\cal M}$ has to be stored to be available in the reverse
580 calculation. Alternatively, the complete model state up to the
581 point of evaluation has to be recomputed whenever its value is required.
582
583 A method to balance the amount of recomputations vs.
584 storage requirements is called {\sf checkpointing}
585 (e.g. \cite{gri:92}, \cite{res-eta:98}).
586 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
587 [as an example, we give explicit numbers for a 3-day
588 integration with a 1-hourly timestep in square brackets].
589 \begin{itemize}
590 %
591 \item [$lev3$]
592 In a first step, the model trajectory is subdivided into
593 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
594 with the label $lev3$ for this outermost loop.
595 The model is then integrated along the full trajectory,
596 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
597 [i.e. 3 times, at
598 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
599 In addition, the cost function is computed, if needed.
600 %
601 \item [$lev2$]
602 In a second step each subsection itself is divided into
603 $ {n}^{lev2} $ subsections
604 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
605 The model picks up at the last outermost dumped state
606 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
607 the last subsection, with the label $lev2$ for this
608 intermediate loop.
609 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
610 timestep
611 [i.e. 4 times, at
612 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
613 %
614 \item [$lev1$]
615 Finally, the model picks up at the last intermediate dump state
616 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
617 the last subsection, with the label $lev1$ for this
618 intermediate loop.
619 Within this sub-subsection only, parts of the model state is stored
620 to memory at every timestep
621 [i.e. every hour $ i=0,...,5$ corresponding to
622 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
623 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
624 and the model state of all preceding timesteps along the last
625 innermost subsection are available, enabling integration backwards
626 in time along the last subsection.
627 The adjoint can thus be computed along this last
628 subsection $k_{n}^{lev2}$.
629 %
630 \end{itemize}
631 %
632 This procedure is repeated consecutively for each previous
633 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
634 carrying the adjoint computation to the initial time
635 of the subsection $k_{n}^{lev3}$.
636 Then, the procedure is repeated for the previous subsection
637 $k_{n-1}^{lev3}$
638 carrying the adjoint computation to the initial time
639 $k_{1}^{lev3}$.
640
641 For the full model trajectory of
642 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
643 the required storing of the model state was significantly reduced to
644 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
645 [i.e. for the 3-day integration with a total oof 72 timesteps
646 the model state was stored 7 times to disk and roughly 6 times
647 to memory].
648 This saving in memory comes at a cost of a required
649 3 full forward integrations of the model (one for each
650 checkpointing level).
651 The optimal balance of storage vs. recomputation certainly depends
652 on the computing resources available and may be adjusted by
653 adjusting the partitioning among the
654 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
655
656 \begin{figure}[t!]
657 \begin{center}
658 %\psdraft
659 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
660 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
661 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
662 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
663 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
664 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
665 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
666 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
667 %\mbox{\epsfig{file=s_autodiff/figs/checkpointing.eps, width=0.8\textwidth}}
668 \resizebox{5.5in}{!}{\includegraphics{s_autodiff/figs/checkpointing.eps}}
669 %\psfull
670 \end{center}
671 \caption{
672 Schematic view of intermediate dump and restart for
673 3-level checkpointing.}
674 \label{fig:3levelcheck}
675 \end{figure}
676
677 % \subsection{Optimal perturbations}
678 % \label{sec_optpert}
679
680
681 % \subsection{Error covariance estimate and Hessian matrix}
682 % \label{sec_hessian}
683
684 \newpage
685
686 %**********************************************************************
687 \section{TLM and ADM generation in general}
688 \label{sec_ad_setup_gen}
689 \begin{rawhtml}
690 <!-- CMIREDIR:sec_ad_setup_gen: -->
691 \end{rawhtml}
692 %**********************************************************************
693
694 In this section we describe in a general fashion
695 the parts of the code that are relevant for automatic
696 differentiation using the software tool TAF.
697 Modifications to use OpenAD are described in \ref{sec_ad_openad}.
698
699 \input{s_autodiff/text/doc_ad_the_model}
700
701 The basic flow is depicted in \ref{fig:adthemodel}.
702 If CPP option \texttt{ALLOW\_AUTODIFF\_TAMC} is defined,
703 the driver routine
704 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
705 invokes the adjoint of this routine, {\it adthe\_main\_loop}
706 (case \texttt{\#define ALLOW\_ADJOINT\_RUN}), or
707 the tangent linear of this routine {\it g\_the\_main\_loop}
708 (case \texttt{\#define ALLOW\_TANGENTLINEAR\_RUN}),
709 which are the toplevel routines in terms of automatic differentiation.
710 The routines {\it adthe\_main\_loop} or {\it g\_the\_main\_loop}
711 are generated by TAF.
712 It contains both the forward integration of the full model, the
713 cost function calculation,
714 any additional storing that is required for efficient checkpointing,
715 and the reverse integration of the adjoint model.
716
717 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
718
719 In Fig. \ref{fig:adthemodel}
720 the structure of {\it adthe\_main\_loop} has been strongly
721 simplified to focus on the essentials; in particular, no checkpointing
722 procedures are shown here.
723 Prior to the call of {\it adthe\_main\_loop}, the routine
724 {\it ctrl\_unpack} is invoked to unpack the control vector
725 or initialise the control variables.
726 Following the call of {\it adthe\_main\_loop},
727 the routine {\it ctrl\_pack}
728 is invoked to pack the control vector
729 (cf. Section \ref{section_ctrl}).
730 If gradient checks are to be performed, the option
731 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
732 the driver routine {\it grdchk\_main} is called after
733 the gradient has been computed via the adjoint
734 (cf. Section \ref{sec:ad_gradient_check}).
735
736 %------------------------------------------------------------------
737
738 \subsection{General setup
739 \label{section_ad_setup}}
740
741 In order to configure AD-related setups the following packages need
742 to be enabled:
743 {\it
744 \begin{table}[!ht]
745 \begin{tabular}{l}
746 autodiff \\
747 ctrl \\
748 cost \\
749 grdchk \\
750 \end{tabular}
751 \end{table}
752 }
753 The packages are enabled by adding them to your experiment-specific
754 configuration file
755 {\it packages.conf} (see Section ???).
756
757 The following AD-specific CPP option files need to be customized:
758 %
759 \begin{itemize}
760 %
761 \item {\it ECCO\_CPPOPTIONS.h} \\
762 This header file collects CPP options for the packages
763 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
764 the external forcing package {\it exf}.
765 \footnote{NOTE: These options are not set in their package-specific
766 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
767 in the single header file {\it ECCO\_CPPOPTIONS.h}.
768 The package-specific header files serve as simple
769 placeholders at this point.}
770 %
771 \item {\it tamc.h} \\
772 This header configures the splitting of the time stepping loop
773 w.r.t. the 3-level checkpointing (see section ???).
774
775 %
776 \end{itemize}
777
778 %------------------------------------------------------------------
779
780 \subsection{Building the AD code using TAF
781 \label{section_ad_build}}
782
783 The build process of an AD code is very similar to building
784 the forward model. However, depending on which AD code one wishes
785 to generate, and on which AD tool is available (TAF or TAMC),
786 the following {\tt make} targets are available:
787
788 \begin{table}[!ht]
789 {\footnotesize
790 \begin{tabular}{|ccll|}
791 \hline
792 ~ & {\it AD-target} & {\it output} & {\it description} \\
793 \hline
794 \hline
795 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
796 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
797 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
798 ~ & ~ & ~ & useful for compiling on remote platforms \\
799 \hline
800 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
801 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
802 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
803 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
804 \hline
805 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
806 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
807 ~ & ~ & ~ & and compiles all code \\
808 ~ & ~ & ~ & (use of TAF is set as default) \\
809 \hline
810 \end{tabular}
811 }
812 \end{table}
813 %
814 Here, the following placeholders are used
815 %
816 \begin{itemize}
817 %
818 \item $<$TOOL$>$
819 %
820 \begin{itemize}
821 %
822 \item {\tt TAF}
823 \item {\tt TAMC}
824 %
825 \end{itemize}
826 %
827 \item $<$MODE$>$
828 %
829 \begin{itemize}
830 %
831 \item {\tt ad} generates the adjoint model (ADM)
832 \item {\tt ftl} generates the tangent linear model (TLM)
833 \item {\tt svd} generates both ADM and TLM for \\
834 singular value decomposition (SVD) type calculations
835 %
836 \end{itemize}
837 %
838 \end{itemize}
839
840 For example, to generate the adjoint model using TAF after routines ({\tt .F})
841 or headers ({\tt .h}) have been modified, but without compilation,
842 type {\tt make adtaf};
843 or, to generate the tangent linear model using TAMC without
844 re-generating the input code, type {\tt make ftltamconly}.
845
846
847 A typical full build process to generate the ADM via TAF would
848 look like follows:
849 \begin{verbatim}
850 % mkdir build
851 % cd build
852 % ../../../tools/genmake2 -mods=../code_ad
853 % make depend
854 % make adall
855 \end{verbatim}
856
857 %------------------------------------------------------------------
858
859 \subsection{The AD build process in detail
860 \label{section_ad_build_detail}}
861
862 The {\tt make <MODE>all} target consists of the following procedures:
863
864 \begin{enumerate}
865 %
866 \item
867 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
868 on which code ought to be generated. Depending on the {\tt make} target,
869 the contents is one of the following:
870 \begin{itemize}
871 \item
872 {\tt \#define ALLOW\_ADJOINT\_RUN}
873 \item
874 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
875 \item
876 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
877 \end{itemize}
878 %
879 \item
880 A single file {\tt <MODE>\_input\_code.f} is concatenated
881 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
882 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
883 %
884 \item
885 The AD tool is invoked with the {\tt <MODE>\_<TOOL>\_FLAGS}.
886 The default AD tool flags in {\tt genmake2} can be overrwritten by
887 an {\tt adjoint\_options} file (similar to the platform-specific
888 {\tt build\_options}, see Section ???.
889 The AD tool writes the resulting AD code into the file
890 {\tt <MODE>\_input\_code\_ad.f}
891 %
892 \item
893 A short sed script {\tt adjoint\_sed} is applied to
894 {\tt <MODE>\_input\_code\_ad.f}
895 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
896 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
897 %
898 \item
899 All routines are compiled and an executable is generated
900 (see Table ???).
901 %
902 \end{enumerate}
903
904 \subsubsection{The list AD\_FILES and {\tt .list} files}
905
906 Not all routines are presented to the AD tool.
907 Routines typically hidden are diagnostics routines which
908 do not influence the cost function, but may create
909 artificial flow dependencies such as I/O of active variables.
910
911 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
912 which contains all routines that are shown to the AD tool.
913 This list is put together from all files with suffix {\tt .list}
914 that {\tt genmake2} finds in its search directories.
915 The list file for the core MITgcm routines is in {\tt model/src/}
916 is called {\tt model\_ad\_diff.list}.
917 Note that no wrapper routine is shown to TAF. These are either
918 not visible at all to the AD code, or hand-written AD code
919 is available (see next section).
920
921 Each package directory contains its package-specific
922 list file {\tt <PKG>\_ad\_diff.list}. For example,
923 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
924 Thus, enabling a package will automatically extend the
925 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
926 package-specific routines.
927 Note that you will need to regenerate the {\tt Makefile} if
928 you enable a package (e.g. by adding it to {\tt packages.conf})
929 and a {\tt Makefile} already exists.
930
931 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
932
933 TAMC and TAF can evaluate user-specified directives
934 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
935 The main categories of directives are STORE directives and
936 FLOW directives. Here, we are concerned with flow directives,
937 store directives are treated elsewhere.
938
939 Flow directives enable the AD tool to evaluate how it should treat
940 routines that are 'hidden' by the user, i.e. routines which are
941 not contained in the {\bf AD\_FILES} list (see previous section),
942 but which are called in part of the code that the AD tool does see.
943 The flow directive tell the AD tool
944 %
945 \begin{itemize}
946 %
947 \item which subroutine arguments are input/output
948 \item which subroutine arguments are active
949 \item which subroutine arguments are required to compute the cost
950 \item which subroutine arguments are dependent
951 %
952 \end{itemize}
953 %
954 The syntax for the flow directives can be found in the
955 AD tool manuals.
956
957 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
958 which contains all files with suffix{\tt .flow} that it finds
959 in its search directories.
960 The flow directives for the core MITgcm routines of
961 {\tt eesupp/src/} and {\tt model/src/}
962 reside in {\tt pkg/autodiff/}.
963 This directory also contains hand-written adjoint code
964 for the MITgcm WRAPPER (section \ref{chap:sarch}).
965
966 Flow directives for package-specific routines are contained in
967 the corresponding package directories in the file
968 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
969 {\tt ptracers\_ad.flow}.
970
971 \subsubsection{Store directives for 3-level checkpointing}
972
973 The storing that is required at each period of the
974 3-level checkpointing is controled by three
975 top-level headers.
976
977 \begin{verbatim}
978 do ilev_3 = 1, nchklev_3
979 # include ``checkpoint_lev3.h''
980 do ilev_2 = 1, nchklev_2
981 # include ``checkpoint_lev2.h''
982 do ilev_1 = 1, nchklev_1
983 # include ``checkpoint_lev1.h''
984
985 ...
986
987 end do
988 end do
989 end do
990 \end{verbatim}
991
992 All files {\tt checkpoint\_lev?.h} are contained in directory
993 {\tt pkg/autodiff/}.
994
995
996 \subsubsection{Changing the default AD tool flags: ad\_options files}
997
998
999 \subsubsection{Hand-written adjoint code}
1000
1001 %------------------------------------------------------------------
1002
1003 \subsection{The cost function (dependent variable)
1004 \label{section_cost}}
1005
1006 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
1007 It is a function of the input variables $ \vec{u} $ via the composition
1008 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
1009 The input are referred to as the
1010 {\sf independent variables} or {\sf control variables}.
1011 All aspects relevant to the treatment of the cost function $ {\cal J} $
1012 (parameter setting, initialization, accumulation,
1013 final evaluation), are controlled by the package {\it pkg/cost}.
1014 The aspects relevant to the treatment of the independent variables
1015 are controlled by the package {\it pkg/ctrl} and will be treated
1016 in the next section.
1017
1018 \input{s_autodiff/text/doc_cost_flow}
1019
1020 \subsubsection{Enabling the package}
1021
1022 \fbox{
1023 \begin{minipage}{12cm}
1024 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1025 \end{minipage}
1026 }
1027 \begin{itemize}
1028 %
1029 \item
1030 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1031 (see Section ???)
1032 %
1033 \item
1034
1035
1036 \end{itemize}
1037 %
1038
1039 N.B.: In general the following packages ought to be enabled
1040 simultaneously: {\it autodiff, cost, ctrl}.
1041 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1042 Each specific cost function contribution has its own option.
1043 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1044 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1045 Since the cost function is usually used in conjunction with
1046 automatic differentiation, the CPP option
1047 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1048 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1049 should be defined.
1050
1051 \subsubsection{Initialization}
1052 %
1053 The initialization of the {\it cost} package is readily enabled
1054 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1055 %
1056 \begin{itemize}
1057 %
1058 \item
1059 \fbox{
1060 \begin{minipage}{12cm}
1061 Parameters: {\it cost\_readparms}
1062 \end{minipage}
1063 }
1064 \\
1065 This S/R
1066 reads runtime flags and parameters from file {\it data.cost}.
1067 For the present example the only relevant parameter read
1068 is {\bf mult\_tracer}. This multiplier enables different
1069 cost function contributions to be switched on
1070 ( = 1.) or off ( = 0.) at runtime.
1071 For more complex cost functions which involve model vs. data
1072 misfits, the corresponding data filenames and data
1073 specifications (start date and time, period, ...) are read
1074 in this S/R.
1075 %
1076 \item
1077 \fbox{
1078 \begin{minipage}{12cm}
1079 Variables: {\it cost\_init}
1080 \end{minipage}
1081 }
1082 \\
1083 This S/R
1084 initializes the different cost function contributions.
1085 The contribution for the present example is {\bf objf\_tracer}
1086 which is defined on each tile (bi,bj).
1087 %
1088 \end{itemize}
1089 %
1090 \subsubsection{Accumulation}
1091 %
1092 \begin{itemize}
1093 %
1094 \item
1095 \fbox{
1096 \begin{minipage}{12cm}
1097 {\it cost\_tile}, {\it cost\_tracer}
1098 \end{minipage}
1099 }
1100 \end{itemize}
1101 %
1102 The 'driver' routine
1103 {\it cost\_tile} is called at the end of each time step.
1104 Within this 'driver' routine, S/R are called for each of
1105 the chosen cost function contributions.
1106 In the present example ({\bf ALLOW\_COST\_TRACER}),
1107 S/R {\it cost\_tracer} is called.
1108 It accumulates {\bf objf\_tracer} according to eqn. (ref:ask-the-author).
1109 %
1110 \subsubsection{Finalize all contributions}
1111 %
1112 \begin{itemize}
1113 %
1114 \item
1115 \fbox{
1116 \begin{minipage}{12cm}
1117 {\it cost\_final}
1118 \end{minipage}
1119 }
1120 \end{itemize}
1121 %
1122 At the end of the forward integration S/R {\it cost\_final}
1123 is called. It accumulates the total cost function {\bf fc}
1124 from each contribution and sums over all tiles:
1125 \begin{equation}
1126 {\cal J} \, = \,
1127 {\rm fc} \, = \,
1128 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1129 {\rm objf\_tracer}(bi,bj) \, + \, ...
1130 \end{equation}
1131 %
1132 The total cost function {\bf fc} will be the
1133 'dependent' variable in the argument list for TAF, i.e.
1134 \begin{verbatim}
1135 taf -output 'fc' ...
1136 \end{verbatim}
1137
1138 %%%% \end{document}
1139
1140 \input{s_autodiff/text/doc_ad_the_main}
1141
1142 \subsection{The control variables (independent variables)
1143 \label{section_ctrl}}
1144
1145 The control variables are a subset of the model input
1146 (initial conditions, boundary conditions, model parameters).
1147 Here we identify them with the variable $ \vec{u} $.
1148 All intermediate variables whose derivative w.r.t. control
1149 variables do not vanish are called {\sf active variables}.
1150 All subroutines whose derivative w.r.t. the control variables
1151 don't vanish are called {\sf active routines}.
1152 Read and write operations from and to file can be viewed
1153 as variable assignments. Therefore, files to which
1154 active variables are written and from which active variables
1155 are read are called {\sf active files}.
1156 All aspects relevant to the treatment of the control variables
1157 (parameter setting, initialization, perturbation)
1158 are controlled by the package {\it pkg/ctrl}.
1159
1160 \input{s_autodiff/text/doc_ctrl_flow}
1161
1162 \subsubsection{genmake and CPP options}
1163 %
1164 \begin{itemize}
1165 %
1166 \item
1167 \fbox{
1168 \begin{minipage}{12cm}
1169 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1170 \end{minipage}
1171 }
1172 \end{itemize}
1173 %
1174 To enable the directory to be included to the compile list,
1175 {\bf ctrl} has to be added to the {\bf enable} list in
1176 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1177 package, cf. previous section).
1178 Each control variable is enabled via its own CPP option
1179 in {\it ECCO\_CPPOPTIONS.h}.
1180
1181 \subsubsection{Initialization}
1182 %
1183 \begin{itemize}
1184 %
1185 \item
1186 \fbox{
1187 \begin{minipage}{12cm}
1188 Parameters: {\it ctrl\_readparms}
1189 \end{minipage}
1190 }
1191 \\
1192 %
1193 This S/R
1194 reads runtime flags and parameters from file {\it data.ctrl}.
1195 For the present example the file contains the file names
1196 of each control variable that is used.
1197 In addition, the number of wet points for each control
1198 variable and the net dimension of the space of control
1199 variables (counting wet points only) {\bf nvarlength}
1200 is determined.
1201 Masks for wet points for each tile {\bf (bi,\,bj)}
1202 and vertical layer {\bf k} are generated for the three
1203 relevant categories on the C-grid:
1204 {\bf nWetCtile} for tracer fields,
1205 {\bf nWetWtile} for zonal velocity fields,
1206 {\bf nWetStile} for meridional velocity fields.
1207 %
1208 \item
1209 \fbox{
1210 \begin{minipage}{12cm}
1211 Control variables, control vector,
1212 and their gradients: {\it ctrl\_unpack}
1213 \end{minipage}
1214 }
1215 \\
1216 %
1217 Two important issues related to the handling of the control
1218 variables in MITgcm need to be addressed.
1219 First, in order to save memory, the control variable arrays
1220 are not kept in memory, but rather read from file and added
1221 to the initial fields during the model initialization phase.
1222 Similarly, the corresponding adjoint fields which represent
1223 the gradient of the cost function w.r.t. the control variables
1224 are written to file at the end of the adjoint integration.
1225 Second, in addition to the files holding the 2-dim. and 3-dim.
1226 control variables and the corresponding cost gradients,
1227 a 1-dim. {\sf control vector}
1228 and {\sf gradient vector} are written to file. They contain
1229 only the wet points of the control variables and the corresponding
1230 gradient.
1231 This leads to a significant data compression.
1232 Furthermore, an option is available
1233 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1234 non-dimensionalise the control and gradient vector,
1235 which otherwise would contain different pieces of different
1236 magnitudes and units.
1237 Finally, the control and gradient vector can be passed to a
1238 minimization routine if an update of the control variables
1239 is sought as part of a minimization exercise.
1240
1241 The files holding fields and vectors of the control variables
1242 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1243 %
1244 \end{itemize}
1245
1246 \subsubsection{Perturbation of the independent variables}
1247 %
1248 The dependency flow for differentiation w.r.t. the controls
1249 starts with adding a perturbation onto the input variable,
1250 thus defining the independent or control variables for TAF.
1251 Three types of controls may be considered:
1252 %
1253 \begin{itemize}
1254 %
1255 \item
1256 \fbox{
1257 \begin{minipage}{12cm}
1258 {\it ctrl\_map\_ini} (initial value sensitivity):
1259 \end{minipage}
1260 }
1261 \\
1262 %
1263 Consider as an example the initial tracer distribution
1264 {\bf tr1} as control variable.
1265 After {\bf tr1} has been initialised in
1266 {\it ini\_tr1} (dynamical variables such as
1267 temperature and salinity are initialised in {\it ini\_fields}),
1268 a perturbation anomaly is added to the field in S/R
1269 {\it ctrl\_map\_ini}
1270 %
1271 %\begin{eqnarray}
1272 \begin{equation}
1273 \begin{aligned}
1274 u & = \, u_{[0]} \, + \, \Delta u \\
1275 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1276 \label{perturb}
1277 \end{aligned}
1278 \end{equation}
1279 %\end{eqnarray}
1280 %
1281 {\bf xx\_tr1} is a 3-dim. global array
1282 holding the perturbation. In the case of a simple
1283 sensitivity study this array is identical to zero.
1284 However, it's specification is essential in the context
1285 of automatic differentiation since TAF
1286 treats the corresponding line in the code symbolically
1287 when determining the differentiation chain and its origin.
1288 Thus, the variable names are part of the argument list
1289 when calling TAF:
1290 %
1291 \begin{verbatim}
1292 taf -input 'xx_tr1 ...' ...
1293 \end{verbatim}
1294 %
1295 Now, as mentioned above, MITgcm avoids maintaining
1296 an array for each control variable by reading the
1297 perturbation to a temporary array from file.
1298 To ensure the symbolic link to be recognized by TAF, a scalar
1299 dummy variable {\bf xx\_tr1\_dummy} is introduced
1300 and an 'active read' routine of the adjoint support
1301 package {\it pkg/autodiff} is invoked.
1302 The read-procedure is tagged with the variable
1303 {\bf xx\_tr1\_dummy} enabling TAF to recognize the
1304 initialization of the perturbation.
1305 The modified call of TAF thus reads
1306 %
1307 \begin{verbatim}
1308 taf -input 'xx_tr1_dummy ...' ...
1309 \end{verbatim}
1310 %
1311 and the modified operation to (\ref{perturb})
1312 in the code takes on the form
1313 %
1314 \begin{verbatim}
1315 call active_read_xyz(
1316 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1317
1318 tr1(...) = tr1(...) + tmpfld3d(...)
1319 \end{verbatim}
1320 %
1321 Note, that reading an active variable corresponds
1322 to a variable assignment. Its derivative corresponds
1323 to a write statement of the adjoint variable, followed by
1324 a reset.
1325 The 'active file' routines have been designed
1326 to support active read and corresponding adjoint active write
1327 operations (and vice versa).
1328 %
1329 \item
1330 \fbox{
1331 \begin{minipage}{12cm}
1332 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1333 \end{minipage}
1334 }
1335 \\
1336 %
1337 The handling of boundary values as control variables
1338 proceeds exactly analogous to the initial values
1339 with the symbolic perturbation taking place in S/R
1340 {\it ctrl\_map\_forcing}.
1341 Note however an important difference:
1342 Since the boundary values are time dependent with a new
1343 forcing field applied at each time steps,
1344 the general problem may be thought of as
1345 a new control variable at each time step
1346 (or, if the perturbation is averaged over a certain period,
1347 at each $ N $ timesteps), i.e.
1348 \[
1349 u_{\rm forcing} \, = \,
1350 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1351 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1352 \]
1353 %
1354 In the current example an equilibrium state is considered,
1355 and only an initial perturbation to
1356 surface forcing is applied with respect to the
1357 equilibrium state.
1358 A time dependent treatment of the surface forcing is
1359 implemented in the ECCO environment, involving the
1360 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1361 %
1362 \item
1363 \fbox{
1364 \begin{minipage}{12cm}
1365 {\it ctrl\_map\_params} (parameter sensitivity):
1366 \end{minipage}
1367 }
1368 \\
1369 %
1370 This routine is not yet implemented, but would proceed
1371 proceed along the same lines as the initial value sensitivity.
1372 The mixing parameters {\bf diffkr} and {\bf kapgm}
1373 are currently added as controls in {\it ctrl\_map\_ini.F}.
1374 %
1375 \end{itemize}
1376 %
1377
1378 \subsubsection{Output of adjoint variables and gradient}
1379 %
1380 Several ways exist to generate output of adjoint fields.
1381 %
1382 \begin{itemize}
1383 %
1384 \item
1385 \fbox{
1386 \begin{minipage}{12cm}
1387 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1388 \end{minipage}
1389 }
1390 \\
1391 \begin{itemize}
1392 %
1393 \item {\bf xx\_...}: the control variable fields \\
1394 Before the forward integration, the control
1395 variables are read from file {\bf xx\_ ...} and added to
1396 the model field.
1397 %
1398 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1399 $ \nabla _{u}{\cal J} $ for each control variable \\
1400 After the adjoint integration the corresponding adjoint
1401 variables are written to {\bf adxx\_ ...}.
1402 %
1403 \end{itemize}
1404 %
1405 \item
1406 \fbox{
1407 \begin{minipage}{12cm}
1408 {\it ctrl\_unpack, ctrl\_pack}:
1409 \end{minipage}
1410 }
1411 \\
1412 %
1413 \begin{itemize}
1414 %
1415 \item {\bf vector\_ctrl}: the control vector \\
1416 At the very beginning of the model initialization,
1417 the updated compressed control vector is read (or initialised)
1418 and distributed to 2-dim. and 3-dim. control variable fields.
1419 %
1420 \item {\bf vector\_grad}: the gradient vector \\
1421 At the very end of the adjoint integration,
1422 the 2-dim. and 3-dim. adjoint variables are read,
1423 compressed to a single vector and written to file.
1424 %
1425 \end{itemize}
1426 %
1427 \item
1428 \fbox{
1429 \begin{minipage}{12cm}
1430 {\it addummy\_in\_stepping}:
1431 \end{minipage}
1432 }
1433 \\
1434 In addition to writing the gradient at the end of the
1435 forward/adjoint integration, many more adjoint variables
1436 of the model state
1437 at intermediate times can be written using S/R
1438 {\it addummy\_in\_stepping}.
1439 This routine is part of the adjoint support package
1440 {\it pkg/autodiff} (cf.f. below).
1441 The procedure is enabled using via the CPP-option
1442 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1443 To be part of the adjoint code, the corresponding S/R
1444 {\it dummy\_in\_stepping} has to be called in the forward
1445 model (S/R {\it the\_main\_loop}) at the appropriate place.
1446 The adjoint common blocks are extracted from the adjoint code
1447 via the header file {\it adcommon.h}.
1448
1449 {\it dummy\_in\_stepping} is essentially empty,
1450 the corresponding adjoint routine is hand-written rather
1451 than generated automatically.
1452 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1453 ensure that TAMC does not automatically
1454 generate {\it addummy\_in\_stepping} by trying to differentiate
1455 {\it dummy\_in\_stepping}, but instead refers to
1456 the hand-written routine.
1457
1458 {\it dummy\_in\_stepping} is called in the forward code
1459 at the beginning of each
1460 timestep, before the call to {\it dynamics}, thus ensuring
1461 that {\it addummy\_in\_stepping} is called at the end of
1462 each timestep in the adjoint calculation, after the call to
1463 {\it addynamics}.
1464
1465 {\it addummy\_in\_stepping} includes the header files
1466 {\it adcommon.h}.
1467 This header file is also hand-written. It contains
1468 the common blocks
1469 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1470 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1471 {\bf /adtr1\_r/}, {\bf /adffields/},
1472 which have been extracted from the adjoint code to enable
1473 access to the adjoint variables.
1474
1475 {\bf WARNING:} If the structure of the common blocks
1476 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1477 similar changes will occur in the adjoint common blocks.
1478 Therefore, consistency between the TAMC-generated common blocks
1479 and those in {\it adcommon.h} have to be checked.
1480 %
1481 \end{itemize}
1482
1483
1484 \subsubsection{Control variable handling for
1485 optimization applications}
1486
1487 In optimization mode the cost function $ {\cal J}(u) $ is sought
1488 to be minimized with respect to a set of control variables
1489 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1490 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1491 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1492 at iteration step $ k $ serve
1493 as input to a minimization routine (e.g. quasi-Newton method,
1494 conjugate gradient, ... \cite{gil-lem:89})
1495 to compute an update in the
1496 control variable for iteration step $k+1$
1497 \[
1498 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1499 \quad \mbox{satisfying} \quad
1500 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1501 \]
1502 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1503 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1504 $ k+1 $.
1505 Tab. ref:ask-the-author sketches the flow between forward/adjoint model
1506 and the minimization routine.
1507
1508 {\scriptsize
1509 \begin{eqnarray*}
1510 \begin{array}{ccccc}
1511 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1512 {\Big\downarrow}
1513 & ~ & ~ & ~ & ~ \\
1514 ~ & ~ & ~ & ~ & ~ \\
1515 \hline
1516 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1517 \multicolumn{1}{|c}{
1518 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1519 \stackrel{\bf forward}{\bf \longrightarrow} &
1520 v_{[k]} = M \left( u_{[k]} \right) &
1521 \stackrel{\bf forward}{\bf \longrightarrow} &
1522 \multicolumn{1}{c|}{
1523 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1524 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1525 \hline
1526 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1527 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1528 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1529 \hline
1530 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1531 \multicolumn{1}{|c}{
1532 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1533 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1534 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1535 ad \, v_{[k]} (\delta {\cal J}) =
1536 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1537 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1538 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1539 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1540 \hline
1541 ~ & ~ & ~ & ~ & ~ \\
1542 \hspace*{15ex}{\Bigg\downarrow}
1543 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1544 & ~ & ~ & ~ & ~ \\
1545 ~ & ~ & ~ & ~ & ~ \\
1546 \hline
1547 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1548 \multicolumn{1}{|c}{
1549 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1550 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1551 {\mathbf \longrightarrow} &
1552 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1553 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1554 \hline
1555 ~ & ~ & ~ & ~ & ~ \\
1556 ~ & ~ & ~ & ~ & \Big\downarrow \\
1557 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1558 \end{array}
1559 \end{eqnarray*}
1560 }
1561
1562 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1563 the link between the model and the minimization routine.
1564 As described in Section ref:ask-the-author
1565 the {\it unpack} and {\it pack} routines read and write
1566 control and gradient {\it vectors} which are compressed
1567 to contain only wet points, in addition to the full
1568 2-dim. and 3-dim. fields.
1569 The corresponding I/O flow looks as follows:
1570
1571 \vspace*{0.5cm}
1572
1573 {\scriptsize
1574 \begin{tabular}{ccccc}
1575 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1576 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1577 \cline{1-1}
1578 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1579 \cline{1-1}
1580 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1581 \cline{3-3}
1582 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1583 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1584 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1585 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1586 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1587 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1588 & ~ & ~ \\
1589 \cline{3-3}
1590 ~ & ~ & $\downarrow$ & ~ & ~ \\
1591 \cline{3-3}
1592 ~ & ~ &
1593 \multicolumn{1}{|c|}{~} & ~ &
1594 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1595 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1596 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1597 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1598 ~ & ~ & \multicolumn{1}{|c|}{~}
1599 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1600 \cline{3-3}
1601 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1602 \cline{5-5}
1603 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1604 \cline{5-5}
1605 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1606 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1607 \end{tabular}
1608 }
1609
1610 \vspace*{0.5cm}
1611
1612
1613 {\it ctrl\_unpack} reads the updated control vector
1614 {\bf vector\_ctrl\_$<$k$>$}.
1615 It distributes the different control variables to
1616 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1617 At the start of the forward integration the control variables
1618 are read from {\it xx\_...$<$k$>$} and added to the
1619 field.
1620 Correspondingly, at the end of the adjoint integration
1621 the adjoint fields are written
1622 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1623 Finally, {\it ctrl\_pack} collects all adjoint files
1624 and writes them to the compressed vector file
1625 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22