/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.22 - (show annotations) (download) (as text)
Fri Aug 27 13:09:40 2010 UTC (14 years, 10 months ago) by jmc
Branch: MAIN
Changes since 1.21: +7 -7 lines
File MIME type: application/x-tex
changed for new directory path:
 s/input{part5\//input{s_autodiff\/text\//g
 /\\includegraphics.*{part5\//s/{part5\//{s_autodiff\/figs\//g
 /\\epsfig.*{.*part5\//s/part5\//s_autodiff\/figs\//g

1 % $Header: /u/gcmpack/manual/s_autodiff/text/doc_ad_2.tex,v 1.21 2008/01/17 22:32:06 heimbach Exp $
2 % $Name: $
3
4 Author: Patrick Heimbach
5
6 {\sf Automatic differentiation} (AD), also referred to as algorithmic
7 (or, more loosely, computational) differentiation, involves
8 automatically deriving code to calculate partial derivatives from an
9 existing fully non-linear prognostic code. (see \cite{gri:00}). A
10 software tool is used that parses and transforms source files
11 according to a set of linguistic and mathematical rules. AD tools are
12 like source-to-source translators in that they parse a program code as
13 input and produce a new program code as output
14 (we restrict our discussion to source-to-source tools, ignoring
15 operator-overloading tools). However, unlike a
16 pure source-to-source translation, the output program represents a new
17 algorithm, such as the evaluation of the Jacobian, the Hessian, or
18 higher derivative operators. In principle, a variety of derived
19 algorithms can be generated automatically in this way.
20
21 MITgcm has been adapted for use with the Tangent linear and Adjoint
22 Model Compiler (TAMC) and its successor TAF (Transformation of
23 Algorithms in Fortran), developed by Ralf Giering (\cite{gie-kam:98},
24 \cite{gie:99,gie:00}). The first application of the adjoint of MITgcm
25 for sensitivity studies has been published by \cite{maro-eta:99}.
26 \cite{stam-etal:97,stam-etal:02} use MITgcm and its adjoint for ocean
27 state estimation studies. In the following we shall refer to TAMC and
28 TAF synonymously, except were explicitly stated otherwise.
29
30 As of mid-2007 we are also able to generate fairly efficient
31 adjoint code of the MITgcm using a new, open-source AD tool,
32 called OpenAD (see \cite{naum-etal:06,utke-etal:08}.
33 This enables us for the first time to compare adjoint models
34 generated from different AD tools, providing an additional
35 accuracy check, complementary to finite-difference gradient checks.
36 OpenAD and its application to MITgcm is described in detail
37 in section \ref{sec_ad_openad}.
38
39 The AD tool exploits the chain rule for computing the first derivative of a
40 function with respect to a set of input variables. Treating a given
41 forward code as a composition of operations -- each line representing
42 a compositional element, the chain rule is rigorously applied to the
43 code, line by line. The resulting tangent linear or adjoint code,
44 then, may be thought of as the composition in forward or reverse
45 order, respectively, of the Jacobian matrices of the forward code's
46 compositional elements.
47
48 %**********************************************************************
49 \section{Some basic algebra}
50 \label{sec_ad_algebra}
51 \begin{rawhtml}
52 <!-- CMIREDIR:sec_ad_algebra: -->
53 \end{rawhtml}
54 %**********************************************************************
55
56 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
57 mapping from the $m$-dimensional space
58 $U \subset I\!\!R^m$ of input variables
59 $\vec{u}=(u_1,\ldots,u_m)$
60 (model parameters, initial conditions, boundary conditions
61 such as forcing functions) to the $n$-dimensional space
62 $V \subset I\!\!R^n$ of
63 model output variable $\vec{v}=(v_1,\ldots,v_n)$
64 (model state, model diagnostics, objective function, ...)
65 under consideration,
66 %
67 \begin{equation}
68 \begin{split}
69 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
70 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
71 {\cal M}(\vec{u})
72 \label{fulloperator}
73 \end{split}
74 \end{equation}
75 %
76 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
77 some given basis vectors
78 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
79 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
80 \[
81 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
82 \qquad
83 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
84 \]
85
86 Two routes may be followed to determine the sensitivity of the
87 output variable $\vec{v}$ to its input $\vec{u}$.
88
89 \subsection{Forward or direct sensitivity}
90 %
91 Consider a perturbation to the input variables $\delta \vec{u}$
92 (typically a single component
93 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
94 Their effect on the output may be obtained via the linear
95 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
96 $ M $, evaluated in the point $u^{(0)}$ according to
97 %
98 \begin{equation}
99 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
100 \label{tangent_linear}
101 \end{equation}
102 with resulting output perturbation $\delta \vec{v}$.
103 In components
104 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
105 it reads
106 %
107 \begin{equation}
108 \delta v_{j} \, = \, \sum_{i}
109 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
110 \delta u_{i}
111 \label{jacobi_matrix}
112 \end{equation}
113 %
114 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
115 In contrast to the full nonlinear model $ {\cal M} $, the operator
116 $ M $ is just a matrix
117 which can readily be used to find the forward sensitivity of $\vec{v}$ to
118 perturbations in $u$,
119 but if there are very many input variables $(\gg O(10^{6})$ for
120 large-scale oceanographic application), it quickly becomes
121 prohibitive to proceed directly as in (\ref{tangent_linear}),
122 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
123
124 \subsection{Reverse or adjoint sensitivity}
125 %
126 Let us consider the special case of a
127 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
128 the total meridional heat transport,
129 the total uptake of $CO_{2}$ in the Southern
130 Ocean over a time interval,
131 or a measure of some model-to-data misfit)
132 %
133 \begin{eqnarray}
134 \begin{array}{cccccc}
135 {\cal J} \, : & U &
136 \longrightarrow & V &
137 \longrightarrow & I \!\! R \\
138 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
139 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
140 \end{array}
141 \label{compo}
142 \end{eqnarray}
143 %
144 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
145 \[
146 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
147 \]
148 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
149 w.r.t. their corresponding inner product
150 $\left\langle \,\, , \,\, \right\rangle $
151 %
152 \begin{equation}
153 \begin{split}
154 {\cal J} & = \,
155 {\cal J} |_{\vec{u}^{(0)}} \, + \,
156 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
157 \, + \, O(\delta \vec{u}^2) \\
158 ~ & = \,
159 {\cal J} |_{\vec{v}^{(0)}} \, + \,
160 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
161 \, + \, O(\delta \vec{v}^2)
162 \end{split}
163 \label{deljidentity}
164 \end{equation}
165 %
166 (note, that the gradient $ \nabla f $ is a co-vector, therefore
167 its transpose is required in the above inner product).
168 Then, using the representation of
169 $ \delta {\cal J} =
170 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
171 the definition
172 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
173 \[
174 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
175 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
176 \]
177 which for finite-dimensional vector spaces is just the
178 transpose of $ A $,
179 \[
180 A^{\ast} \, = \, A^T
181 \]
182 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
183 we note that
184 (omitting $|$'s):
185 %
186 \begin{equation}
187 \delta {\cal J}
188 \, = \,
189 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
190 \, = \,
191 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
192 \, = \,
193 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
194 \delta \vec{u} \, \right\rangle
195 \label{inner}
196 \end{equation}
197 %
198 With the identity (\ref{deljidentity}), we then find that
199 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
200 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
201 %
202 \begin{equation}
203 \begin{split}
204 \nabla _{u}{\cal J}^T |_{\vec{u}} &
205 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
206 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
207 ~ & = \, \delta \vec{u}^{\ast}
208 \end{split}
209 \label{adjoint}
210 \end{equation}
211 %
212 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
213 in which $M^T$ is the adjoint (here, the transpose) of the
214 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
215 the adjoint variable of the model state $ \vec{v} $, and
216 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
217
218 The {\sf reverse} nature of the adjoint calculation can be readily
219 seen as follows.
220 Consider a model integration which consists of $ \Lambda $
221 consecutive operations
222 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
223 ...... ( {\cal M}_{\lambda} (
224 ......
225 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
226 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
227 in the code of the model, or successive time steps of the
228 model integration,
229 starting at step 0 and moving up to step $\Lambda$, with intermediate
230 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
231 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
232 Let ${\cal J}$ be a cost function which explicitly depends on the
233 final state $\vec{v}$ only
234 (this restriction is for clarity reasons only).
235 %
236 ${\cal J}(u)$ may be decomposed according to:
237 %
238 \begin{equation}
239 {\cal J}({\cal M}(\vec{u})) \, = \,
240 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
241 ...... ( {\cal M}_{\lambda} (
242 ......
243 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
244 \label{compos}
245 \end{equation}
246 %
247 Then, according to the chain rule, the forward calculation reads,
248 in terms of the Jacobi matrices
249 (we've omitted the $ | $'s which, nevertheless are important
250 to the aspect of {\it tangent} linearity;
251 note also that by definition
252 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
253 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
254 %
255 \begin{equation}
256 \begin{split}
257 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
258 \nabla_v {\cal J} \cdot M_{\Lambda}
259 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
260 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
261 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
262 \end{split}
263 \label{forward}
264 \end{equation}
265 %
266 whereas in reverse mode we have
267 %
268 \begin{equation}
269 \boxed{
270 \begin{split}
271 M^T ( \nabla_v {\cal J}^T) & = \,
272 M_{0}^T \cdot M_{1}^T
273 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
274 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
275 ~ & = \, M_{0}^T \cdot M_{1}^T
276 \cdot ...... \cdot
277 \nabla_{v^{(\lambda)}} {\cal J}^T \\
278 ~ & = \, \nabla_u {\cal J}^T
279 \end{split}
280 }
281 \label{reverse}
282 \end{equation}
283 %
284 clearly expressing the reverse nature of the calculation.
285 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
286 If the intermediate steps $\lambda$ in
287 eqn. (\ref{compos}) -- (\ref{reverse})
288 represent the model state (forward or adjoint) at each
289 intermediate time step as noted above, then correspondingly,
290 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
291 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
292 It thus becomes evident that the adjoint calculation also
293 yields the adjoint of each model state component
294 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
295 %
296 \begin{equation}
297 \boxed{
298 \begin{split}
299 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
300 & = \,
301 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
302 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
303 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
304 \end{split}
305 }
306 \end{equation}
307 %
308 in close analogy to eq. (\ref{adjoint})
309 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
310 are the Lagrange multipliers of the model equations which determine
311 $ \vec{v}^{(\lambda)}$.
312
313 In components, eq. (\ref{adjoint}) reads as follows.
314 Let
315 \[
316 \begin{array}{rclcrcl}
317 \delta \vec{u} & = &
318 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
319 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
320 \left(
321 \frac{\partial {\cal J}}{\partial u_1},\ldots,
322 \frac{\partial {\cal J}}{\partial u_m}
323 \right)^T \\
324 \delta \vec{v} & = &
325 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
326 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
327 \left(
328 \frac{\partial {\cal J}}{\partial v_1},\ldots,
329 \frac{\partial {\cal J}}{\partial v_n}
330 \right)^T \\
331 \end{array}
332 \]
333 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
334 and their adjoint variables;
335 further
336 \[
337 M \, = \, \left(
338 \begin{array}{ccc}
339 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
340 \frac{\partial {\cal M}_1}{\partial u_m} \\
341 \vdots & ~ & \vdots \\
342 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
343 \frac{\partial {\cal M}_n}{\partial u_m} \\
344 \end{array}
345 \right)
346 \]
347 is the Jacobi matrix of $ {\cal M} $
348 (an $ n \times m $ matrix)
349 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
350 \[
351 \delta v_{j}
352 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
353 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
354 \delta u_{i}
355 \]
356 %
357 Then eq. (\ref{adjoint}) takes the form
358 \[
359 \delta u_{i}^{\ast}
360 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
361 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
362 \delta v_{j}^{\ast}
363 \]
364 %
365 or
366 %
367 \[
368 \left(
369 \begin{array}{c}
370 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
371 \vdots \\
372 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
373 \end{array}
374 \right)
375 \, = \,
376 \left(
377 \begin{array}{ccc}
378 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
379 & \ldots &
380 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
381 \vdots & ~ & \vdots \\
382 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
383 & \ldots &
384 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
385 \end{array}
386 \right)
387 \cdot
388 \left(
389 \begin{array}{c}
390 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
391 \vdots \\
392 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
393 \end{array}
394 \right)
395 \]
396 %
397 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
398 of any intermediate state $ v^{(\lambda)} $
399 may be obtained, using the intermediate Jacobian
400 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
401 %
402 \[
403 M_{\lambda} \, = \,
404 \left(
405 \begin{array}{ccc}
406 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
407 & \ldots &
408 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
409 \vdots & ~ & \vdots \\
410 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
411 & \ldots &
412 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
413 \end{array}
414 \right)
415 \]
416 %
417 and the shorthand notation for the adjoint variables
418 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
419 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
420 for intermediate components, yielding
421 \begin{equation}
422 \small
423 \begin{split}
424 \left(
425 \begin{array}{c}
426 \delta v^{(\lambda) \, \ast}_1 \\
427 \vdots \\
428 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
429 \end{array}
430 \right)
431 \, = &
432 \left(
433 \begin{array}{ccc}
434 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
435 & \ldots \,\, \ldots &
436 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
437 \vdots & ~ & \vdots \\
438 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
439 & \ldots \,\, \ldots &
440 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
441 \end{array}
442 \right)
443 \cdot
444 %
445 \\ ~ & ~
446 \\ ~ &
447 %
448 \left(
449 \begin{array}{ccc}
450 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
451 & \ldots &
452 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
453 \vdots & ~ & \vdots \\
454 \vdots & ~ & \vdots \\
455 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
456 & \ldots &
457 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
458 \end{array}
459 \right)
460 \cdot \, \ldots \, \cdot
461 \left(
462 \begin{array}{c}
463 \delta v^{\ast}_1 \\
464 \vdots \\
465 \delta v^{\ast}_{n} \\
466 \end{array}
467 \right)
468 \end{split}
469 \end{equation}
470
471 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
472 showing the advantage of the reverse over the forward mode
473 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
474 cost function $ {\cal J} $ with respect to {\it all} input
475 variables $u$
476 (or the sensitivity of the cost function with respect to
477 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
478 In order to be able to solve for each component of the gradient
479 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
480 a forward calculation has to be performed for each component separately,
481 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
482 for the $i$-th forward calculation.
483 Then, (\ref{forward}) represents the
484 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
485 The full gradient is retrieved from the $ m $ forward calculations.
486 In contrast, eq. (\ref{reverse}) yields the full
487 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
488 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
489
490 Note, that if $ {\cal J} $ is a vector-valued function
491 of dimension $ l > 1 $,
492 eq. (\ref{reverse}) has to be modified according to
493 \[
494 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
495 \, = \,
496 \nabla_u {\cal J}^T \cdot \delta \vec{J}
497 \]
498 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
499 dimension $ l $.
500 In this case $ l $ reverse simulations have to be performed
501 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
502 Then, the reverse mode is more efficient as long as
503 $ l < n $, otherwise the forward mode is preferable.
504 Strictly, the reverse mode is called adjoint mode only for
505 $ l = 1 $.
506
507 A detailed analysis of the underlying numerical operations
508 shows that the computation of $\nabla _{u}{\cal J}$ in this way
509 requires about 2 to 5 times the computation of the cost function.
510 Alternatively, the gradient vector could be approximated
511 by finite differences, requiring $m$ computations
512 of the perturbed cost function.
513
514 To conclude we give two examples of commonly used types
515 of cost functions:
516
517 \paragraph{Example 1:
518 $ {\cal J} = v_{j} (T) $} ~ \\
519 The cost function consists of the $j$-th component of the model state
520 $ \vec{v} $ at time $T$.
521 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
522 unit vector. The $ \nabla_u {\cal J}^T $
523 is the projection of the adjoint
524 operator onto the $j$-th component ${\bf f_{j}}$,
525 \[
526 \nabla_u {\cal J}^T
527 \, = \, M^T \cdot \nabla_v {\cal J}^T
528 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
529 \]
530
531 \paragraph{Example 2:
532 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
533 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
534 The cost function represents the quadratic model vs. data misfit.
535 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
536 operator which maps the model state space onto the data space.
537 Then, $ \nabla_v {\cal J} $ takes the form
538 %
539 \begin{equation*}
540 \begin{split}
541 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
542 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
543 ~ & = \, 2 \sum_{j} \left\{ \sum_k
544 \frac{\partial {\cal H}_k}{\partial v_{j}}
545 \left( {\cal H}_k (\vec{v}) - d_k \right)
546 \right\} \, {\vec{f}_{j}} \\
547 \end{split}
548 \end{equation*}
549 %
550 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
551 Jacobi matrix of the data projection operator.
552 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
553 adjoint operator,
554 driven by the model vs. data misfit:
555 \[
556 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
557 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
558 \]
559
560 \subsection{Storing vs. recomputation in reverse mode}
561 \label{checkpointing}
562
563 We note an important aspect of the forward vs. reverse
564 mode calculation.
565 Because of the local character of the derivative
566 (a derivative is defined w.r.t. a point along the trajectory),
567 the intermediate results of the model trajectory
568 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
569 may be required to evaluate the intermediate Jacobian
570 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
571 This is the case e.g. for nonlinear expressions
572 (momentum advection, nonlinear equation of state), state-dependent
573 conditional statements (parameterization schemes).
574 In the forward mode, the intermediate results are required
575 in the same order as computed by the full forward model ${\cal M}$,
576 but in the reverse mode they are required in the reverse order.
577 Thus, in the reverse mode the trajectory of the forward model
578 integration ${\cal M}$ has to be stored to be available in the reverse
579 calculation. Alternatively, the complete model state up to the
580 point of evaluation has to be recomputed whenever its value is required.
581
582 A method to balance the amount of recomputations vs.
583 storage requirements is called {\sf checkpointing}
584 (e.g. \cite{gri:92}, \cite{res-eta:98}).
585 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
586 [as an example, we give explicit numbers for a 3-day
587 integration with a 1-hourly timestep in square brackets].
588 \begin{itemize}
589 %
590 \item [$lev3$]
591 In a first step, the model trajectory is subdivided into
592 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
593 with the label $lev3$ for this outermost loop.
594 The model is then integrated along the full trajectory,
595 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
596 [i.e. 3 times, at
597 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
598 In addition, the cost function is computed, if needed.
599 %
600 \item [$lev2$]
601 In a second step each subsection itself is divided into
602 $ {n}^{lev2} $ subsections
603 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
604 The model picks up at the last outermost dumped state
605 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
606 the last subsection, with the label $lev2$ for this
607 intermediate loop.
608 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
609 timestep
610 [i.e. 4 times, at
611 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
612 %
613 \item [$lev1$]
614 Finally, the model picks up at the last intermediate dump state
615 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
616 the last subsection, with the label $lev1$ for this
617 intermediate loop.
618 Within this sub-subsection only, parts of the model state is stored
619 to memory at every timestep
620 [i.e. every hour $ i=0,...,5$ corresponding to
621 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
622 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
623 and the model state of all preceding timesteps along the last
624 innermost subsection are available, enabling integration backwards
625 in time along the last subsection.
626 The adjoint can thus be computed along this last
627 subsection $k_{n}^{lev2}$.
628 %
629 \end{itemize}
630 %
631 This procedure is repeated consecutively for each previous
632 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
633 carrying the adjoint computation to the initial time
634 of the subsection $k_{n}^{lev3}$.
635 Then, the procedure is repeated for the previous subsection
636 $k_{n-1}^{lev3}$
637 carrying the adjoint computation to the initial time
638 $k_{1}^{lev3}$.
639
640 For the full model trajectory of
641 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
642 the required storing of the model state was significantly reduced to
643 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
644 [i.e. for the 3-day integration with a total oof 72 timesteps
645 the model state was stored 7 times to disk and roughly 6 times
646 to memory].
647 This saving in memory comes at a cost of a required
648 3 full forward integrations of the model (one for each
649 checkpointing level).
650 The optimal balance of storage vs. recomputation certainly depends
651 on the computing resources available and may be adjusted by
652 adjusting the partitioning among the
653 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
654
655 \begin{figure}[t!]
656 \begin{center}
657 %\psdraft
658 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
659 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
660 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
661 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
662 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
663 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
664 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
665 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
666 %\mbox{\epsfig{file=s_autodiff/figs/checkpointing.eps, width=0.8\textwidth}}
667 \resizebox{5.5in}{!}{\includegraphics{s_autodiff/figs/checkpointing.eps}}
668 %\psfull
669 \end{center}
670 \caption{
671 Schematic view of intermediate dump and restart for
672 3-level checkpointing.}
673 \label{fig:3levelcheck}
674 \end{figure}
675
676 % \subsection{Optimal perturbations}
677 % \label{sec_optpert}
678
679
680 % \subsection{Error covariance estimate and Hessian matrix}
681 % \label{sec_hessian}
682
683 \newpage
684
685 %**********************************************************************
686 \section{TLM and ADM generation in general}
687 \label{sec_ad_setup_gen}
688 \begin{rawhtml}
689 <!-- CMIREDIR:sec_ad_setup_gen: -->
690 \end{rawhtml}
691 %**********************************************************************
692
693 In this section we describe in a general fashion
694 the parts of the code that are relevant for automatic
695 differentiation using the software tool TAF.
696 Modifications to use OpenAD are described in \ref{sec_ad_openad}.
697
698 \input{s_autodiff/text/doc_ad_the_model}
699
700 The basic flow is depicted in \ref{fig:adthemodel}.
701 If CPP option \texttt{ALLOW\_AUTODIFF\_TAMC} is defined,
702 the driver routine
703 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
704 invokes the adjoint of this routine, {\it adthe\_main\_loop}
705 (case \texttt{\#define ALLOW\_ADJOINT\_RUN}), or
706 the tangent linear of this routine {\it g\_the\_main\_loop}
707 (case \texttt{\#define ALLOW\_TANGENTLINEAR\_RUN}),
708 which are the toplevel routines in terms of automatic differentiation.
709 The routines {\it adthe\_main\_loop} or {\it g\_the\_main\_loop}
710 are generated by TAF.
711 It contains both the forward integration of the full model, the
712 cost function calculation,
713 any additional storing that is required for efficient checkpointing,
714 and the reverse integration of the adjoint model.
715
716 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
717
718 In Fig. \ref{fig:adthemodel}
719 the structure of {\it adthe\_main\_loop} has been strongly
720 simplified to focus on the essentials; in particular, no checkpointing
721 procedures are shown here.
722 Prior to the call of {\it adthe\_main\_loop}, the routine
723 {\it ctrl\_unpack} is invoked to unpack the control vector
724 or initialise the control variables.
725 Following the call of {\it adthe\_main\_loop},
726 the routine {\it ctrl\_pack}
727 is invoked to pack the control vector
728 (cf. Section \ref{section_ctrl}).
729 If gradient checks are to be performed, the option
730 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
731 the driver routine {\it grdchk\_main} is called after
732 the gradient has been computed via the adjoint
733 (cf. Section \ref{section_grdchk}).
734
735 %------------------------------------------------------------------
736
737 \subsection{General setup
738 \label{section_ad_setup}}
739
740 In order to configure AD-related setups the following packages need
741 to be enabled:
742 {\it
743 \begin{table}[h!]
744 \begin{tabular}{l}
745 autodiff \\
746 ctrl \\
747 cost \\
748 grdchk \\
749 \end{tabular}
750 \end{table}
751 }
752 The packages are enabled by adding them to your experiment-specific
753 configuration file
754 {\it packages.conf} (see Section ???).
755
756 The following AD-specific CPP option files need to be customized:
757 %
758 \begin{itemize}
759 %
760 \item {\it ECCO\_CPPOPTIONS.h} \\
761 This header file collects CPP options for the packages
762 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
763 the external forcing package {\it exf}.
764 \footnote{NOTE: These options are not set in their package-specific
765 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
766 in the single header file {\it ECCO\_CPPOPTIONS.h}.
767 The package-specific header files serve as simple
768 placeholders at this point.}
769 %
770 \item {\it tamc.h} \\
771 This header configures the splitting of the time stepping loop
772 w.r.t. the 3-level checkpointing (see section ???).
773
774 %
775 \end{itemize}
776
777 %------------------------------------------------------------------
778
779 \subsection{Building the AD code using TAF
780 \label{section_ad_build}}
781
782 The build process of an AD code is very similar to building
783 the forward model. However, depending on which AD code one wishes
784 to generate, and on which AD tool is available (TAF or TAMC),
785 the following {\tt make} targets are available:
786
787 \begin{table}[h!]
788 {\footnotesize
789 \begin{tabular}{|ccll|}
790 \hline
791 ~ & {\it AD-target} & {\it output} & {\it description} \\
792 \hline
793 \hline
794 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
795 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
796 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
797 ~ & ~ & ~ & useful for compiling on remote platforms \\
798 \hline
799 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
800 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
801 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
802 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
803 \hline
804 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
805 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
806 ~ & ~ & ~ & and compiles all code \\
807 ~ & ~ & ~ & (use of TAF is set as default) \\
808 \hline
809 \end{tabular}
810 }
811 \end{table}
812 %
813 Here, the following placeholders are used
814 %
815 \begin{itemize}
816 %
817 \item $<$TOOL$>$
818 %
819 \begin{itemize}
820 %
821 \item {\tt TAF}
822 \item {\tt TAMC}
823 %
824 \end{itemize}
825 %
826 \item $<$MODE$>$
827 %
828 \begin{itemize}
829 %
830 \item {\tt ad} generates the adjoint model (ADM)
831 \item {\tt ftl} generates the tangent linear model (TLM)
832 \item {\tt svd} generates both ADM and TLM for \\
833 singular value decomposition (SVD) type calculations
834 %
835 \end{itemize}
836 %
837 \end{itemize}
838
839 For example, to generate the adjoint model using TAF after routines ({\tt .F})
840 or headers ({\tt .h}) have been modified, but without compilation,
841 type {\tt make adtaf};
842 or, to generate the tangent linear model using TAMC without
843 re-generating the input code, type {\tt make ftltamconly}.
844
845
846 A typical full build process to generate the ADM via TAF would
847 look like follows:
848 \begin{verbatim}
849 % mkdir build
850 % cd build
851 % ../../../tools/genmake2 -mods=../code_ad
852 % make depend
853 % make adall
854 \end{verbatim}
855
856 %------------------------------------------------------------------
857
858 \subsection{The AD build process in detail
859 \label{section_ad_build_detail}}
860
861 The {\tt make <MODE>all} target consists of the following procedures:
862
863 \begin{enumerate}
864 %
865 \item
866 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
867 on which code ought to be generated. Depending on the {\tt make} target,
868 the contents is one of the following:
869 \begin{itemize}
870 \item
871 {\tt \#define ALLOW\_ADJOINT\_RUN}
872 \item
873 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
874 \item
875 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
876 \end{itemize}
877 %
878 \item
879 A single file {\tt <MODE>\_input\_code.f} is concatenated
880 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
881 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
882 %
883 \item
884 The AD tool is invoked with the {\tt <MODE>\_<TOOL>\_FLAGS}.
885 The default AD tool flags in {\tt genmake2} can be overrwritten by
886 an {\tt adjoint\_options} file (similar to the platform-specific
887 {\tt build\_options}, see Section ???.
888 The AD tool writes the resulting AD code into the file
889 {\tt <MODE>\_input\_code\_ad.f}
890 %
891 \item
892 A short sed script {\tt adjoint\_sed} is applied to
893 {\tt <MODE>\_input\_code\_ad.f}
894 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
895 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
896 %
897 \item
898 All routines are compiled and an executable is generated
899 (see Table ???).
900 %
901 \end{enumerate}
902
903 \subsubsection{The list AD\_FILES and {\tt .list} files}
904
905 Not all routines are presented to the AD tool.
906 Routines typically hidden are diagnostics routines which
907 do not influence the cost function, but may create
908 artificial flow dependencies such as I/O of active variables.
909
910 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
911 which contains all routines that are shown to the AD tool.
912 This list is put together from all files with suffix {\tt .list}
913 that {\tt genmake2} finds in its search directories.
914 The list file for the core MITgcm routines is in {\tt model/src/}
915 is called {\tt model\_ad\_diff.list}.
916 Note that no wrapper routine is shown to TAF. These are either
917 not visible at all to the AD code, or hand-written AD code
918 is available (see next section).
919
920 Each package directory contains its package-specific
921 list file {\tt <PKG>\_ad\_diff.list}. For example,
922 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
923 Thus, enabling a package will automatically extend the
924 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
925 package-specific routines.
926 Note that you will need to regenerate the {\tt Makefile} if
927 you enable a package (e.g. by adding it to {\tt packages.conf})
928 and a {\tt Makefile} already exists.
929
930 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
931
932 TAMC and TAF can evaluate user-specified directives
933 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
934 The main categories of directives are STORE directives and
935 FLOW directives. Here, we are concerned with flow directives,
936 store directives are treated elsewhere.
937
938 Flow directives enable the AD tool to evaluate how it should treat
939 routines that are 'hidden' by the user, i.e. routines which are
940 not contained in the {\bf AD\_FILES} list (see previous section),
941 but which are called in part of the code that the AD tool does see.
942 The flow directive tell the AD tool
943 %
944 \begin{itemize}
945 %
946 \item which subroutine arguments are input/output
947 \item which subroutine arguments are active
948 \item which subroutine arguments are required to compute the cost
949 \item which subroutine arguments are dependent
950 %
951 \end{itemize}
952 %
953 The syntax for the flow directives can be found in the
954 AD tool manuals.
955
956 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
957 which contains all files with suffix{\tt .flow} that it finds
958 in its search directories.
959 The flow directives for the core MITgcm routines of
960 {\tt eesupp/src/} and {\tt model/src/}
961 reside in {\tt pkg/autodiff/}.
962 This directory also contains hand-written adjoint code
963 for the MITgcm WRAPPER (section \ref{chap:sarch}).
964
965 Flow directives for package-specific routines are contained in
966 the corresponding package directories in the file
967 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
968 {\tt ptracers\_ad.flow}.
969
970 \subsubsection{Store directives for 3-level checkpointing}
971
972 The storing that is required at each period of the
973 3-level checkpointing is controled by three
974 top-level headers.
975
976 \begin{verbatim}
977 do ilev_3 = 1, nchklev_3
978 # include ``checkpoint_lev3.h''
979 do ilev_2 = 1, nchklev_2
980 # include ``checkpoint_lev2.h''
981 do ilev_1 = 1, nchklev_1
982 # include ``checkpoint_lev1.h''
983
984 ...
985
986 end do
987 end do
988 end do
989 \end{verbatim}
990
991 All files {\tt checkpoint\_lev?.h} are contained in directory
992 {\tt pkg/autodiff/}.
993
994
995 \subsubsection{Changing the default AD tool flags: ad\_options files}
996
997
998 \subsubsection{Hand-written adjoint code}
999
1000 %------------------------------------------------------------------
1001
1002 \subsection{The cost function (dependent variable)
1003 \label{section_cost}}
1004
1005 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
1006 It is a function of the input variables $ \vec{u} $ via the composition
1007 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
1008 The input are referred to as the
1009 {\sf independent variables} or {\sf control variables}.
1010 All aspects relevant to the treatment of the cost function $ {\cal J} $
1011 (parameter setting, initialization, accumulation,
1012 final evaluation), are controlled by the package {\it pkg/cost}.
1013 The aspects relevant to the treatment of the independent variables
1014 are controlled by the package {\it pkg/ctrl} and will be treated
1015 in the next section.
1016
1017 \input{s_autodiff/text/doc_cost_flow}
1018
1019 \subsubsection{Enabling the package}
1020
1021 \fbox{
1022 \begin{minipage}{12cm}
1023 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1024 \end{minipage}
1025 }
1026 \begin{itemize}
1027 %
1028 \item
1029 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1030 (see Section ???)
1031 %
1032 \item
1033
1034
1035 \end{itemize}
1036 %
1037
1038 N.B.: In general the following packages ought to be enabled
1039 simultaneously: {\it autodiff, cost, ctrl}.
1040 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1041 Each specific cost function contribution has its own option.
1042 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1043 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1044 Since the cost function is usually used in conjunction with
1045 automatic differentiation, the CPP option
1046 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1047 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1048 should be defined.
1049
1050 \subsubsection{Initialization}
1051 %
1052 The initialization of the {\it cost} package is readily enabled
1053 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1054 %
1055 \begin{itemize}
1056 %
1057 \item
1058 \fbox{
1059 \begin{minipage}{12cm}
1060 Parameters: {\it cost\_readparms}
1061 \end{minipage}
1062 }
1063 \\
1064 This S/R
1065 reads runtime flags and parameters from file {\it data.cost}.
1066 For the present example the only relevant parameter read
1067 is {\bf mult\_tracer}. This multiplier enables different
1068 cost function contributions to be switched on
1069 ( = 1.) or off ( = 0.) at runtime.
1070 For more complex cost functions which involve model vs. data
1071 misfits, the corresponding data filenames and data
1072 specifications (start date and time, period, ...) are read
1073 in this S/R.
1074 %
1075 \item
1076 \fbox{
1077 \begin{minipage}{12cm}
1078 Variables: {\it cost\_init}
1079 \end{minipage}
1080 }
1081 \\
1082 This S/R
1083 initializes the different cost function contributions.
1084 The contribution for the present example is {\bf objf\_tracer}
1085 which is defined on each tile (bi,bj).
1086 %
1087 \end{itemize}
1088 %
1089 \subsubsection{Accumulation}
1090 %
1091 \begin{itemize}
1092 %
1093 \item
1094 \fbox{
1095 \begin{minipage}{12cm}
1096 {\it cost\_tile}, {\it cost\_tracer}
1097 \end{minipage}
1098 }
1099 \end{itemize}
1100 %
1101 The 'driver' routine
1102 {\it cost\_tile} is called at the end of each time step.
1103 Within this 'driver' routine, S/R are called for each of
1104 the chosen cost function contributions.
1105 In the present example ({\bf ALLOW\_COST\_TRACER}),
1106 S/R {\it cost\_tracer} is called.
1107 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1108 %
1109 \subsubsection{Finalize all contributions}
1110 %
1111 \begin{itemize}
1112 %
1113 \item
1114 \fbox{
1115 \begin{minipage}{12cm}
1116 {\it cost\_final}
1117 \end{minipage}
1118 }
1119 \end{itemize}
1120 %
1121 At the end of the forward integration S/R {\it cost\_final}
1122 is called. It accumulates the total cost function {\bf fc}
1123 from each contribution and sums over all tiles:
1124 \begin{equation}
1125 {\cal J} \, = \,
1126 {\rm fc} \, = \,
1127 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1128 {\rm objf\_tracer}(bi,bj) \, + \, ...
1129 \end{equation}
1130 %
1131 The total cost function {\bf fc} will be the
1132 'dependent' variable in the argument list for TAF, i.e.
1133 \begin{verbatim}
1134 taf -output 'fc' ...
1135 \end{verbatim}
1136
1137 %%%% \end{document}
1138
1139 \input{s_autodiff/text/doc_ad_the_main}
1140
1141 \subsection{The control variables (independent variables)
1142 \label{section_ctrl}}
1143
1144 The control variables are a subset of the model input
1145 (initial conditions, boundary conditions, model parameters).
1146 Here we identify them with the variable $ \vec{u} $.
1147 All intermediate variables whose derivative w.r.t. control
1148 variables do not vanish are called {\sf active variables}.
1149 All subroutines whose derivative w.r.t. the control variables
1150 don't vanish are called {\sf active routines}.
1151 Read and write operations from and to file can be viewed
1152 as variable assignments. Therefore, files to which
1153 active variables are written and from which active variables
1154 are read are called {\sf active files}.
1155 All aspects relevant to the treatment of the control variables
1156 (parameter setting, initialization, perturbation)
1157 are controlled by the package {\it pkg/ctrl}.
1158
1159 \input{s_autodiff/text/doc_ctrl_flow}
1160
1161 \subsubsection{genmake and CPP options}
1162 %
1163 \begin{itemize}
1164 %
1165 \item
1166 \fbox{
1167 \begin{minipage}{12cm}
1168 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1169 \end{minipage}
1170 }
1171 \end{itemize}
1172 %
1173 To enable the directory to be included to the compile list,
1174 {\bf ctrl} has to be added to the {\bf enable} list in
1175 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1176 package, cf. previous section).
1177 Each control variable is enabled via its own CPP option
1178 in {\it ECCO\_CPPOPTIONS.h}.
1179
1180 \subsubsection{Initialization}
1181 %
1182 \begin{itemize}
1183 %
1184 \item
1185 \fbox{
1186 \begin{minipage}{12cm}
1187 Parameters: {\it ctrl\_readparms}
1188 \end{minipage}
1189 }
1190 \\
1191 %
1192 This S/R
1193 reads runtime flags and parameters from file {\it data.ctrl}.
1194 For the present example the file contains the file names
1195 of each control variable that is used.
1196 In addition, the number of wet points for each control
1197 variable and the net dimension of the space of control
1198 variables (counting wet points only) {\bf nvarlength}
1199 is determined.
1200 Masks for wet points for each tile {\bf (bi,\,bj)}
1201 and vertical layer {\bf k} are generated for the three
1202 relevant categories on the C-grid:
1203 {\bf nWetCtile} for tracer fields,
1204 {\bf nWetWtile} for zonal velocity fields,
1205 {\bf nWetStile} for meridional velocity fields.
1206 %
1207 \item
1208 \fbox{
1209 \begin{minipage}{12cm}
1210 Control variables, control vector,
1211 and their gradients: {\it ctrl\_unpack}
1212 \end{minipage}
1213 }
1214 \\
1215 %
1216 Two important issues related to the handling of the control
1217 variables in MITgcm need to be addressed.
1218 First, in order to save memory, the control variable arrays
1219 are not kept in memory, but rather read from file and added
1220 to the initial fields during the model initialization phase.
1221 Similarly, the corresponding adjoint fields which represent
1222 the gradient of the cost function w.r.t. the control variables
1223 are written to file at the end of the adjoint integration.
1224 Second, in addition to the files holding the 2-dim. and 3-dim.
1225 control variables and the corresponding cost gradients,
1226 a 1-dim. {\sf control vector}
1227 and {\sf gradient vector} are written to file. They contain
1228 only the wet points of the control variables and the corresponding
1229 gradient.
1230 This leads to a significant data compression.
1231 Furthermore, an option is available
1232 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1233 non-dimensionalise the control and gradient vector,
1234 which otherwise would contain different pieces of different
1235 magnitudes and units.
1236 Finally, the control and gradient vector can be passed to a
1237 minimization routine if an update of the control variables
1238 is sought as part of a minimization exercise.
1239
1240 The files holding fields and vectors of the control variables
1241 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1242 %
1243 \end{itemize}
1244
1245 \subsubsection{Perturbation of the independent variables}
1246 %
1247 The dependency flow for differentiation w.r.t. the controls
1248 starts with adding a perturbation onto the input variable,
1249 thus defining the independent or control variables for TAF.
1250 Three types of controls may be considered:
1251 %
1252 \begin{itemize}
1253 %
1254 \item
1255 \fbox{
1256 \begin{minipage}{12cm}
1257 {\it ctrl\_map\_ini} (initial value sensitivity):
1258 \end{minipage}
1259 }
1260 \\
1261 %
1262 Consider as an example the initial tracer distribution
1263 {\bf tr1} as control variable.
1264 After {\bf tr1} has been initialised in
1265 {\it ini\_tr1} (dynamical variables such as
1266 temperature and salinity are initialised in {\it ini\_fields}),
1267 a perturbation anomaly is added to the field in S/R
1268 {\it ctrl\_map\_ini}
1269 %
1270 \begin{equation}
1271 \begin{split}
1272 u & = \, u_{[0]} \, + \, \Delta u \\
1273 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1274 \label{perturb}
1275 \end{split}
1276 \end{equation}
1277 %
1278 {\bf xx\_tr1} is a 3-dim. global array
1279 holding the perturbation. In the case of a simple
1280 sensitivity study this array is identical to zero.
1281 However, it's specification is essential in the context
1282 of automatic differentiation since TAF
1283 treats the corresponding line in the code symbolically
1284 when determining the differentiation chain and its origin.
1285 Thus, the variable names are part of the argument list
1286 when calling TAF:
1287 %
1288 \begin{verbatim}
1289 taf -input 'xx_tr1 ...' ...
1290 \end{verbatim}
1291 %
1292 Now, as mentioned above, MITgcm avoids maintaining
1293 an array for each control variable by reading the
1294 perturbation to a temporary array from file.
1295 To ensure the symbolic link to be recognized by TAF, a scalar
1296 dummy variable {\bf xx\_tr1\_dummy} is introduced
1297 and an 'active read' routine of the adjoint support
1298 package {\it pkg/autodiff} is invoked.
1299 The read-procedure is tagged with the variable
1300 {\bf xx\_tr1\_dummy} enabling TAF to recognize the
1301 initialization of the perturbation.
1302 The modified call of TAF thus reads
1303 %
1304 \begin{verbatim}
1305 taf -input 'xx_tr1_dummy ...' ...
1306 \end{verbatim}
1307 %
1308 and the modified operation to (\ref{perturb})
1309 in the code takes on the form
1310 %
1311 \begin{verbatim}
1312 call active_read_xyz(
1313 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1314
1315 tr1(...) = tr1(...) + tmpfld3d(...)
1316 \end{verbatim}
1317 %
1318 Note, that reading an active variable corresponds
1319 to a variable assignment. Its derivative corresponds
1320 to a write statement of the adjoint variable, followed by
1321 a reset.
1322 The 'active file' routines have been designed
1323 to support active read and corresponding adjoint active write
1324 operations (and vice versa).
1325 %
1326 \item
1327 \fbox{
1328 \begin{minipage}{12cm}
1329 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1330 \end{minipage}
1331 }
1332 \\
1333 %
1334 The handling of boundary values as control variables
1335 proceeds exactly analogous to the initial values
1336 with the symbolic perturbation taking place in S/R
1337 {\it ctrl\_map\_forcing}.
1338 Note however an important difference:
1339 Since the boundary values are time dependent with a new
1340 forcing field applied at each time steps,
1341 the general problem may be thought of as
1342 a new control variable at each time step
1343 (or, if the perturbation is averaged over a certain period,
1344 at each $ N $ timesteps), i.e.
1345 \[
1346 u_{\rm forcing} \, = \,
1347 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1348 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1349 \]
1350 %
1351 In the current example an equilibrium state is considered,
1352 and only an initial perturbation to
1353 surface forcing is applied with respect to the
1354 equilibrium state.
1355 A time dependent treatment of the surface forcing is
1356 implemented in the ECCO environment, involving the
1357 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1358 %
1359 \item
1360 \fbox{
1361 \begin{minipage}{12cm}
1362 {\it ctrl\_map\_params} (parameter sensitivity):
1363 \end{minipage}
1364 }
1365 \\
1366 %
1367 This routine is not yet implemented, but would proceed
1368 proceed along the same lines as the initial value sensitivity.
1369 The mixing parameters {\bf diffkr} and {\bf kapgm}
1370 are currently added as controls in {\it ctrl\_map\_ini.F}.
1371 %
1372 \end{itemize}
1373 %
1374
1375 \subsubsection{Output of adjoint variables and gradient}
1376 %
1377 Several ways exist to generate output of adjoint fields.
1378 %
1379 \begin{itemize}
1380 %
1381 \item
1382 \fbox{
1383 \begin{minipage}{12cm}
1384 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1385 \end{minipage}
1386 }
1387 \\
1388 \begin{itemize}
1389 %
1390 \item {\bf xx\_...}: the control variable fields \\
1391 Before the forward integration, the control
1392 variables are read from file {\bf xx\_ ...} and added to
1393 the model field.
1394 %
1395 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1396 $ \nabla _{u}{\cal J} $ for each control variable \\
1397 After the adjoint integration the corresponding adjoint
1398 variables are written to {\bf adxx\_ ...}.
1399 %
1400 \end{itemize}
1401 %
1402 \item
1403 \fbox{
1404 \begin{minipage}{12cm}
1405 {\it ctrl\_unpack, ctrl\_pack}:
1406 \end{minipage}
1407 }
1408 \\
1409 %
1410 \begin{itemize}
1411 %
1412 \item {\bf vector\_ctrl}: the control vector \\
1413 At the very beginning of the model initialization,
1414 the updated compressed control vector is read (or initialised)
1415 and distributed to 2-dim. and 3-dim. control variable fields.
1416 %
1417 \item {\bf vector\_grad}: the gradient vector \\
1418 At the very end of the adjoint integration,
1419 the 2-dim. and 3-dim. adjoint variables are read,
1420 compressed to a single vector and written to file.
1421 %
1422 \end{itemize}
1423 %
1424 \item
1425 \fbox{
1426 \begin{minipage}{12cm}
1427 {\it addummy\_in\_stepping}:
1428 \end{minipage}
1429 }
1430 \\
1431 In addition to writing the gradient at the end of the
1432 forward/adjoint integration, many more adjoint variables
1433 of the model state
1434 at intermediate times can be written using S/R
1435 {\it addummy\_in\_stepping}.
1436 This routine is part of the adjoint support package
1437 {\it pkg/autodiff} (cf.f. below).
1438 The procedure is enabled using via the CPP-option
1439 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1440 To be part of the adjoint code, the corresponding S/R
1441 {\it dummy\_in\_stepping} has to be called in the forward
1442 model (S/R {\it the\_main\_loop}) at the appropriate place.
1443 The adjoint common blocks are extracted from the adjoint code
1444 via the header file {\it adcommon.h}.
1445
1446 {\it dummy\_in\_stepping} is essentially empty,
1447 the corresponding adjoint routine is hand-written rather
1448 than generated automatically.
1449 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1450 ensure that TAMC does not automatically
1451 generate {\it addummy\_in\_stepping} by trying to differentiate
1452 {\it dummy\_in\_stepping}, but instead refers to
1453 the hand-written routine.
1454
1455 {\it dummy\_in\_stepping} is called in the forward code
1456 at the beginning of each
1457 timestep, before the call to {\it dynamics}, thus ensuring
1458 that {\it addummy\_in\_stepping} is called at the end of
1459 each timestep in the adjoint calculation, after the call to
1460 {\it addynamics}.
1461
1462 {\it addummy\_in\_stepping} includes the header files
1463 {\it adcommon.h}.
1464 This header file is also hand-written. It contains
1465 the common blocks
1466 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1467 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1468 {\bf /adtr1\_r/}, {\bf /adffields/},
1469 which have been extracted from the adjoint code to enable
1470 access to the adjoint variables.
1471
1472 {\bf WARNING:} If the structure of the common blocks
1473 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1474 similar changes will occur in the adjoint common blocks.
1475 Therefore, consistency between the TAMC-generated common blocks
1476 and those in {\it adcommon.h} have to be checked.
1477 %
1478 \end{itemize}
1479
1480
1481 \subsubsection{Control variable handling for
1482 optimization applications}
1483
1484 In optimization mode the cost function $ {\cal J}(u) $ is sought
1485 to be minimized with respect to a set of control variables
1486 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1487 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1488 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1489 at iteration step $ k $ serve
1490 as input to a minimization routine (e.g. quasi-Newton method,
1491 conjugate gradient, ... \cite{gil-lem:89})
1492 to compute an update in the
1493 control variable for iteration step $k+1$
1494 \[
1495 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1496 \quad \mbox{satisfying} \quad
1497 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1498 \]
1499 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1500 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1501 $ k+1 $.
1502 Tab. \ref{???} sketches the flow between forward/adjoint model
1503 and the minimization routine.
1504
1505 \begin{eqnarray*}
1506 \scriptsize
1507 \begin{array}{ccccc}
1508 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1509 {\Big\downarrow}
1510 & ~ & ~ & ~ & ~ \\
1511 ~ & ~ & ~ & ~ & ~ \\
1512 \hline
1513 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1514 \multicolumn{1}{|c}{
1515 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1516 \stackrel{\bf forward}{\bf \longrightarrow} &
1517 v_{[k]} = M \left( u_{[k]} \right) &
1518 \stackrel{\bf forward}{\bf \longrightarrow} &
1519 \multicolumn{1}{c|}{
1520 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1521 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1522 \hline
1523 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1524 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1525 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1526 \hline
1527 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1528 \multicolumn{1}{|c}{
1529 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1530 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1531 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1532 ad \, v_{[k]} (\delta {\cal J}) =
1533 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1534 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1535 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1536 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1537 \hline
1538 ~ & ~ & ~ & ~ & ~ \\
1539 \hspace*{15ex}{\Bigg\downarrow}
1540 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1541 & ~ & ~ & ~ & ~ \\
1542 ~ & ~ & ~ & ~ & ~ \\
1543 \hline
1544 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1545 \multicolumn{1}{|c}{
1546 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1547 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1548 {\mathbf \longrightarrow} &
1549 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1550 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1551 \hline
1552 ~ & ~ & ~ & ~ & ~ \\
1553 ~ & ~ & ~ & ~ & \Big\downarrow \\
1554 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1555 \end{array}
1556 \end{eqnarray*}
1557
1558 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1559 the link between the model and the minimization routine.
1560 As described in Section \ref{???}
1561 the {\it unpack} and {\it pack} routines read and write
1562 control and gradient {\it vectors} which are compressed
1563 to contain only wet points, in addition to the full
1564 2-dim. and 3-dim. fields.
1565 The corresponding I/O flow looks as follows:
1566
1567 \vspace*{0.5cm}
1568
1569 {\scriptsize
1570 \begin{tabular}{ccccc}
1571 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1572 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1573 \cline{1-1}
1574 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1575 \cline{1-1}
1576 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1577 \cline{3-3}
1578 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1579 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1580 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1581 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1582 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1583 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1584 & ~ & ~ \\
1585 \cline{3-3}
1586 ~ & ~ & $\downarrow$ & ~ & ~ \\
1587 \cline{3-3}
1588 ~ & ~ &
1589 \multicolumn{1}{|c|}{~} & ~ &
1590 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1591 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1592 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1593 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1594 ~ & ~ & \multicolumn{1}{|c|}{~}
1595 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1596 \cline{3-3}
1597 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1598 \cline{5-5}
1599 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1600 \cline{5-5}
1601 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1602 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1603 \end{tabular}
1604 }
1605
1606 \vspace*{0.5cm}
1607
1608
1609 {\it ctrl\_unpack} reads the updated control vector
1610 {\bf vector\_ctrl\_$<$k$>$}.
1611 It distributes the different control variables to
1612 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1613 At the start of the forward integration the control variables
1614 are read from {\it xx\_...$<$k$>$} and added to the
1615 field.
1616 Correspondingly, at the end of the adjoint integration
1617 the adjoint fields are written
1618 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1619 Finally, {\it ctrl\_pack} collects all adjoint files
1620 and writes them to the compressed vector file
1621 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22