/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.20 - (show annotations) (download) (as text)
Wed Apr 5 02:27:33 2006 UTC (19 years, 3 months ago) by edhill
Branch: MAIN
Changes since 1.19: +33 -40 lines
File MIME type: application/x-tex
refer to the model uniformly as "MITgcm" and treat it as a proper noun

1 % $Header: /u/gcmpack/manual/part5/doc_ad_2.tex,v 1.19 2005/08/02 22:26:58 heimbach Exp $
2 % $Name: $
3
4 Author: Patrick Heimbach
5
6 {\sf Automatic differentiation} (AD), also referred to as algorithmic
7 (or, more loosely, computational) differentiation, involves
8 automatically deriving code to calculate partial derivatives from an
9 existing fully non-linear prognostic code. (see \cite{gri:00}). A
10 software tool is used that parses and transforms source files
11 according to a set of linguistic and mathematical rules. AD tools are
12 like source-to-source translators in that they parse a program code as
13 input and produce a new program code as output. However, unlike a
14 pure source-to-source translation, the output program represents a new
15 algorithm, such as the evaluation of the Jacobian, the Hessian, or
16 higher derivative operators. In principle, a variety of derived
17 algorithms can be generated automatically in this way.
18
19 MITgcm has been adapted for use with the Tangent linear and Adjoint
20 Model Compiler (TAMC) and its successor TAF (Transformation of
21 Algorithms in Fortran), developed by Ralf Giering (\cite{gie-kam:98},
22 \cite{gie:99,gie:00}). The first application of the adjoint of MITgcm
23 for sensitivity studies has been published by \cite{maro-eta:99}.
24 \cite{sta-eta:97,sta-eta:01} use MITgcm and its adjoint for ocean
25 state estimation studies. In the following we shall refer to TAMC and
26 TAF synonymously, except were explicitly stated otherwise.
27
28 TAMC exploits the chain rule for computing the first derivative of a
29 function with respect to a set of input variables. Treating a given
30 forward code as a composition of operations -- each line representing
31 a compositional element, the chain rule is rigorously applied to the
32 code, line by line. The resulting tangent linear or adjoint code,
33 then, may be thought of as the composition in forward or reverse
34 order, respectively, of the Jacobian matrices of the forward code's
35 compositional elements.
36
37 %**********************************************************************
38 \section{Some basic algebra}
39 \label{sec_ad_algebra}
40 \begin{rawhtml}
41 <!-- CMIREDIR:sec_ad_algebra: -->
42 \end{rawhtml}
43 %**********************************************************************
44
45 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
46 mapping from the $m$-dimensional space
47 $U \subset I\!\!R^m$ of input variables
48 $\vec{u}=(u_1,\ldots,u_m)$
49 (model parameters, initial conditions, boundary conditions
50 such as forcing functions) to the $n$-dimensional space
51 $V \subset I\!\!R^n$ of
52 model output variable $\vec{v}=(v_1,\ldots,v_n)$
53 (model state, model diagnostics, objective function, ...)
54 under consideration,
55 %
56 \begin{equation}
57 \begin{split}
58 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
59 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
60 {\cal M}(\vec{u})
61 \label{fulloperator}
62 \end{split}
63 \end{equation}
64 %
65 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
66 some given basis vectors
67 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
68 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
69 \[
70 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
71 \qquad
72 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
73 \]
74
75 Two routes may be followed to determine the sensitivity of the
76 output variable $\vec{v}$ to its input $\vec{u}$.
77
78 \subsection{Forward or direct sensitivity}
79 %
80 Consider a perturbation to the input variables $\delta \vec{u}$
81 (typically a single component
82 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
83 Their effect on the output may be obtained via the linear
84 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
85 $ M $, evaluated in the point $u^{(0)}$ according to
86 %
87 \begin{equation}
88 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
89 \label{tangent_linear}
90 \end{equation}
91 with resulting output perturbation $\delta \vec{v}$.
92 In components
93 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
94 it reads
95 %
96 \begin{equation}
97 \delta v_{j} \, = \, \sum_{i}
98 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
99 \delta u_{i}
100 \label{jacobi_matrix}
101 \end{equation}
102 %
103 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
104 In contrast to the full nonlinear model $ {\cal M} $, the operator
105 $ M $ is just a matrix
106 which can readily be used to find the forward sensitivity of $\vec{v}$ to
107 perturbations in $u$,
108 but if there are very many input variables $(\gg O(10^{6})$ for
109 large-scale oceanographic application), it quickly becomes
110 prohibitive to proceed directly as in (\ref{tangent_linear}),
111 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
112
113 \subsection{Reverse or adjoint sensitivity}
114 %
115 Let us consider the special case of a
116 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
117 the total meridional heat transport,
118 the total uptake of $CO_{2}$ in the Southern
119 Ocean over a time interval,
120 or a measure of some model-to-data misfit)
121 %
122 \begin{eqnarray}
123 \begin{array}{cccccc}
124 {\cal J} \, : & U &
125 \longrightarrow & V &
126 \longrightarrow & I \!\! R \\
127 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
128 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
129 \end{array}
130 \label{compo}
131 \end{eqnarray}
132 %
133 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
134 \[
135 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
136 \]
137 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
138 w.r.t. their corresponding inner product
139 $\left\langle \,\, , \,\, \right\rangle $
140 %
141 \begin{equation}
142 \begin{split}
143 {\cal J} & = \,
144 {\cal J} |_{\vec{u}^{(0)}} \, + \,
145 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
146 \, + \, O(\delta \vec{u}^2) \\
147 ~ & = \,
148 {\cal J} |_{\vec{v}^{(0)}} \, + \,
149 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
150 \, + \, O(\delta \vec{v}^2)
151 \end{split}
152 \label{deljidentity}
153 \end{equation}
154 %
155 (note, that the gradient $ \nabla f $ is a co-vector, therefore
156 its transpose is required in the above inner product).
157 Then, using the representation of
158 $ \delta {\cal J} =
159 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
160 the definition
161 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
162 \[
163 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
164 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
165 \]
166 which for finite-dimensional vector spaces is just the
167 transpose of $ A $,
168 \[
169 A^{\ast} \, = \, A^T
170 \]
171 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
172 we note that
173 (omitting $|$'s):
174 %
175 \begin{equation}
176 \delta {\cal J}
177 \, = \,
178 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
179 \, = \,
180 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
181 \, = \,
182 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
183 \delta \vec{u} \, \right\rangle
184 \label{inner}
185 \end{equation}
186 %
187 With the identity (\ref{deljidentity}), we then find that
188 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
189 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
190 %
191 \begin{equation}
192 \begin{split}
193 \nabla _{u}{\cal J}^T |_{\vec{u}} &
194 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
195 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
196 ~ & = \, \delta \vec{u}^{\ast}
197 \end{split}
198 \label{adjoint}
199 \end{equation}
200 %
201 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
202 in which $M^T$ is the adjoint (here, the transpose) of the
203 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
204 the adjoint variable of the model state $ \vec{v} $, and
205 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
206
207 The {\sf reverse} nature of the adjoint calculation can be readily
208 seen as follows.
209 Consider a model integration which consists of $ \Lambda $
210 consecutive operations
211 $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
212 ...... ( {\cal M}_{\lambda} (
213 ......
214 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
215 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
216 in the code of the model, or successive time steps of the
217 model integration,
218 starting at step 0 and moving up to step $\Lambda$, with intermediate
219 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
220 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
221 Let ${\cal J}$ be a cost function which explicitly depends on the
222 final state $\vec{v}$ only
223 (this restriction is for clarity reasons only).
224 %
225 ${\cal J}(u)$ may be decomposed according to:
226 %
227 \begin{equation}
228 {\cal J}({\cal M}(\vec{u})) \, = \,
229 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
230 ...... ( {\cal M}_{\lambda} (
231 ......
232 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
233 \label{compos}
234 \end{equation}
235 %
236 Then, according to the chain rule, the forward calculation reads,
237 in terms of the Jacobi matrices
238 (we've omitted the $ | $'s which, nevertheless are important
239 to the aspect of {\it tangent} linearity;
240 note also that by definition
241 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
242 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
243 %
244 \begin{equation}
245 \begin{split}
246 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
247 \nabla_v {\cal J} \cdot M_{\Lambda}
248 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
249 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
250 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
251 \end{split}
252 \label{forward}
253 \end{equation}
254 %
255 whereas in reverse mode we have
256 %
257 \begin{equation}
258 \boxed{
259 \begin{split}
260 M^T ( \nabla_v {\cal J}^T) & = \,
261 M_{0}^T \cdot M_{1}^T
262 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
263 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
264 ~ & = \, M_{0}^T \cdot M_{1}^T
265 \cdot ...... \cdot
266 \nabla_{v^{(\lambda)}} {\cal J}^T \\
267 ~ & = \, \nabla_u {\cal J}^T
268 \end{split}
269 }
270 \label{reverse}
271 \end{equation}
272 %
273 clearly expressing the reverse nature of the calculation.
274 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
275 If the intermediate steps $\lambda$ in
276 eqn. (\ref{compos}) -- (\ref{reverse})
277 represent the model state (forward or adjoint) at each
278 intermediate time step as noted above, then correspondingly,
279 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
280 \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
281 It thus becomes evident that the adjoint calculation also
282 yields the adjoint of each model state component
283 $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
284 %
285 \begin{equation}
286 \boxed{
287 \begin{split}
288 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
289 & = \,
290 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
291 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
292 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
293 \end{split}
294 }
295 \end{equation}
296 %
297 in close analogy to eq. (\ref{adjoint})
298 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
299 are the Lagrange multipliers of the model equations which determine
300 $ \vec{v}^{(\lambda)}$.
301
302 In components, eq. (\ref{adjoint}) reads as follows.
303 Let
304 \[
305 \begin{array}{rclcrcl}
306 \delta \vec{u} & = &
307 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
308 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
309 \left(
310 \frac{\partial {\cal J}}{\partial u_1},\ldots,
311 \frac{\partial {\cal J}}{\partial u_m}
312 \right)^T \\
313 \delta \vec{v} & = &
314 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
315 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
316 \left(
317 \frac{\partial {\cal J}}{\partial v_1},\ldots,
318 \frac{\partial {\cal J}}{\partial v_n}
319 \right)^T \\
320 \end{array}
321 \]
322 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
323 and their adjoint variables;
324 further
325 \[
326 M \, = \, \left(
327 \begin{array}{ccc}
328 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
329 \frac{\partial {\cal M}_1}{\partial u_m} \\
330 \vdots & ~ & \vdots \\
331 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
332 \frac{\partial {\cal M}_n}{\partial u_m} \\
333 \end{array}
334 \right)
335 \]
336 is the Jacobi matrix of $ {\cal M} $
337 (an $ n \times m $ matrix)
338 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
339 \[
340 \delta v_{j}
341 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
342 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
343 \delta u_{i}
344 \]
345 %
346 Then eq. (\ref{adjoint}) takes the form
347 \[
348 \delta u_{i}^{\ast}
349 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
350 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
351 \delta v_{j}^{\ast}
352 \]
353 %
354 or
355 %
356 \[
357 \left(
358 \begin{array}{c}
359 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
360 \vdots \\
361 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
362 \end{array}
363 \right)
364 \, = \,
365 \left(
366 \begin{array}{ccc}
367 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
368 & \ldots &
369 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
370 \vdots & ~ & \vdots \\
371 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
372 & \ldots &
373 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
374 \end{array}
375 \right)
376 \cdot
377 \left(
378 \begin{array}{c}
379 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
380 \vdots \\
381 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
382 \end{array}
383 \right)
384 \]
385 %
386 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
387 of any intermediate state $ v^{(\lambda)} $
388 may be obtained, using the intermediate Jacobian
389 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
390 %
391 \[
392 M_{\lambda} \, = \,
393 \left(
394 \begin{array}{ccc}
395 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
396 & \ldots &
397 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
398 \vdots & ~ & \vdots \\
399 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
400 & \ldots &
401 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
402 \end{array}
403 \right)
404 \]
405 %
406 and the shorthand notation for the adjoint variables
407 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
408 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
409 for intermediate components, yielding
410 \begin{equation}
411 \small
412 \begin{split}
413 \left(
414 \begin{array}{c}
415 \delta v^{(\lambda) \, \ast}_1 \\
416 \vdots \\
417 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
418 \end{array}
419 \right)
420 \, = &
421 \left(
422 \begin{array}{ccc}
423 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
424 & \ldots \,\, \ldots &
425 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
426 \vdots & ~ & \vdots \\
427 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
428 & \ldots \,\, \ldots &
429 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
430 \end{array}
431 \right)
432 \cdot
433 %
434 \\ ~ & ~
435 \\ ~ &
436 %
437 \left(
438 \begin{array}{ccc}
439 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
440 & \ldots &
441 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
442 \vdots & ~ & \vdots \\
443 \vdots & ~ & \vdots \\
444 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
445 & \ldots &
446 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
447 \end{array}
448 \right)
449 \cdot \, \ldots \, \cdot
450 \left(
451 \begin{array}{c}
452 \delta v^{\ast}_1 \\
453 \vdots \\
454 \delta v^{\ast}_{n} \\
455 \end{array}
456 \right)
457 \end{split}
458 \end{equation}
459
460 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
461 showing the advantage of the reverse over the forward mode
462 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
463 cost function $ {\cal J} $ with respect to {\it all} input
464 variables $u$
465 (or the sensitivity of the cost function with respect to
466 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
467 In order to be able to solve for each component of the gradient
468 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
469 a forward calculation has to be performed for each component separately,
470 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
471 for the $i$-th forward calculation.
472 Then, (\ref{forward}) represents the
473 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
474 The full gradient is retrieved from the $ m $ forward calculations.
475 In contrast, eq. (\ref{reverse}) yields the full
476 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
477 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
478
479 Note, that if $ {\cal J} $ is a vector-valued function
480 of dimension $ l > 1 $,
481 eq. (\ref{reverse}) has to be modified according to
482 \[
483 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
484 \, = \,
485 \nabla_u {\cal J}^T \cdot \delta \vec{J}
486 \]
487 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
488 dimension $ l $.
489 In this case $ l $ reverse simulations have to be performed
490 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
491 Then, the reverse mode is more efficient as long as
492 $ l < n $, otherwise the forward mode is preferable.
493 Strictly, the reverse mode is called adjoint mode only for
494 $ l = 1 $.
495
496 A detailed analysis of the underlying numerical operations
497 shows that the computation of $\nabla _{u}{\cal J}$ in this way
498 requires about 2 to 5 times the computation of the cost function.
499 Alternatively, the gradient vector could be approximated
500 by finite differences, requiring $m$ computations
501 of the perturbed cost function.
502
503 To conclude we give two examples of commonly used types
504 of cost functions:
505
506 \paragraph{Example 1:
507 $ {\cal J} = v_{j} (T) $} ~ \\
508 The cost function consists of the $j$-th component of the model state
509 $ \vec{v} $ at time $T$.
510 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
511 unit vector. The $ \nabla_u {\cal J}^T $
512 is the projection of the adjoint
513 operator onto the $j$-th component ${\bf f_{j}}$,
514 \[
515 \nabla_u {\cal J}^T
516 \, = \, M^T \cdot \nabla_v {\cal J}^T
517 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
518 \]
519
520 \paragraph{Example 2:
521 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
522 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
523 The cost function represents the quadratic model vs. data misfit.
524 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
525 operator which maps the model state space onto the data space.
526 Then, $ \nabla_v {\cal J} $ takes the form
527 %
528 \begin{equation*}
529 \begin{split}
530 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
531 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
532 ~ & = \, 2 \sum_{j} \left\{ \sum_k
533 \frac{\partial {\cal H}_k}{\partial v_{j}}
534 \left( {\cal H}_k (\vec{v}) - d_k \right)
535 \right\} \, {\vec{f}_{j}} \\
536 \end{split}
537 \end{equation*}
538 %
539 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
540 Jacobi matrix of the data projection operator.
541 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
542 adjoint operator,
543 driven by the model vs. data misfit:
544 \[
545 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
546 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
547 \]
548
549 \subsection{Storing vs. recomputation in reverse mode}
550 \label{checkpointing}
551
552 We note an important aspect of the forward vs. reverse
553 mode calculation.
554 Because of the local character of the derivative
555 (a derivative is defined w.r.t. a point along the trajectory),
556 the intermediate results of the model trajectory
557 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
558 may be required to evaluate the intermediate Jacobian
559 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
560 This is the case e.g. for nonlinear expressions
561 (momentum advection, nonlinear equation of state), state-dependent
562 conditional statements (parameterization schemes).
563 In the forward mode, the intermediate results are required
564 in the same order as computed by the full forward model ${\cal M}$,
565 but in the reverse mode they are required in the reverse order.
566 Thus, in the reverse mode the trajectory of the forward model
567 integration ${\cal M}$ has to be stored to be available in the reverse
568 calculation. Alternatively, the complete model state up to the
569 point of evaluation has to be recomputed whenever its value is required.
570
571 A method to balance the amount of recomputations vs.
572 storage requirements is called {\sf checkpointing}
573 (e.g. \cite{gri:92}, \cite{res-eta:98}).
574 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
575 [as an example, we give explicit numbers for a 3-day
576 integration with a 1-hourly timestep in square brackets].
577 \begin{itemize}
578 %
579 \item [$lev3$]
580 In a first step, the model trajectory is subdivided into
581 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
582 with the label $lev3$ for this outermost loop.
583 The model is then integrated along the full trajectory,
584 and the model state stored to disk only at every $ k_{i}^{lev3} $-th timestep
585 [i.e. 3 times, at
586 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
587 In addition, the cost function is computed, if needed.
588 %
589 \item [$lev2$]
590 In a second step each subsection itself is divided into
591 $ {n}^{lev2} $ subsections
592 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
593 The model picks up at the last outermost dumped state
594 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
595 the last subsection, with the label $lev2$ for this
596 intermediate loop.
597 The model state is now stored to disk at every $ k_{i}^{lev2} $-th
598 timestep
599 [i.e. 4 times, at
600 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
601 %
602 \item [$lev1$]
603 Finally, the model picks up at the last intermediate dump state
604 $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
605 the last subsection, with the label $lev1$ for this
606 intermediate loop.
607 Within this sub-subsection only, parts of the model state is stored
608 to memory at every timestep
609 [i.e. every hour $ i=0,...,5$ corresponding to
610 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
611 The final state $ v_n = v_{k_{n}^{lev1}} $ is reached
612 and the model state of all preceding timesteps along the last
613 innermost subsection are available, enabling integration backwards
614 in time along the last subsection.
615 The adjoint can thus be computed along this last
616 subsection $k_{n}^{lev2}$.
617 %
618 \end{itemize}
619 %
620 This procedure is repeated consecutively for each previous
621 subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
622 carrying the adjoint computation to the initial time
623 of the subsection $k_{n}^{lev3}$.
624 Then, the procedure is repeated for the previous subsection
625 $k_{n-1}^{lev3}$
626 carrying the adjoint computation to the initial time
627 $k_{1}^{lev3}$.
628
629 For the full model trajectory of
630 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
631 the required storing of the model state was significantly reduced to
632 $ n^{lev2} + n^{lev3} $ to disk and roughly $ n^{lev1} $ to memory
633 [i.e. for the 3-day integration with a total oof 72 timesteps
634 the model state was stored 7 times to disk and roughly 6 times
635 to memory].
636 This saving in memory comes at a cost of a required
637 3 full forward integrations of the model (one for each
638 checkpointing level).
639 The optimal balance of storage vs. recomputation certainly depends
640 on the computing resources available and may be adjusted by
641 adjusting the partitioning among the
642 $ n^{lev3}, \,\, n^{lev2}, \,\, n^{lev1} $.
643
644 \begin{figure}[t!]
645 \begin{center}
646 %\psdraft
647 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
648 %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
649 %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
650 %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
651 %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
652 %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
653 %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
654 %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
655 %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
656 \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
657 %\psfull
658 \end{center}
659 \caption{
660 Schematic view of intermediate dump and restart for
661 3-level checkpointing.}
662 \label{fig:3levelcheck}
663 \end{figure}
664
665 % \subsection{Optimal perturbations}
666 % \label{sec_optpert}
667
668
669 % \subsection{Error covariance estimate and Hessian matrix}
670 % \label{sec_hessian}
671
672 \newpage
673
674 %**********************************************************************
675 \section{TLM and ADM generation in general}
676 \label{sec_ad_setup_gen}
677 \begin{rawhtml}
678 <!-- CMIREDIR:sec_ad_setup_gen: -->
679 \end{rawhtml}
680 %**********************************************************************
681
682 In this section we describe in a general fashion
683 the parts of the code that are relevant for automatic
684 differentiation using the software tool TAF.
685
686 \input{part5/doc_ad_the_model}
687
688 The basic flow is depicted in \ref{fig:adthemodel}.
689 If CPP option \texttt{ALLOW\_AUTODIFF\_TAMC} is defined,
690 the driver routine
691 {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
692 invokes the adjoint of this routine, {\it adthe\_main\_loop}
693 (case \texttt{\#define ALLOW\_ADJOINT\_RUN}), or
694 the tangent linear of this routine {\it g\_the\_main\_loop}
695 (case \texttt{\#define ALLOW\_TANGENTLINEAR\_RUN}),
696 which are the toplevel routines in terms of automatic differentiation.
697 The routines {\it adthe\_main\_loop} or {\it g\_the\_main\_loop}
698 are generated by TAF.
699 It contains both the forward integration of the full model, the
700 cost function calculation,
701 any additional storing that is required for efficient checkpointing,
702 and the reverse integration of the adjoint model.
703
704 [DESCRIBE IN A SEPARATE SECTION THE WORKING OF THE TLM]
705
706 In Fig. \ref{fig:adthemodel}
707 the structure of {\it adthe\_main\_loop} has been strongly
708 simplified to focus on the essentials; in particular, no checkpointing
709 procedures are shown here.
710 Prior to the call of {\it adthe\_main\_loop}, the routine
711 {\it ctrl\_unpack} is invoked to unpack the control vector
712 or initialise the control variables.
713 Following the call of {\it adthe\_main\_loop},
714 the routine {\it ctrl\_pack}
715 is invoked to pack the control vector
716 (cf. Section \ref{section_ctrl}).
717 If gradient checks are to be performed, the option
718 {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
719 the driver routine {\it grdchk\_main} is called after
720 the gradient has been computed via the adjoint
721 (cf. Section \ref{section_grdchk}).
722
723 %------------------------------------------------------------------
724
725 \subsection{General setup
726 \label{section_ad_setup}}
727
728 In order to configure AD-related setups the following packages need
729 to be enabled:
730 {\it
731 \begin{table}[h!]
732 \begin{tabular}{l}
733 autodiff \\
734 ctrl \\
735 cost \\
736 grdchk \\
737 \end{tabular}
738 \end{table}
739 }
740 The packages are enabled by adding them to your experiment-specific
741 configuration file
742 {\it packages.conf} (see Section ???).
743
744 The following AD-specific CPP option files need to be customized:
745 %
746 \begin{itemize}
747 %
748 \item {\it ECCO\_CPPOPTIONS.h} \\
749 This header file collects CPP options for the packages
750 {\it autodiff, cost, ctrl} as well as AD-unrelated options for
751 the external forcing package {\it exf}.
752 \footnote{NOTE: These options are not set in their package-specific
753 headers such as {\it COST\_CPPOPTIONS.h}, but are instead collected
754 in the single header file {\it ECCO\_CPPOPTIONS.h}.
755 The package-specific header files serve as simple
756 placeholders at this point.}
757 %
758 \item {\it tamc.h} \\
759 This header configures the splitting of the time stepping loop
760 w.r.t. the 3-level checkpointing (see section ???).
761
762 %
763 \end{itemize}
764
765 %------------------------------------------------------------------
766
767 \subsection{Building the AD code
768 \label{section_ad_build}}
769
770 The build process of an AD code is very similar to building
771 the forward model. However, depending on which AD code one wishes
772 to generate, and on which AD tool is available (TAF or TAMC),
773 the following {\tt make} targets are available:
774
775 \begin{table}[h!]
776 {\footnotesize
777 \begin{tabular}{ccll}
778 ~ & {\it AD-target} & {\it output} & {\it description} \\
779 \hline
780 \hline
781 (1) & {\tt <MODE><TOOL>only} & {\tt <MODE>\_<TOOL>\_output.f} &
782 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
783 ~ & ~ & ~ & no {\tt make} dependencies on {\tt .F .h} \\
784 ~ & ~ & ~ & useful for compiling on remote platforms \\
785 \hline
786 (2) & {\tt <MODE><TOOL>} & {\tt <MODE>\_<TOOL>\_output.f} &
787 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
788 ~ & ~ & ~ & includes {\tt make} dependencies on {\tt .F .h} \\
789 ~ & ~ & ~ & i.e. input for $<$TOOL$>$ may be re-generated \\
790 \hline
791 (3) & {\tt <MODE>all} & {\tt mitgcmuv\_<MODE>} &
792 generates code for $<$MODE$>$ using $<$TOOL$>$ \\
793 ~ & ~ & ~ & and compiles all code \\
794 ~ & ~ & ~ & (use of TAF is set as default) \\
795 \hline
796 \hline
797 \end{tabular}
798 }
799 \end{table}
800 %
801 Here, the following placeholders are used
802 %
803 \begin{itemize}
804 %
805 \item [$<$TOOL$>$]
806 %
807 \begin{itemize}
808 %
809 \item {\tt TAF}
810 \item {\tt TAMC}
811 %
812 \end{itemize}
813 %
814 \item [$<$MODE$>$]
815 %
816 \begin{itemize}
817 %
818 \item {\tt ad} generates the adjoint model (ADM)
819 \item {\tt ftl} generates the tangent linear model (TLM)
820 \item {\tt svd} generates both ADM and TLM for \\
821 singular value decomposition (SVD) type calculations
822 %
823 \end{itemize}
824 %
825 \end{itemize}
826
827 For example, to generate the adjoint model using TAF after routines ({\tt .F})
828 or headers ({\tt .h}) have been modified, but without compilation,
829 type {\tt make adtaf};
830 or, to generate the tangent linear model using TAMC without
831 re-generating the input code, type {\tt make ftltamconly}.
832
833
834 A typical full build process to generate the ADM via TAF would
835 look like follows:
836 \begin{verbatim}
837 % mkdir build
838 % cd build
839 % ../../../tools/genmake2 -mods=../code_ad
840 % make depend
841 % make adall
842 \end{verbatim}
843
844 %------------------------------------------------------------------
845
846 \subsection{The AD build process in detail
847 \label{section_ad_build_detail}}
848
849 The {\tt make <MODE>all} target consists of the following procedures:
850
851 \begin{enumerate}
852 %
853 \item
854 A header file {\tt AD\_CONFIG.h} is generated which contains a CPP option
855 on which code ought to be generated. Depending on the {\tt make} target,
856 the contents is
857 \begin{itemize}
858 \item
859 {\tt \#define ALLOW\_ADJOINT\_RUN}
860 \item
861 {\tt \#define ALLOW\_TANGENTLINEAR\_RUN}
862 \item
863 {\tt \#define ALLOW\_ECCO\_OPTIMIZATION}
864 \end{itemize}
865 %
866 \item
867 A single file {\tt <MODE>\_input\_code.f} is concatenated
868 consisting of all {\tt .f} files that are part of the list {\bf AD\_FILES}
869 and all {\tt .flow} files that are part of the list {\bf AD\_FLOW\_FILES}.
870 %
871 \item
872 The AD tool is invoked with the {\bf <MODE>\_<TOOL>\_FLAGS}.
873 The default AD tool flags in {\tt genmake2} can be overrwritten by
874 an {\tt adjoint\_options} file (similar to the platform-specific
875 {\tt build\_options}, see Section ???.
876 The AD tool writes the resulting AD code into the file
877 {\tt <MODE>\_input\_code\_ad.f}
878 %
879 \item
880 A short sed script {\tt adjoint\_sed} is applied to
881 {\tt <MODE>\_input\_code\_ad.f}
882 to reinstate {\bf myThid} into the CALL argument list of active file I/O.
883 The result is written to file {\tt <MODE>\_<TOOL>\_output.f}.
884 %
885 \item
886 All routines are compiled and an executable is generated
887 (see Table ???).
888 %
889 \end{enumerate}
890
891 \subsubsection{The list AD\_FILES and {\tt .list} files}
892
893 Not all routines are presented to the AD tool.
894 Routines typically hidden are diagnostics routines which
895 do not influence the cost function, but may create
896 artificial flow dependencies such as I/O of active variables.
897
898 {\tt genmake2} generates a list (or variable) {\bf AD\_FILES}
899 which contains all routines that are shown to the AD tool.
900 This list is put together from all files with suffix {\tt .list}
901 that {\tt genmake2} finds in its search directories.
902 The list file for the core MITgcm routines is in {\tt model/src/}
903 is called {\tt model\_ad\_diff.list}.
904 Note that no wrapper routine is shown to TAF. These are either
905 not visible at all to the AD code, or hand-written AD code
906 is available (see next section).
907
908 Each package directory contains its package-specific
909 list file {\tt <PKG>\_ad\_diff.list}. For example,
910 {\tt pkg/ptracers/} contains the file {\tt ptracers\_ad\_diff.list}.
911 Thus, enabling a package will automatically extend the
912 {\bf AD\_FILES} list of {\tt genmake2} to incorporate the
913 package-specific routines.
914 Note that you will need to regenerate the {\tt Makefile} if
915 you enable a package (e.g. by adding it to {\tt packages.conf})
916 and a {\tt Makefile} already exists.
917
918 \subsubsection{The list AD\_FLOW\_FILES and {\tt .flow} files}
919
920 TAMC and TAF can evaluate user-specified directives
921 that start with a specific syntax ({\tt CADJ}, {\tt C\$TAF}, {\tt !\$TAF}).
922 The main categories of directives are STORE directives and
923 FLOW directives. Here, we are concerned with flow directives,
924 store directives are treated elsewhere.
925
926 Flow directives enable the AD tool to evaluate how it should treat
927 routines that are 'hidden' by the user, i.e. routines which are
928 not contained in the {\bf AD\_FILES} list (see previous section),
929 but which are called in part of the code that the AD tool does see.
930 The flow directive tell the AD tool
931 %
932 \begin{itemize}
933 %
934 \item which subroutine arguments are input/output
935 \item which subroutine arguments are active
936 \item which subroutine arguments are required to compute the cost
937 \item which subroutine arguments are dependent
938 %
939 \end{itemize}
940 %
941 The syntax for the flow directives can be found in the
942 AD tool manuals.
943
944 {\tt genmake2} generates a list (or variable) {\bf AD\_FLOW\_FILES}
945 which contains all files with suffix{\tt .flow} that it finds
946 in its search directories.
947 The flow directives for the core MITgcm routines of
948 {\tt eesupp/src/} and {\tt model/src/}
949 reside in {\tt pkg/autodiff/}.
950 This directory also contains hand-written adjoint code
951 for the MITgcm WRAPPER (section \ref{chap:sarch}).
952
953 Flow directives for package-specific routines are contained in
954 the corresponding package directories in the file
955 {\tt <PKG>\_ad.flow}, e.g. ptracers-specific directives are in
956 {\tt ptracers\_ad.flow}.
957
958 \subsubsection{Store directives for 3-level checkpointing}
959
960 The storing that is required at each period of the
961 3-level checkpointing is controled by three
962 top-level headers.
963
964 \begin{verbatim}
965 do ilev_3 = 1, nchklev_3
966 # include ``checkpoint_lev3.h''
967 do ilev_2 = 1, nchklev_2
968 # include ``checkpoint_lev2.h''
969 do ilev_1 = 1, nchklev_1
970 # include ``checkpoint_lev1.h''
971
972 ...
973
974 end do
975 end do
976 end do
977 \end{verbatim}
978
979 All files {\tt checkpoint\_lev?.h} are contained in directory
980 {\tt pkg/autodiff/}.
981
982
983 \subsubsection{Changing the default AD tool flags: ad\_options files}
984
985
986 \subsubsection{Hand-written adjoint code}
987
988 %------------------------------------------------------------------
989
990 \subsection{The cost function (dependent variable)
991 \label{section_cost}}
992
993 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
994 It is a function of the input variables $ \vec{u} $ via the composition
995 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
996 The input are referred to as the
997 {\sf independent variables} or {\sf control variables}.
998 All aspects relevant to the treatment of the cost function $ {\cal J} $
999 (parameter setting, initialization, accumulation,
1000 final evaluation), are controlled by the package {\it pkg/cost}.
1001 The aspects relevant to the treatment of the independent variables
1002 are controlled by the package {\it pkg/ctrl} and will be treated
1003 in the next section.
1004
1005 \input{part5/doc_cost_flow}
1006
1007 \subsubsection{Enabling the package}
1008
1009 \fbox{
1010 \begin{minipage}{12cm}
1011 {\it packages.conf}, {\it ECCO\_CPPOPTIONS.h}
1012 \end{minipage}
1013 }
1014 \begin{itemize}
1015 %
1016 \item
1017 The package is enabled by adding {\it cost} to your file {\it packages.conf}
1018 (see Section ???)
1019 %
1020 \item
1021
1022
1023 \end{itemize}
1024 %
1025
1026 N.B.: In general the following packages ought to be enabled
1027 simultaneously: {\it autodiff, cost, ctrl}.
1028 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1029 Each specific cost function contribution has its own option.
1030 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1031 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1032 Since the cost function is usually used in conjunction with
1033 automatic differentiation, the CPP option
1034 {\bf ALLOW\_ADJOINT\_RUN} (file {\it CPP\_OPTIONS.h}) and
1035 {\bf ALLOW\_AUTODIFF\_TAMC} (file {\it ECCO\_CPPOPTIONS.h})
1036 should be defined.
1037
1038 \subsubsection{Initialization}
1039 %
1040 The initialization of the {\it cost} package is readily enabled
1041 as soon as the CPP option {\bf ALLOW\_COST} is defined.
1042 %
1043 \begin{itemize}
1044 %
1045 \item
1046 \fbox{
1047 \begin{minipage}{12cm}
1048 Parameters: {\it cost\_readparms}
1049 \end{minipage}
1050 }
1051 \\
1052 This S/R
1053 reads runtime flags and parameters from file {\it data.cost}.
1054 For the present example the only relevant parameter read
1055 is {\bf mult\_tracer}. This multiplier enables different
1056 cost function contributions to be switched on
1057 ( = 1.) or off ( = 0.) at runtime.
1058 For more complex cost functions which involve model vs. data
1059 misfits, the corresponding data filenames and data
1060 specifications (start date and time, period, ...) are read
1061 in this S/R.
1062 %
1063 \item
1064 \fbox{
1065 \begin{minipage}{12cm}
1066 Variables: {\it cost\_init}
1067 \end{minipage}
1068 }
1069 \\
1070 This S/R
1071 initializes the different cost function contributions.
1072 The contribution for the present example is {\bf objf\_tracer}
1073 which is defined on each tile (bi,bj).
1074 %
1075 \end{itemize}
1076 %
1077 \subsubsection{Accumulation}
1078 %
1079 \begin{itemize}
1080 %
1081 \item
1082 \fbox{
1083 \begin{minipage}{12cm}
1084 {\it cost\_tile}, {\it cost\_tracer}
1085 \end{minipage}
1086 }
1087 \end{itemize}
1088 %
1089 The 'driver' routine
1090 {\it cost\_tile} is called at the end of each time step.
1091 Within this 'driver' routine, S/R are called for each of
1092 the chosen cost function contributions.
1093 In the present example ({\bf ALLOW\_COST\_TRACER}),
1094 S/R {\it cost\_tracer} is called.
1095 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1096 %
1097 \subsubsection{Finalize all contributions}
1098 %
1099 \begin{itemize}
1100 %
1101 \item
1102 \fbox{
1103 \begin{minipage}{12cm}
1104 {\it cost\_final}
1105 \end{minipage}
1106 }
1107 \end{itemize}
1108 %
1109 At the end of the forward integration S/R {\it cost\_final}
1110 is called. It accumulates the total cost function {\bf fc}
1111 from each contribution and sums over all tiles:
1112 \begin{equation}
1113 {\cal J} \, = \,
1114 {\rm fc} \, = \,
1115 {\rm mult\_tracer} \sum_{\text{global sum}} \sum_{bi,\,bj}^{nSx,\,nSy}
1116 {\rm objf\_tracer}(bi,bj) \, + \, ...
1117 \end{equation}
1118 %
1119 The total cost function {\bf fc} will be the
1120 'dependent' variable in the argument list for TAMC, i.e.
1121 \begin{verbatim}
1122 tamc -output 'fc' ...
1123 \end{verbatim}
1124
1125 %%%% \end{document}
1126
1127 \input{part5/doc_ad_the_main}
1128
1129 \subsection{The control variables (independent variables)
1130 \label{section_ctrl}}
1131
1132 The control variables are a subset of the model input
1133 (initial conditions, boundary conditions, model parameters).
1134 Here we identify them with the variable $ \vec{u} $.
1135 All intermediate variables whose derivative w.r.t. control
1136 variables do not vanish are called {\sf active variables}.
1137 All subroutines whose derivative w.r.t. the control variables
1138 don't vanish are called {\sf active routines}.
1139 Read and write operations from and to file can be viewed
1140 as variable assignments. Therefore, files to which
1141 active variables are written and from which active variables
1142 are read are called {\sf active files}.
1143 All aspects relevant to the treatment of the control variables
1144 (parameter setting, initialization, perturbation)
1145 are controlled by the package {\it pkg/ctrl}.
1146
1147 \input{part5/doc_ctrl_flow}
1148
1149 \subsubsection{genmake and CPP options}
1150 %
1151 \begin{itemize}
1152 %
1153 \item
1154 \fbox{
1155 \begin{minipage}{12cm}
1156 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1157 \end{minipage}
1158 }
1159 \end{itemize}
1160 %
1161 To enable the directory to be included to the compile list,
1162 {\bf ctrl} has to be added to the {\bf enable} list in
1163 {\it .genmakerc} or in {\it genmake} itself (analogous to {\it cost}
1164 package, cf. previous section).
1165 Each control variable is enabled via its own CPP option
1166 in {\it ECCO\_CPPOPTIONS.h}.
1167
1168 \subsubsection{Initialization}
1169 %
1170 \begin{itemize}
1171 %
1172 \item
1173 \fbox{
1174 \begin{minipage}{12cm}
1175 Parameters: {\it ctrl\_readparms}
1176 \end{minipage}
1177 }
1178 \\
1179 %
1180 This S/R
1181 reads runtime flags and parameters from file {\it data.ctrl}.
1182 For the present example the file contains the file names
1183 of each control variable that is used.
1184 In addition, the number of wet points for each control
1185 variable and the net dimension of the space of control
1186 variables (counting wet points only) {\bf nvarlength}
1187 is determined.
1188 Masks for wet points for each tile {\bf (bi,\,bj)}
1189 and vertical layer {\bf k} are generated for the three
1190 relevant categories on the C-grid:
1191 {\bf nWetCtile} for tracer fields,
1192 {\bf nWetWtile} for zonal velocity fields,
1193 {\bf nWetStile} for meridional velocity fields.
1194 %
1195 \item
1196 \fbox{
1197 \begin{minipage}{12cm}
1198 Control variables, control vector,
1199 and their gradients: {\it ctrl\_unpack}
1200 \end{minipage}
1201 }
1202 \\
1203 %
1204 Two important issues related to the handling of the control
1205 variables in MITgcm need to be addressed.
1206 First, in order to save memory, the control variable arrays
1207 are not kept in memory, but rather read from file and added
1208 to the initial fields during the model initialization phase.
1209 Similarly, the corresponding adjoint fields which represent
1210 the gradient of the cost function w.r.t. the control variables
1211 are written to file at the end of the adjoint integration.
1212 Second, in addition to the files holding the 2-dim. and 3-dim.
1213 control variables and the corresponding cost gradients,
1214 a 1-dim. {\sf control vector}
1215 and {\sf gradient vector} are written to file. They contain
1216 only the wet points of the control variables and the corresponding
1217 gradient.
1218 This leads to a significant data compression.
1219 Furthermore, an option is available
1220 ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
1221 non-dimensionalise the control and gradient vector,
1222 which otherwise would contain different pieces of different
1223 magnitudes and units.
1224 Finally, the control and gradient vector can be passed to a
1225 minimization routine if an update of the control variables
1226 is sought as part of a minimization exercise.
1227
1228 The files holding fields and vectors of the control variables
1229 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1230 %
1231 \end{itemize}
1232
1233 \subsubsection{Perturbation of the independent variables}
1234 %
1235 The dependency flow for differentiation w.r.t. the controls
1236 starts with adding a perturbation onto the input variable,
1237 thus defining the independent or control variables for TAMC.
1238 Three types of controls may be considered:
1239 %
1240 \begin{itemize}
1241 %
1242 \item
1243 \fbox{
1244 \begin{minipage}{12cm}
1245 {\it ctrl\_map\_ini} (initial value sensitivity):
1246 \end{minipage}
1247 }
1248 \\
1249 %
1250 Consider as an example the initial tracer distribution
1251 {\bf tr1} as control variable.
1252 After {\bf tr1} has been initialised in
1253 {\it ini\_tr1} (dynamical variables such as
1254 temperature and salinity are initialised in {\it ini\_fields}),
1255 a perturbation anomaly is added to the field in S/R
1256 {\it ctrl\_map\_ini}
1257 %
1258 \begin{equation}
1259 \begin{split}
1260 u & = \, u_{[0]} \, + \, \Delta u \\
1261 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1262 \label{perturb}
1263 \end{split}
1264 \end{equation}
1265 %
1266 {\bf xx\_tr1} is a 3-dim. global array
1267 holding the perturbation. In the case of a simple
1268 sensitivity study this array is identical to zero.
1269 However, it's specification is essential in the context
1270 of automatic differentiation since TAMC
1271 treats the corresponding line in the code symbolically
1272 when determining the differentiation chain and its origin.
1273 Thus, the variable names are part of the argument list
1274 when calling TAMC:
1275 %
1276 \begin{verbatim}
1277 tamc -input 'xx_tr1 ...' ...
1278 \end{verbatim}
1279 %
1280 Now, as mentioned above, MITgcm avoids maintaining
1281 an array for each control variable by reading the
1282 perturbation to a temporary array from file.
1283 To ensure the symbolic link to be recognized by TAMC, a scalar
1284 dummy variable {\bf xx\_tr1\_dummy} is introduced
1285 and an 'active read' routine of the adjoint support
1286 package {\it pkg/autodiff} is invoked.
1287 The read-procedure is tagged with the variable
1288 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1289 initialization of the perturbation.
1290 The modified call of TAMC thus reads
1291 %
1292 \begin{verbatim}
1293 tamc -input 'xx_tr1_dummy ...' ...
1294 \end{verbatim}
1295 %
1296 and the modified operation to (\ref{perturb})
1297 in the code takes on the form
1298 %
1299 \begin{verbatim}
1300 call active_read_xyz(
1301 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1302
1303 tr1(...) = tr1(...) + tmpfld3d(...)
1304 \end{verbatim}
1305 %
1306 Note, that reading an active variable corresponds
1307 to a variable assignment. Its derivative corresponds
1308 to a write statement of the adjoint variable, followed by
1309 a reset.
1310 The 'active file' routines have been designed
1311 to support active read and corresponding adjoint active write
1312 operations (and vice versa).
1313 %
1314 \item
1315 \fbox{
1316 \begin{minipage}{12cm}
1317 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1318 \end{minipage}
1319 }
1320 \\
1321 %
1322 The handling of boundary values as control variables
1323 proceeds exactly analogous to the initial values
1324 with the symbolic perturbation taking place in S/R
1325 {\it ctrl\_map\_forcing}.
1326 Note however an important difference:
1327 Since the boundary values are time dependent with a new
1328 forcing field applied at each time steps,
1329 the general problem may be thought of as
1330 a new control variable at each time step
1331 (or, if the perturbation is averaged over a certain period,
1332 at each $ N $ timesteps), i.e.
1333 \[
1334 u_{\rm forcing} \, = \,
1335 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1336 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1337 \]
1338 %
1339 In the current example an equilibrium state is considered,
1340 and only an initial perturbation to
1341 surface forcing is applied with respect to the
1342 equilibrium state.
1343 A time dependent treatment of the surface forcing is
1344 implemented in the ECCO environment, involving the
1345 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1346 %
1347 \item
1348 \fbox{
1349 \begin{minipage}{12cm}
1350 {\it ctrl\_map\_params} (parameter sensitivity):
1351 \end{minipage}
1352 }
1353 \\
1354 %
1355 This routine is not yet implemented, but would proceed
1356 proceed along the same lines as the initial value sensitivity.
1357 The mixing parameters {\bf diffkr} and {\bf kapgm}
1358 are currently added as controls in {\it ctrl\_map\_ini.F}.
1359 %
1360 \end{itemize}
1361 %
1362
1363 \subsubsection{Output of adjoint variables and gradient}
1364 %
1365 Several ways exist to generate output of adjoint fields.
1366 %
1367 \begin{itemize}
1368 %
1369 \item
1370 \fbox{
1371 \begin{minipage}{12cm}
1372 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1373 \end{minipage}
1374 }
1375 \\
1376 \begin{itemize}
1377 %
1378 \item {\bf xx\_...}: the control variable fields \\
1379 Before the forward integration, the control
1380 variables are read from file {\bf xx\_ ...} and added to
1381 the model field.
1382 %
1383 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1384 $ \nabla _{u}{\cal J} $ for each control variable \\
1385 After the adjoint integration the corresponding adjoint
1386 variables are written to {\bf adxx\_ ...}.
1387 %
1388 \end{itemize}
1389 %
1390 \item
1391 \fbox{
1392 \begin{minipage}{12cm}
1393 {\it ctrl\_unpack, ctrl\_pack}:
1394 \end{minipage}
1395 }
1396 \\
1397 %
1398 \begin{itemize}
1399 %
1400 \item {\bf vector\_ctrl}: the control vector \\
1401 At the very beginning of the model initialization,
1402 the updated compressed control vector is read (or initialised)
1403 and distributed to 2-dim. and 3-dim. control variable fields.
1404 %
1405 \item {\bf vector\_grad}: the gradient vector \\
1406 At the very end of the adjoint integration,
1407 the 2-dim. and 3-dim. adjoint variables are read,
1408 compressed to a single vector and written to file.
1409 %
1410 \end{itemize}
1411 %
1412 \item
1413 \fbox{
1414 \begin{minipage}{12cm}
1415 {\it addummy\_in\_stepping}:
1416 \end{minipage}
1417 }
1418 \\
1419 In addition to writing the gradient at the end of the
1420 forward/adjoint integration, many more adjoint variables
1421 of the model state
1422 at intermediate times can be written using S/R
1423 {\it addummy\_in\_stepping}.
1424 This routine is part of the adjoint support package
1425 {\it pkg/autodiff} (cf.f. below).
1426 The procedure is enabled using via the CPP-option
1427 {\bf ALLOW\_AUTODIFF\_MONITOR} (file {\it ECCO\_CPPOPTIONS.h}).
1428 To be part of the adjoint code, the corresponding S/R
1429 {\it dummy\_in\_stepping} has to be called in the forward
1430 model (S/R {\it the\_main\_loop}) at the appropriate place.
1431 The adjoint common blocks are extracted from the adjoint code
1432 via the header file {\it adcommon.h}.
1433
1434 {\it dummy\_in\_stepping} is essentially empty,
1435 the corresponding adjoint routine is hand-written rather
1436 than generated automatically.
1437 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1438 ensure that TAMC does not automatically
1439 generate {\it addummy\_in\_stepping} by trying to differentiate
1440 {\it dummy\_in\_stepping}, but instead refers to
1441 the hand-written routine.
1442
1443 {\it dummy\_in\_stepping} is called in the forward code
1444 at the beginning of each
1445 timestep, before the call to {\it dynamics}, thus ensuring
1446 that {\it addummy\_in\_stepping} is called at the end of
1447 each timestep in the adjoint calculation, after the call to
1448 {\it addynamics}.
1449
1450 {\it addummy\_in\_stepping} includes the header files
1451 {\it adcommon.h}.
1452 This header file is also hand-written. It contains
1453 the common blocks
1454 {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1455 {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1456 {\bf /adtr1\_r/}, {\bf /adffields/},
1457 which have been extracted from the adjoint code to enable
1458 access to the adjoint variables.
1459
1460 {\bf WARNING:} If the structure of the common blocks
1461 {\bf /dynvars\_r/}, {\bf /dynvars\_cd/}, etc., changes
1462 similar changes will occur in the adjoint common blocks.
1463 Therefore, consistency between the TAMC-generated common blocks
1464 and those in {\it adcommon.h} have to be checked.
1465 %
1466 \end{itemize}
1467
1468
1469 \subsubsection{Control variable handling for
1470 optimization applications}
1471
1472 In optimization mode the cost function $ {\cal J}(u) $ is sought
1473 to be minimized with respect to a set of control variables
1474 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1475 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1476 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1477 at iteration step $ k $ serve
1478 as input to a minimization routine (e.g. quasi-Newton method,
1479 conjugate gradient, ... \cite{gil-lem:89})
1480 to compute an update in the
1481 control variable for iteration step $k+1$
1482 \[
1483 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1484 \quad \mbox{satisfying} \quad
1485 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1486 \]
1487 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1488 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1489 $ k+1 $.
1490 Tab. \ref{???} sketches the flow between forward/adjoint model
1491 and the minimization routine.
1492
1493 \begin{eqnarray*}
1494 \scriptsize
1495 \begin{array}{ccccc}
1496 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1497 {\Big\downarrow}
1498 & ~ & ~ & ~ & ~ \\
1499 ~ & ~ & ~ & ~ & ~ \\
1500 \hline
1501 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1502 \multicolumn{1}{|c}{
1503 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1504 \stackrel{\bf forward}{\bf \longrightarrow} &
1505 v_{[k]} = M \left( u_{[k]} \right) &
1506 \stackrel{\bf forward}{\bf \longrightarrow} &
1507 \multicolumn{1}{c|}{
1508 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1509 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1510 \hline
1511 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1512 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1513 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1514 \hline
1515 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1516 \multicolumn{1}{|c}{
1517 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1518 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1519 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1520 ad \, v_{[k]} (\delta {\cal J}) =
1521 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1522 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1523 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1524 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1525 \hline
1526 ~ & ~ & ~ & ~ & ~ \\
1527 \hspace*{15ex}{\Bigg\downarrow}
1528 \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1529 & ~ & ~ & ~ & ~ \\
1530 ~ & ~ & ~ & ~ & ~ \\
1531 \hline
1532 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1533 \multicolumn{1}{|c}{
1534 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1535 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1536 {\mathbf \longrightarrow} &
1537 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1538 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1539 \hline
1540 ~ & ~ & ~ & ~ & ~ \\
1541 ~ & ~ & ~ & ~ & \Big\downarrow \\
1542 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1543 \end{array}
1544 \end{eqnarray*}
1545
1546 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1547 the link between the model and the minimization routine.
1548 As described in Section \ref{???}
1549 the {\it unpack} and {\it pack} routines read and write
1550 control and gradient {\it vectors} which are compressed
1551 to contain only wet points, in addition to the full
1552 2-dim. and 3-dim. fields.
1553 The corresponding I/O flow looks as follows:
1554
1555 \vspace*{0.5cm}
1556
1557 {\scriptsize
1558 \begin{tabular}{ccccc}
1559 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1560 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1561 \cline{1-1}
1562 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1563 \cline{1-1}
1564 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1565 \cline{3-3}
1566 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1567 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1568 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1569 $\stackrel{\mbox{read}}{\longrightarrow}$ &
1570 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1571 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1572 & ~ & ~ \\
1573 \cline{3-3}
1574 ~ & ~ & $\downarrow$ & ~ & ~ \\
1575 \cline{3-3}
1576 ~ & ~ &
1577 \multicolumn{1}{|c|}{~} & ~ &
1578 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1579 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1580 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1581 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1582 ~ & ~ & \multicolumn{1}{|c|}{~}
1583 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1584 \cline{3-3}
1585 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1586 \cline{5-5}
1587 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1588 \cline{5-5}
1589 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1590 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1591 \end{tabular}
1592 }
1593
1594 \vspace*{0.5cm}
1595
1596
1597 {\it ctrl\_unpack} reads the updated control vector
1598 {\bf vector\_ctrl\_$<$k$>$}.
1599 It distributes the different control variables to
1600 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1601 At the start of the forward integration the control variables
1602 are read from {\it xx\_...$<$k$>$} and added to the
1603 field.
1604 Correspondingly, at the end of the adjoint integration
1605 the adjoint fields are written
1606 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1607 Finally, {\it ctrl\_pack} collects all adjoint files
1608 and writes them to the compressed vector file
1609 {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22