/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Contents of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1.1.1 - (show annotations) (download) (as text) (vendor branch)
Wed Aug 8 16:16:26 2001 UTC (23 years, 11 months ago) by adcroft
Branch: dummy
CVS Tags: Import
Changes since 1.1: +0 -0 lines
File MIME type: application/x-tex
Error occurred while calculating annotation data.
Importing model documentation into CVS

1 % $Header: $
2 % $Name: $
3
4 {\sf Automatic differentiation} (AD), also referred to as algorithmic
5 (or, more loosely, computational) differentiation, involves
6 automatically deriving code to calculate
7 partial derivatives from an existing fully non-linear prognostic code.
8 (see \cite{gri:00}).
9 A software tool is used that parses and transforms source files
10 according to a set of linguistic and mathematical rules.
11 AD tools are like source-to-source translators in that
12 they parse a program code as input and produce a new program code
13 as output.
14 However, unlike a pure source-to-source translation, the output program
15 represents a new algorithm, such as the evaluation of the
16 Jacobian, the Hessian, or higher derivative operators.
17 In principle, a variety of derived algorithms
18 can be generated automatically in this way.
19
20 The MITGCM has been adapted for use with the
21 Tangent linear and Adjoint Model Compiler (TAMC) and its succssor TAF
22 (Transformation of Algorithms in Fortran), developed
23 by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
24 The first application of the adjoint of the MITGCM for senistivity
25 studies has been published by \cite{maro-eta:99}.
26 \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
27 for ocean state estimation studies.
28
29 TAMC exploits the chain rule for computing the first
30 derivative of a function with
31 respect to a set of input variables.
32 Treating a given forward code as a composition of operations --
33 each line representing a compositional element -- the chain rule is
34 rigorously applied to the code, line by line. The resulting
35 tangent linear or adjoint code,
36 then, may be thought of as the composition in
37 forward or reverse order, respectively, of the
38 Jacobian matrices of the forward code compositional elements.
39
40 %**********************************************************************
41 \section{Some basic algebra}
42 \label{sec_ad_algebra}
43 %**********************************************************************
44
45 Let $ \cal{M} $ be a general nonlinear, model, i.e. a
46 mapping from the $m$-dimensional space
47 $U \subset I\!\!R^m$ of input variables
48 $\vec{u}=(u_1,\ldots,u_m)$
49 (model parameters, initial conditions, boundary conditions
50 such as forcing functions) to the $n$-dimensional space
51 $V \subset I\!\!R^n$ of
52 model output variable $\vec{v}=(v_1,\ldots,v_n)$
53 (model state, model diagnostcs, objective function, ...)
54 under consideration,
55 %
56 \begin{equation}
57 \begin{split}
58 {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
59 ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
60 {\cal M}(\vec{u})
61 \label{fulloperator}
62 \end{split}
63 \end{equation}
64 %
65 The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
66 some given basis vectors
67 $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
68 $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
69 \[
70 \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
71 \qquad
72 \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
73 \]
74
75 Two routes may be followed to determine the sensitivity of the
76 output variable $\vec{v}$ to its input $\vec{u}$.
77
78 \subsection{Forward or direct sensitivity}
79 %
80 Consider a perturbation to the input variables $\delta \vec{u}$
81 (typically a single component
82 $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
83 Their effect on the output may be obtained via the linear
84 approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
85 $ M $, evaluated in the point $u^{(0)}$ according to
86 %
87 \begin{equation}
88 \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
89 \label{tangent_linear}
90 \end{equation}
91 with resulting output perturbation $\delta \vec{v}$.
92 In components
93 $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
94 it reads
95 %
96 \begin{equation}
97 \delta v_{j} \, = \, \sum_{i}
98 \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
99 \delta u_{i}
100 \label{jacobi_matrix}
101 \end{equation}
102 %
103 Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
104 In contrast to the full nonlinear model $ {\cal M} $, the operator
105 $ M $ is just a matrix
106 which can readily be used to find the forward sensitivity of $\vec{v}$ to
107 perturbations in $u$,
108 but if there are very many input variables $(>>O(10^{6})$ for
109 large-scale oceanographic application), it quickly becomes
110 prohibitive to proceed directly as in (\ref{tangent_linear}),
111 if the impact of each component $ {\bf e_{i}} $ is to be assessed.
112
113 \subsection{Reverse or adjoint sensitivity}
114 %
115 Let us consider the special case of a
116 scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
117 the total meridional heat transport,
118 the total uptake of $CO_{2}$ in the Southern
119 Ocean over a time interval,
120 or a measure of some model-to-data misfit)
121 %
122 \begin{eqnarray}
123 \begin{array}{cccccc}
124 {\cal J} \, : & U &
125 \longrightarrow & V &
126 \longrightarrow & I \!\! R \\
127 ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
128 \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
129 \end{array}
130 \label{compo}
131 \end{eqnarray}
132 %
133 The linear approximation of $ {\cal J} $,
134 \[
135 {\cal J} \, \approx \, {\cal J}_0 \, + \, \delta {\cal J}
136 \]
137 can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
138 w.r.t. their corresponding inner product
139 $\left\langle \,\, , \,\, \right\rangle $
140 %
141 \begin{equation}
142 \begin{split}
143 {\cal J} & = \,
144 {\cal J} |_{\vec{u}^{(0)}} \, + \,
145 \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
146 \, + \, O(\delta \vec{u}^2) \\
147 ~ & = \,
148 {\cal J} |_{\vec{v}^{(0)}} \, + \,
149 \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
150 \, + \, O(\delta \vec{v}^2)
151 \end{split}
152 \label{deljidentity}
153 \end{equation}
154 %
155 (note, that the gradient $ \nabla f $ is a pseudo-vector, therefore
156 its transpose is required in the above inner product).
157 Then, using the representation of
158 $ \delta {\cal J} =
159 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
160 the definition
161 of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
162 \[
163 \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
164 \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
165 \]
166 which for finite-dimensional vector spaces is just the
167 transpose of $ A $,
168 \[
169 A^{\ast} \, = \, A^T
170 \]
171 and from eq. (\ref{tangent_linear}), we note that
172 (omitting $|$'s):
173 %
174 \begin{equation}
175 \delta {\cal J}
176 \, = \,
177 \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
178 \, = \,
179 \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
180 \, = \,
181 \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
182 \delta \vec{u} \, \right\rangle
183 \label{inner}
184 \end{equation}
185 %
186 With the identity (\ref{deljidentity}), we then find that
187 the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
188 invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
189 %
190 \begin{equation}
191 \begin{split}
192 \nabla _{u}{\cal J}^T |_{\vec{u}} &
193 = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
194 ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
195 ~ & = \, \delta \vec{u}^{\ast}
196 \end{split}
197 \label{adjoint}
198 \end{equation}
199 %
200 Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
201 in which $M^T$ is the adjoint (here, the transpose) of the
202 tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
203 the adjoint variable of the model state $ \vec{v} $, and
204 $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
205
206 The {\sf reverse} nature of the adjoint calculation can be readily
207 seen as follows. Let us decompose ${\cal J}(u)$, thus:
208 %
209 \begin{equation}
210 {\cal J}({\cal M}(\vec{u})) \, = \,
211 {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
212 ...... ( {\cal M}_{\lambda} (
213 ......
214 ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
215 \label{compos}
216 \end{equation}
217 %
218 where the ${\cal M}$'s could be the elementary steps, i.e. single lines
219 in the code of the model,
220 starting at step 0 and moving up to step $\Lambda$, with intermediate
221 ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
222 ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$
223 Then, according to the chain rule the forward calculation reads in
224 terms of the Jacobi matrices
225 (we've omitted the $ | $'s which, nevertheless are important
226 to the aspect of {\it tangent} linearity;
227 note also that per definition
228 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
229 = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
230 %
231 \begin{equation}
232 \begin{split}
233 \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
234 \nabla_v {\cal J} \cdot M_{\Lambda}
235 \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
236 M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
237 ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
238 \end{split}
239 \label{forward}
240 \end{equation}
241 %
242 whereas in reverse mode we have
243 %
244 \begin{equation}
245 \boxed{
246 \begin{split}
247 M^T ( \nabla_v {\cal J}^T) & = \,
248 M_{0}^T \cdot M_{1}^T
249 \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
250 M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
251 ~ & = \, M_{0}^T \cdot M_{1}^T
252 \cdot ...... \cdot
253 \nabla_{v^{(\lambda)}} {\cal J}^T \\
254 ~ & = \, \nabla_u {\cal J}^T
255 \end{split}
256 }
257 \label{reverse}
258 \end{equation}
259 %
260 clearly expressing the reverse nature of the calculation.
261 Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
262 The intermediate steps $\lambda$ in
263 eqn. (\ref{compos}) -- (\ref{reverse})
264 could represent the model state (forward or adjoint) at each
265 intermediate time step in which case
266 $ {\cal M}(\vec{v}^{(\lambda)}) = \vec{v}^{(\lambda+1)} $, and correspondingly,
267 $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) = \delta \vec{v}^{(\lambda-1) \, \ast} $,
268 but they can also be viewed more generally as
269 single lines of code in the numerical algorithm.
270 In both cases it becomes evident that the adjoint calculation
271 yields at the same time the adjoint of each model state component
272 $ \vec{v}^{(\lambda)} $ at each intermediate step $ l $, namely
273 %
274 \begin{equation}
275 \boxed{
276 \begin{split}
277 \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
278 & = \,
279 M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
280 M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
281 ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
282 \end{split}
283 }
284 \end{equation}
285 %
286 in close analogy to eq. (\ref{adjoint})
287 We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
288 are the Lagrange multipliers of the model state $ \vec{v}^{(\lambda)}$.
289
290 In coponents, eq. (\ref{adjoint}) reads as follows.
291 Let
292 \[
293 \begin{array}{rclcrcl}
294 \delta \vec{u} & = &
295 \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
296 \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
297 \left(
298 \frac{\partial {\cal J}}{\partial u_1},\ldots,
299 \frac{\partial {\cal J}}{\partial u_m}
300 \right)^T \\
301 \delta \vec{v} & = &
302 \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
303 \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
304 \left(
305 \frac{\partial {\cal J}}{\partial v_1},\ldots,
306 \frac{\partial {\cal J}}{\partial v_n}
307 \right)^T \\
308 \end{array}
309 \]
310 denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
311 and their adjoint varaiables;
312 further
313 \[
314 M \, = \, \left(
315 \begin{array}{ccc}
316 \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
317 \frac{\partial {\cal M}_1}{\partial u_m} \\
318 \vdots & ~ & \vdots \\
319 \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
320 \frac{\partial {\cal M}_n}{\partial u_m} \\
321 \end{array}
322 \right)
323 \]
324 is the Jacobi matrix of $ {\cal M} $
325 (an $ n \times m $ matrix)
326 such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
327 \[
328 \delta v_{j}
329 \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
330 \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
331 \delta u_{i}
332 \]
333 %
334 Then eq. (\ref{adjoint}) takes the form
335 \[
336 \delta u_{i}^{\ast}
337 \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
338 \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
339 \delta v_{j}^{\ast}
340 \]
341 %
342 or
343 %
344 \[
345 \left(
346 \begin{array}{c}
347 \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
348 \vdots \\
349 \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
350 \end{array}
351 \right)
352 \, = \,
353 \left(
354 \begin{array}{ccc}
355 \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
356 & \ldots &
357 \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
358 \vdots & ~ & \vdots \\
359 \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
360 & \ldots &
361 \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
362 \end{array}
363 \right)
364 \cdot
365 \left(
366 \begin{array}{c}
367 \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
368 \vdots \\
369 \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
370 \end{array}
371 \right)
372 \]
373 %
374 Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
375 of any intermediate state $ v^{(\lambda)} $
376 may be obtained, using the intermediate Jacobian
377 (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
378 %
379 \[
380 M_{\lambda} \, = \,
381 \left(
382 \begin{array}{ccc}
383 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
384 & \ldots &
385 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
386 \vdots & ~ & \vdots \\
387 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
388 & \ldots &
389 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
390 \end{array}
391 \right)
392 \]
393 %
394 and the shorthand notation for the adjoint variables
395 $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
396 {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
397 for intermediate components, yielding
398 \[
399 \footnotesize
400 \left(
401 \begin{array}{c}
402 \delta v^{(\lambda) \, \ast}_1 \\
403 \vdots \\
404 \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
405 \end{array}
406 \right)
407 \, = \,
408 \left(
409 \begin{array}{ccc}
410 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
411 & \ldots &
412 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
413 \vdots & ~ & \vdots \\
414 \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
415 & \ldots &
416 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
417 \end{array}
418 \right)
419 %
420 \cdot
421 %
422 \left(
423 \begin{array}{ccc}
424 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
425 & \ldots &
426 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
427 \vdots & ~ & \vdots \\
428 \vdots & ~ & \vdots \\
429 \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
430 & \ldots &
431 \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
432 \end{array}
433 \right)
434 \cdot \ldots \ldots \cdot
435 \left(
436 \begin{array}{c}
437 \delta v^{\ast}_1 \\
438 \vdots \\
439 \delta v^{\ast}_{n} \\
440 \end{array}
441 \right)
442 \]
443
444 Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
445 showing the advantage of the reverse over the forward mode
446 if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
447 cost function $ {\cal J} $ with respect to {\it all} input
448 variables $u$
449 (or the sensitivity of the cost function with respect to
450 {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
451 In order to be able to solve for each component of the gradient
452 $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
453 a forward calulation has to be performed for each component seperately,
454 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
455 for the $i$-th forward calculation.
456 Then, (\ref{forward}) represents the
457 projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
458 The full gradient is retrieved from the $ m $ forward calculations.
459 In contrast, eq. (\ref{reverse}) yields the full
460 gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
461 $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
462
463 Note, that in case $ {\cal J} $ is a vector-valued function
464 of dimension $ l > 1 $,
465 eq. (\ref{reverse}) has to be modified according to
466 \[
467 M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
468 \, = \,
469 \nabla_u {\cal J}^T \cdot \delta \vec{J}
470 \]
471 where now $ \delta \vec{J} \in I\!\!R $ is a vector of dimenison $ l $.
472 In this case $ l $ reverse simulations have to be performed
473 for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
474 Then, the reverse mode is more efficient as long as
475 $ l < n $, otherwise the forward mode is preferable.
476 Stricly, the reverse mode is called adjoint mode only for
477 $ l = 1 $.
478
479 A detailed analysis of the underlying numerical operations
480 shows that the computation of $\nabla _{u}{\cal J}$ in this way
481 requires about 2 to 5 times the computation of the cost function.
482 Alternatively, the gradient vector could be approximated
483 by finite differences, requiring $m$ computations
484 of the perturbed cost function.
485
486 To conclude we give two examples of commonly used types
487 of cost functions:
488
489 \paragraph{Example 1:
490 $ {\cal J} = v_{j} (T) $} ~ \\
491 The cost function consists of the $j$-th component of the model state
492 $ \vec{v} $ at time $T$.
493 Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
494 unit vector. The $ \nabla_u {\cal J}^T $
495 is the projection of the adjoint
496 operator onto the $j$-th component ${\bf f_{j}}$,
497 \[
498 \nabla_u {\cal J}^T
499 \, = \, M^T \cdot \nabla_v {\cal J}^T
500 \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
501 \]
502
503 \paragraph{Example 2:
504 $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
505 \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
506 The cost function represents the quadratic model vs.data misfit.
507 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
508 operator which maps the model state space onto the data space.
509 Then, $ \nabla_v {\cal J} $ takes the form
510 %
511 \begin{equation*}
512 \begin{split}
513 \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
514 \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
515 ~ & = \, 2 \sum_{j} \left\{ \sum_k
516 \frac{\partial {\cal H}_k}{\partial v_{j}}
517 \left( {\cal H}_k (\vec{v}) - d_k \right)
518 \right\} \, {\vec{f}_{j}} \\
519 \end{split}
520 \end{equation*}
521 %
522 where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
523 Jacobi matrix of the data projection operator.
524 Thus, the gradient $ \nabla_u {\cal J} $ is given by the
525 adjoint operator,
526 driven by the model vs. data misfit:
527 \[
528 \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
529 H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
530 \]
531
532 \subsection{Storing vs. recomputation in reverse mode}
533 \label{checkpointing}
534
535 We note an important aspect of the forward vs. reverse
536 mode calculation.
537 Because of the locality of the derivative,
538 the intermediate results of the model trajectory
539 $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
540 are needed to evaluate the intermediate Jacobian
541 $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
542 In the forward mode, the intermediate results are required
543 in the same order as computed by the full forward model ${\cal M}$,
544 in the reverse mode they are required in the reverse order.
545 Thus, in the reverse mode the trajectory of the forward model
546 integration ${\cal M}$ has to be stored to be available in the reverse
547 calculation. Alternatively, the model state would have to be
548 recomputed whenever its value is required.
549
550 A method to balance the amount of recomputations vs.
551 storage requirements is called {\sf checkpointing}
552 (e.g. \cite{res-eta:98}).
553 It is depicted in Fig. ... for a 3-level checkpointing
554 [as concrete example, we give explicit numbers for a 3-day
555 integration with a 1-hourly timestep in square brackets].
556 \begin{itemize}
557 %
558 \item [$lev3$]
559 In a first step, the model trajectory is subdivided into
560 $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
561 with the label $lev3$ for this outermost loop.
562 The model is then integrated over the full trajectory,
563 and the model state stored only at every $ k_{i}^{lev3} $-th timestep
564 [i.e. 3 times, at
565 $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
566 %
567 \item [$lev2$]
568 In a second step each subsection is itself divided into
569 $ {n}^{lev2} $ subsubsections
570 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
571 The model picks up at the last outermost dumped state
572 $ v_{k_{n}^{lev3}} $ and is integrated forward in time over
573 the last subsection, with the label $lev2$ for this
574 intermediate loop.
575 The model state is now stored only at every $ k_{i}^{lev2} $-th
576 timestep
577 [i.e. 4 times, at
578 $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
579 %
580 \item [$lev1$]
581 Finally, the mode picks up at the last intermediate dump state
582 $ v_{k_{n}^{lev2}} $ and is integrated forward in time over
583 the last subsubsection, with the label $lev1$ for this
584 intermediate loop.
585 Within this subsubsection only, the model state is stored
586 at every timestep
587 [i.e. every hour $ i=0,...,5$ corresponding to
588 $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
589 Thus, the final state $ v_n = v_{k_{n}^{lev1}} $ is reached
590 and the model state of all peceeding timesteps over the last
591 subsubsections are available, enabling integration backwards
592 in time over the last subsubsection.
593 Thus, the adjoint can be computed over this last
594 subsubsection $k_{n}^{lev2}$.
595 %
596 \end{itemize}
597 %
598 This procedure is repeated consecutively for each previous
599 subsubsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
600 carrying the adjoint computation to the initial time
601 of the subsection $k_{n}^{lev3}$.
602 Then, the procedure is repeated for the previous subsection
603 $k_{n-1}^{lev3}$
604 carrying the adjoint computation to the initial time
605 $k_{1}^{lev3}$.
606
607 For the full model trajectory of
608 $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
609 the required storing of the model state was significantly reduced to
610 $ n^{lev1} + n^{lev2} + n^{lev3} $
611 [i.e. for the 3-day integration with a total oof 72 timesteps
612 the model state was stored 13 times].
613 This saving in memory comes at a cost of a required
614 3 full forward integrations of the model (one for each
615 checkpointing level).
616 The balance of storage vs. recomputation certainly depends
617 on the computing resources available.
618
619 \begin{figure}[t!]
620 \centering
621 %\psdraft
622 \psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
623 \psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
624 \psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
625 \psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
626 \psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
627 \psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
628 \psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
629 \psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
630 \mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
631 %\psfull
632 \caption
633 {Schematic view of intermediate dump and restart for
634 3-level checkpointing.}
635 \label{fig:erswns}
636 \end{figure}
637
638 \subsection{Optimal perturbations}
639 \label{optpert}
640
641
642 \subsection{Error covariance estimate and Hessian matrix}
643 \label{sec_hessian}
644
645 \newpage
646
647 %**********************************************************************
648 \section{AD-specific setup by example: sensitivity of carbon sequestration}
649 \label{sec_ad_setup_ex}
650 %**********************************************************************
651
652 The MITGCM has been adapted to enable AD using TAMC or TAF
653 (we'll refer to TAMC and TAF interchangeably, except where
654 distinctions are explicitly mentioned).
655 The present description, therefore, is specific to the
656 use of TAMC as AD tool.
657 The following sections describe the steps which are necessary to
658 generate a tangent linear or adjoint model of the MITGCM.
659 We take as an example the sensitivity of carbon sequestration
660 in the ocean.
661 The AD-relevant hooks in the code are sketched in
662 \reffig{adthemodel}, \reffig{adthemain}.
663
664 \subsection{Overview of the experiment}
665
666 We describe an adjoint sensitivity analysis of outgassing from
667 the ocean into the atmosphere of a carbon like tracer injected
668 into the ocean interior (see \cite{hil-eta:01}).
669
670 \subsubsection{Passive tracer equation}
671
672 For this work the MITGCM was augmented with a thermodynamically
673 inactive tracer, $C$. Tracer residing in the ocean
674 model surface layer is outgassed according to a relaxation time scale,
675 $\mu$. Within the ocean interior, the tracer is passively advected
676 by the ocean model currents. The full equation for the time evolution
677 %
678 \begin{equation}
679 \label{carbon_ddt}
680 \frac{\partial C}{\partial t} \, = \,
681 -U\cdot \nabla C \, - \, \mu C \, + \, \Gamma(C) \,+ \, S
682 \end{equation}
683 %
684 also includes a source term $S$. This term
685 represents interior sources of $C$ such as would arise due to
686 direct injection.
687 The velocity term, $U$, is the sum of the
688 model Eulerian circulation and an eddy-induced velocity, the latter
689 parameterized according to Gent/McWilliams (\cite{gen:90, dan:95}).
690 The convection function, $\Gamma$, mixes $C$ vertically wherever the
691 fluid is locally statically unstable.
692
693 The outgassing time scale, $\mu$, in eqn. (\ref{carbon_ddt})
694 is set so that \( 1/\mu \sim 1 \ \mathrm{year} \) for the surface
695 ocean and $\mu=0$ elsewhere. With this value, eqn. (\ref{carbon_ddt})
696 is valid as a prognostic equation for small perturbations in oceanic
697 carbon concentrations. This configuration provides a
698 powerful tool for examining the impact of large-scale ocean circulation
699 on $ CO_2 $ outgassing due to interior injections.
700 As source we choose a constant in time injection of
701 $ S = 1 \,\, {\rm mol / s}$.
702
703 \subsubsection{Model configuration}
704
705 The model configuration employed has a constant
706 $4^\circ \times 4^\circ$ resolution horizontal grid and realistic
707 geography and bathymetry. Twenty vertical layers are used with
708 vertical spacing ranging
709 from 50 m near the surface to 815 m at depth.
710 Driven to steady-state by climatalogical wind-stress, heat and
711 fresh-water forcing the model reproduces well known large-scale
712 features of the ocean general circulation.
713
714 \subsubsection{Outgassing cost function}
715
716 To quantify and understand outgassing due to injections of $C$
717 in eqn. (\ref{carbon_ddt}),
718 we define a cost function $ {\cal J} $ that measures the total amount of
719 tracer outgassed at each timestep:
720 %
721 \begin{equation}
722 \label{cost_tracer}
723 {\cal J}(t=T)=\int_{t=0}^{t=T}\int_{A} \mu C \, dA \, dt
724 \end{equation}
725 %
726 Equation(\ref{cost_tracer}) integrates the outgassing term, $\mu C$,
727 from (\ref{carbon_ddt})
728 over the entire ocean surface area, $A$, and accumulates it
729 up to time $T$.
730 Physically, ${\cal J}$ can be thought of as representing the amount of
731 $CO_2$ that our model predicts would be outgassed following an
732 injection at rate $S$.
733 The sensitivity of ${\cal J}$ to the spatial location of $S$,
734 $\frac{\partial {\cal J}}{\partial S}$,
735 can be used to identify regions from which circulation
736 would cause $CO_2$ to rapidly outgas following injection
737 and regions in which $CO_2$ injections would remain effectively
738 sequesterd within the ocean.
739
740 \subsection{Code configuration}
741
742 The model configuration for this experiment resides under the
743 directory {\it verification/carbon/}.
744 The code customisation routines are in {\it verification/carbon/code/}:
745 %
746 \begin{itemize}
747 %
748 \item {\it .genmakerc}
749 %
750 \item {\it COST\_CPPOPTIONS.h}
751 %
752 \item {\it CPP\_EEOPTIONS.h}
753 %
754 \item {\it CPP\_OPTIONS.h}
755 %
756 \item {\it CTRL\_OPTIONS.h}
757 %
758 \item {\it ECCO\_OPTIONS.h}
759 %
760 \item {\it SIZE.h}
761 %
762 \item {\it adcommon.h}
763 %
764 \item {\it tamc.h}
765 %
766 \end{itemize}
767 %
768 The runtime flag and parameters settings are contained in
769 {\it verification/carbon/input/},
770 together with the forcing fields and and restart files:
771 %
772 \begin{itemize}
773 %
774 \item {\it data}
775 %
776 \item {\it data.cost}
777 %
778 \item {\it data.ctrl}
779 %
780 \item {\it data.pkg}
781 %
782 \item {\it eedata}
783 %
784 \item {\it topog.bin}
785 %
786 \item {\it windx.bin, windy.bin}
787 %
788 \item {\it salt.bin, theta.bin}
789 %
790 \item {\it SSS.bin, SST.bin}
791 %
792 \item {\it pickup*}
793 %
794 \end{itemize}
795 %
796 Finally, the file to generate the adjoint code resides in
797 $ adjoint/ $:
798 %
799 \begin{itemize}
800 %
801 \item {\it makefile}
802 %
803 \end{itemize}
804 %
805
806 Below we describe the customisations of this files which are
807 specific to this experiment.
808
809 \subsubsection{File {\it .genmakerc}}
810 This file overwites default settings of {\it genmake}.
811 In the present example it is used to switch on the following
812 packages which are related to automatic differentiation
813 and are disabled by default: \\
814 \hspace*{4ex} {\tt set ENABLE=( autodiff cost ctrl ecco )} \\
815 Other packages which are not needed are switched off: \\
816 \hspace*{4ex} {\tt set DISABLE=( aim obcs zonal\_filt shap\_filt cal exf )}
817
818 \subsubsection{File {\it COST\_CPPOPTIONS.h, CTRL\_OPTIONS.h}}
819
820 These files used to contain package-specific CPP-options
821 (see Section \ref{???}).
822 For technical reasons those options have been grouped together
823 in the file {\it ECCO\_OPTIONS.h}.
824 To retain the modularity, the files have been kept and contain
825 the standard include of the {\it CPP\_OPTIONS.h} file.
826
827 \subsubsection{File {\it CPP\_EEOPTIONS.h}}
828
829 This file contains 'wrapper'-specific CPP options.
830 It only needs to be changed if the code is to be run
831 in parallel environment (see Section \ref{???}).
832
833 \subsubsection{File {\it CPP\_OPTIONS.h}}
834
835 This file contains model-specific CPP options
836 (see Section \ref{???}).
837 Most options are related to the forward model setup.
838 They are identical to the global steady circulation setup of
839 {\it verification/exp2/}.
840 The option specific to this experiment is \\
841 \hspace*{4ex} {\tt \#define ALLOW\_MIT\_ADJOINT\_RUN} \\
842 This flag enables the inclusion of some AD-related fields
843 concerning initialisation, link between control variables
844 and forward model variables, and the call to the top-level
845 forward/adjoint subroutine {\it adthe\_main\_loop}
846 instead of {\it the\_main\_loop}.
847
848 \subsubsection{File {\it ECCO\_OPTIONS.h}}
849
850 The CPP options of several AD-related packages are grouped
851 in this file:
852 %
853 \begin{itemize}
854 %
855 \item
856 Adjoint support package: {\it pkg/autodiff/} \\
857 This package contains hand-written adjoint code such as
858 active file handling, flow directives for files which must not
859 be differentiated, and TAMC-specific header files. \\
860 \hspace*{4ex} {\tt \#define ALLOW\_AUTODIFF\_TAMC} \\
861 defines TAMC-related features in the code. \\
862 \hspace*{4ex} {\tt \#define ALLOW\_TAMC\_CHECKPOINTING} \\
863 enables the checkpointing feature of TAMC
864 (see Section \ref{???}).
865 In the present example a 3-level checkpointing is implemented.
866 The code contains the relevant store directives, common block
867 and tape initialisations, storing key computation,
868 and loop index handling.
869 The checkpointing length at each level is defined in
870 file {\it tamc.h}, cf. below.
871 %
872 \item Cost function package: {\it pkg/cost/} \\
873 This package contains all relevant routines for
874 initialising, accumulating and finalizing the cost function
875 (see Section \ref{???}). \\
876 \hspace*{4ex} {\tt \#define ALLOW\_COST} \\
877 enables all general aspects of the cost function handling,
878 in particular the hooks in the foorward code for
879 initialising, accumulating and finalizing the cost function. \\
880 \hspace*{4ex} {\tt \#define ALLOW\_COST\_TRACER} \\
881 includes the subroutine with the cost function for this
882 particular experiment, eqn. (\ref{cost_tracer}).
883 %
884 \item Control variable package: {\it pkg/ctrl/} \\
885 This package contains all relevant routines for
886 the handling of the control vector.
887 Each control variable can be enabled/disabled with its own flag: \\
888 \begin{tabular}{ll}
889 \hspace*{2ex} {\tt \#define ALLOW\_THETA0\_CONTROL} &
890 initial temperature \\
891 \hspace*{2ex} {\tt \#define ALLOW\_SALT0\_CONTROL} &
892 initial salinity \\
893 \hspace*{2ex} {\tt \#define ALLOW\_TR0\_CONTROL} &
894 initial passive tracer concentration \\
895 \hspace*{2ex} {\tt \#define ALLOW\_TAUU0\_CONTROL} &
896 zonal wind stress \\
897 \hspace*{2ex} {\tt \#define ALLOW\_TAUV0\_CONTROL} &
898 meridional wind stress \\
899 \hspace*{2ex} {\tt \#define ALLOW\_SFLUX0\_CONTROL} &
900 freshwater flux \\
901 \hspace*{2ex} {\tt \#define ALLOW\_HFLUX0\_CONTROL} &
902 heat flux \\
903 \hspace*{2ex} {\tt \#undef ALLOW\_DIFFKR\_CONTROL} &
904 diapycnal diffusivity \\
905 \hspace*{2ex} {\tt \#undef ALLOW\_KAPPAGM\_CONTROL} &
906 isopycnal diffusivity \\
907 \end{tabular}
908 %
909 \end{itemize}
910
911 \subsubsection{File {\it SIZE.h}}
912
913 The file contains the grid point dimensions of the forward
914 model. It is identical to the {\it verification/exp2/}: \\
915 \hspace*{4ex} {\tt sNx = 90} \\
916 \hspace*{4ex} {\tt sNy = 40} \\
917 \hspace*{4ex} {\tt Nr = 20} \\
918 It correpsponds to a single-tile/single-processor setup:
919 {\tt nSx = nSy = 1, nPx = nPy = 1},
920 with standard overlap dimensioning
921 {\tt OLx = OLy = 3}.
922
923 \subsubsection{File {\it adcommon.h}}
924
925 This file contains common blocks of some adjoint variables
926 that are generated by TAMC.
927 The common blocks are used by the adjoint support routine
928 {\it addummy\_in\_stepping} which needs to access those variables:
929
930 \begin{tabular}{ll}
931 \hspace*{4ex} {\tt common /addynvars\_r/} &
932 \hspace*{4ex} is related to {\it DYNVARS.h} \\
933 \hspace*{4ex} {\tt common /addynvars\_cd/} &
934 \hspace*{4ex} is related to {\it DYNVARS.h} \\
935 \hspace*{4ex} {\tt common /adtr1\_r/} &
936 \hspace*{4ex} is related to {\it TR1.h} \\
937 \hspace*{4ex} {\tt common /adffields/} &
938 \hspace*{4ex} is related to {\it FFIELDS.h}\\
939 \end{tabular}
940
941 Note that if the structure of the common block changes in the
942 above header files of the forward code, the structure
943 of the adjoint common blocks will change accordingly.
944 Thus, it has to be made sure that the structure of the
945 adjoint common block in the hand-written file {\it adcommon.h}
946 complies with the automatically generated adjoint common blocks
947 in {\it adjoint\_model.F}.
948
949 \subsubsection{File {\it tamc.h}}
950
951 This routine contains the dimensions for TAMC checkpointing.
952 %
953 \begin{itemize}
954 %
955 \item {\tt \#ifdef ALLOW\_TAMC\_CHECKPOINTING} \\
956 3-level checkpointing is enabled, i.e. the timestepping
957 is divided into three different levels (see Section \ref{???}).
958 The model state of the outermost ({\tt nchklev\_3}) and the
959 itermediate ({\tt nchklev\_2}) timestepping loop are stored to file
960 (handled in {\it the\_main\_loop}).
961 The innermost loop ({\tt nchklev\_1})
962 avoids I/O by storing all required variables
963 to common blocks. This storing may also be necessary if
964 no checkpointing is chosen
965 (nonlinear functions, if-statements, iterative loops, ...).
966 In the present example the dimensions are chosen as follows: \\
967 \hspace*{4ex} {\tt nchklev\_1 = 36 } \\
968 \hspace*{4ex} {\tt nchklev\_2 = 30 } \\
969 \hspace*{4ex} {\tt nchklev\_3 = 60 } \\
970 To guarantee that the checkpointing intervals span the entire
971 integration period the relation \\
972 \hspace*{4ex} {\tt nchklev\_1*nchklev\_2*nchklev\_3 $ \ge $ nTimeSteps} \\
973 where {\tt nTimeSteps} is either specified in {\it data}
974 or computed via \\
975 \hspace*{4ex} {\tt nTimeSteps = (endTime-startTime)/deltaTClock }.
976 %
977 \item {\tt \#undef ALLOW\_TAMC\_CHECKPOINTING} \\
978 No checkpointing is enabled.
979 In this case the relevant counter is {\tt nchklev\_0}.
980 Similar to above, the following relation has to be satisfied \\
981 \hspace*{4ex} {\tt nchklev\_0 $ \ge $ nTimeSteps}.
982 %
983 \end{itemize}
984
985 \subsubsection{File {\it makefile}}
986
987 This file contains all relevant paramter flags and
988 lists to run TAMC.
989 It is assumed that TAMC is available to you, either locally,
990 being installed on your network, or remotely through the 'TAMC Utility'.
991 TAMC is called with the command {\tt tamc} followed by a
992 number of options. They are described in detail in the
993 TAMC manual \cite{gie:99}.
994 Here we briefly discuss the main flags used in the {\it makefile}
995 %
996 \begin{itemize}
997 \item [{\tt tamc}] {\tt
998 -input <variable names>
999 -output <variable name> ... \\
1000 -toplevel <S/R name> -reverse <file names>
1001 }
1002 \end{itemize}
1003 %
1004 \begin{itemize}
1005 %
1006 \item {\tt -toplevel <S/R name>} \\
1007 Name of the toplevel routine, with respect to which the
1008 control flow analysis is performed.
1009 %
1010 \item {\tt -input <variable names>} \\
1011 List of independent variables $ u $ with respect to which the
1012 dependent variable $ J $ is differentiated.
1013 %
1014 \item {\tt -output <variable name>} \\
1015 Dependent variable $ J $ which is to be differentiated.
1016 %
1017 \item {\tt -reverse <file names>} \\
1018 Adjoint code is generated to compute the sensitivity of an
1019 independent variable w.r.t. many dependent variables.
1020 The generated adjoint top-level routine computes the product
1021 of the transposed Jacobian matrix $ M^T $ times
1022 the gradient vector $ \nabla_v J $.
1023 \\
1024 {\tt <file names>} refers to the list of files {\it .f} which are to be
1025 analyzed by TAMC. This list is generally smaller than the full list
1026 of code to be compiled. The files not contained are either
1027 above the top-level routine (some initialisations), or are
1028 deliberately hidden from TAMC, either because hand-written
1029 adjoint routines exist, or the routines must not (or don't have to)
1030 be differentiated. For each routine which is part of the flow tree
1031 of the top-level routine, but deliberately hidden from TAMC,
1032 a corresponding file {\it .flow} exists containing flow directives
1033 for TAMC.
1034 %
1035 \end{itemize}
1036
1037
1038 \subsubsection{File {\it data}}
1039
1040 \subsubsection{File {\it data.cost}}
1041
1042 \subsubsection{File {\it data.ctrl}}
1043
1044 \subsubsection{File {\it data.pkg}}
1045
1046 \subsubsection{File {\it eedata}}
1047
1048 \subsubsection{File {\it topog.bin}}
1049
1050 \subsubsection{File {\it windx.bin, windy.bin}}
1051
1052 \subsubsection{File {\it salt.bin, theta.bin}}
1053
1054 \subsubsection{File {\it SSS.bin, SST.bin}}
1055
1056 \subsubsection{File {\it pickup*}}
1057
1058 \subsection{Compiling the model and its adjoint}
1059
1060 \newpage
1061
1062 %**********************************************************************
1063 \section{TLM and ADM code generation in general}
1064 \label{sec_ad_setup_gen}
1065 %**********************************************************************
1066
1067 In this section we describe in a general fashion
1068 the parts of the code that are relevant for automatic
1069 differentiation using the software tool TAMC.
1070
1071 \subsection{The cost function (dependent variable)}
1072
1073 The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
1074 It is a function of the input variables $ \vec{u} $ via the composition
1075 $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
1076 The input is referred to as the
1077 {\sf independent variables} or {\sf control variables}.
1078 All aspects relevant to the treatment of the cost function $ {\cal J} $
1079 (parameter setting, initialisation, incrementation,
1080 final evaluation), are controled by the package {\it pkg/cost}.
1081
1082 \subsubsection{genmake and CPP options}
1083 %
1084 \begin{itemize}
1085 %
1086 \item
1087 \fbox{
1088 \begin{minipage}{12cm}
1089 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1090 \end{minipage}
1091 }
1092 \end{itemize}
1093 %
1094 The directory {\it pkg/cost} can be included to the
1095 compile list in 3 different ways (cf. Section \ref{???}):
1096 %
1097 \begin{enumerate}
1098 %
1099 \item {\it genmake}: \\
1100 Change the default settngs in the file {\it genmake} by adding
1101 {\bf cost} to the {\bf enable} list (not recommended).
1102 %
1103 \item {\it .genmakerc}: \\
1104 Customize the settings of {\bf enable}, {\bf disable} which are
1105 appropriate for your experiment in the file {\it .genmakerc}
1106 and add the file to your compile directory.
1107 %
1108 \item genmake-options: \\
1109 Call {\it genmake} with the option
1110 {\tt genmake -enable=cost}.
1111 %
1112 \end{enumerate}
1113 Since the cost function is usually used in conjunction with
1114 automatic differentiation, the CPP option
1115 {\bf ALLOW\_ADJOINT\_RUN} should be defined
1116 (file {\it CPP\_OPTIONS.h}).
1117 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
1118 Each specific cost function contribution has its own option.
1119 For the present example the option is {\bf ALLOW\_COST\_TRACER}.
1120 All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
1121
1122 \subsubsection{Initialisation}
1123 %
1124 The initialisation of the {\it cost} package is readily enabled
1125 as soon as the CPP option {\bf ALLOW\_ADJOINT\_RUN} is defined.
1126 %
1127 \begin{itemize}
1128 %
1129 \item
1130 \fbox{
1131 \begin{minipage}{12cm}
1132 Parameters: {\it cost\_readparms}
1133 \end{minipage}
1134 }
1135 \\
1136 This S/R
1137 reads runtime flags and parameters from file {\it data.cost}.
1138 For the present example the only relevant parameter read
1139 is {\bf mult\_tracer}. This multiplier enables different
1140 cost function contributions to be switched on
1141 ( = 1.) or off ( = 0.) at runtime.
1142 For more complex cost functions which involve model vs. data
1143 misfits, the corresponding data filenames and data
1144 specifications (start date and time, period, ...) are read
1145 in this S/R.
1146 %
1147 \item
1148 \fbox{
1149 \begin{minipage}{12cm}
1150 Variables: {\it cost\_init}
1151 \end{minipage}
1152 }
1153 \\
1154 This S/R
1155 initialises the different cost function contributions.
1156 The contribtion for the present example is {\bf objf\_tracer}
1157 which is defined on each tile (bi,bj).
1158 %
1159 \end{itemize}
1160 %
1161 \subsubsection{Incrementation}
1162 %
1163 \begin{itemize}
1164 %
1165 \item
1166 \fbox{
1167 \begin{minipage}{12cm}
1168 {\it cost\_tile}, {\it cost\_tracer}
1169 \end{minipage}
1170 }
1171 \end{itemize}
1172 %
1173 The 'driver' routine
1174 {\it cost\_tile} is called at the end of each time step.
1175 Within this 'driver' routine, S/R are called for each of
1176 the chosen cost function contributions.
1177 In the present example ({\bf ALLOW\_COST\_TRACER}),
1178 S/R {\it cost\_tracer} is called.
1179 It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
1180 %
1181 \subsubsection{Finalize all contributions}
1182 %
1183 \begin{itemize}
1184 %
1185 \item
1186 \fbox{
1187 \begin{minipage}{12cm}
1188 {\it cost\_final}
1189 \end{minipage}
1190 }
1191 \end{itemize}
1192 %
1193 At the end of the forward integration S/R {\it cost\_final}
1194 is called. It accumulates the total cost function {\bf fc}
1195 from each contribution and sums over all tiles:
1196 \begin{equation}
1197 {\cal J} \, = \,
1198 {\rm fc} \, = \,
1199 {\rm mult\_tracer} \sum_{bi,\,bj}^{nSx,\,nSy}
1200 {\rm objf\_tracer}(bi,bj) \, + \, ...
1201 \end{equation}
1202 %
1203 The total cost function {\bf fc} will be the
1204 'dependent' variable in the argument list for TAMC, i.e.
1205 \begin{verbatim}
1206 tamc -output 'fc' ...
1207 \end{verbatim}
1208
1209 \begin{figure}[t!]
1210 \input{part5/doc_ad_the_model}
1211 \label{fig:adthemodel}
1212 \caption{~}
1213 \end{figure}
1214
1215 \begin{figure}
1216 \input{part5/doc_ad_the_main}
1217 \label{fig:adthemain}
1218 \caption{~}
1219 \end{figure}
1220
1221 \subsection{The control variables (independent variables)}
1222
1223 The control variables are a subset of the model input
1224 (initial conditions, boundary conditions, model parameters).
1225 Here we identify them with the variable $ \vec{u} $.
1226 All intermediate variables whose derivative w.r.t. control
1227 variables don't vanish are called {\sf active variables}.
1228 All subroutines whose derivative w.r.t. the control variables
1229 don't vanish are called {\sf active routines}.
1230 Read and write operations from and to file can be viewed
1231 as variable assignments. Therefore, files to which
1232 active variables are written and from which active variables
1233 are read are called {\sf active files}.
1234 All aspects relevant to the treatment of the control variables
1235 (parameter setting, initialisation, perturbation)
1236 are controled by the package {\it pkg/ctrl}.
1237
1238 \subsubsection{genmake and CPP options}
1239 %
1240 \begin{itemize}
1241 %
1242 \item
1243 \fbox{
1244 \begin{minipage}{12cm}
1245 {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
1246 \end{minipage}
1247 }
1248 \end{itemize}
1249 %
1250 To enable the directory to be included to the compile list,
1251 {\bf ctrl} has to be added to the {\bf enable} list in
1252 {\it .genmakerc} (or {\it genmake} itself).
1253 Each control variable is enabled via its own CPP option
1254 in {\it ECCO\_CPPOPTIONS.h}.
1255
1256 \subsubsection{Initialisation}
1257 %
1258 \begin{itemize}
1259 %
1260 \item
1261 \fbox{
1262 \begin{minipage}{12cm}
1263 Parameters: {\it ctrl\_readparms}
1264 \end{minipage}
1265 }
1266 \\
1267 %
1268 This S/R
1269 reads runtime flags and parameters from file {\it data.ctrl}.
1270 For the present example the file contains the file names
1271 of each control variable that is used.
1272 In addition, the number of wet points for each control
1273 variable and the net dimension of the space of control
1274 variables (counting wet points only) {\bf nvarlength}
1275 is determined.
1276 Masks for wet points for each tile {\bf (bi,\,bj)}
1277 and vertical layer {\bf k} are generated for the three
1278 relevant categories on the C-grid:
1279 {\bf nWetCtile} for tracer fields,
1280 {\bf nWetWtile} for zonal velocity fields,
1281 {\bf nWetStile} for meridional velocity fields.
1282 %
1283 \item
1284 \fbox{
1285 \begin{minipage}{12cm}
1286 Control variables, control vector,
1287 and their gradients: {\it ctrl\_unpack}
1288 \end{minipage}
1289 }
1290 \\
1291 %
1292 Two important issues related to the handling of the control
1293 variables in the MITGCM need to be addressed.
1294 First, in order to save memory, the control variable arrays
1295 are not kept in memory, but rather read from file and added
1296 to the initial (or first guess) fields.
1297 Similarly, the corresponding adjoint fields which represent
1298 the gradient of the cost function w.r.t. the control variables
1299 are written to to file.
1300 Second, in addition to the files holding the 2-dim. and 3-dim.
1301 control variables and the gradient, a 1-dim. {\sf control vector}
1302 and {\sf gradient vector} are written to file. They contain
1303 only the wet points of the control variables and the corresponding
1304 gradient.
1305 This leads to a significant data compression.
1306 Furthermore, the control and the gradient vector can be passed to a
1307 minimization routine if an update of the control variables
1308 is sought as part of a minimization exercise.
1309
1310 The files holding fields and vectors of the control variables
1311 and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
1312 %
1313 \end{itemize}
1314
1315 \subsubsection{Perturbation of the independent variables}
1316 %
1317 The dependency chain for differentiation starts
1318 with adding a perturbation onto the the input variable,
1319 thus defining the independent or control variables for TAMC.
1320 Three classes of controls may be considered:
1321 %
1322 \begin{itemize}
1323 %
1324 \item
1325 \fbox{
1326 \begin{minipage}{12cm}
1327 {\it ctrl\_map\_ini} (initial value sensitivity):
1328 \end{minipage}
1329 }
1330 \\
1331 %
1332 Consider as an example the initial tracer distribution
1333 {\bf tr1} as control variable.
1334 After {\bf tr1} has been initialised in
1335 {\it ini\_tr1} (dynamical variables including
1336 temperature and salinity are initialised in {\it ini\_fields}),
1337 a perturbation anomaly is added to the field in S/R
1338 {\it ctrl\_map\_ini}
1339 %
1340 \begin{equation}
1341 \begin{split}
1342 u & = \, u_{[0]} \, + \, \Delta u \\
1343 {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
1344 \label{perturb}
1345 \end{split}
1346 \end{equation}
1347 %
1348 In principle {\bf xx\_tr1} is a 3-dim. global array
1349 holding the perturbation. In the case of a simple
1350 sensitivity study this array is identical to zero.
1351 However, it's specification is essential since TAMC
1352 treats the corresponding line in the code symbolically
1353 when determining the differentiation chain and its origin.
1354 Thus, the variable names are part of the argument list
1355 when calling TAMC:
1356 %
1357 \begin{verbatim}
1358 tamc -input 'xx_tr1 ...' ...
1359 \end{verbatim}
1360 %
1361 Now, as mentioned above, the MITGCM avoids maintaining
1362 an array for each control variable by reading the
1363 perturbation to a temporary array from file.
1364 To ensure the symbolic link to be recognized by TAMC, a scalar
1365 dummy variable {\bf xx\_tr1\_dummy} is introduced
1366 and an 'active read' routine of the adjoint support
1367 package {\it pkg/autodiff} is invoked.
1368 The read-procedure is tagged with the variable
1369 {\bf xx\_tr1\_dummy} enabbling TAMC to recognize the
1370 initialisation of the perturbation.
1371 The modified call of TAMC thus reads
1372 %
1373 \begin{verbatim}
1374 tamc -input 'xx_tr1_dummy ...' ...
1375 \end{verbatim}
1376 %
1377 and the modified operation to (\ref{perturb})
1378 in the code takes on the form
1379 %
1380 \begin{verbatim}
1381 call active_read_xyz(
1382 & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1383
1384 tr1(...) = tr1(...) + tmpfld3d(...)
1385 \end{verbatim}
1386 %
1387 Note, that reading an active variable corresponds
1388 to a variable assignment. Its derivative corresponds
1389 to a write statement of the adjoint variable.
1390 The 'active file' routines have been designed
1391 to support active read and corresponding active write
1392 operations.
1393 %
1394 \item
1395 \fbox{
1396 \begin{minipage}{12cm}
1397 {\it ctrl\_map\_forcing} (boundary value sensitivity):
1398 \end{minipage}
1399 }
1400 \\
1401 %
1402 The handling of boundary values as control variables
1403 proceeds exactly analogous to the initial values
1404 with the symbolic perturbation taking place in S/R
1405 {\it ctrl\_map\_forcing}.
1406 Note however an important difference:
1407 Since the boundary values are time dependent with a new
1408 forcing field applied at each time steps,
1409 the general problem may be be thought of as
1410 a new control variable at each time step, i.e.
1411 \[
1412 u_{\rm forcing} \, = \,
1413 \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1414 n \, = \, 1, \ldots , {\rm nTimeSteps} }
1415 \]
1416 %
1417 In the current example an equilibrium state is considered,
1418 and only an initial perturbation to
1419 surface forcing is applied with respect to the
1420 equilibrium state.
1421 A time dependent treatment of the surface forcing is
1422 implemented in the ECCO environment, involving the
1423 calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1424 %
1425 \item
1426 \fbox{
1427 \begin{minipage}{12cm}
1428 {\it ctrl\_map\_params} (parameter sensitivity):
1429 \end{minipage}
1430 }
1431 \\
1432 %
1433 This routine is not yet implemented, but would proceed
1434 proceed along the same lines as the initial value sensitivity.
1435 %
1436 \end{itemize}
1437 %
1438
1439 \subsubsection{Output of adjoint variables and gradient}
1440 %
1441 Two ways exist to generate output of adjoint fields.
1442 %
1443 \begin{itemize}
1444 %
1445 \item
1446 \fbox{
1447 \begin{minipage}{12cm}
1448 {\it ctrl\_pack}:
1449 \end{minipage}
1450 }
1451 \\
1452 At the end of the forward/adjoint integration, the S/R
1453 {\it ctrl\_pack} is called which mirrors S/R {\it ctrl\_unpack}.
1454 It writes the following files:
1455 %
1456 \begin{itemize}
1457 %
1458 \item {\bf xx\_...}: the control variable fields
1459 %
1460 \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1461 $ \nabla _{u}{\cal J} $ for each control variable,
1462 %
1463 \item {\bf vector\_ctrl}: the control vector
1464 %
1465 \item {\bf vector\_grad}: the gradient vector
1466 %
1467 \end{itemize}
1468 %
1469 \item
1470 \fbox{
1471 \begin{minipage}{12cm}
1472 {\it addummy\_in\_stepping}:
1473 \end{minipage}
1474 }
1475 \\
1476 In addition to writing the gradient at the end of the
1477 forward/adjoint integration, many more adjoint variables,
1478 representing the Lagrange multipliers of the model state
1479 w.r.t. the model state
1480 at different times can be written using S/R
1481 {\it addummy\_in\_stepping}.
1482 This routine is part of the adjoint support package
1483 {\it pkg/autodiff} (cf.f. below).
1484 To be part of the adjoint code, the corresponding S/R
1485 {\it dummy\_in\_stepping} has to be called in the forward
1486 model (S/R {\it the\_main\_loop}) at the appropriate place.
1487
1488 {\it dummy\_in\_stepping} is essentially empty,
1489 the corresponding adjoint routine is hand-written rather
1490 than generated automatically.
1491 Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1492 ensure that TAMC does not automatically
1493 generate {\it addummy\_in\_stepping} by trying to differentiate
1494 {\it dummy\_in\_stepping}, but rather takes the hand-written routine.
1495
1496 {\it dummy\_in\_stepping} is called in the forward code
1497 at the beginning of each
1498 timestep, before the call to {\it dynamics}, thus ensuring
1499 that {\it addummy\_in\_stepping} is called at the end of
1500 each timestep in the adjoint calculation, after the call to
1501 {\it addynamics}.
1502
1503 {\it addummy\_in\_stepping} includes the header files
1504 {\it adffields.h, addynamics.h, adtr1.h}.
1505 These header files are also hand-written. They contain
1506 the common blocks {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1507 {\bf /adtr1\_r/}, {\bf /adffields/},
1508 which have been extracted from the adjoint code to enable
1509 access to the adjoint variables.
1510 %
1511 \end{itemize}
1512
1513
1514 \subsubsection{Control variable handling for
1515 optimization applications}
1516
1517 In optimization mode the cost function $ {\cal J}(u) $ is sought
1518 to be minimized with respect to a set of control variables
1519 $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1520 The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1521 with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1522 at iteration step $ k $ serve
1523 as input to a minimization routine (e.g. quasi-Newton method,
1524 conjugate gradient, ...) to compute an update in the
1525 control variable for iteration step $k+1$
1526 \[
1527 u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1528 \quad \mbox{satisfying} \quad
1529 {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1530 \]
1531 $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1532 to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1533 $ k+1 $.
1534 Tab. \ref{???} sketches the flow between forward/adjoint model
1535 and the minimization routine.
1536
1537 \begin{eqnarray*}
1538 \footnotesize
1539 \begin{array}{ccccc}
1540 u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1541 {\Big\downarrow}
1542 & ~ & ~ & ~ & ~ \\
1543 ~ & ~ & ~ & ~ & ~ \\
1544 \hline
1545 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1546 \multicolumn{1}{|c}{
1547 u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1548 \stackrel{\bf forward}{\bf \longrightarrow} &
1549 v_{[k]} = M \left( u_{[k]} \right) &
1550 \stackrel{\bf forward}{\bf \longrightarrow} &
1551 \multicolumn{1}{c|}{
1552 {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1553 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1554 \hline
1555 \hline
1556 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1557 \multicolumn{1}{|c}{
1558 \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1559 T\!\!^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1560 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1561 ad \, v_{[k]} (\delta {\cal J}) =
1562 \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1563 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1564 \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1565 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1566 \hline
1567 ~ & ~ & ~ & ~ & ~ \\
1568 ~ & ~ &
1569 {\cal J}_{[k]} \qquad {\Bigg\downarrow} \qquad \nabla_u {\cal J}_{[k]}
1570 & ~ & ~ \\
1571 ~ & ~ & ~ & ~ & ~ \\
1572 \hline
1573 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1574 \multicolumn{1}{|c}{
1575 {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1576 {\mathbf \longrightarrow} & \text{\bf minimisation} &
1577 {\mathbf \longrightarrow} &
1578 \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1579 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1580 \hline
1581 ~ & ~ & ~ & ~ & ~ \\
1582 ~ & ~ & ~ & ~ & \Big\downarrow \\
1583 ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1584 \end{array}
1585 \end{eqnarray*}
1586
1587 The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1588 the link between the model and the minimization routine.
1589 As described in Section \ref{???}
1590 the {\it unpack} and {\it pack} routines read and write
1591 control and gradient {\it vectors} which are compressed
1592 to contain only wet points, in addition to the full
1593 2-dim. and 3-dim. fields.
1594 The corresponding I/O flow looks as follows:
1595
1596 \vspace*{0.5cm}
1597
1598 \begin{tabular}{ccccc}
1599 {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1600 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1601 \cline{1-1}
1602 \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1603 \cline{1-1}
1604 {\big\downarrow} & ~ & ~ & ~ & ~ \\
1605 \cline{3-3}
1606 \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1607 \multicolumn{1}{|c|}{~} & ~ & ~ \\
1608 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} & $\longrightarrow$ &
1609 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1610 \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1611 & ~ & ~ \\
1612 \cline{3-3}
1613 ~ & ~ & ~ & ~ & ~ \\
1614 \cline{3-3}
1615 ~ & ~ &
1616 \multicolumn{1}{|c|}{~} & ~ &
1617 \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1618 ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1619 $\longrightarrow$ &
1620 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1621 ~ & ~ & \multicolumn{1}{|c|}{~}
1622 & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1623 \cline{3-3}
1624 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1625 \cline{5-5}
1626 ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1627 \cline{5-5}
1628 ~ & ~ & ~ & ~ & {\big\downarrow} \\
1629 ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1630 \end{tabular}
1631
1632 \vspace*{0.5cm}
1633
1634
1635 {\it ctrl\_unpack} reads in the updated control vector
1636 {\bf vector\_ctrl\_$<$k$>$}.
1637 It distributes the different control variables to
1638 2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1639 During the forward integration the control variables
1640 are read from {\it xx\_...$<$k$>$}.
1641 Correspondingly, the adjoint fields are written
1642 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1643 Finally, {\it ctrl\_pack} collects all adjoint field files
1644 and writes them to the compressed vector file
1645 {\bf vector\_grad\_$<$k$>$}.
1646
1647 \subsection{TLM and ADM generation via TAMC}
1648
1649
1650
1651 \subsection{Flow directives and adjoint support routines}
1652
1653 \subsection{Store directives and checkpointing}
1654
1655 \subsection{Gradient checks}
1656
1657 \subsection{Second derivative generation via TAMC}
1658
1659 \section{Example of adjoint code}

  ViewVC Help
Powered by ViewVC 1.1.22