/[MITgcm]/manual/s_autodiff/text/doc_ad_2.tex
ViewVC logotype

Annotation of /manual/s_autodiff/text/doc_ad_2.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.14 - (hide annotations) (download) (as text)
Thu Feb 28 19:32:20 2002 UTC (23 years, 4 months ago) by cnh
Branch: MAIN
Changes since 1.13: +1 -584 lines
File MIME type: application/x-tex
Updates for special on-line version with
hyperlinked and animated figures

Separating tutorials and reference material

1 cnh 1.14 % $Header: /u/u0/gcmpack/manual/part5/doc_ad_2.tex,v 1.13 2002/01/18 22:56:45 heimbach Exp $
2 heimbach 1.2 % $Name: $
3 adcroft 1.1
4     {\sf Automatic differentiation} (AD), also referred to as algorithmic
5     (or, more loosely, computational) differentiation, involves
6     automatically deriving code to calculate
7     partial derivatives from an existing fully non-linear prognostic code.
8     (see \cite{gri:00}).
9     A software tool is used that parses and transforms source files
10     according to a set of linguistic and mathematical rules.
11     AD tools are like source-to-source translators in that
12     they parse a program code as input and produce a new program code
13     as output.
14     However, unlike a pure source-to-source translation, the output program
15     represents a new algorithm, such as the evaluation of the
16     Jacobian, the Hessian, or higher derivative operators.
17     In principle, a variety of derived algorithms
18     can be generated automatically in this way.
19    
20     The MITGCM has been adapted for use with the
21 heimbach 1.4 Tangent linear and Adjoint Model Compiler (TAMC) and its successor TAF
22 adcroft 1.1 (Transformation of Algorithms in Fortran), developed
23     by Ralf Giering (\cite{gie-kam:98}, \cite{gie:99,gie:00}).
24 cnh 1.7 The first application of the adjoint of the MITGCM for sensitivity
25 adcroft 1.1 studies has been published by \cite{maro-eta:99}.
26     \cite{sta-eta:97,sta-eta:01} use the MITGCM and its adjoint
27     for ocean state estimation studies.
28 heimbach 1.4 In the following we shall refer to TAMC and TAF synonymously,
29     except were explicitly stated otherwise.
30 adcroft 1.1
31     TAMC exploits the chain rule for computing the first
32     derivative of a function with
33     respect to a set of input variables.
34     Treating a given forward code as a composition of operations --
35 heimbach 1.4 each line representing a compositional element, the chain rule is
36 adcroft 1.1 rigorously applied to the code, line by line. The resulting
37     tangent linear or adjoint code,
38     then, may be thought of as the composition in
39     forward or reverse order, respectively, of the
40 heimbach 1.4 Jacobian matrices of the forward code's compositional elements.
41 adcroft 1.1
42     %**********************************************************************
43     \section{Some basic algebra}
44     \label{sec_ad_algebra}
45     %**********************************************************************
46    
47     Let $ \cal{M} $ be a general nonlinear, model, i.e. a
48     mapping from the $m$-dimensional space
49     $U \subset I\!\!R^m$ of input variables
50     $\vec{u}=(u_1,\ldots,u_m)$
51     (model parameters, initial conditions, boundary conditions
52     such as forcing functions) to the $n$-dimensional space
53     $V \subset I\!\!R^n$ of
54     model output variable $\vec{v}=(v_1,\ldots,v_n)$
55 cnh 1.7 (model state, model diagnostics, objective function, ...)
56 adcroft 1.1 under consideration,
57     %
58     \begin{equation}
59     \begin{split}
60     {\cal M} \, : & \, U \,\, \longrightarrow \, V \\
61     ~ & \, \vec{u} \,\, \longmapsto \, \vec{v} \, = \,
62     {\cal M}(\vec{u})
63     \label{fulloperator}
64     \end{split}
65     \end{equation}
66     %
67     The vectors $ \vec{u} \in U $ and $ v \in V $ may be represented w.r.t.
68     some given basis vectors
69     $ {\rm span} (U) = \{ {\vec{e}_i} \}_{i = 1, \ldots , m} $ and
70     $ {\rm span} (V) = \{ {\vec{f}_j} \}_{j = 1, \ldots , n} $ as
71     \[
72     \vec{u} \, = \, \sum_{i=1}^{m} u_i \, {\vec{e}_i},
73     \qquad
74     \vec{v} \, = \, \sum_{j=1}^{n} v_j \, {\vec{f}_j}
75     \]
76    
77     Two routes may be followed to determine the sensitivity of the
78     output variable $\vec{v}$ to its input $\vec{u}$.
79    
80     \subsection{Forward or direct sensitivity}
81     %
82     Consider a perturbation to the input variables $\delta \vec{u}$
83     (typically a single component
84     $\delta \vec{u} = \delta u_{i} \, {\vec{e}_{i}}$).
85     Their effect on the output may be obtained via the linear
86     approximation of the model $ {\cal M}$ in terms of its Jacobian matrix
87     $ M $, evaluated in the point $u^{(0)}$ according to
88     %
89     \begin{equation}
90     \delta \vec{v} \, = \, M |_{\vec{u}^{(0)}} \, \delta \vec{u}
91     \label{tangent_linear}
92     \end{equation}
93     with resulting output perturbation $\delta \vec{v}$.
94     In components
95     $M_{j i} \, = \, \partial {\cal M}_{j} / \partial u_{i} $,
96     it reads
97     %
98     \begin{equation}
99     \delta v_{j} \, = \, \sum_{i}
100     \left. \frac{\partial {\cal M}_{j}}{\partial u_{i}} \right|_{u^{(0)}} \,
101     \delta u_{i}
102     \label{jacobi_matrix}
103     \end{equation}
104     %
105     Eq. (\ref{tangent_linear}) is the {\sf tangent linear model (TLM)}.
106     In contrast to the full nonlinear model $ {\cal M} $, the operator
107     $ M $ is just a matrix
108     which can readily be used to find the forward sensitivity of $\vec{v}$ to
109     perturbations in $u$,
110 heimbach 1.4 but if there are very many input variables $(\gg O(10^{6})$ for
111 adcroft 1.1 large-scale oceanographic application), it quickly becomes
112     prohibitive to proceed directly as in (\ref{tangent_linear}),
113     if the impact of each component $ {\bf e_{i}} $ is to be assessed.
114    
115     \subsection{Reverse or adjoint sensitivity}
116     %
117     Let us consider the special case of a
118     scalar objective function ${\cal J}(\vec{v})$ of the model output (e.g.
119     the total meridional heat transport,
120     the total uptake of $CO_{2}$ in the Southern
121     Ocean over a time interval,
122     or a measure of some model-to-data misfit)
123     %
124     \begin{eqnarray}
125     \begin{array}{cccccc}
126     {\cal J} \, : & U &
127     \longrightarrow & V &
128     \longrightarrow & I \!\! R \\
129     ~ & \vec{u} & \longmapsto & \vec{v}={\cal M}(\vec{u}) &
130     \longmapsto & {\cal J}(\vec{u}) = {\cal J}({\cal M}(\vec{u}))
131     \end{array}
132     \label{compo}
133     \end{eqnarray}
134     %
135 heimbach 1.4 The perturbation of $ {\cal J} $ around a fixed point $ {\cal J}_0 $,
136 adcroft 1.1 \[
137 heimbach 1.4 {\cal J} \, = \, {\cal J}_0 \, + \, \delta {\cal J}
138 adcroft 1.1 \]
139     can be expressed in both bases of $ \vec{u} $ and $ \vec{v} $
140     w.r.t. their corresponding inner product
141     $\left\langle \,\, , \,\, \right\rangle $
142     %
143     \begin{equation}
144     \begin{split}
145     {\cal J} & = \,
146     {\cal J} |_{\vec{u}^{(0)}} \, + \,
147     \left\langle \, \nabla _{u}{\cal J}^T |_{\vec{u}^{(0)}} \, , \, \delta \vec{u} \, \right\rangle
148     \, + \, O(\delta \vec{u}^2) \\
149     ~ & = \,
150     {\cal J} |_{\vec{v}^{(0)}} \, + \,
151     \left\langle \, \nabla _{v}{\cal J}^T |_{\vec{v}^{(0)}} \, , \, \delta \vec{v} \, \right\rangle
152     \, + \, O(\delta \vec{v}^2)
153     \end{split}
154     \label{deljidentity}
155     \end{equation}
156     %
157 heimbach 1.2 (note, that the gradient $ \nabla f $ is a co-vector, therefore
158 adcroft 1.1 its transpose is required in the above inner product).
159     Then, using the representation of
160     $ \delta {\cal J} =
161     \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle $,
162     the definition
163     of an adjoint operator $ A^{\ast} $ of a given operator $ A $,
164     \[
165     \left\langle \, A^{\ast} \vec{x} \, , \, \vec{y} \, \right\rangle =
166     \left\langle \, \vec{x} \, , \, A \vec{y} \, \right\rangle
167     \]
168     which for finite-dimensional vector spaces is just the
169     transpose of $ A $,
170     \[
171     A^{\ast} \, = \, A^T
172     \]
173 heimbach 1.4 and from eq. (\ref{tangent_linear}), (\ref{deljidentity}),
174     we note that
175 adcroft 1.1 (omitting $|$'s):
176     %
177     \begin{equation}
178     \delta {\cal J}
179     \, = \,
180     \left\langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \right\rangle
181     \, = \,
182     \left\langle \, \nabla _{v}{\cal J}^T \, , \, M \, \delta \vec{u} \, \right\rangle
183     \, = \,
184     \left\langle \, M^T \, \nabla _{v}{\cal J}^T \, , \,
185     \delta \vec{u} \, \right\rangle
186     \label{inner}
187     \end{equation}
188     %
189     With the identity (\ref{deljidentity}), we then find that
190     the gradient $ \nabla _{u}{\cal J} $ can be readily inferred by
191     invoking the adjoint $ M^{\ast } $ of the tangent linear model $ M $
192     %
193     \begin{equation}
194     \begin{split}
195     \nabla _{u}{\cal J}^T |_{\vec{u}} &
196     = \, M^T |_{\vec{u}} \cdot \nabla _{v}{\cal J}^T |_{\vec{v}} \\
197     ~ & = \, M^T |_{\vec{u}} \cdot \delta \vec{v}^{\ast} \\
198     ~ & = \, \delta \vec{u}^{\ast}
199     \end{split}
200     \label{adjoint}
201     \end{equation}
202     %
203     Eq. (\ref{adjoint}) is the {\sf adjoint model (ADM)},
204     in which $M^T$ is the adjoint (here, the transpose) of the
205     tangent linear operator $M$, $ \delta \vec{v}^{\ast} $
206     the adjoint variable of the model state $ \vec{v} $, and
207     $ \delta \vec{u}^{\ast} $ the adjoint variable of the control variable $ \vec{u} $.
208    
209     The {\sf reverse} nature of the adjoint calculation can be readily
210 heimbach 1.4 seen as follows.
211     Consider a model integration which consists of $ \Lambda $
212     consecutive operations
213     $ {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
214     ...... ( {\cal M}_{\lambda} (
215     ......
216     ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))) $,
217     where the ${\cal M}$'s could be the elementary steps, i.e. single lines
218     in the code of the model, or successive time steps of the
219     model integration,
220     starting at step 0 and moving up to step $\Lambda$, with intermediate
221     ${\cal M}_{\lambda} (\vec{u}) = \vec{v}^{(\lambda+1)}$ and final
222     ${\cal M}_{\Lambda} (\vec{u}) = \vec{v}^{(\Lambda+1)} = \vec{v}$.
223 cnh 1.7 Let ${\cal J}$ be a cost function which explicitly depends on the
224 heimbach 1.4 final state $\vec{v}$ only
225     (this restriction is for clarity reasons only).
226     %
227     ${\cal J}(u)$ may be decomposed according to:
228 adcroft 1.1 %
229     \begin{equation}
230     {\cal J}({\cal M}(\vec{u})) \, = \,
231     {\cal J} ( {\cal M}_{\Lambda} ( {\cal M}_{\Lambda-1} (
232     ...... ( {\cal M}_{\lambda} (
233     ......
234     ( {\cal M}_{1} ( {\cal M}_{0}(\vec{u}) )))))
235     \label{compos}
236     \end{equation}
237     %
238 heimbach 1.4 Then, according to the chain rule, the forward calculation reads,
239     in terms of the Jacobi matrices
240 adcroft 1.1 (we've omitted the $ | $'s which, nevertheless are important
241     to the aspect of {\it tangent} linearity;
242 heimbach 1.4 note also that by definition
243 adcroft 1.1 $ \langle \, \nabla _{v}{\cal J}^T \, , \, \delta \vec{v} \, \rangle
244     = \nabla_v {\cal J} \cdot \delta \vec{v} $ )
245     %
246     \begin{equation}
247     \begin{split}
248     \nabla_v {\cal J} (M(\delta \vec{u})) & = \,
249     \nabla_v {\cal J} \cdot M_{\Lambda}
250     \cdot ...... \cdot M_{\lambda} \cdot ...... \cdot
251     M_{1} \cdot M_{0} \cdot \delta \vec{u} \\
252     ~ & = \, \nabla_v {\cal J} \cdot \delta \vec{v} \\
253     \end{split}
254     \label{forward}
255     \end{equation}
256     %
257     whereas in reverse mode we have
258     %
259     \begin{equation}
260     \boxed{
261     \begin{split}
262     M^T ( \nabla_v {\cal J}^T) & = \,
263     M_{0}^T \cdot M_{1}^T
264     \cdot ...... \cdot M_{\lambda}^T \cdot ...... \cdot
265     M_{\Lambda}^T \cdot \nabla_v {\cal J}^T \\
266     ~ & = \, M_{0}^T \cdot M_{1}^T
267     \cdot ...... \cdot
268     \nabla_{v^{(\lambda)}} {\cal J}^T \\
269     ~ & = \, \nabla_u {\cal J}^T
270     \end{split}
271     }
272     \label{reverse}
273     \end{equation}
274     %
275     clearly expressing the reverse nature of the calculation.
276     Eq. (\ref{reverse}) is at the heart of automatic adjoint compilers.
277 heimbach 1.4 If the intermediate steps $\lambda$ in
278 adcroft 1.1 eqn. (\ref{compos}) -- (\ref{reverse})
279 heimbach 1.4 represent the model state (forward or adjoint) at each
280     intermediate time step as noted above, then correspondingly,
281     $ M^T (\delta \vec{v}^{(\lambda) \, \ast}) =
282     \delta \vec{v}^{(\lambda-1) \, \ast} $ for the adjoint variables.
283     It thus becomes evident that the adjoint calculation also
284     yields the adjoint of each model state component
285     $ \vec{v}^{(\lambda)} $ at each intermediate step $ \lambda $, namely
286 adcroft 1.1 %
287     \begin{equation}
288     \boxed{
289     \begin{split}
290     \nabla_{v^{(\lambda)}} {\cal J}^T |_{\vec{v}^{(\lambda)}}
291     & = \,
292     M_{\lambda}^T |_{\vec{v}^{(\lambda)}} \cdot ...... \cdot
293     M_{\Lambda}^T |_{\vec{v}^{(\lambda)}} \cdot \delta \vec{v}^{\ast} \\
294     ~ & = \, \delta \vec{v}^{(\lambda) \, \ast}
295     \end{split}
296     }
297     \end{equation}
298     %
299     in close analogy to eq. (\ref{adjoint})
300     We note in passing that that the $\delta \vec{v}^{(\lambda) \, \ast}$
301 heimbach 1.4 are the Lagrange multipliers of the model equations which determine
302     $ \vec{v}^{(\lambda)}$.
303 adcroft 1.1
304 cnh 1.7 In components, eq. (\ref{adjoint}) reads as follows.
305 adcroft 1.1 Let
306     \[
307     \begin{array}{rclcrcl}
308     \delta \vec{u} & = &
309     \left( \delta u_1,\ldots, \delta u_m \right)^T , & \qquad &
310     \delta \vec{u}^{\ast} \,\, = \,\, \nabla_u {\cal J}^T & = &
311     \left(
312     \frac{\partial {\cal J}}{\partial u_1},\ldots,
313     \frac{\partial {\cal J}}{\partial u_m}
314     \right)^T \\
315     \delta \vec{v} & = &
316     \left( \delta v_1,\ldots, \delta u_n \right)^T , & \qquad &
317     \delta \vec{v}^{\ast} \,\, = \,\, \nabla_v {\cal J}^T & = &
318     \left(
319     \frac{\partial {\cal J}}{\partial v_1},\ldots,
320     \frac{\partial {\cal J}}{\partial v_n}
321     \right)^T \\
322     \end{array}
323     \]
324     denote the perturbations in $\vec{u}$ and $\vec{v}$, respectively,
325 cnh 1.7 and their adjoint variables;
326 adcroft 1.1 further
327     \[
328     M \, = \, \left(
329     \begin{array}{ccc}
330     \frac{\partial {\cal M}_1}{\partial u_1} & \ldots &
331     \frac{\partial {\cal M}_1}{\partial u_m} \\
332     \vdots & ~ & \vdots \\
333     \frac{\partial {\cal M}_n}{\partial u_1} & \ldots &
334     \frac{\partial {\cal M}_n}{\partial u_m} \\
335     \end{array}
336     \right)
337     \]
338     is the Jacobi matrix of $ {\cal M} $
339     (an $ n \times m $ matrix)
340     such that $ \delta \vec{v} = M \cdot \delta \vec{u} $, or
341     \[
342     \delta v_{j}
343     \, = \, \sum_{i=1}^m M_{ji} \, \delta u_{i}
344     \, = \, \sum_{i=1}^m \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
345     \delta u_{i}
346     \]
347     %
348     Then eq. (\ref{adjoint}) takes the form
349     \[
350     \delta u_{i}^{\ast}
351     \, = \, \sum_{j=1}^n M_{ji} \, \delta v_{j}^{\ast}
352     \, = \, \sum_{j=1}^n \, \frac{\partial {\cal M}_{j}}{\partial u_{i}}
353     \delta v_{j}^{\ast}
354     \]
355     %
356     or
357     %
358     \[
359     \left(
360     \begin{array}{c}
361     \left. \frac{\partial}{\partial u_1} {\cal J} \right|_{\vec{u}^{(0)}} \\
362     \vdots \\
363     \left. \frac{\partial}{\partial u_m} {\cal J} \right|_{\vec{u}^{(0)}} \\
364     \end{array}
365     \right)
366     \, = \,
367     \left(
368     \begin{array}{ccc}
369     \left. \frac{\partial {\cal M}_1}{\partial u_1} \right|_{\vec{u}^{(0)}}
370     & \ldots &
371     \left. \frac{\partial {\cal M}_n}{\partial u_1} \right|_{\vec{u}^{(0)}} \\
372     \vdots & ~ & \vdots \\
373     \left. \frac{\partial {\cal M}_1}{\partial u_m} \right|_{\vec{u}^{(0)}}
374     & \ldots &
375     \left. \frac{\partial {\cal M}_n}{\partial u_m} \right|_{\vec{u}^{(0)}} \\
376     \end{array}
377     \right)
378     \cdot
379     \left(
380     \begin{array}{c}
381     \left. \frac{\partial}{\partial v_1} {\cal J} \right|_{\vec{v}} \\
382     \vdots \\
383     \left. \frac{\partial}{\partial v_n} {\cal J} \right|_{\vec{v}} \\
384     \end{array}
385     \right)
386     \]
387     %
388     Furthermore, the adjoint $ \delta v^{(\lambda) \, \ast} $
389     of any intermediate state $ v^{(\lambda)} $
390     may be obtained, using the intermediate Jacobian
391     (an $ n_{\lambda+1} \times n_{\lambda} $ matrix)
392     %
393     \[
394     M_{\lambda} \, = \,
395     \left(
396     \begin{array}{ccc}
397     \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
398     & \ldots &
399     \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
400     \vdots & ~ & \vdots \\
401     \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1}
402     & \ldots &
403     \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
404     \end{array}
405     \right)
406     \]
407     %
408     and the shorthand notation for the adjoint variables
409     $ \delta v^{(\lambda) \, \ast}_{j} = \frac{\partial}{\partial v^{(\lambda)}_{j}}
410     {\cal J}^T $, $ j = 1, \ldots , n_{\lambda} $,
411     for intermediate components, yielding
412 heimbach 1.4 \begin{equation}
413     \small
414     \begin{split}
415 adcroft 1.1 \left(
416     \begin{array}{c}
417     \delta v^{(\lambda) \, \ast}_1 \\
418     \vdots \\
419     \delta v^{(\lambda) \, \ast}_{n_{\lambda}} \\
420     \end{array}
421     \right)
422 heimbach 1.4 \, = &
423 adcroft 1.1 \left(
424     \begin{array}{ccc}
425     \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_1}
426 heimbach 1.4 & \ldots \,\, \ldots &
427 adcroft 1.1 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_1} \\
428     \vdots & ~ & \vdots \\
429     \frac{\partial ({\cal M}_{\lambda})_1}{\partial v^{(\lambda)}_{n_{\lambda}}}
430 heimbach 1.4 & \ldots \,\, \ldots &
431 adcroft 1.1 \frac{\partial ({\cal M}_{\lambda})_{n_{\lambda+1}}}{\partial v^{(\lambda)}_{n_{\lambda}}} \\
432     \end{array}
433     \right)
434 heimbach 1.4 \cdot
435 adcroft 1.1 %
436 heimbach 1.4 \\ ~ & ~
437     \\ ~ &
438 adcroft 1.1 %
439     \left(
440     \begin{array}{ccc}
441     \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_1}
442     & \ldots &
443     \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_1} \\
444     \vdots & ~ & \vdots \\
445     \vdots & ~ & \vdots \\
446     \frac{\partial ({\cal M}_{\lambda+1})_1}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}}
447     & \ldots &
448     \frac{\partial ({\cal M}_{\lambda+1})_{n_{\lambda+2}}}{\partial v^{(\lambda+1)}_{n_{\lambda+1}}} \\
449     \end{array}
450     \right)
451 heimbach 1.4 \cdot \, \ldots \, \cdot
452 adcroft 1.1 \left(
453     \begin{array}{c}
454     \delta v^{\ast}_1 \\
455     \vdots \\
456     \delta v^{\ast}_{n} \\
457     \end{array}
458     \right)
459 heimbach 1.4 \end{split}
460     \end{equation}
461 adcroft 1.1
462     Eq. (\ref{forward}) and (\ref{reverse}) are perhaps clearest in
463     showing the advantage of the reverse over the forward mode
464     if the gradient $\nabla _{u}{\cal J}$, i.e. the sensitivity of the
465     cost function $ {\cal J} $ with respect to {\it all} input
466     variables $u$
467     (or the sensitivity of the cost function with respect to
468     {\it all} intermediate states $ \vec{v}^{(\lambda)} $) are sought.
469     In order to be able to solve for each component of the gradient
470     $ \partial {\cal J} / \partial u_{i} $ in (\ref{forward})
471 cnh 1.7 a forward calculation has to be performed for each component separately,
472 adcroft 1.1 i.e. $ \delta \vec{u} = \delta u_{i} {\vec{e}_{i}} $
473     for the $i$-th forward calculation.
474     Then, (\ref{forward}) represents the
475     projection of $ \nabla_u {\cal J} $ onto the $i$-th component.
476     The full gradient is retrieved from the $ m $ forward calculations.
477     In contrast, eq. (\ref{reverse}) yields the full
478     gradient $\nabla _{u}{\cal J}$ (and all intermediate gradients
479     $\nabla _{v^{(\lambda)}}{\cal J}$) within a single reverse calculation.
480    
481 heimbach 1.4 Note, that if $ {\cal J} $ is a vector-valued function
482 adcroft 1.1 of dimension $ l > 1 $,
483     eq. (\ref{reverse}) has to be modified according to
484     \[
485     M^T \left( \nabla_v {\cal J}^T \left(\delta \vec{J}\right) \right)
486     \, = \,
487     \nabla_u {\cal J}^T \cdot \delta \vec{J}
488     \]
489 heimbach 1.4 where now $ \delta \vec{J} \in I\!\!R^l $ is a vector of
490 cnh 1.7 dimension $ l $.
491 adcroft 1.1 In this case $ l $ reverse simulations have to be performed
492     for each $ \delta J_{k}, \,\, k = 1, \ldots, l $.
493     Then, the reverse mode is more efficient as long as
494     $ l < n $, otherwise the forward mode is preferable.
495 cnh 1.7 Strictly, the reverse mode is called adjoint mode only for
496 adcroft 1.1 $ l = 1 $.
497    
498     A detailed analysis of the underlying numerical operations
499     shows that the computation of $\nabla _{u}{\cal J}$ in this way
500     requires about 2 to 5 times the computation of the cost function.
501     Alternatively, the gradient vector could be approximated
502     by finite differences, requiring $m$ computations
503     of the perturbed cost function.
504    
505     To conclude we give two examples of commonly used types
506     of cost functions:
507    
508     \paragraph{Example 1:
509     $ {\cal J} = v_{j} (T) $} ~ \\
510     The cost function consists of the $j$-th component of the model state
511     $ \vec{v} $ at time $T$.
512     Then $ \nabla_v {\cal J}^T = {\vec{f}_{j}} $ is just the $j$-th
513     unit vector. The $ \nabla_u {\cal J}^T $
514     is the projection of the adjoint
515     operator onto the $j$-th component ${\bf f_{j}}$,
516     \[
517     \nabla_u {\cal J}^T
518     \, = \, M^T \cdot \nabla_v {\cal J}^T
519     \, = \, \sum_{i} M^T_{ji} \, {\vec{e}_{i}}
520     \]
521    
522     \paragraph{Example 2:
523     $ {\cal J} = \langle \, {\cal H}(\vec{v}) - \vec{d} \, ,
524     \, {\cal H}(\vec{v}) - \vec{d} \, \rangle $} ~ \\
525 heimbach 1.4 The cost function represents the quadratic model vs. data misfit.
526 adcroft 1.1 Here, $ \vec{d} $ is the data vector and $ {\cal H} $ represents the
527     operator which maps the model state space onto the data space.
528     Then, $ \nabla_v {\cal J} $ takes the form
529     %
530     \begin{equation*}
531     \begin{split}
532     \nabla_v {\cal J}^T & = \, 2 \, \, H \cdot
533     \left( \, {\cal H}(\vec{v}) - \vec{d} \, \right) \\
534     ~ & = \, 2 \sum_{j} \left\{ \sum_k
535     \frac{\partial {\cal H}_k}{\partial v_{j}}
536     \left( {\cal H}_k (\vec{v}) - d_k \right)
537     \right\} \, {\vec{f}_{j}} \\
538     \end{split}
539     \end{equation*}
540     %
541     where $H_{kj} = \partial {\cal H}_k / \partial v_{j} $ is the
542     Jacobi matrix of the data projection operator.
543     Thus, the gradient $ \nabla_u {\cal J} $ is given by the
544     adjoint operator,
545     driven by the model vs. data misfit:
546     \[
547     \nabla_u {\cal J}^T \, = \, 2 \, M^T \cdot
548     H \cdot \left( {\cal H}(\vec{v}) - \vec{d} \, \right)
549     \]
550    
551     \subsection{Storing vs. recomputation in reverse mode}
552     \label{checkpointing}
553    
554     We note an important aspect of the forward vs. reverse
555     mode calculation.
556 heimbach 1.4 Because of the local character of the derivative
557     (a derivative is defined w.r.t. a point along the trajectory),
558 adcroft 1.1 the intermediate results of the model trajectory
559     $\vec{v}^{(\lambda+1)}={\cal M}_{\lambda}(v^{(\lambda)})$
560     are needed to evaluate the intermediate Jacobian
561     $M_{\lambda}|_{\vec{v}^{(\lambda)}} \, \delta \vec{v}^{(\lambda)} $.
562     In the forward mode, the intermediate results are required
563     in the same order as computed by the full forward model ${\cal M}$,
564 heimbach 1.4 but in the reverse mode they are required in the reverse order.
565 adcroft 1.1 Thus, in the reverse mode the trajectory of the forward model
566     integration ${\cal M}$ has to be stored to be available in the reverse
567 heimbach 1.4 calculation. Alternatively, the complete model state up to the
568     point of evaluation has to be recomputed whenever its value is required.
569 adcroft 1.1
570     A method to balance the amount of recomputations vs.
571     storage requirements is called {\sf checkpointing}
572     (e.g. \cite{res-eta:98}).
573 adcroft 1.6 It is depicted in \ref{fig:3levelcheck} for a 3-level checkpointing
574 heimbach 1.4 [as an example, we give explicit numbers for a 3-day
575 adcroft 1.1 integration with a 1-hourly timestep in square brackets].
576     \begin{itemize}
577     %
578     \item [$lev3$]
579     In a first step, the model trajectory is subdivided into
580     $ {n}^{lev3} $ subsections [$ {n}^{lev3} $=3 1-day intervals],
581     with the label $lev3$ for this outermost loop.
582 heimbach 1.4 The model is then integrated along the full trajectory,
583 adcroft 1.1 and the model state stored only at every $ k_{i}^{lev3} $-th timestep
584     [i.e. 3 times, at
585     $ i = 0,1,2 $ corresponding to $ k_{i}^{lev3} = 0, 24, 48 $].
586     %
587     \item [$lev2$]
588 heimbach 1.4 In a second step each subsection itself is divided into
589     $ {n}^{lev2} $ sub-subsections
590 adcroft 1.1 [$ {n}^{lev2} $=4 6-hour intervals per subsection].
591     The model picks up at the last outermost dumped state
592 heimbach 1.4 $ v_{k_{n}^{lev3}} $ and is integrated forward in time along
593 adcroft 1.1 the last subsection, with the label $lev2$ for this
594     intermediate loop.
595 heimbach 1.4 The model state is now stored at every $ k_{i}^{lev2} $-th
596 adcroft 1.1 timestep
597     [i.e. 4 times, at
598     $ i = 0,1,2,3 $ corresponding to $ k_{i}^{lev2} = 48, 54, 60, 66 $].
599     %
600     \item [$lev1$]
601 heimbach 1.4 Finally, the model picks up at the last intermediate dump state
602     $ v_{k_{n}^{lev2}} $ and is integrated forward in time along
603     the last sub-subsection, with the label $lev1$ for this
604 adcroft 1.1 intermediate loop.
605 heimbach 1.4 Within this sub-subsection only, the model state is stored
606 adcroft 1.1 at every timestep
607     [i.e. every hour $ i=0,...,5$ corresponding to
608     $ k_{i}^{lev1} = 66, 67, \ldots, 71 $].
609     Thus, the final state $ v_n = v_{k_{n}^{lev1}} $ is reached
610 adcroft 1.8 and the model state of all proceeding timesteps along the last
611 heimbach 1.4 sub-subsections are available, enabling integration backwards
612     in time along the last sub-subsection.
613     Thus, the adjoint can be computed along this last
614     sub-subsection $k_{n}^{lev2}$.
615 adcroft 1.1 %
616     \end{itemize}
617     %
618     This procedure is repeated consecutively for each previous
619 heimbach 1.4 sub-subsection $k_{n-1}^{lev2}, \ldots, k_{1}^{lev2} $
620 adcroft 1.1 carrying the adjoint computation to the initial time
621     of the subsection $k_{n}^{lev3}$.
622     Then, the procedure is repeated for the previous subsection
623     $k_{n-1}^{lev3}$
624     carrying the adjoint computation to the initial time
625     $k_{1}^{lev3}$.
626    
627     For the full model trajectory of
628     $ n^{lev3} \cdot n^{lev2} \cdot n^{lev1} $ timesteps
629     the required storing of the model state was significantly reduced to
630     $ n^{lev1} + n^{lev2} + n^{lev3} $
631     [i.e. for the 3-day integration with a total oof 72 timesteps
632     the model state was stored 13 times].
633     This saving in memory comes at a cost of a required
634     3 full forward integrations of the model (one for each
635     checkpointing level).
636     The balance of storage vs. recomputation certainly depends
637     on the computing resources available.
638    
639     \begin{figure}[t!]
640 adcroft 1.6 \begin{center}
641 adcroft 1.1 %\psdraft
642 adcroft 1.6 %\psfrag{v_k1^lev3}{\mathinfigure{v_{k_{1}^{lev3}}}}
643     %\psfrag{v_kn-1^lev3}{\mathinfigure{v_{k_{n-1}^{lev3}}}}
644     %\psfrag{v_kn^lev3}{\mathinfigure{v_{k_{n}^{lev3}}}}
645     %\psfrag{v_k1^lev2}{\mathinfigure{v_{k_{1}^{lev2}}}}
646     %\psfrag{v_kn-1^lev2}{\mathinfigure{v_{k_{n-1}^{lev2}}}}
647     %\psfrag{v_kn^lev2}{\mathinfigure{v_{k_{n}^{lev2}}}}
648     %\psfrag{v_k1^lev1}{\mathinfigure{v_{k_{1}^{lev1}}}}
649     %\psfrag{v_kn^lev1}{\mathinfigure{v_{k_{n}^{lev1}}}}
650     %\mbox{\epsfig{file=part5/checkpointing.eps, width=0.8\textwidth}}
651     \resizebox{5.5in}{!}{\includegraphics{part5/checkpointing.eps}}
652 adcroft 1.1 %\psfull
653 adcroft 1.6 \end{center}
654     \caption{
655     Schematic view of intermediate dump and restart for
656 adcroft 1.1 3-level checkpointing.}
657 heimbach 1.4 \label{fig:3levelcheck}
658 adcroft 1.1 \end{figure}
659    
660 heimbach 1.4 % \subsection{Optimal perturbations}
661     % \label{sec_optpert}
662 adcroft 1.1
663    
664 heimbach 1.4 % \subsection{Error covariance estimate and Hessian matrix}
665     % \label{sec_hessian}
666 adcroft 1.1
667     \newpage
668    
669     %**********************************************************************
670 heimbach 1.4 \section{TLM and ADM generation in general}
671 adcroft 1.1 \label{sec_ad_setup_gen}
672     %**********************************************************************
673    
674     In this section we describe in a general fashion
675     the parts of the code that are relevant for automatic
676     differentiation using the software tool TAMC.
677    
678 heimbach 1.4 \input{part5/doc_ad_the_model}
679    
680 adcroft 1.6 The basic flow is depicted in \ref{fig:adthemodel}.
681 heimbach 1.4 If the option {\tt ALLOW\_AUTODIFF\_TAMC} is defined, the driver routine
682     {\it the\_model\_main}, instead of calling {\it the\_main\_loop},
683     invokes the adjoint of this routine, {\it adthe\_main\_loop},
684     which is the toplevel routine in terms of reverse mode computation.
685     The routine {\it adthe\_main\_loop} has been generated using TAMC.
686     It contains both the forward integration of the full model,
687     any additional storing that is required for efficient checkpointing,
688     and the reverse integration of the adjoint model.
689     The structure of {\it adthe\_main\_loop} has been strongly
690     simplified for clarification; in particular, no checkpointing
691     procedures are shown here.
692     Prior to the call of {\it adthe\_main\_loop}, the routine
693     {\it ctrl\_unpack} is invoked to unpack the control vector,
694     and following that call, the routine {\it ctrl\_pack}
695     is invoked to pack the control vector
696     (cf. Section \ref{section_ctrl}).
697     If gradient checks are to be performed, the option
698     {\tt ALLOW\_GRADIENT\_CHECK} is defined. In this case
699     the driver routine {\it grdchk\_main} is called after
700     the gradient has been computed via the adjoint
701     (cf. Section \ref{section_grdchk}).
702    
703     \subsection{The cost function (dependent variable)
704     \label{section_cost}}
705 adcroft 1.1
706     The cost function $ {\cal J} $ is referred to as the {\sf dependent variable}.
707     It is a function of the input variables $ \vec{u} $ via the composition
708     $ {\cal J}(\vec{u}) \, = \, {\cal J}(M(\vec{u})) $.
709     The input is referred to as the
710     {\sf independent variables} or {\sf control variables}.
711     All aspects relevant to the treatment of the cost function $ {\cal J} $
712 cnh 1.7 (parameter setting, initialization, accumulation,
713 heimbach 1.4 final evaluation), are controlled by the package {\it pkg/cost}.
714    
715     \input{part5/doc_cost_flow}
716 adcroft 1.1
717     \subsubsection{genmake and CPP options}
718     %
719     \begin{itemize}
720     %
721     \item
722     \fbox{
723     \begin{minipage}{12cm}
724     {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
725     \end{minipage}
726     }
727     \end{itemize}
728     %
729     The directory {\it pkg/cost} can be included to the
730     compile list in 3 different ways (cf. Section \ref{???}):
731     %
732     \begin{enumerate}
733     %
734     \item {\it genmake}: \\
735 heimbach 1.4 Change the default settings in the file {\it genmake} by adding
736 adcroft 1.1 {\bf cost} to the {\bf enable} list (not recommended).
737     %
738     \item {\it .genmakerc}: \\
739     Customize the settings of {\bf enable}, {\bf disable} which are
740     appropriate for your experiment in the file {\it .genmakerc}
741     and add the file to your compile directory.
742     %
743     \item genmake-options: \\
744     Call {\it genmake} with the option
745     {\tt genmake -enable=cost}.
746     %
747     \end{enumerate}
748 heimbach 1.4 The basic CPP option to enable the cost function is {\bf ALLOW\_COST}.
749     Each specific cost function contribution has its own option.
750     For the present example the option is {\bf ALLOW\_COST\_TRACER}.
751     All cost-specific options are set in {\it ECCO\_CPPOPTIONS.h}
752 adcroft 1.1 Since the cost function is usually used in conjunction with
753     automatic differentiation, the CPP option
754     {\bf ALLOW\_ADJOINT\_RUN} should be defined
755     (file {\it CPP\_OPTIONS.h}).
756    
757 cnh 1.7 \subsubsection{Initialization}
758 adcroft 1.1 %
759 cnh 1.7 The initialization of the {\it cost} package is readily enabled
760 adcroft 1.1 as soon as the CPP option {\bf ALLOW\_ADJOINT\_RUN} is defined.
761     %
762     \begin{itemize}
763     %
764     \item
765     \fbox{
766     \begin{minipage}{12cm}
767     Parameters: {\it cost\_readparms}
768     \end{minipage}
769     }
770     \\
771     This S/R
772     reads runtime flags and parameters from file {\it data.cost}.
773     For the present example the only relevant parameter read
774     is {\bf mult\_tracer}. This multiplier enables different
775     cost function contributions to be switched on
776     ( = 1.) or off ( = 0.) at runtime.
777     For more complex cost functions which involve model vs. data
778     misfits, the corresponding data filenames and data
779     specifications (start date and time, period, ...) are read
780     in this S/R.
781     %
782     \item
783     \fbox{
784     \begin{minipage}{12cm}
785     Variables: {\it cost\_init}
786     \end{minipage}
787     }
788     \\
789     This S/R
790 cnh 1.7 initializes the different cost function contributions.
791     The contribution for the present example is {\bf objf\_tracer}
792 adcroft 1.1 which is defined on each tile (bi,bj).
793     %
794     \end{itemize}
795     %
796 heimbach 1.4 \subsubsection{Accumulation}
797 adcroft 1.1 %
798     \begin{itemize}
799     %
800     \item
801     \fbox{
802     \begin{minipage}{12cm}
803     {\it cost\_tile}, {\it cost\_tracer}
804     \end{minipage}
805     }
806     \end{itemize}
807     %
808     The 'driver' routine
809     {\it cost\_tile} is called at the end of each time step.
810     Within this 'driver' routine, S/R are called for each of
811     the chosen cost function contributions.
812     In the present example ({\bf ALLOW\_COST\_TRACER}),
813     S/R {\it cost\_tracer} is called.
814     It accumulates {\bf objf\_tracer} according to eqn. (\ref{???}).
815     %
816     \subsubsection{Finalize all contributions}
817     %
818     \begin{itemize}
819     %
820     \item
821     \fbox{
822     \begin{minipage}{12cm}
823     {\it cost\_final}
824     \end{minipage}
825     }
826     \end{itemize}
827     %
828     At the end of the forward integration S/R {\it cost\_final}
829     is called. It accumulates the total cost function {\bf fc}
830     from each contribution and sums over all tiles:
831     \begin{equation}
832     {\cal J} \, = \,
833     {\rm fc} \, = \,
834     {\rm mult\_tracer} \sum_{bi,\,bj}^{nSx,\,nSy}
835     {\rm objf\_tracer}(bi,bj) \, + \, ...
836     \end{equation}
837     %
838     The total cost function {\bf fc} will be the
839     'dependent' variable in the argument list for TAMC, i.e.
840     \begin{verbatim}
841     tamc -output 'fc' ...
842     \end{verbatim}
843    
844 cnh 1.3 %%%% \end{document}
845 adcroft 1.1
846     \input{part5/doc_ad_the_main}
847    
848 heimbach 1.4 \subsection{The control variables (independent variables)
849     \label{section_ctrl}}
850 adcroft 1.1
851     The control variables are a subset of the model input
852     (initial conditions, boundary conditions, model parameters).
853     Here we identify them with the variable $ \vec{u} $.
854     All intermediate variables whose derivative w.r.t. control
855 heimbach 1.4 variables do not vanish are called {\sf active variables}.
856 adcroft 1.1 All subroutines whose derivative w.r.t. the control variables
857     don't vanish are called {\sf active routines}.
858     Read and write operations from and to file can be viewed
859     as variable assignments. Therefore, files to which
860     active variables are written and from which active variables
861     are read are called {\sf active files}.
862     All aspects relevant to the treatment of the control variables
863 cnh 1.7 (parameter setting, initialization, perturbation)
864     are controlled by the package {\it pkg/ctrl}.
865 adcroft 1.1
866 heimbach 1.4 \input{part5/doc_ctrl_flow}
867    
868 adcroft 1.1 \subsubsection{genmake and CPP options}
869     %
870     \begin{itemize}
871     %
872     \item
873     \fbox{
874     \begin{minipage}{12cm}
875     {\it genmake}, {\it CPP\_OPTIONS.h}, {\it ECCO\_CPPOPTIONS.h}
876     \end{minipage}
877     }
878     \end{itemize}
879     %
880     To enable the directory to be included to the compile list,
881     {\bf ctrl} has to be added to the {\bf enable} list in
882     {\it .genmakerc} (or {\it genmake} itself).
883     Each control variable is enabled via its own CPP option
884     in {\it ECCO\_CPPOPTIONS.h}.
885    
886 cnh 1.7 \subsubsection{Initialization}
887 adcroft 1.1 %
888     \begin{itemize}
889     %
890     \item
891     \fbox{
892     \begin{minipage}{12cm}
893     Parameters: {\it ctrl\_readparms}
894     \end{minipage}
895     }
896     \\
897     %
898     This S/R
899     reads runtime flags and parameters from file {\it data.ctrl}.
900     For the present example the file contains the file names
901     of each control variable that is used.
902     In addition, the number of wet points for each control
903     variable and the net dimension of the space of control
904     variables (counting wet points only) {\bf nvarlength}
905     is determined.
906     Masks for wet points for each tile {\bf (bi,\,bj)}
907     and vertical layer {\bf k} are generated for the three
908     relevant categories on the C-grid:
909     {\bf nWetCtile} for tracer fields,
910     {\bf nWetWtile} for zonal velocity fields,
911     {\bf nWetStile} for meridional velocity fields.
912     %
913     \item
914     \fbox{
915     \begin{minipage}{12cm}
916     Control variables, control vector,
917     and their gradients: {\it ctrl\_unpack}
918     \end{minipage}
919     }
920     \\
921     %
922     Two important issues related to the handling of the control
923     variables in the MITGCM need to be addressed.
924     First, in order to save memory, the control variable arrays
925     are not kept in memory, but rather read from file and added
926 cnh 1.7 to the initial fields during the model initialization phase.
927 adcroft 1.1 Similarly, the corresponding adjoint fields which represent
928     the gradient of the cost function w.r.t. the control variables
929 heimbach 1.4 are written to file at the end of the adjoint integration.
930 adcroft 1.1 Second, in addition to the files holding the 2-dim. and 3-dim.
931 heimbach 1.4 control variables and the corresponding cost gradients,
932     a 1-dim. {\sf control vector}
933 adcroft 1.1 and {\sf gradient vector} are written to file. They contain
934     only the wet points of the control variables and the corresponding
935     gradient.
936     This leads to a significant data compression.
937 heimbach 1.4 Furthermore, an option is available
938     ({\tt ALLOW\_NONDIMENSIONAL\_CONTROL\_IO}) to
939     non-dimensionalise the control and gradient vector,
940     which otherwise would contain different pieces of different
941     magnitudes and units.
942     Finally, the control and gradient vector can be passed to a
943 adcroft 1.1 minimization routine if an update of the control variables
944     is sought as part of a minimization exercise.
945    
946     The files holding fields and vectors of the control variables
947     and gradient are generated and initialised in S/R {\it ctrl\_unpack}.
948     %
949     \end{itemize}
950    
951     \subsubsection{Perturbation of the independent variables}
952     %
953 heimbach 1.4 The dependency flow for differentiation w.r.t. the controls
954     starts with adding a perturbation onto the input variable,
955 adcroft 1.1 thus defining the independent or control variables for TAMC.
956 heimbach 1.4 Three types of controls may be considered:
957 adcroft 1.1 %
958     \begin{itemize}
959     %
960     \item
961     \fbox{
962     \begin{minipage}{12cm}
963     {\it ctrl\_map\_ini} (initial value sensitivity):
964     \end{minipage}
965     }
966     \\
967     %
968     Consider as an example the initial tracer distribution
969     {\bf tr1} as control variable.
970     After {\bf tr1} has been initialised in
971 heimbach 1.4 {\it ini\_tr1} (dynamical variables such as
972 adcroft 1.1 temperature and salinity are initialised in {\it ini\_fields}),
973     a perturbation anomaly is added to the field in S/R
974     {\it ctrl\_map\_ini}
975     %
976     \begin{equation}
977     \begin{split}
978     u & = \, u_{[0]} \, + \, \Delta u \\
979     {\bf tr1}(...) & = \, {\bf tr1_{ini}}(...) \, + \, {\bf xx\_tr1}(...)
980     \label{perturb}
981     \end{split}
982     \end{equation}
983     %
984 heimbach 1.4 {\bf xx\_tr1} is a 3-dim. global array
985 adcroft 1.1 holding the perturbation. In the case of a simple
986     sensitivity study this array is identical to zero.
987 heimbach 1.4 However, it's specification is essential in the context
988     of automatic differentiation since TAMC
989 adcroft 1.1 treats the corresponding line in the code symbolically
990     when determining the differentiation chain and its origin.
991     Thus, the variable names are part of the argument list
992     when calling TAMC:
993     %
994     \begin{verbatim}
995     tamc -input 'xx_tr1 ...' ...
996     \end{verbatim}
997     %
998     Now, as mentioned above, the MITGCM avoids maintaining
999     an array for each control variable by reading the
1000     perturbation to a temporary array from file.
1001     To ensure the symbolic link to be recognized by TAMC, a scalar
1002     dummy variable {\bf xx\_tr1\_dummy} is introduced
1003     and an 'active read' routine of the adjoint support
1004     package {\it pkg/autodiff} is invoked.
1005     The read-procedure is tagged with the variable
1006 cnh 1.7 {\bf xx\_tr1\_dummy} enabling TAMC to recognize the
1007     initialization of the perturbation.
1008 adcroft 1.1 The modified call of TAMC thus reads
1009     %
1010     \begin{verbatim}
1011     tamc -input 'xx_tr1_dummy ...' ...
1012     \end{verbatim}
1013     %
1014     and the modified operation to (\ref{perturb})
1015     in the code takes on the form
1016     %
1017     \begin{verbatim}
1018     call active_read_xyz(
1019     & ..., tmpfld3d, ..., xx_tr1_dummy, ... )
1020    
1021     tr1(...) = tr1(...) + tmpfld3d(...)
1022     \end{verbatim}
1023     %
1024     Note, that reading an active variable corresponds
1025     to a variable assignment. Its derivative corresponds
1026     to a write statement of the adjoint variable.
1027     The 'active file' routines have been designed
1028 heimbach 1.4 to support active read and corresponding adjoint active write
1029     operations (and vice versa).
1030 adcroft 1.1 %
1031     \item
1032     \fbox{
1033     \begin{minipage}{12cm}
1034     {\it ctrl\_map\_forcing} (boundary value sensitivity):
1035     \end{minipage}
1036     }
1037     \\
1038     %
1039     The handling of boundary values as control variables
1040     proceeds exactly analogous to the initial values
1041     with the symbolic perturbation taking place in S/R
1042     {\it ctrl\_map\_forcing}.
1043     Note however an important difference:
1044     Since the boundary values are time dependent with a new
1045     forcing field applied at each time steps,
1046 heimbach 1.4 the general problem may be thought of as
1047     a new control variable at each time step
1048     (or, if the perturbation is averaged over a certain period,
1049     at each $ N $ timesteps), i.e.
1050 adcroft 1.1 \[
1051     u_{\rm forcing} \, = \,
1052     \{ \, u_{\rm forcing} ( t_n ) \, \}_{
1053     n \, = \, 1, \ldots , {\rm nTimeSteps} }
1054     \]
1055     %
1056     In the current example an equilibrium state is considered,
1057     and only an initial perturbation to
1058     surface forcing is applied with respect to the
1059     equilibrium state.
1060     A time dependent treatment of the surface forcing is
1061     implemented in the ECCO environment, involving the
1062     calendar ({\it cal}~) and external forcing ({\it exf}~) packages.
1063     %
1064     \item
1065     \fbox{
1066     \begin{minipage}{12cm}
1067     {\it ctrl\_map\_params} (parameter sensitivity):
1068     \end{minipage}
1069     }
1070     \\
1071     %
1072     This routine is not yet implemented, but would proceed
1073     proceed along the same lines as the initial value sensitivity.
1074 heimbach 1.4 The mixing parameters {\bf diffkr} and {\bf kapgm}
1075     are currently added as controls in {\it ctrl\_map\_ini.F}.
1076 adcroft 1.1 %
1077     \end{itemize}
1078     %
1079    
1080     \subsubsection{Output of adjoint variables and gradient}
1081     %
1082 heimbach 1.4 Several ways exist to generate output of adjoint fields.
1083 adcroft 1.1 %
1084     \begin{itemize}
1085     %
1086     \item
1087     \fbox{
1088     \begin{minipage}{12cm}
1089 heimbach 1.4 {\it ctrl\_map\_ini, ctrl\_map\_forcing}:
1090 adcroft 1.1 \end{minipage}
1091     }
1092     \\
1093     \begin{itemize}
1094     %
1095 heimbach 1.4 \item {\bf xx\_...}: the control variable fields \\
1096     Before the forward integration, the control
1097     variables are read from file {\bf xx\_ ...} and added to
1098     the model field.
1099 adcroft 1.1 %
1100     \item {\bf adxx\_...}: the adjoint variable fields, i.e. the gradient
1101 heimbach 1.4 $ \nabla _{u}{\cal J} $ for each control variable \\
1102     After the adjoint integration the corresponding adjoint
1103     variables are written to {\bf adxx\_ ...}.
1104 adcroft 1.1 %
1105 heimbach 1.4 \end{itemize}
1106 adcroft 1.1 %
1107 heimbach 1.4 \item
1108     \fbox{
1109     \begin{minipage}{12cm}
1110     {\it ctrl\_unpack, ctrl\_pack}:
1111     \end{minipage}
1112     }
1113     \\
1114     %
1115     \begin{itemize}
1116     %
1117     \item {\bf vector\_ctrl}: the control vector \\
1118 cnh 1.7 At the very beginning of the model initialization,
1119 heimbach 1.4 the updated compressed control vector is read (or initialised)
1120     and distributed to 2-dim. and 3-dim. control variable fields.
1121     %
1122     \item {\bf vector\_grad}: the gradient vector \\
1123     At the very end of the adjoint integration,
1124     the 2-dim. and 3-dim. adjoint variables are read,
1125     compressed to a single vector and written to file.
1126 adcroft 1.1 %
1127     \end{itemize}
1128     %
1129     \item
1130     \fbox{
1131     \begin{minipage}{12cm}
1132     {\it addummy\_in\_stepping}:
1133     \end{minipage}
1134     }
1135     \\
1136     In addition to writing the gradient at the end of the
1137 heimbach 1.4 forward/adjoint integration, many more adjoint variables
1138     of the model state
1139     at intermediate times can be written using S/R
1140 adcroft 1.1 {\it addummy\_in\_stepping}.
1141     This routine is part of the adjoint support package
1142     {\it pkg/autodiff} (cf.f. below).
1143     To be part of the adjoint code, the corresponding S/R
1144     {\it dummy\_in\_stepping} has to be called in the forward
1145     model (S/R {\it the\_main\_loop}) at the appropriate place.
1146    
1147     {\it dummy\_in\_stepping} is essentially empty,
1148     the corresponding adjoint routine is hand-written rather
1149     than generated automatically.
1150     Appropriate flow directives ({\it dummy\_in\_stepping.flow})
1151     ensure that TAMC does not automatically
1152     generate {\it addummy\_in\_stepping} by trying to differentiate
1153 heimbach 1.4 {\it dummy\_in\_stepping}, but instead refers to
1154     the hand-written routine.
1155 adcroft 1.1
1156     {\it dummy\_in\_stepping} is called in the forward code
1157     at the beginning of each
1158     timestep, before the call to {\it dynamics}, thus ensuring
1159     that {\it addummy\_in\_stepping} is called at the end of
1160     each timestep in the adjoint calculation, after the call to
1161     {\it addynamics}.
1162    
1163     {\it addummy\_in\_stepping} includes the header files
1164 heimbach 1.4 {\it adcommon.h}.
1165     This header file is also hand-written. It contains
1166     the common blocks
1167     {\bf /addynvars\_r/}, {\bf /addynvars\_cd/},
1168     {\bf /addynvars\_diffkr/}, {\bf /addynvars\_kapgm/},
1169 adcroft 1.1 {\bf /adtr1\_r/}, {\bf /adffields/},
1170     which have been extracted from the adjoint code to enable
1171     access to the adjoint variables.
1172     %
1173     \end{itemize}
1174    
1175    
1176     \subsubsection{Control variable handling for
1177     optimization applications}
1178    
1179     In optimization mode the cost function $ {\cal J}(u) $ is sought
1180     to be minimized with respect to a set of control variables
1181     $ \delta {\cal J} \, = \, 0 $, in an iterative manner.
1182     The gradient $ \nabla _{u}{\cal J} |_{u_{[k]}} $ together
1183     with the value of the cost function itself $ {\cal J}(u_{[k]}) $
1184     at iteration step $ k $ serve
1185     as input to a minimization routine (e.g. quasi-Newton method,
1186 heimbach 1.9 conjugate gradient, ... \cite{gil-lem:89})
1187 heimbach 1.4 to compute an update in the
1188 adcroft 1.1 control variable for iteration step $k+1$
1189     \[
1190     u_{[k+1]} \, = \, u_{[0]} \, + \, \Delta u_{[k+1]}
1191     \quad \mbox{satisfying} \quad
1192     {\cal J} \left( u_{[k+1]} \right) \, < \, {\cal J} \left( u_{[k]} \right)
1193     \]
1194     $ u_{[k+1]} $ then serves as input for a forward/adjoint run
1195     to determine $ {\cal J} $ and $ \nabla _{u}{\cal J} $ at iteration step
1196     $ k+1 $.
1197     Tab. \ref{???} sketches the flow between forward/adjoint model
1198     and the minimization routine.
1199    
1200     \begin{eqnarray*}
1201 heimbach 1.4 \scriptsize
1202 adcroft 1.1 \begin{array}{ccccc}
1203     u_{[0]} \,\, , \,\, \Delta u_{[k]} & ~ & ~ & ~ & ~ \\
1204     {\Big\downarrow}
1205     & ~ & ~ & ~ & ~ \\
1206     ~ & ~ & ~ & ~ & ~ \\
1207     \hline
1208     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1209     \multicolumn{1}{|c}{
1210     u_{[k]} = u_{[0]} + \Delta u_{[k]}} &
1211     \stackrel{\bf forward}{\bf \longrightarrow} &
1212     v_{[k]} = M \left( u_{[k]} \right) &
1213     \stackrel{\bf forward}{\bf \longrightarrow} &
1214     \multicolumn{1}{c|}{
1215     {\cal J}_{[k]} = {\cal J} \left( M \left( u_{[k]} \right) \right)} \\
1216     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1217     \hline
1218 heimbach 1.4 \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1219     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{{\Big\downarrow}} \\
1220     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1221 adcroft 1.1 \hline
1222     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1223     \multicolumn{1}{|c}{
1224     \nabla_u {\cal J}_{[k]} (\delta {\cal J}) =
1225 heimbach 1.4 T^{\ast} \cdot \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J})} &
1226 adcroft 1.1 \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1227     ad \, v_{[k]} (\delta {\cal J}) =
1228     \nabla_v {\cal J} |_{v_{[k]}} (\delta {\cal J}) &
1229     \stackrel{\bf adjoint}{\mathbf \longleftarrow} &
1230     \multicolumn{1}{c|}{ ad \, {\cal J} = \delta {\cal J}} \\
1231     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1232     \hline
1233     ~ & ~ & ~ & ~ & ~ \\
1234 heimbach 1.4 \hspace*{15ex}{\Bigg\downarrow}
1235     \quad {\cal J}_{[k]}, \quad \nabla_u {\cal J}_{[k]}
1236     & ~ & ~ & ~ & ~ \\
1237 adcroft 1.1 ~ & ~ & ~ & ~ & ~ \\
1238     \hline
1239     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1240     \multicolumn{1}{|c}{
1241     {\cal J}_{[k]} \,\, , \,\, \nabla_u {\cal J}_{[k]}} &
1242     {\mathbf \longrightarrow} & \text{\bf minimisation} &
1243     {\mathbf \longrightarrow} &
1244     \multicolumn{1}{c|}{ \Delta u_{[k+1]}} \\
1245     \multicolumn{1}{|c}{~} & ~ & ~ & ~ & \multicolumn{1}{c|}{~} \\
1246     \hline
1247     ~ & ~ & ~ & ~ & ~ \\
1248     ~ & ~ & ~ & ~ & \Big\downarrow \\
1249     ~ & ~ & ~ & ~ & \Delta u_{[k+1]} \\
1250     \end{array}
1251     \end{eqnarray*}
1252    
1253     The routines {\it ctrl\_unpack} and {\it ctrl\_pack} provide
1254     the link between the model and the minimization routine.
1255     As described in Section \ref{???}
1256     the {\it unpack} and {\it pack} routines read and write
1257     control and gradient {\it vectors} which are compressed
1258     to contain only wet points, in addition to the full
1259     2-dim. and 3-dim. fields.
1260     The corresponding I/O flow looks as follows:
1261    
1262     \vspace*{0.5cm}
1263    
1264 heimbach 1.4 {\scriptsize
1265 adcroft 1.1 \begin{tabular}{ccccc}
1266     {\bf vector\_ctrl\_$<$k$>$ } & ~ & ~ & ~ & ~ \\
1267     {\big\downarrow} & ~ & ~ & ~ & ~ \\
1268     \cline{1-1}
1269     \multicolumn{1}{|c|}{\it ctrl\_unpack} & ~ & ~ & ~ & ~ \\
1270     \cline{1-1}
1271     {\big\downarrow} & ~ & ~ & ~ & ~ \\
1272     \cline{3-3}
1273     \multicolumn{1}{l}{\bf xx\_theta0...$<$k$>$} & ~ &
1274     \multicolumn{1}{|c|}{~} & ~ & ~ \\
1275 heimbach 1.4 \multicolumn{1}{l}{\bf xx\_salt0...$<$k$>$} &
1276     $\stackrel{\mbox{read}}{\longrightarrow}$ &
1277 adcroft 1.1 \multicolumn{1}{|c|}{forward integration} & ~ & ~ \\
1278     \multicolumn{1}{l}{\bf \vdots} & ~ & \multicolumn{1}{|c|}{~}
1279     & ~ & ~ \\
1280     \cline{3-3}
1281 heimbach 1.4 ~ & ~ & $\downarrow$ & ~ & ~ \\
1282 adcroft 1.1 \cline{3-3}
1283     ~ & ~ &
1284     \multicolumn{1}{|c|}{~} & ~ &
1285     \multicolumn{1}{l}{\bf adxx\_theta0...$<$k$>$} \\
1286     ~ & ~ & \multicolumn{1}{|c|}{adjoint integration} &
1287 heimbach 1.4 $\stackrel{\mbox{write}}{\longrightarrow}$ &
1288 adcroft 1.1 \multicolumn{1}{l}{\bf adxx\_salt0...$<$k$>$} \\
1289     ~ & ~ & \multicolumn{1}{|c|}{~}
1290     & ~ & \multicolumn{1}{l}{\bf \vdots} \\
1291     \cline{3-3}
1292     ~ & ~ & ~ & ~ & {\big\downarrow} \\
1293     \cline{5-5}
1294     ~ & ~ & ~ & ~ & \multicolumn{1}{|c|}{\it ctrl\_pack} \\
1295     \cline{5-5}
1296     ~ & ~ & ~ & ~ & {\big\downarrow} \\
1297     ~ & ~ & ~ & ~ & {\bf vector\_grad\_$<$k$>$ } \\
1298     \end{tabular}
1299 heimbach 1.4 }
1300 adcroft 1.1
1301     \vspace*{0.5cm}
1302    
1303    
1304 heimbach 1.4 {\it ctrl\_unpack} reads the updated control vector
1305 adcroft 1.1 {\bf vector\_ctrl\_$<$k$>$}.
1306     It distributes the different control variables to
1307     2-dim. and 3-dim. files {\it xx\_...$<$k$>$}.
1308 heimbach 1.4 At the start of the forward integration the control variables
1309     are read from {\it xx\_...$<$k$>$} and added to the
1310     field.
1311     Correspondingly, at the end of the adjoint integration
1312     the adjoint fields are written
1313 adcroft 1.1 to {\it adxx\_...$<$k$>$}, again via the active file routines.
1314 heimbach 1.4 Finally, {\it ctrl\_pack} collects all adjoint files
1315 adcroft 1.1 and writes them to the compressed vector file
1316     {\bf vector\_grad\_$<$k$>$}.

  ViewVC Help
Powered by ViewVC 1.1.22