s_autodiff/text/diva_revised_recipe_3.tex


\section{Adjoint dump \& restart -- divided adjoint (DIVA)
\label{sec_ad_diva}}
\begin{rawhtml}
<!-- CMIREDIR:sec_ad_diva: -->
\end{rawhtml}

{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}

{\bf 
NOTE: \\
THIS SECTION IS SUBJECT TO CHANGE.
IT REFERS TO TAF-1.4.26.

Previous TAF versions are incomplete and have problems 
with both TAF options '-pure' and '-mpi'.

The code which is tuned to the DIVA implementation 
of this TAF version
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
}

\subsection{Introduction}

Most high performance computing (HPC) centres require the use
of batch jobs for code execution. 
Limits in maximum available CPU time and memory may prevent
the adjoint code execution from fitting into any of the available
queues. This presents a serious limit for large scale /
long time adjoint ocean and climate model integrations.
The MITgcm itself enables the split of the total model 
integration into sub-intervals through standard dump/restart 
of/from the full model state.
For a similar procedure to run in reverse mode,
the adjoint model requires, in addition to the model state, 
the adjoint model state,
i.e. all variables with derivative information
which are needed in an adjoint restart.
This adjoint dump \& restart is also termed 'divided adjoint (DIVA).

For this to work in conjunction with automatic differentiation, 
an AD tool needs to perform the following tasks:
%
\begin{enumerate}
%
\item
%
identify an adjoint state, i.e. those sensitivities whose
accumulation is interrupted by a dump/restart and which influence
the outcome of the gradient.
Ideally, this state consists of 
%
\begin{itemize}
%
\item
the adjoint of the model state,
%
\item
the adjoint of other intermediate results (such as control variables,
cost function contributions, etc.)
%
\item
bookkeeping indices (such as loop indices, etc.)
%
\end{itemize}
%
\item
%
generate code for storing and reading adjoint state variables
%
\item
generate code for bookkeeping , i.e. maintaining a file
with index information
%
\item
generate a suitable adjoint loop to propagate adjoint values
for dump/restart with a minimum overhad of adjoint intermediate
values.
%
\end{enumerate}

TAF (but not TAMC!)
generates adjoint code which performs the above specified
tasks. It is closely tied to the adjoint multi-level checkpointing.
The adjoint state is dumped (and restarted) at each step of the
outermost checkpointing level and adjoint intergration is performed
over one outermost checkpointing interval.
Prior to the adjoint computations, a full foward sweep is performed to 
generate the outermost (forward state) tapes and to calculate
the cost function.
In the current implementation, the forward sweep is
immediately followed by the first adjoint leg.
Thus, in theory, the following steps are performed (automatically)
%
\begin{itemize}
%
\item {\bf 1st model call:} \\
This is the case if file {\tt costfinal} does {\it not} exist.
S/R {\tt mdthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
calculate forward trajectory and dump model state after each
outermost checkpointing interval to files {\tt tapelev3}
%
\item
calculate cost function {\tt fc} and write it to file
{\tt costfinal}
%
\end{enumerate}
%
\item{\bf 2nd and all remaining model call:} \\
This is the case if file {\tt costfinal} {\it does} exist.
S/R {\tt adthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
(forward run and cost function call is avoided
since all values are known)
%
\begin{itemize}
%
\item
if 1st adjoint leg: \\
create index file {\tt divided.ctrl} which contains
info on current checkpointing index $ilev3$
%
\item
if not $i$-th adjoint leg: \\
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
%
\end{itemize}
%
\item
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
%
\item
dump adjoint state to file {\tt snapshot}
%
\item
dump index file {\tt divided.ctrl} for next adjoint leg
%
\item
in the last step the gradient is written.
%
\end{enumerate}
%
\end{itemize}

A few modififications were performed in the forward code,
obvious ones such as adding the corresponding TAF-directive
at the appropriate place, and less obvious ones
(avoid some re-initializations, when in an intermediate
adjoint integration interval).

[For TAF-1.4.20 a number of hand-modifications were necessary
to compensate for TAF bugs.
Since we refer to TAF-1.4.26 onwards,
these modifications are not documented here].

\subsection{Recipe 1: single processor}


\begin{enumerate}

\item
In {\tt ECCO\_CPPOPTIONS.h} set:
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #undef  ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}

\item
Generate adjoint code. 
Using the TAF option '{\tt -pure}', two codes are generated:
%
\begin{itemize}
%
\item {\tt mdthe\_main\_loop}: \\
Is responsible for the forward trajectory, storing of outermost
checkpoint levels to file, computation of cost function, and
storing of cost function to file (1st step).
%
\item {\tt adthe\_main\_loop}: \\
Is responsible for computing one adjoint leg, dump adjoint state
to file and write index info to file (2nd and consecutive steps).

    for adjoint code generation, e.g. add '{\tt -pure}' to
    TAF option list
{\footnotesize
\begin{verbatim}
    make adtaf
\end{verbatim}
}
%

\item
One modification needs to be made to adjoint codes in
S/R adecco\_the\_main\_loop:

There's a remaining issue with the '{\tt -pure}' option.
The '{\tt call ad...}' 
between '{\tt call ad...}' and the read of the {\tt snapshot} file
should be called only in the firt adjoint leg between
$nlev3$ and $nlev3-1$.
In the ecco-branch, the following lines should be 
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:

{\footnotesize
\begin{verbatim}

...
      xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
      xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
      xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
      call barrier( mythid )
cAdd(
      if (idivbeg .GE. nchklev_3) then
cAdd)

      call adcost_final( mythid )
      call barrier( mythid )
      call adcost_sst( mythid )
      call adcost_ssh( mythid )
      call adcost_hyd( mythid )
      call adcost_averagesfields( mytime,myiter,mythid )
      call barrier( mythid )
cAdd(
      endif
cAdd)

C----------------------------------------------
C read snapshot
C----------------------------------------------
      if (idivbeg .lt. nchklev_3) then
        open(unit=77,file='snapshot',status='old',form='unformatted',
     $iostat=iers)
...

\end{verbatim}
}

For the main code, in all likelihood the block which needs to
be bracketed consists of {\tt adcost\_final} only.

\item
Now the code can be copied as usual to {\tt adjoint\_model.F}
and then be compiled:
%
{\footnotesize
\begin{verbatim}
    make adchange
    then compile
\end{verbatim}
}

\end{itemize}

\end{enumerate}

\subsection{Recipe 2: multi processor (MPI)}


\begin{enumerate}

\item
On the machine where you execute the code
(most likely not the machine where you run TAF)
find the includes directory for MPI containing {\tt mpif.h}.
Either copy {\tt mpif.h} to the machine where you generate the
{\tt .f} files before TAF-ing, or add the path to the includes
directory to you genmake {\tt platform} setup,
TAF needs some MPI parameter settings 
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
to incorporate those in the adjoint code.

\item
In {\tt ECCO\_CPPOPTIONS.h} set
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #define ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}
%
This will include the header file {\tt mpif.h}
into the top level routine for TAF.

\item
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.

\item
Follow the same steps as in {\bf Recipe 1} (previous section).

\end{enumerate}

That's it. Good luck \& have fun.

1	heimbach	1.1
2			\section{Adjoint dump \& restart -- divided adjoint (DIVA)
3			\label{sec_ad_diva}}
4	edhill	1.2	\begin{rawhtml}
5			<!-- CMIREDIR:sec_ad_diva: -->
6			\end{rawhtml}
7	heimbach	1.1
8			{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}
9
10			{\bf
11			NOTE: \\
12			THIS SECTION IS SUBJECT TO CHANGE.
13			IT REFERS TO TAF-1.4.26.
14
15			Previous TAF versions are incomplete and have problems
16			with both TAF options '-pure' and '-mpi'.
17
18			The code which is tuned to the DIVA implementation
19			of this TAF version
20			is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
21			}
22
23			\subsection{Introduction}
24
25			Most high performance computing (HPC) centres require the use
26			of batch jobs for code execution.
27			Limits in maximum available CPU time and memory may prevent
28			the adjoint code execution from fitting into any of the available
29			queues. This presents a serious limit for large scale /
30			long time adjoint ocean and climate model integrations.
31			The MITgcm itself enables the split of the total model
32			integration into sub-intervals through standard dump/restart
33			of/from the full model state.
34			For a similar procedure to run in reverse mode,
35			the adjoint model requires, in addition to the model state,
36			the adjoint model state,
37			i.e. all variables with derivative information
38			which are needed in an adjoint restart.
39			This adjoint dump \& restart is also termed 'divided adjoint (DIVA).
40
41			For this to work in conjunction with automatic differentiation,
42			an AD tool needs to perform the following tasks:
43			%
44			\begin{enumerate}
45			%
46			\item
47			%
48			identify an adjoint state, i.e. those sensitivities whose
49			accumulation is interrupted by a dump/restart and which influence
50			the outcome of the gradient.
51			Ideally, this state consists of
52			%
53			\begin{itemize}
54			%
55			\item
56			the adjoint of the model state,
57			%
58			\item
59			the adjoint of other intermediate results (such as control variables,
60			cost function contributions, etc.)
61			%
62			\item
63			bookkeeping indices (such as loop indices, etc.)
64			%
65			\end{itemize}
66			%
67			\item
68			%
69			generate code for storing and reading adjoint state variables
70			%
71			\item
72			generate code for bookkeeping , i.e. maintaining a file
73			with index information
74			%
75			\item
76			generate a suitable adjoint loop to propagate adjoint values
77			for dump/restart with a minimum overhad of adjoint intermediate
78			values.
79			%
80			\end{enumerate}
81
82			TAF (but not TAMC!)
83			generates adjoint code which performs the above specified
84			tasks. It is closely tied to the adjoint multi-level checkpointing.
85			The adjoint state is dumped (and restarted) at each step of the
86			outermost checkpointing level and adjoint intergration is performed
87			over one outermost checkpointing interval.
88			Prior to the adjoint computations, a full foward sweep is performed to
89			generate the outermost (forward state) tapes and to calculate
90			the cost function.
91			In the current implementation, the forward sweep is
92			immediately followed by the first adjoint leg.
93			Thus, in theory, the following steps are performed (automatically)
94			%
95			\begin{itemize}
96			%
97			\item {\bf 1st model call:} \\
98			This is the case if file {\tt costfinal} does {\it not} exist.
99			S/R {\tt mdthe\_main\_loop} is called.
100			%
101			\begin{enumerate}
102			%
103			\item
104			calculate forward trajectory and dump model state after each
105			outermost checkpointing interval to files {\tt tapelev3}
106			%
107			\item
108			calculate cost function {\tt fc} and write it to file
109			{\tt costfinal}
110			%
111			\end{enumerate}
112			%
113			\item{\bf 2nd and all remaining model call:} \\
114			This is the case if file {\tt costfinal} {\it does} exist.
115			S/R {\tt adthe\_main\_loop} is called.
116			%
117			\begin{enumerate}
118			%
119			\item
120			(forward run and cost function call is avoided
121			since all values are known)
122			%
123			\begin{itemize}
124			%
125			\item
126			if 1st adjoint leg: \\
127			create index file {\tt divided.ctrl} which contains
128			info on current checkpointing index $ilev3$
129			%
130			\item
131			if not $i$-th adjoint leg: \\
132			adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
133			%
134			\end{itemize}
135			%
136			\item
137			perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
138			%
139			\item
140			dump adjoint state to file {\tt snapshot}
141			%
142			\item
143			dump index file {\tt divided.ctrl} for next adjoint leg
144			%
145			\item
146			in the last step the gradient is written.
147			%
148			\end{enumerate}
149			%
150			\end{itemize}
151
152			A few modififications were performed in the forward code,
153			obvious ones such as adding the corresponding TAF-directive
154			at the appropriate place, and less obvious ones
155			(avoid some re-initializations, when in an intermediate
156			adjoint integration interval).
157
158			[For TAF-1.4.20 a number of hand-modifications were necessary
159			to compensate for TAF bugs.
160			Since we refer to TAF-1.4.26 onwards,
161			these modifications are not documented here].
162
163			\subsection{Recipe 1: single processor}
164
165
166			\begin{enumerate}
167
168			\item
169			In {\tt ECCO\_CPPOPTIONS.h} set:
170			%
171			{\footnotesize
172			\begin{verbatim}
173			#define ALLOW_DIVIDED_ADJOINT
174			#undef ALLOW_DIVIDED_ADJOINT_MPI
175			\end{verbatim}
176			}
177
178			\item
179			Generate adjoint code.
180			Using the TAF option '{\tt -pure}', two codes are generated:
181			%
182			\begin{itemize}
183			%
184			\item {\tt mdthe\_main\_loop}: \\
185			Is responsible for the forward trajectory, storing of outermost
186			checkpoint levels to file, computation of cost function, and
187			storing of cost function to file (1st step).
188			%
189			\item {\tt adthe\_main\_loop}: \\
190			Is responsible for computing one adjoint leg, dump adjoint state
191			to file and write index info to file (2nd and consecutive steps).
192
193			for adjoint code generation, e.g. add '{\tt -pure}' to
194			TAF option list
195			{\footnotesize
196			\begin{verbatim}
197			make adtaf
198			\end{verbatim}
199			}
200			%
201
202			\item
203			One modification needs to be made to adjoint codes in
204			S/R adecco\_the\_main\_loop:
205
206			There's a remaining issue with the '{\tt -pure}' option.
207			The '{\tt call ad...}'
208			between '{\tt call ad...}' and the read of the {\tt snapshot} file
209			should be called only in the firt adjoint leg between
210			$nlev3$ and $nlev3-1$.
211			In the ecco-branch, the following lines should be
212			bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:
213
214			{\footnotesize
215			\begin{verbatim}
216
217			...
218			xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
219			xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
220			xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
221			call barrier( mythid )
222			cAdd(
223			if (idivbeg .GE. nchklev_3) then
224			cAdd)
225
226			call adcost_final( mythid )
227			call barrier( mythid )
228			call adcost_sst( mythid )
229			call adcost_ssh( mythid )
230			call adcost_hyd( mythid )
231			call adcost_averagesfields( mytime,myiter,mythid )
232			call barrier( mythid )
233			cAdd(
234			endif
235			cAdd)
236
237			C----------------------------------------------
238			C read snapshot
239			C----------------------------------------------
240			if (idivbeg .lt. nchklev_3) then
241			open(unit=77,file='snapshot',status='old',form='unformatted',
242			$iostat=iers)
243			...
244
245			\end{verbatim}
246			}
247
248			For the main code, in all likelihood the block which needs to
249			be bracketed consists of {\tt adcost\_final} only.
250
251			\item
252			Now the code can be copied as usual to {\tt adjoint\_model.F}
253			and then be compiled:
254			%
255			{\footnotesize
256			\begin{verbatim}
257			make adchange
258			then compile
259			\end{verbatim}
260			}
261
262			\end{itemize}
263
264			\end{enumerate}
265
266			\subsection{Recipe 2: multi processor (MPI)}
267
268
269			\begin{enumerate}
270
271			\item
272			On the machine where you execute the code
273			(most likely not the machine where you run TAF)
274			find the includes directory for MPI containing {\tt mpif.h}.
275			Either copy {\tt mpif.h} to the machine where you generate the
276			{\tt .f} files before TAF-ing, or add the path to the includes
277			directory to you genmake {\tt platform} setup,
278			TAF needs some MPI parameter settings
279			(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
280			to incorporate those in the adjoint code.
281
282			\item
283			In {\tt ECCO\_CPPOPTIONS.h} set
284			%
285			{\footnotesize
286			\begin{verbatim}
287			#define ALLOW_DIVIDED_ADJOINT
288			#define ALLOW_DIVIDED_ADJOINT_MPI
289			\end{verbatim}
290			}
291			%
292			This will include the header file {\tt mpif.h}
293			into the top level routine for TAF.
294
295			\item
296			Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.
297
298			\item
299			Follow the same steps as in {\bf Recipe 1} (previous section).
300
301			\end{enumerate}
302
303			That's it. Good luck \& have fun.
304