s_autodiff/text/diva_revised_recipe_3.tex


\section{Adjoint dump \& restart -- divided adjoint (DIVA)
\label{sec_ad_diva}}
\begin{rawhtml}
<!-- CMIREDIR:sec_ad_diva: -->
\end{rawhtml}

{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}

{\bf 
NOTE: \\
THIS SECTION IS SUBJECT TO CHANGE.
IT REFERS TO TAF-1.4.26.

Previous TAF versions are incomplete and have problems 
with both TAF options '-pure' and '-mpi'.

The code which is tuned to the DIVA implementation 
of this TAF version
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
}

\subsection{Introduction}

Most high performance computing (HPC) centres require the use
of batch jobs for code execution. 
Limits in maximum available CPU time and memory may prevent
the adjoint code execution from fitting into any of the available
queues. This presents a serious limit for large scale /
long time adjoint ocean and climate model integrations.
The MITgcm itself enables the split of the total model 
integration into sub-intervals through standard dump/restart 
of/from the full model state.
For a similar procedure to run in reverse mode,
the adjoint model requires, in addition to the model state, 
the adjoint model state,
i.e. all variables with derivative information
which are needed in an adjoint restart.
This adjoint dump \& restart is also termed 'divided adjoint (DIVA).

For this to work in conjunction with automatic differentiation, 
an AD tool needs to perform the following tasks:
%
\begin{enumerate}
%
\item
%
identify an adjoint state, i.e. those sensitivities whose
accumulation is interrupted by a dump/restart and which influence
the outcome of the gradient.
Ideally, this state consists of 
%
\begin{itemize}
%
\item
the adjoint of the model state,
%
\item
the adjoint of other intermediate results (such as control variables,
cost function contributions, etc.)
%
\item
bookkeeping indices (such as loop indices, etc.)
%
\end{itemize}
%
\item
%
generate code for storing and reading adjoint state variables
%
\item
generate code for bookkeeping , i.e. maintaining a file
with index information
%
\item
generate a suitable adjoint loop to propagate adjoint values
for dump/restart with a minimum overhad of adjoint intermediate
values.
%
\end{enumerate}

TAF (but not TAMC!)
generates adjoint code which performs the above specified
tasks. It is closely tied to the adjoint multi-level checkpointing.
The adjoint state is dumped (and restarted) at each step of the
outermost checkpointing level and adjoint intergration is performed
over one outermost checkpointing interval.
Prior to the adjoint computations, a full foward sweep is performed to 
generate the outermost (forward state) tapes and to calculate
the cost function.
In the current implementation, the forward sweep is
immediately followed by the first adjoint leg.
Thus, in theory, the following steps are performed (automatically)
%
\begin{itemize}
%
\item {\bf 1st model call:} \\
This is the case if file {\tt costfinal} does {\it not} exist.
S/R {\tt mdthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
calculate forward trajectory and dump model state after each
outermost checkpointing interval to files {\tt tapelev3}
%
\item
calculate cost function {\tt fc} and write it to file
{\tt costfinal}
%
\end{enumerate}
%
\item{\bf 2nd and all remaining model call:} \\
This is the case if file {\tt costfinal} {\it does} exist.
S/R {\tt adthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
(forward run and cost function call is avoided
since all values are known)
%
\begin{itemize}
%
\item
if 1st adjoint leg: \\
create index file {\tt divided.ctrl} which contains
info on current checkpointing index $ilev3$
%
\item
if not $i$-th adjoint leg: \\
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
%
\end{itemize}
%
\item
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
%
\item
dump adjoint state to file {\tt snapshot}
%
\item
dump index file {\tt divided.ctrl} for next adjoint leg
%
\item
in the last step the gradient is written.
%
\end{enumerate}
%
\end{itemize}

A few modififications were performed in the forward code,
obvious ones such as adding the corresponding TAF-directive
at the appropriate place, and less obvious ones
(avoid some re-initializations, when in an intermediate
adjoint integration interval).

[For TAF-1.4.20 a number of hand-modifications were necessary
to compensate for TAF bugs.
Since we refer to TAF-1.4.26 onwards,
these modifications are not documented here].

\subsection{Recipe 1: single processor}


\begin{enumerate}

\item
In {\tt ECCO\_CPPOPTIONS.h} set:
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #undef  ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}

\item
Generate adjoint code. 
Using the TAF option '{\tt -pure}', two codes are generated:
%
\begin{itemize}
%
\item {\tt mdthe\_main\_loop}: \\
Is responsible for the forward trajectory, storing of outermost
checkpoint levels to file, computation of cost function, and
storing of cost function to file (1st step).
%
\item {\tt adthe\_main\_loop}: \\
Is responsible for computing one adjoint leg, dump adjoint state
to file and write index info to file (2nd and consecutive steps).

    for adjoint code generation, e.g. add '{\tt -pure}' to
    TAF option list
{\footnotesize
\begin{verbatim}
    make adtaf
\end{verbatim}
}
%

\item
One modification needs to be made to adjoint codes in
S/R adecco\_the\_main\_loop:

There's a remaining issue with the '{\tt -pure}' option.
The '{\tt call ad...}' 
between '{\tt call ad...}' and the read of the {\tt snapshot} file
should be called only in the firt adjoint leg between
$nlev3$ and $nlev3-1$.
In the ecco-branch, the following lines should be 
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:

{\footnotesize
\begin{verbatim}

...
      xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
      xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
      xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
      call barrier( mythid )
cAdd(
      if (idivbeg .GE. nchklev_3) then
cAdd)

      call adcost_final( mythid )
      call barrier( mythid )
      call adcost_sst( mythid )
      call adcost_ssh( mythid )
      call adcost_hyd( mythid )
      call adcost_averagesfields( mytime,myiter,mythid )
      call barrier( mythid )
cAdd(
      endif
cAdd)

C----------------------------------------------
C read snapshot
C----------------------------------------------
      if (idivbeg .lt. nchklev_3) then
        open(unit=77,file='snapshot',status='old',form='unformatted',
     $iostat=iers)
...

\end{verbatim}
}

For the main code, in all likelihood the block which needs to
be bracketed consists of {\tt adcost\_final} only.

\item
Now the code can be copied as usual to {\tt adjoint\_model.F}
and then be compiled:
%
{\footnotesize
\begin{verbatim}
    make adchange
    then compile
\end{verbatim}
}

\end{itemize}

\end{enumerate}

\subsection{Recipe 2: multi processor (MPI)}


\begin{enumerate}

\item
On the machine where you execute the code
(most likely not the machine where you run TAF)
find the includes directory for MPI containing {\tt mpif.h}.
Either copy {\tt mpif.h} to the machine where you generate the
{\tt .f} files before TAF-ing, or add the path to the includes
directory to you genmake {\tt platform} setup,
TAF needs some MPI parameter settings 
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
to incorporate those in the adjoint code.

\item
In {\tt ECCO\_CPPOPTIONS.h} set
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #define ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}
%
This will include the header file {\tt mpif.h}
into the top level routine for TAF.

\item
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.

\item
Follow the same steps as in {\bf Recipe 1} (previous section).

\end{enumerate}

That's it. Good luck \& have fun.

1
2	\section{Adjoint dump \& restart -- divided adjoint (DIVA)
3	\label{sec_ad_diva}}
4	\begin{rawhtml}
5	<!-- CMIREDIR:sec_ad_diva: -->
6	\end{rawhtml}
7
8	{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}
9
10	{\bf
11	NOTE: \\
12	THIS SECTION IS SUBJECT TO CHANGE.
13	IT REFERS TO TAF-1.4.26.
14
15	Previous TAF versions are incomplete and have problems
16	with both TAF options '-pure' and '-mpi'.
17
18	The code which is tuned to the DIVA implementation
19	of this TAF version
20	is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
21	}
22
23	\subsection{Introduction}
24
25	Most high performance computing (HPC) centres require the use
26	of batch jobs for code execution.
27	Limits in maximum available CPU time and memory may prevent
28	the adjoint code execution from fitting into any of the available
29	queues. This presents a serious limit for large scale /
30	long time adjoint ocean and climate model integrations.
31	The MITgcm itself enables the split of the total model
32	integration into sub-intervals through standard dump/restart
33	of/from the full model state.
34	For a similar procedure to run in reverse mode,
35	the adjoint model requires, in addition to the model state,
36	the adjoint model state,
37	i.e. all variables with derivative information
38	which are needed in an adjoint restart.
39	This adjoint dump \& restart is also termed 'divided adjoint (DIVA).
40
41	For this to work in conjunction with automatic differentiation,
42	an AD tool needs to perform the following tasks:
43	%
44	\begin{enumerate}
45	%
46	\item
47	%
48	identify an adjoint state, i.e. those sensitivities whose
49	accumulation is interrupted by a dump/restart and which influence
50	the outcome of the gradient.
51	Ideally, this state consists of
52	%
53	\begin{itemize}
54	%
55	\item
56	the adjoint of the model state,
57	%
58	\item
59	the adjoint of other intermediate results (such as control variables,
60	cost function contributions, etc.)
61	%
62	\item
63	bookkeeping indices (such as loop indices, etc.)
64	%
65	\end{itemize}
66	%
67	\item
68	%
69	generate code for storing and reading adjoint state variables
70	%
71	\item
72	generate code for bookkeeping , i.e. maintaining a file
73	with index information
74	%
75	\item
76	generate a suitable adjoint loop to propagate adjoint values
77	for dump/restart with a minimum overhad of adjoint intermediate
78	values.
79	%
80	\end{enumerate}
81
82	TAF (but not TAMC!)
83	generates adjoint code which performs the above specified
84	tasks. It is closely tied to the adjoint multi-level checkpointing.
85	The adjoint state is dumped (and restarted) at each step of the
86	outermost checkpointing level and adjoint intergration is performed
87	over one outermost checkpointing interval.
88	Prior to the adjoint computations, a full foward sweep is performed to
89	generate the outermost (forward state) tapes and to calculate
90	the cost function.
91	In the current implementation, the forward sweep is
92	immediately followed by the first adjoint leg.
93	Thus, in theory, the following steps are performed (automatically)
94	%
95	\begin{itemize}
96	%
97	\item {\bf 1st model call:} \\
98	This is the case if file {\tt costfinal} does {\it not} exist.
99	S/R {\tt mdthe\_main\_loop} is called.
100	%
101	\begin{enumerate}
102	%
103	\item
104	calculate forward trajectory and dump model state after each
105	outermost checkpointing interval to files {\tt tapelev3}
106	%
107	\item
108	calculate cost function {\tt fc} and write it to file
109	{\tt costfinal}
110	%
111	\end{enumerate}
112	%
113	\item{\bf 2nd and all remaining model call:} \\
114	This is the case if file {\tt costfinal} {\it does} exist.
115	S/R {\tt adthe\_main\_loop} is called.
116	%
117	\begin{enumerate}
118	%
119	\item
120	(forward run and cost function call is avoided
121	since all values are known)
122	%
123	\begin{itemize}
124	%
125	\item
126	if 1st adjoint leg: \\
127	create index file {\tt divided.ctrl} which contains
128	info on current checkpointing index $ilev3$
129	%
130	\item
131	if not $i$-th adjoint leg: \\
132	adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
133	%
134	\end{itemize}
135	%
136	\item
137	perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
138	%
139	\item
140	dump adjoint state to file {\tt snapshot}
141	%
142	\item
143	dump index file {\tt divided.ctrl} for next adjoint leg
144	%
145	\item
146	in the last step the gradient is written.
147	%
148	\end{enumerate}
149	%
150	\end{itemize}
151
152	A few modififications were performed in the forward code,
153	obvious ones such as adding the corresponding TAF-directive
154	at the appropriate place, and less obvious ones
155	(avoid some re-initializations, when in an intermediate
156	adjoint integration interval).
157
158	[For TAF-1.4.20 a number of hand-modifications were necessary
159	to compensate for TAF bugs.
160	Since we refer to TAF-1.4.26 onwards,
161	these modifications are not documented here].
162
163	\subsection{Recipe 1: single processor}
164
165
166	\begin{enumerate}
167
168	\item
169	In {\tt ECCO\_CPPOPTIONS.h} set:
170	%
171	{\footnotesize
172	\begin{verbatim}
173	#define ALLOW_DIVIDED_ADJOINT
174	#undef ALLOW_DIVIDED_ADJOINT_MPI
175	\end{verbatim}
176	}
177
178	\item
179	Generate adjoint code.
180	Using the TAF option '{\tt -pure}', two codes are generated:
181	%
182	\begin{itemize}
183	%
184	\item {\tt mdthe\_main\_loop}: \\
185	Is responsible for the forward trajectory, storing of outermost
186	checkpoint levels to file, computation of cost function, and
187	storing of cost function to file (1st step).
188	%
189	\item {\tt adthe\_main\_loop}: \\
190	Is responsible for computing one adjoint leg, dump adjoint state
191	to file and write index info to file (2nd and consecutive steps).
192
193	for adjoint code generation, e.g. add '{\tt -pure}' to
194	TAF option list
195	{\footnotesize
196	\begin{verbatim}
197	make adtaf
198	\end{verbatim}
199	}
200	%
201
202	\item
203	One modification needs to be made to adjoint codes in
204	S/R adecco\_the\_main\_loop:
205
206	There's a remaining issue with the '{\tt -pure}' option.
207	The '{\tt call ad...}'
208	between '{\tt call ad...}' and the read of the {\tt snapshot} file
209	should be called only in the firt adjoint leg between
210	$nlev3$ and $nlev3-1$.
211	In the ecco-branch, the following lines should be
212	bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:
213
214	{\footnotesize
215	\begin{verbatim}
216
217	...
218	xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
219	xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
220	xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
221	call barrier( mythid )
222	cAdd(
223	if (idivbeg .GE. nchklev_3) then
224	cAdd)
225
226	call adcost_final( mythid )
227	call barrier( mythid )
228	call adcost_sst( mythid )
229	call adcost_ssh( mythid )
230	call adcost_hyd( mythid )
231	call adcost_averagesfields( mytime,myiter,mythid )
232	call barrier( mythid )
233	cAdd(
234	endif
235	cAdd)
236
237	C----------------------------------------------
238	C read snapshot
239	C----------------------------------------------
240	if (idivbeg .lt. nchklev_3) then
241	open(unit=77,file='snapshot',status='old',form='unformatted',
242	$iostat=iers)
243	...
244
245	\end{verbatim}
246	}
247
248	For the main code, in all likelihood the block which needs to
249	be bracketed consists of {\tt adcost\_final} only.
250
251	\item
252	Now the code can be copied as usual to {\tt adjoint\_model.F}
253	and then be compiled:
254	%
255	{\footnotesize
256	\begin{verbatim}
257	make adchange
258	then compile
259	\end{verbatim}
260	}
261
262	\end{itemize}
263
264	\end{enumerate}
265
266	\subsection{Recipe 2: multi processor (MPI)}
267
268
269	\begin{enumerate}
270
271	\item
272	On the machine where you execute the code
273	(most likely not the machine where you run TAF)
274	find the includes directory for MPI containing {\tt mpif.h}.
275	Either copy {\tt mpif.h} to the machine where you generate the
276	{\tt .f} files before TAF-ing, or add the path to the includes
277	directory to you genmake {\tt platform} setup,
278	TAF needs some MPI parameter settings
279	(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
280	to incorporate those in the adjoint code.
281
282	\item
283	In {\tt ECCO\_CPPOPTIONS.h} set
284	%
285	{\footnotesize
286	\begin{verbatim}
287	#define ALLOW_DIVIDED_ADJOINT
288	#define ALLOW_DIVIDED_ADJOINT_MPI
289	\end{verbatim}
290	}
291	%
292	This will include the header file {\tt mpif.h}
293	into the top level routine for TAF.
294
295	\item
296	Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.
297
298	\item
299	Follow the same steps as in {\bf Recipe 1} (previous section).
300
301	\end{enumerate}
302
303	That's it. Good luck \& have fun.
304