s_autodiff/text/diva_revised_recipe_3.tex


\section{Adjoint dump \& restart -- divided adjoint (DIVA)
\label{sec_ad_diva}}

{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}

{\bf 
NOTE: \\
THIS SECTION IS SUBJECT TO CHANGE.
IT REFERS TO TAF-1.4.26.

Previous TAF versions are incomplete and have problems 
with both TAF options '-pure' and '-mpi'.

The code which is tuned to the DIVA implementation 
of this TAF version
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
}

\subsection{Introduction}

Most high performance computing (HPC) centres require the use
of batch jobs for code execution. 
Limits in maximum available CPU time and memory may prevent
the adjoint code execution from fitting into any of the available
queues. This presents a serious limit for large scale /
long time adjoint ocean and climate model integrations.
The MITgcm itself enables the split of the total model 
integration into sub-intervals through standard dump/restart 
of/from the full model state.
For a similar procedure to run in reverse mode,
the adjoint model requires, in addition to the model state, 
the adjoint model state,
i.e. all variables with derivative information
which are needed in an adjoint restart.
This adjoint dump \& restart is also termed 'divided adjoint (DIVA).

For this to work in conjunction with automatic differentiation, 
an AD tool needs to perform the following tasks:
%
\begin{enumerate}
%
\item
%
identify an adjoint state, i.e. those sensitivities whose
accumulation is interrupted by a dump/restart and which influence
the outcome of the gradient.
Ideally, this state consists of 
%
\begin{itemize}
%
\item
the adjoint of the model state,
%
\item
the adjoint of other intermediate results (such as control variables,
cost function contributions, etc.)
%
\item
bookkeeping indices (such as loop indices, etc.)
%
\end{itemize}
%
\item
%
generate code for storing and reading adjoint state variables
%
\item
generate code for bookkeeping , i.e. maintaining a file
with index information
%
\item
generate a suitable adjoint loop to propagate adjoint values
for dump/restart with a minimum overhad of adjoint intermediate
values.
%
\end{enumerate}

TAF (but not TAMC!)
generates adjoint code which performs the above specified
tasks. It is closely tied to the adjoint multi-level checkpointing.
The adjoint state is dumped (and restarted) at each step of the
outermost checkpointing level and adjoint intergration is performed
over one outermost checkpointing interval.
Prior to the adjoint computations, a full foward sweep is performed to 
generate the outermost (forward state) tapes and to calculate
the cost function.
In the current implementation, the forward sweep is
immediately followed by the first adjoint leg.
Thus, in theory, the following steps are performed (automatically)
%
\begin{itemize}
%
\item {\bf 1st model call:} \\
This is the case if file {\tt costfinal} does {\it not} exist.
S/R {\tt mdthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
calculate forward trajectory and dump model state after each
outermost checkpointing interval to files {\tt tapelev3}
%
\item
calculate cost function {\tt fc} and write it to file
{\tt costfinal}
%
\end{enumerate}
%
\item{\bf 2nd and all remaining model call:} \\
This is the case if file {\tt costfinal} {\it does} exist.
S/R {\tt adthe\_main\_loop} is called.
%
\begin{enumerate}
%
\item
(forward run and cost function call is avoided
since all values are known)
%
\begin{itemize}
%
\item
if 1st adjoint leg: \\
create index file {\tt divided.ctrl} which contains
info on current checkpointing index $ilev3$
%
\item
if not $i$-th adjoint leg: \\
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
%
\end{itemize}
%
\item
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
%
\item
dump adjoint state to file {\tt snapshot}
%
\item
dump index file {\tt divided.ctrl} for next adjoint leg
%
\item
in the last step the gradient is written.
%
\end{enumerate}
%
\end{itemize}

A few modififications were performed in the forward code,
obvious ones such as adding the corresponding TAF-directive
at the appropriate place, and less obvious ones
(avoid some re-initializations, when in an intermediate
adjoint integration interval).

[For TAF-1.4.20 a number of hand-modifications were necessary
to compensate for TAF bugs.
Since we refer to TAF-1.4.26 onwards,
these modifications are not documented here].

\subsection{Recipe 1: single processor}


\begin{enumerate}

\item
In {\tt ECCO\_CPPOPTIONS.h} set:
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #undef  ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}

\item
Generate adjoint code. 
Using the TAF option '{\tt -pure}', two codes are generated:
%
\begin{itemize}
%
\item {\tt mdthe\_main\_loop}: \\
Is responsible for the forward trajectory, storing of outermost
checkpoint levels to file, computation of cost function, and
storing of cost function to file (1st step).
%
\item {\tt adthe\_main\_loop}: \\
Is responsible for computing one adjoint leg, dump adjoint state
to file and write index info to file (2nd and consecutive steps).

    for adjoint code generation, e.g. add '{\tt -pure}' to
    TAF option list
{\footnotesize
\begin{verbatim}
    make adtaf
\end{verbatim}
}
%

\item
One modification needs to be made to adjoint codes in
S/R adecco\_the\_main\_loop:

There's a remaining issue with the '{\tt -pure}' option.
The '{\tt call ad...}' 
between '{\tt call ad...}' and the read of the {\tt snapshot} file
should be called only in the firt adjoint leg between
$nlev3$ and $nlev3-1$.
In the ecco-branch, the following lines should be 
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:

{\footnotesize
\begin{verbatim}

...
      xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
      xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
      xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
      call barrier( mythid )
cAdd(
      if (idivbeg .GE. nchklev_3) then
cAdd)

      call adcost_final( mythid )
      call barrier( mythid )
      call adcost_sst( mythid )
      call adcost_ssh( mythid )
      call adcost_hyd( mythid )
      call adcost_averagesfields( mytime,myiter,mythid )
      call barrier( mythid )
cAdd(
      endif
cAdd)

C----------------------------------------------
C read snapshot
C----------------------------------------------
      if (idivbeg .lt. nchklev_3) then
        open(unit=77,file='snapshot',status='old',form='unformatted',
     $iostat=iers)
...

\end{verbatim}
}

For the main code, in all likelihood the block which needs to
be bracketed consists of {\tt adcost\_final} only.

\item
Now the code can be copied as usual to {\tt adjoint\_model.F}
and then be compiled:
%
{\footnotesize
\begin{verbatim}
    make adchange
    then compile
\end{verbatim}
}

\end{itemize}

\end{enumerate}

\subsection{Recipe 2: multi processor (MPI)}


\begin{enumerate}

\item
On the machine where you execute the code
(most likely not the machine where you run TAF)
find the includes directory for MPI containing {\tt mpif.h}.
Either copy {\tt mpif.h} to the machine where you generate the
{\tt .f} files before TAF-ing, or add the path to the includes
directory to you genmake {\tt platform} setup,
TAF needs some MPI parameter settings 
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
to incorporate those in the adjoint code.

\item
In {\tt ECCO\_CPPOPTIONS.h} set
%
{\footnotesize
\begin{verbatim}
      #define ALLOW_DIVIDED_ADJOINT
      #define ALLOW_DIVIDED_ADJOINT_MPI
\end{verbatim}
}
%
This will include the header file {\tt mpif.h}
into the top level routine for TAF.

\item
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.

\item
Follow the same steps as in {\bf Recipe 1} (previous section).

\end{enumerate}

That's it. Good luck \& have fun.

1
2	\section{Adjoint dump \& restart -- divided adjoint (DIVA)
3	\label{sec_ad_diva}}
4
5	{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003}
6
7	{\bf
8	NOTE: \\
9	THIS SECTION IS SUBJECT TO CHANGE.
10	IT REFERS TO TAF-1.4.26.
11
12	Previous TAF versions are incomplete and have problems
13	with both TAF options '-pure' and '-mpi'.
14
15	The code which is tuned to the DIVA implementation
16	of this TAF version
17	is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO).
18	}
19
20	\subsection{Introduction}
21
22	Most high performance computing (HPC) centres require the use
23	of batch jobs for code execution.
24	Limits in maximum available CPU time and memory may prevent
25	the adjoint code execution from fitting into any of the available
26	queues. This presents a serious limit for large scale /
27	long time adjoint ocean and climate model integrations.
28	The MITgcm itself enables the split of the total model
29	integration into sub-intervals through standard dump/restart
30	of/from the full model state.
31	For a similar procedure to run in reverse mode,
32	the adjoint model requires, in addition to the model state,
33	the adjoint model state,
34	i.e. all variables with derivative information
35	which are needed in an adjoint restart.
36	This adjoint dump \& restart is also termed 'divided adjoint (DIVA).
37
38	For this to work in conjunction with automatic differentiation,
39	an AD tool needs to perform the following tasks:
40	%
41	\begin{enumerate}
42	%
43	\item
44	%
45	identify an adjoint state, i.e. those sensitivities whose
46	accumulation is interrupted by a dump/restart and which influence
47	the outcome of the gradient.
48	Ideally, this state consists of
49	%
50	\begin{itemize}
51	%
52	\item
53	the adjoint of the model state,
54	%
55	\item
56	the adjoint of other intermediate results (such as control variables,
57	cost function contributions, etc.)
58	%
59	\item
60	bookkeeping indices (such as loop indices, etc.)
61	%
62	\end{itemize}
63	%
64	\item
65	%
66	generate code for storing and reading adjoint state variables
67	%
68	\item
69	generate code for bookkeeping , i.e. maintaining a file
70	with index information
71	%
72	\item
73	generate a suitable adjoint loop to propagate adjoint values
74	for dump/restart with a minimum overhad of adjoint intermediate
75	values.
76	%
77	\end{enumerate}
78
79	TAF (but not TAMC!)
80	generates adjoint code which performs the above specified
81	tasks. It is closely tied to the adjoint multi-level checkpointing.
82	The adjoint state is dumped (and restarted) at each step of the
83	outermost checkpointing level and adjoint intergration is performed
84	over one outermost checkpointing interval.
85	Prior to the adjoint computations, a full foward sweep is performed to
86	generate the outermost (forward state) tapes and to calculate
87	the cost function.
88	In the current implementation, the forward sweep is
89	immediately followed by the first adjoint leg.
90	Thus, in theory, the following steps are performed (automatically)
91	%
92	\begin{itemize}
93	%
94	\item {\bf 1st model call:} \\
95	This is the case if file {\tt costfinal} does {\it not} exist.
96	S/R {\tt mdthe\_main\_loop} is called.
97	%
98	\begin{enumerate}
99	%
100	\item
101	calculate forward trajectory and dump model state after each
102	outermost checkpointing interval to files {\tt tapelev3}
103	%
104	\item
105	calculate cost function {\tt fc} and write it to file
106	{\tt costfinal}
107	%
108	\end{enumerate}
109	%
110	\item{\bf 2nd and all remaining model call:} \\
111	This is the case if file {\tt costfinal} {\it does} exist.
112	S/R {\tt adthe\_main\_loop} is called.
113	%
114	\begin{enumerate}
115	%
116	\item
117	(forward run and cost function call is avoided
118	since all values are known)
119	%
120	\begin{itemize}
121	%
122	\item
123	if 1st adjoint leg: \\
124	create index file {\tt divided.ctrl} which contains
125	info on current checkpointing index $ilev3$
126	%
127	\item
128	if not $i$-th adjoint leg: \\
129	adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$
130	%
131	\end{itemize}
132	%
133	\item
134	perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$
135	%
136	\item
137	dump adjoint state to file {\tt snapshot}
138	%
139	\item
140	dump index file {\tt divided.ctrl} for next adjoint leg
141	%
142	\item
143	in the last step the gradient is written.
144	%
145	\end{enumerate}
146	%
147	\end{itemize}
148
149	A few modififications were performed in the forward code,
150	obvious ones such as adding the corresponding TAF-directive
151	at the appropriate place, and less obvious ones
152	(avoid some re-initializations, when in an intermediate
153	adjoint integration interval).
154
155	[For TAF-1.4.20 a number of hand-modifications were necessary
156	to compensate for TAF bugs.
157	Since we refer to TAF-1.4.26 onwards,
158	these modifications are not documented here].
159
160	\subsection{Recipe 1: single processor}
161
162
163	\begin{enumerate}
164
165	\item
166	In {\tt ECCO\_CPPOPTIONS.h} set:
167	%
168	{\footnotesize
169	\begin{verbatim}
170	#define ALLOW_DIVIDED_ADJOINT
171	#undef ALLOW_DIVIDED_ADJOINT_MPI
172	\end{verbatim}
173	}
174
175	\item
176	Generate adjoint code.
177	Using the TAF option '{\tt -pure}', two codes are generated:
178	%
179	\begin{itemize}
180	%
181	\item {\tt mdthe\_main\_loop}: \\
182	Is responsible for the forward trajectory, storing of outermost
183	checkpoint levels to file, computation of cost function, and
184	storing of cost function to file (1st step).
185	%
186	\item {\tt adthe\_main\_loop}: \\
187	Is responsible for computing one adjoint leg, dump adjoint state
188	to file and write index info to file (2nd and consecutive steps).
189
190	for adjoint code generation, e.g. add '{\tt -pure}' to
191	TAF option list
192	{\footnotesize
193	\begin{verbatim}
194	make adtaf
195	\end{verbatim}
196	}
197	%
198
199	\item
200	One modification needs to be made to adjoint codes in
201	S/R adecco\_the\_main\_loop:
202
203	There's a remaining issue with the '{\tt -pure}' option.
204	The '{\tt call ad...}'
205	between '{\tt call ad...}' and the read of the {\tt snapshot} file
206	should be called only in the firt adjoint leg between
207	$nlev3$ and $nlev3-1$.
208	In the ecco-branch, the following lines should be
209	bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus:
210
211	{\footnotesize
212	\begin{verbatim}
213
214	...
215	xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1)
216	xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1)
217	xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1)
218	call barrier( mythid )
219	cAdd(
220	if (idivbeg .GE. nchklev_3) then
221	cAdd)
222
223	call adcost_final( mythid )
224	call barrier( mythid )
225	call adcost_sst( mythid )
226	call adcost_ssh( mythid )
227	call adcost_hyd( mythid )
228	call adcost_averagesfields( mytime,myiter,mythid )
229	call barrier( mythid )
230	cAdd(
231	endif
232	cAdd)
233
234	C----------------------------------------------
235	C read snapshot
236	C----------------------------------------------
237	if (idivbeg .lt. nchklev_3) then
238	open(unit=77,file='snapshot',status='old',form='unformatted',
239	$iostat=iers)
240	...
241
242	\end{verbatim}
243	}
244
245	For the main code, in all likelihood the block which needs to
246	be bracketed consists of {\tt adcost\_final} only.
247
248	\item
249	Now the code can be copied as usual to {\tt adjoint\_model.F}
250	and then be compiled:
251	%
252	{\footnotesize
253	\begin{verbatim}
254	make adchange
255	then compile
256	\end{verbatim}
257	}
258
259	\end{itemize}
260
261	\end{enumerate}
262
263	\subsection{Recipe 2: multi processor (MPI)}
264
265
266	\begin{enumerate}
267
268	\item
269	On the machine where you execute the code
270	(most likely not the machine where you run TAF)
271	find the includes directory for MPI containing {\tt mpif.h}.
272	Either copy {\tt mpif.h} to the machine where you generate the
273	{\tt .f} files before TAF-ing, or add the path to the includes
274	directory to you genmake {\tt platform} setup,
275	TAF needs some MPI parameter settings
276	(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer})
277	to incorporate those in the adjoint code.
278
279	\item
280	In {\tt ECCO\_CPPOPTIONS.h} set
281	%
282	{\footnotesize
283	\begin{verbatim}
284	#define ALLOW_DIVIDED_ADJOINT
285	#define ALLOW_DIVIDED_ADJOINT_MPI
286	\end{verbatim}
287	}
288	%
289	This will include the header file {\tt mpif.h}
290	into the top level routine for TAF.
291
292	\item
293	Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile.
294
295	\item
296	Follow the same steps as in {\bf Recipe 1} (previous section).
297
298	\end{enumerate}
299
300	That's it. Good luck \& have fun.
301