1 |
|
2 |
\section{Adjoint dump \& restart -- divided adjoint (DIVA) |
3 |
\label{sec_ad_diva}} |
4 |
|
5 |
{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003} |
6 |
|
7 |
{\bf |
8 |
NOTE: \\ |
9 |
THIS SECTION IS SUBJECT TO CHANGE. |
10 |
IT REFERS TO TAF-1.4.26. |
11 |
|
12 |
Previous TAF versions are incomplete and have problems |
13 |
with both TAF options '-pure' and '-mpi'. |
14 |
|
15 |
The code which is tuned to the DIVA implementation |
16 |
of this TAF version |
17 |
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO). |
18 |
} |
19 |
|
20 |
\subsection{Introduction} |
21 |
|
22 |
Most high performance computing (HPC) centres require the use |
23 |
of batch jobs for code execution. |
24 |
Limits in maximum available CPU time and memory may prevent |
25 |
the adjoint code execution from fitting into any of the available |
26 |
queues. This presents a serious limit for large scale / |
27 |
long time adjoint ocean and climate model integrations. |
28 |
The MITgcm itself enables the split of the total model |
29 |
integration into sub-intervals through standard dump/restart |
30 |
of/from the full model state. |
31 |
For a similar procedure to run in reverse mode, |
32 |
the adjoint model requires, in addition to the model state, |
33 |
the adjoint model state, |
34 |
i.e. all variables with derivative information |
35 |
which are needed in an adjoint restart. |
36 |
This adjoint dump \& restart is also termed 'divided adjoint (DIVA). |
37 |
|
38 |
For this to work in conjunction with automatic differentiation, |
39 |
an AD tool needs to perform the following tasks: |
40 |
% |
41 |
\begin{enumerate} |
42 |
% |
43 |
\item |
44 |
% |
45 |
identify an adjoint state, i.e. those sensitivities whose |
46 |
accumulation is interrupted by a dump/restart and which influence |
47 |
the outcome of the gradient. |
48 |
Ideally, this state consists of |
49 |
% |
50 |
\begin{itemize} |
51 |
% |
52 |
\item |
53 |
the adjoint of the model state, |
54 |
% |
55 |
\item |
56 |
the adjoint of other intermediate results (such as control variables, |
57 |
cost function contributions, etc.) |
58 |
% |
59 |
\item |
60 |
bookkeeping indices (such as loop indices, etc.) |
61 |
% |
62 |
\end{itemize} |
63 |
% |
64 |
\item |
65 |
% |
66 |
generate code for storing and reading adjoint state variables |
67 |
% |
68 |
\item |
69 |
generate code for bookkeeping , i.e. maintaining a file |
70 |
with index information |
71 |
% |
72 |
\item |
73 |
generate a suitable adjoint loop to propagate adjoint values |
74 |
for dump/restart with a minimum overhad of adjoint intermediate |
75 |
values. |
76 |
% |
77 |
\end{enumerate} |
78 |
|
79 |
TAF (but not TAMC!) |
80 |
generates adjoint code which performs the above specified |
81 |
tasks. It is closely tied to the adjoint multi-level checkpointing. |
82 |
The adjoint state is dumped (and restarted) at each step of the |
83 |
outermost checkpointing level and adjoint intergration is performed |
84 |
over one outermost checkpointing interval. |
85 |
Prior to the adjoint computations, a full foward sweep is performed to |
86 |
generate the outermost (forward state) tapes and to calculate |
87 |
the cost function. |
88 |
In the current implementation, the forward sweep is |
89 |
immediately followed by the first adjoint leg. |
90 |
Thus, in theory, the following steps are performed (automatically) |
91 |
% |
92 |
\begin{itemize} |
93 |
% |
94 |
\item {\bf 1st model call:} \\ |
95 |
This is the case if file {\tt costfinal} does {\it not} exist. |
96 |
S/R {\tt mdthe\_main\_loop} is called. |
97 |
% |
98 |
\begin{enumerate} |
99 |
% |
100 |
\item |
101 |
calculate forward trajectory and dump model state after each |
102 |
outermost checkpointing interval to files {\tt tapelev3} |
103 |
% |
104 |
\item |
105 |
calculate cost function {\tt fc} and write it to file |
106 |
{\tt costfinal} |
107 |
% |
108 |
\end{enumerate} |
109 |
% |
110 |
\item{\bf 2nd and all remaining model call:} \\ |
111 |
This is the case if file {\tt costfinal} {\it does} exist. |
112 |
S/R {\tt adthe\_main\_loop} is called. |
113 |
% |
114 |
\begin{enumerate} |
115 |
% |
116 |
\item |
117 |
(forward run and cost function call is avoided |
118 |
since all values are known) |
119 |
% |
120 |
\begin{itemize} |
121 |
% |
122 |
\item |
123 |
if 1st adjoint leg: \\ |
124 |
create index file {\tt divided.ctrl} which contains |
125 |
info on current checkpointing index $ilev3$ |
126 |
% |
127 |
\item |
128 |
if not $i$-th adjoint leg: \\ |
129 |
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$ |
130 |
% |
131 |
\end{itemize} |
132 |
% |
133 |
\item |
134 |
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$ |
135 |
% |
136 |
\item |
137 |
dump adjoint state to file {\tt snapshot} |
138 |
% |
139 |
\item |
140 |
dump index file {\tt divided.ctrl} for next adjoint leg |
141 |
% |
142 |
\item |
143 |
in the last step the gradient is written. |
144 |
% |
145 |
\end{enumerate} |
146 |
% |
147 |
\end{itemize} |
148 |
|
149 |
A few modififications were performed in the forward code, |
150 |
obvious ones such as adding the corresponding TAF-directive |
151 |
at the appropriate place, and less obvious ones |
152 |
(avoid some re-initializations, when in an intermediate |
153 |
adjoint integration interval). |
154 |
|
155 |
[For TAF-1.4.20 a number of hand-modifications were necessary |
156 |
to compensate for TAF bugs. |
157 |
Since we refer to TAF-1.4.26 onwards, |
158 |
these modifications are not documented here]. |
159 |
|
160 |
\subsection{Recipe 1: single processor} |
161 |
|
162 |
|
163 |
\begin{enumerate} |
164 |
|
165 |
\item |
166 |
In {\tt ECCO\_CPPOPTIONS.h} set: |
167 |
% |
168 |
{\footnotesize |
169 |
\begin{verbatim} |
170 |
#define ALLOW_DIVIDED_ADJOINT |
171 |
#undef ALLOW_DIVIDED_ADJOINT_MPI |
172 |
\end{verbatim} |
173 |
} |
174 |
|
175 |
\item |
176 |
Generate adjoint code. |
177 |
Using the TAF option '{\tt -pure}', two codes are generated: |
178 |
% |
179 |
\begin{itemize} |
180 |
% |
181 |
\item {\tt mdthe\_main\_loop}: \\ |
182 |
Is responsible for the forward trajectory, storing of outermost |
183 |
checkpoint levels to file, computation of cost function, and |
184 |
storing of cost function to file (1st step). |
185 |
% |
186 |
\item {\tt adthe\_main\_loop}: \\ |
187 |
Is responsible for computing one adjoint leg, dump adjoint state |
188 |
to file and write index info to file (2nd and consecutive steps). |
189 |
|
190 |
for adjoint code generation, e.g. add '{\tt -pure}' to |
191 |
TAF option list |
192 |
{\footnotesize |
193 |
\begin{verbatim} |
194 |
make adtaf |
195 |
\end{verbatim} |
196 |
} |
197 |
% |
198 |
|
199 |
\item |
200 |
One modification needs to be made to adjoint codes in |
201 |
S/R adecco\_the\_main\_loop: |
202 |
|
203 |
There's a remaining issue with the '{\tt -pure}' option. |
204 |
The '{\tt call ad...}' |
205 |
between '{\tt call ad...}' and the read of the {\tt snapshot} file |
206 |
should be called only in the firt adjoint leg between |
207 |
$nlev3$ and $nlev3-1$. |
208 |
In the ecco-branch, the following lines should be |
209 |
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus: |
210 |
|
211 |
{\footnotesize |
212 |
\begin{verbatim} |
213 |
|
214 |
... |
215 |
xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1) |
216 |
xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1) |
217 |
xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1) |
218 |
call barrier( mythid ) |
219 |
cAdd( |
220 |
if (idivbeg .GE. nchklev_3) then |
221 |
cAdd) |
222 |
|
223 |
call adcost_final( mythid ) |
224 |
call barrier( mythid ) |
225 |
call adcost_sst( mythid ) |
226 |
call adcost_ssh( mythid ) |
227 |
call adcost_hyd( mythid ) |
228 |
call adcost_averagesfields( mytime,myiter,mythid ) |
229 |
call barrier( mythid ) |
230 |
cAdd( |
231 |
endif |
232 |
cAdd) |
233 |
|
234 |
C---------------------------------------------- |
235 |
C read snapshot |
236 |
C---------------------------------------------- |
237 |
if (idivbeg .lt. nchklev_3) then |
238 |
open(unit=77,file='snapshot',status='old',form='unformatted', |
239 |
$iostat=iers) |
240 |
... |
241 |
|
242 |
\end{verbatim} |
243 |
} |
244 |
|
245 |
For the main code, in all likelihood the block which needs to |
246 |
be bracketed consists of {\tt adcost\_final} only. |
247 |
|
248 |
\item |
249 |
Now the code can be copied as usual to {\tt adjoint\_model.F} |
250 |
and then be compiled: |
251 |
% |
252 |
{\footnotesize |
253 |
\begin{verbatim} |
254 |
make adchange |
255 |
then compile |
256 |
\end{verbatim} |
257 |
} |
258 |
|
259 |
\end{itemize} |
260 |
|
261 |
\end{enumerate} |
262 |
|
263 |
\subsection{Recipe 2: multi processor (MPI)} |
264 |
|
265 |
|
266 |
\begin{enumerate} |
267 |
|
268 |
\item |
269 |
On the machine where you execute the code |
270 |
(most likely not the machine where you run TAF) |
271 |
find the includes directory for MPI containing {\tt mpif.h}. |
272 |
Either copy {\tt mpif.h} to the machine where you generate the |
273 |
{\tt .f} files before TAF-ing, or add the path to the includes |
274 |
directory to you genmake {\tt platform} setup, |
275 |
TAF needs some MPI parameter settings |
276 |
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer}) |
277 |
to incorporate those in the adjoint code. |
278 |
|
279 |
\item |
280 |
In {\tt ECCO\_CPPOPTIONS.h} set |
281 |
% |
282 |
{\footnotesize |
283 |
\begin{verbatim} |
284 |
#define ALLOW_DIVIDED_ADJOINT |
285 |
#define ALLOW_DIVIDED_ADJOINT_MPI |
286 |
\end{verbatim} |
287 |
} |
288 |
% |
289 |
This will include the header file {\tt mpif.h} |
290 |
into the top level routine for TAF. |
291 |
|
292 |
\item |
293 |
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile. |
294 |
|
295 |
\item |
296 |
Follow the same steps as in {\bf Recipe 1} (previous section). |
297 |
|
298 |
\end{enumerate} |
299 |
|
300 |
That's it. Good luck \& have fun. |
301 |
|