1 |
|
2 |
\section{Adjoint dump \& restart -- divided adjoint (DIVA) |
3 |
\label{sec_ad_diva}} |
4 |
\begin{rawhtml} |
5 |
<!-- CMIREDIR:sec_ad_diva: --> |
6 |
\end{rawhtml} |
7 |
|
8 |
{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003} |
9 |
|
10 |
{\bf |
11 |
NOTE: \\ |
12 |
THIS SECTION IS SUBJECT TO CHANGE. |
13 |
IT REFERS TO TAF-1.4.26. |
14 |
|
15 |
Previous TAF versions are incomplete and have problems |
16 |
with both TAF options '-pure' and '-mpi'. |
17 |
|
18 |
The code which is tuned to the DIVA implementation |
19 |
of this TAF version |
20 |
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO). |
21 |
} |
22 |
|
23 |
\subsection{Introduction} |
24 |
|
25 |
Most high performance computing (HPC) centres require the use |
26 |
of batch jobs for code execution. |
27 |
Limits in maximum available CPU time and memory may prevent |
28 |
the adjoint code execution from fitting into any of the available |
29 |
queues. This presents a serious limit for large scale / |
30 |
long time adjoint ocean and climate model integrations. |
31 |
The MITgcm itself enables the split of the total model |
32 |
integration into sub-intervals through standard dump/restart |
33 |
of/from the full model state. |
34 |
For a similar procedure to run in reverse mode, |
35 |
the adjoint model requires, in addition to the model state, |
36 |
the adjoint model state, |
37 |
i.e. all variables with derivative information |
38 |
which are needed in an adjoint restart. |
39 |
This adjoint dump \& restart is also termed 'divided adjoint (DIVA). |
40 |
|
41 |
For this to work in conjunction with automatic differentiation, |
42 |
an AD tool needs to perform the following tasks: |
43 |
% |
44 |
\begin{enumerate} |
45 |
% |
46 |
\item |
47 |
% |
48 |
identify an adjoint state, i.e. those sensitivities whose |
49 |
accumulation is interrupted by a dump/restart and which influence |
50 |
the outcome of the gradient. |
51 |
Ideally, this state consists of |
52 |
% |
53 |
\begin{itemize} |
54 |
% |
55 |
\item |
56 |
the adjoint of the model state, |
57 |
% |
58 |
\item |
59 |
the adjoint of other intermediate results (such as control variables, |
60 |
cost function contributions, etc.) |
61 |
% |
62 |
\item |
63 |
bookkeeping indices (such as loop indices, etc.) |
64 |
% |
65 |
\end{itemize} |
66 |
% |
67 |
\item |
68 |
% |
69 |
generate code for storing and reading adjoint state variables |
70 |
% |
71 |
\item |
72 |
generate code for bookkeeping , i.e. maintaining a file |
73 |
with index information |
74 |
% |
75 |
\item |
76 |
generate a suitable adjoint loop to propagate adjoint values |
77 |
for dump/restart with a minimum overhad of adjoint intermediate |
78 |
values. |
79 |
% |
80 |
\end{enumerate} |
81 |
|
82 |
TAF (but not TAMC!) |
83 |
generates adjoint code which performs the above specified |
84 |
tasks. It is closely tied to the adjoint multi-level checkpointing. |
85 |
The adjoint state is dumped (and restarted) at each step of the |
86 |
outermost checkpointing level and adjoint intergration is performed |
87 |
over one outermost checkpointing interval. |
88 |
Prior to the adjoint computations, a full foward sweep is performed to |
89 |
generate the outermost (forward state) tapes and to calculate |
90 |
the cost function. |
91 |
In the current implementation, the forward sweep is |
92 |
immediately followed by the first adjoint leg. |
93 |
Thus, in theory, the following steps are performed (automatically) |
94 |
% |
95 |
\begin{itemize} |
96 |
% |
97 |
\item {\bf 1st model call:} \\ |
98 |
This is the case if file {\tt costfinal} does {\it not} exist. |
99 |
S/R {\tt mdthe\_main\_loop} is called. |
100 |
% |
101 |
\begin{enumerate} |
102 |
% |
103 |
\item |
104 |
calculate forward trajectory and dump model state after each |
105 |
outermost checkpointing interval to files {\tt tapelev3} |
106 |
% |
107 |
\item |
108 |
calculate cost function {\tt fc} and write it to file |
109 |
{\tt costfinal} |
110 |
% |
111 |
\end{enumerate} |
112 |
% |
113 |
\item{\bf 2nd and all remaining model call:} \\ |
114 |
This is the case if file {\tt costfinal} {\it does} exist. |
115 |
S/R {\tt adthe\_main\_loop} is called. |
116 |
% |
117 |
\begin{enumerate} |
118 |
% |
119 |
\item |
120 |
(forward run and cost function call is avoided |
121 |
since all values are known) |
122 |
% |
123 |
\begin{itemize} |
124 |
% |
125 |
\item |
126 |
if 1st adjoint leg: \\ |
127 |
create index file {\tt divided.ctrl} which contains |
128 |
info on current checkpointing index $ilev3$ |
129 |
% |
130 |
\item |
131 |
if not $i$-th adjoint leg: \\ |
132 |
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$ |
133 |
% |
134 |
\end{itemize} |
135 |
% |
136 |
\item |
137 |
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$ |
138 |
% |
139 |
\item |
140 |
dump adjoint state to file {\tt snapshot} |
141 |
% |
142 |
\item |
143 |
dump index file {\tt divided.ctrl} for next adjoint leg |
144 |
% |
145 |
\item |
146 |
in the last step the gradient is written. |
147 |
% |
148 |
\end{enumerate} |
149 |
% |
150 |
\end{itemize} |
151 |
|
152 |
A few modififications were performed in the forward code, |
153 |
obvious ones such as adding the corresponding TAF-directive |
154 |
at the appropriate place, and less obvious ones |
155 |
(avoid some re-initializations, when in an intermediate |
156 |
adjoint integration interval). |
157 |
|
158 |
[For TAF-1.4.20 a number of hand-modifications were necessary |
159 |
to compensate for TAF bugs. |
160 |
Since we refer to TAF-1.4.26 onwards, |
161 |
these modifications are not documented here]. |
162 |
|
163 |
\subsection{Recipe 1: single processor} |
164 |
|
165 |
|
166 |
\begin{enumerate} |
167 |
|
168 |
\item |
169 |
In {\tt ECCO\_CPPOPTIONS.h} set: |
170 |
% |
171 |
{\footnotesize |
172 |
\begin{verbatim} |
173 |
#define ALLOW_DIVIDED_ADJOINT |
174 |
#undef ALLOW_DIVIDED_ADJOINT_MPI |
175 |
\end{verbatim} |
176 |
} |
177 |
|
178 |
\item |
179 |
Generate adjoint code. |
180 |
Using the TAF option '{\tt -pure}', two codes are generated: |
181 |
% |
182 |
\begin{itemize} |
183 |
% |
184 |
\item {\tt mdthe\_main\_loop}: \\ |
185 |
Is responsible for the forward trajectory, storing of outermost |
186 |
checkpoint levels to file, computation of cost function, and |
187 |
storing of cost function to file (1st step). |
188 |
% |
189 |
\item {\tt adthe\_main\_loop}: \\ |
190 |
Is responsible for computing one adjoint leg, dump adjoint state |
191 |
to file and write index info to file (2nd and consecutive steps). |
192 |
|
193 |
for adjoint code generation, e.g. add '{\tt -pure}' to |
194 |
TAF option list |
195 |
{\footnotesize |
196 |
\begin{verbatim} |
197 |
make adtaf |
198 |
\end{verbatim} |
199 |
} |
200 |
% |
201 |
|
202 |
\item |
203 |
One modification needs to be made to adjoint codes in |
204 |
S/R adecco\_the\_main\_loop: |
205 |
|
206 |
There's a remaining issue with the '{\tt -pure}' option. |
207 |
The '{\tt call ad...}' |
208 |
between '{\tt call ad...}' and the read of the {\tt snapshot} file |
209 |
should be called only in the firt adjoint leg between |
210 |
$nlev3$ and $nlev3-1$. |
211 |
In the ecco-branch, the following lines should be |
212 |
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus: |
213 |
|
214 |
{\footnotesize |
215 |
\begin{verbatim} |
216 |
|
217 |
... |
218 |
xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1) |
219 |
xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1) |
220 |
xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1) |
221 |
call barrier( mythid ) |
222 |
cAdd( |
223 |
if (idivbeg .GE. nchklev_3) then |
224 |
cAdd) |
225 |
|
226 |
call adcost_final( mythid ) |
227 |
call barrier( mythid ) |
228 |
call adcost_sst( mythid ) |
229 |
call adcost_ssh( mythid ) |
230 |
call adcost_hyd( mythid ) |
231 |
call adcost_averagesfields( mytime,myiter,mythid ) |
232 |
call barrier( mythid ) |
233 |
cAdd( |
234 |
endif |
235 |
cAdd) |
236 |
|
237 |
C---------------------------------------------- |
238 |
C read snapshot |
239 |
C---------------------------------------------- |
240 |
if (idivbeg .lt. nchklev_3) then |
241 |
open(unit=77,file='snapshot',status='old',form='unformatted', |
242 |
$iostat=iers) |
243 |
... |
244 |
|
245 |
\end{verbatim} |
246 |
} |
247 |
|
248 |
For the main code, in all likelihood the block which needs to |
249 |
be bracketed consists of {\tt adcost\_final} only. |
250 |
|
251 |
\item |
252 |
Now the code can be copied as usual to {\tt adjoint\_model.F} |
253 |
and then be compiled: |
254 |
% |
255 |
{\footnotesize |
256 |
\begin{verbatim} |
257 |
make adchange |
258 |
then compile |
259 |
\end{verbatim} |
260 |
} |
261 |
|
262 |
\end{itemize} |
263 |
|
264 |
\end{enumerate} |
265 |
|
266 |
\subsection{Recipe 2: multi processor (MPI)} |
267 |
|
268 |
|
269 |
\begin{enumerate} |
270 |
|
271 |
\item |
272 |
On the machine where you execute the code |
273 |
(most likely not the machine where you run TAF) |
274 |
find the includes directory for MPI containing {\tt mpif.h}. |
275 |
Either copy {\tt mpif.h} to the machine where you generate the |
276 |
{\tt .f} files before TAF-ing, or add the path to the includes |
277 |
directory to you genmake {\tt platform} setup, |
278 |
TAF needs some MPI parameter settings |
279 |
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer}) |
280 |
to incorporate those in the adjoint code. |
281 |
|
282 |
\item |
283 |
In {\tt ECCO\_CPPOPTIONS.h} set |
284 |
% |
285 |
{\footnotesize |
286 |
\begin{verbatim} |
287 |
#define ALLOW_DIVIDED_ADJOINT |
288 |
#define ALLOW_DIVIDED_ADJOINT_MPI |
289 |
\end{verbatim} |
290 |
} |
291 |
% |
292 |
This will include the header file {\tt mpif.h} |
293 |
into the top level routine for TAF. |
294 |
|
295 |
\item |
296 |
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile. |
297 |
|
298 |
\item |
299 |
Follow the same steps as in {\bf Recipe 1} (previous section). |
300 |
|
301 |
\end{enumerate} |
302 |
|
303 |
That's it. Good luck \& have fun. |
304 |
|