1 |
heimbach |
1.1 |
|
2 |
|
|
\section{Adjoint dump \& restart -- divided adjoint (DIVA) |
3 |
|
|
\label{sec_ad_diva}} |
4 |
edhill |
1.2 |
\begin{rawhtml} |
5 |
|
|
<!-- CMIREDIR:sec_ad_diva: --> |
6 |
|
|
\end{rawhtml} |
7 |
heimbach |
1.1 |
|
8 |
|
|
{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003} |
9 |
|
|
|
10 |
|
|
{\bf |
11 |
|
|
NOTE: \\ |
12 |
|
|
THIS SECTION IS SUBJECT TO CHANGE. |
13 |
|
|
IT REFERS TO TAF-1.4.26. |
14 |
|
|
|
15 |
|
|
Previous TAF versions are incomplete and have problems |
16 |
|
|
with both TAF options '-pure' and '-mpi'. |
17 |
|
|
|
18 |
|
|
The code which is tuned to the DIVA implementation |
19 |
|
|
of this TAF version |
20 |
|
|
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO). |
21 |
|
|
} |
22 |
|
|
|
23 |
|
|
\subsection{Introduction} |
24 |
|
|
|
25 |
|
|
Most high performance computing (HPC) centres require the use |
26 |
|
|
of batch jobs for code execution. |
27 |
|
|
Limits in maximum available CPU time and memory may prevent |
28 |
|
|
the adjoint code execution from fitting into any of the available |
29 |
|
|
queues. This presents a serious limit for large scale / |
30 |
|
|
long time adjoint ocean and climate model integrations. |
31 |
|
|
The MITgcm itself enables the split of the total model |
32 |
|
|
integration into sub-intervals through standard dump/restart |
33 |
|
|
of/from the full model state. |
34 |
|
|
For a similar procedure to run in reverse mode, |
35 |
|
|
the adjoint model requires, in addition to the model state, |
36 |
|
|
the adjoint model state, |
37 |
|
|
i.e. all variables with derivative information |
38 |
|
|
which are needed in an adjoint restart. |
39 |
|
|
This adjoint dump \& restart is also termed 'divided adjoint (DIVA). |
40 |
|
|
|
41 |
|
|
For this to work in conjunction with automatic differentiation, |
42 |
|
|
an AD tool needs to perform the following tasks: |
43 |
|
|
% |
44 |
|
|
\begin{enumerate} |
45 |
|
|
% |
46 |
|
|
\item |
47 |
|
|
% |
48 |
|
|
identify an adjoint state, i.e. those sensitivities whose |
49 |
|
|
accumulation is interrupted by a dump/restart and which influence |
50 |
|
|
the outcome of the gradient. |
51 |
|
|
Ideally, this state consists of |
52 |
|
|
% |
53 |
|
|
\begin{itemize} |
54 |
|
|
% |
55 |
|
|
\item |
56 |
|
|
the adjoint of the model state, |
57 |
|
|
% |
58 |
|
|
\item |
59 |
|
|
the adjoint of other intermediate results (such as control variables, |
60 |
|
|
cost function contributions, etc.) |
61 |
|
|
% |
62 |
|
|
\item |
63 |
|
|
bookkeeping indices (such as loop indices, etc.) |
64 |
|
|
% |
65 |
|
|
\end{itemize} |
66 |
|
|
% |
67 |
|
|
\item |
68 |
|
|
% |
69 |
|
|
generate code for storing and reading adjoint state variables |
70 |
|
|
% |
71 |
|
|
\item |
72 |
|
|
generate code for bookkeeping , i.e. maintaining a file |
73 |
|
|
with index information |
74 |
|
|
% |
75 |
|
|
\item |
76 |
|
|
generate a suitable adjoint loop to propagate adjoint values |
77 |
|
|
for dump/restart with a minimum overhad of adjoint intermediate |
78 |
|
|
values. |
79 |
|
|
% |
80 |
|
|
\end{enumerate} |
81 |
|
|
|
82 |
|
|
TAF (but not TAMC!) |
83 |
|
|
generates adjoint code which performs the above specified |
84 |
|
|
tasks. It is closely tied to the adjoint multi-level checkpointing. |
85 |
|
|
The adjoint state is dumped (and restarted) at each step of the |
86 |
|
|
outermost checkpointing level and adjoint intergration is performed |
87 |
|
|
over one outermost checkpointing interval. |
88 |
|
|
Prior to the adjoint computations, a full foward sweep is performed to |
89 |
|
|
generate the outermost (forward state) tapes and to calculate |
90 |
|
|
the cost function. |
91 |
|
|
In the current implementation, the forward sweep is |
92 |
|
|
immediately followed by the first adjoint leg. |
93 |
|
|
Thus, in theory, the following steps are performed (automatically) |
94 |
|
|
% |
95 |
|
|
\begin{itemize} |
96 |
|
|
% |
97 |
|
|
\item {\bf 1st model call:} \\ |
98 |
|
|
This is the case if file {\tt costfinal} does {\it not} exist. |
99 |
|
|
S/R {\tt mdthe\_main\_loop} is called. |
100 |
|
|
% |
101 |
|
|
\begin{enumerate} |
102 |
|
|
% |
103 |
|
|
\item |
104 |
|
|
calculate forward trajectory and dump model state after each |
105 |
|
|
outermost checkpointing interval to files {\tt tapelev3} |
106 |
|
|
% |
107 |
|
|
\item |
108 |
|
|
calculate cost function {\tt fc} and write it to file |
109 |
|
|
{\tt costfinal} |
110 |
|
|
% |
111 |
|
|
\end{enumerate} |
112 |
|
|
% |
113 |
|
|
\item{\bf 2nd and all remaining model call:} \\ |
114 |
|
|
This is the case if file {\tt costfinal} {\it does} exist. |
115 |
|
|
S/R {\tt adthe\_main\_loop} is called. |
116 |
|
|
% |
117 |
|
|
\begin{enumerate} |
118 |
|
|
% |
119 |
|
|
\item |
120 |
|
|
(forward run and cost function call is avoided |
121 |
|
|
since all values are known) |
122 |
|
|
% |
123 |
|
|
\begin{itemize} |
124 |
|
|
% |
125 |
|
|
\item |
126 |
|
|
if 1st adjoint leg: \\ |
127 |
|
|
create index file {\tt divided.ctrl} which contains |
128 |
|
|
info on current checkpointing index $ilev3$ |
129 |
|
|
% |
130 |
|
|
\item |
131 |
|
|
if not $i$-th adjoint leg: \\ |
132 |
|
|
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$ |
133 |
|
|
% |
134 |
|
|
\end{itemize} |
135 |
|
|
% |
136 |
|
|
\item |
137 |
|
|
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$ |
138 |
|
|
% |
139 |
|
|
\item |
140 |
|
|
dump adjoint state to file {\tt snapshot} |
141 |
|
|
% |
142 |
|
|
\item |
143 |
|
|
dump index file {\tt divided.ctrl} for next adjoint leg |
144 |
|
|
% |
145 |
|
|
\item |
146 |
|
|
in the last step the gradient is written. |
147 |
|
|
% |
148 |
|
|
\end{enumerate} |
149 |
|
|
% |
150 |
|
|
\end{itemize} |
151 |
|
|
|
152 |
|
|
A few modififications were performed in the forward code, |
153 |
|
|
obvious ones such as adding the corresponding TAF-directive |
154 |
|
|
at the appropriate place, and less obvious ones |
155 |
|
|
(avoid some re-initializations, when in an intermediate |
156 |
|
|
adjoint integration interval). |
157 |
|
|
|
158 |
|
|
[For TAF-1.4.20 a number of hand-modifications were necessary |
159 |
|
|
to compensate for TAF bugs. |
160 |
|
|
Since we refer to TAF-1.4.26 onwards, |
161 |
|
|
these modifications are not documented here]. |
162 |
|
|
|
163 |
|
|
\subsection{Recipe 1: single processor} |
164 |
|
|
|
165 |
|
|
|
166 |
|
|
\begin{enumerate} |
167 |
|
|
|
168 |
|
|
\item |
169 |
|
|
In {\tt ECCO\_CPPOPTIONS.h} set: |
170 |
|
|
% |
171 |
|
|
{\footnotesize |
172 |
|
|
\begin{verbatim} |
173 |
|
|
#define ALLOW_DIVIDED_ADJOINT |
174 |
|
|
#undef ALLOW_DIVIDED_ADJOINT_MPI |
175 |
|
|
\end{verbatim} |
176 |
|
|
} |
177 |
|
|
|
178 |
|
|
\item |
179 |
|
|
Generate adjoint code. |
180 |
|
|
Using the TAF option '{\tt -pure}', two codes are generated: |
181 |
|
|
% |
182 |
|
|
\begin{itemize} |
183 |
|
|
% |
184 |
|
|
\item {\tt mdthe\_main\_loop}: \\ |
185 |
|
|
Is responsible for the forward trajectory, storing of outermost |
186 |
|
|
checkpoint levels to file, computation of cost function, and |
187 |
|
|
storing of cost function to file (1st step). |
188 |
|
|
% |
189 |
|
|
\item {\tt adthe\_main\_loop}: \\ |
190 |
|
|
Is responsible for computing one adjoint leg, dump adjoint state |
191 |
|
|
to file and write index info to file (2nd and consecutive steps). |
192 |
|
|
|
193 |
|
|
for adjoint code generation, e.g. add '{\tt -pure}' to |
194 |
|
|
TAF option list |
195 |
|
|
{\footnotesize |
196 |
|
|
\begin{verbatim} |
197 |
|
|
make adtaf |
198 |
|
|
\end{verbatim} |
199 |
|
|
} |
200 |
|
|
% |
201 |
|
|
|
202 |
|
|
\item |
203 |
|
|
One modification needs to be made to adjoint codes in |
204 |
|
|
S/R adecco\_the\_main\_loop: |
205 |
|
|
|
206 |
|
|
There's a remaining issue with the '{\tt -pure}' option. |
207 |
|
|
The '{\tt call ad...}' |
208 |
|
|
between '{\tt call ad...}' and the read of the {\tt snapshot} file |
209 |
|
|
should be called only in the firt adjoint leg between |
210 |
|
|
$nlev3$ and $nlev3-1$. |
211 |
|
|
In the ecco-branch, the following lines should be |
212 |
|
|
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus: |
213 |
|
|
|
214 |
|
|
{\footnotesize |
215 |
|
|
\begin{verbatim} |
216 |
|
|
|
217 |
|
|
... |
218 |
|
|
xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1) |
219 |
|
|
xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1) |
220 |
|
|
xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1) |
221 |
|
|
call barrier( mythid ) |
222 |
|
|
cAdd( |
223 |
|
|
if (idivbeg .GE. nchklev_3) then |
224 |
|
|
cAdd) |
225 |
|
|
|
226 |
|
|
call adcost_final( mythid ) |
227 |
|
|
call barrier( mythid ) |
228 |
|
|
call adcost_sst( mythid ) |
229 |
|
|
call adcost_ssh( mythid ) |
230 |
|
|
call adcost_hyd( mythid ) |
231 |
|
|
call adcost_averagesfields( mytime,myiter,mythid ) |
232 |
|
|
call barrier( mythid ) |
233 |
|
|
cAdd( |
234 |
|
|
endif |
235 |
|
|
cAdd) |
236 |
|
|
|
237 |
|
|
C---------------------------------------------- |
238 |
|
|
C read snapshot |
239 |
|
|
C---------------------------------------------- |
240 |
|
|
if (idivbeg .lt. nchklev_3) then |
241 |
|
|
open(unit=77,file='snapshot',status='old',form='unformatted', |
242 |
|
|
$iostat=iers) |
243 |
|
|
... |
244 |
|
|
|
245 |
|
|
\end{verbatim} |
246 |
|
|
} |
247 |
|
|
|
248 |
|
|
For the main code, in all likelihood the block which needs to |
249 |
|
|
be bracketed consists of {\tt adcost\_final} only. |
250 |
|
|
|
251 |
|
|
\item |
252 |
|
|
Now the code can be copied as usual to {\tt adjoint\_model.F} |
253 |
|
|
and then be compiled: |
254 |
|
|
% |
255 |
|
|
{\footnotesize |
256 |
|
|
\begin{verbatim} |
257 |
|
|
make adchange |
258 |
|
|
then compile |
259 |
|
|
\end{verbatim} |
260 |
|
|
} |
261 |
|
|
|
262 |
|
|
\end{itemize} |
263 |
|
|
|
264 |
|
|
\end{enumerate} |
265 |
|
|
|
266 |
|
|
\subsection{Recipe 2: multi processor (MPI)} |
267 |
|
|
|
268 |
|
|
|
269 |
|
|
\begin{enumerate} |
270 |
|
|
|
271 |
|
|
\item |
272 |
|
|
On the machine where you execute the code |
273 |
|
|
(most likely not the machine where you run TAF) |
274 |
|
|
find the includes directory for MPI containing {\tt mpif.h}. |
275 |
|
|
Either copy {\tt mpif.h} to the machine where you generate the |
276 |
|
|
{\tt .f} files before TAF-ing, or add the path to the includes |
277 |
|
|
directory to you genmake {\tt platform} setup, |
278 |
|
|
TAF needs some MPI parameter settings |
279 |
|
|
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer}) |
280 |
|
|
to incorporate those in the adjoint code. |
281 |
|
|
|
282 |
|
|
\item |
283 |
|
|
In {\tt ECCO\_CPPOPTIONS.h} set |
284 |
|
|
% |
285 |
|
|
{\footnotesize |
286 |
|
|
\begin{verbatim} |
287 |
|
|
#define ALLOW_DIVIDED_ADJOINT |
288 |
|
|
#define ALLOW_DIVIDED_ADJOINT_MPI |
289 |
|
|
\end{verbatim} |
290 |
|
|
} |
291 |
|
|
% |
292 |
|
|
This will include the header file {\tt mpif.h} |
293 |
|
|
into the top level routine for TAF. |
294 |
|
|
|
295 |
|
|
\item |
296 |
|
|
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile. |
297 |
|
|
|
298 |
|
|
\item |
299 |
|
|
Follow the same steps as in {\bf Recipe 1} (previous section). |
300 |
|
|
|
301 |
|
|
\end{enumerate} |
302 |
|
|
|
303 |
|
|
That's it. Good luck \& have fun. |
304 |
|
|
|