| 1 |
heimbach |
1.1 |
|
| 2 |
|
|
\section{Adjoint dump \& restart -- divided adjoint (DIVA) |
| 3 |
|
|
\label{sec_ad_diva}} |
| 4 |
edhill |
1.2 |
\begin{rawhtml} |
| 5 |
|
|
<!-- CMIREDIR:sec_ad_diva: --> |
| 6 |
|
|
\end{rawhtml} |
| 7 |
heimbach |
1.1 |
|
| 8 |
|
|
{\it Patrick Heimbach \& Geoffrey Gebbie, MIT/EAPS, 07-Mar-2003} |
| 9 |
|
|
|
| 10 |
|
|
{\bf |
| 11 |
|
|
NOTE: \\ |
| 12 |
|
|
THIS SECTION IS SUBJECT TO CHANGE. |
| 13 |
|
|
IT REFERS TO TAF-1.4.26. |
| 14 |
|
|
|
| 15 |
|
|
Previous TAF versions are incomplete and have problems |
| 16 |
|
|
with both TAF options '-pure' and '-mpi'. |
| 17 |
|
|
|
| 18 |
|
|
The code which is tuned to the DIVA implementation |
| 19 |
|
|
of this TAF version |
| 20 |
|
|
is {\it checkpoint50} (MITgcm) and {\it ecco\_c50\_e28} (ECCO). |
| 21 |
|
|
} |
| 22 |
|
|
|
| 23 |
|
|
\subsection{Introduction} |
| 24 |
|
|
|
| 25 |
|
|
Most high performance computing (HPC) centres require the use |
| 26 |
|
|
of batch jobs for code execution. |
| 27 |
|
|
Limits in maximum available CPU time and memory may prevent |
| 28 |
|
|
the adjoint code execution from fitting into any of the available |
| 29 |
|
|
queues. This presents a serious limit for large scale / |
| 30 |
|
|
long time adjoint ocean and climate model integrations. |
| 31 |
|
|
The MITgcm itself enables the split of the total model |
| 32 |
|
|
integration into sub-intervals through standard dump/restart |
| 33 |
|
|
of/from the full model state. |
| 34 |
|
|
For a similar procedure to run in reverse mode, |
| 35 |
|
|
the adjoint model requires, in addition to the model state, |
| 36 |
|
|
the adjoint model state, |
| 37 |
|
|
i.e. all variables with derivative information |
| 38 |
|
|
which are needed in an adjoint restart. |
| 39 |
|
|
This adjoint dump \& restart is also termed 'divided adjoint (DIVA). |
| 40 |
|
|
|
| 41 |
|
|
For this to work in conjunction with automatic differentiation, |
| 42 |
|
|
an AD tool needs to perform the following tasks: |
| 43 |
|
|
% |
| 44 |
|
|
\begin{enumerate} |
| 45 |
|
|
% |
| 46 |
|
|
\item |
| 47 |
|
|
% |
| 48 |
|
|
identify an adjoint state, i.e. those sensitivities whose |
| 49 |
|
|
accumulation is interrupted by a dump/restart and which influence |
| 50 |
|
|
the outcome of the gradient. |
| 51 |
|
|
Ideally, this state consists of |
| 52 |
|
|
% |
| 53 |
|
|
\begin{itemize} |
| 54 |
|
|
% |
| 55 |
|
|
\item |
| 56 |
|
|
the adjoint of the model state, |
| 57 |
|
|
% |
| 58 |
|
|
\item |
| 59 |
|
|
the adjoint of other intermediate results (such as control variables, |
| 60 |
|
|
cost function contributions, etc.) |
| 61 |
|
|
% |
| 62 |
|
|
\item |
| 63 |
|
|
bookkeeping indices (such as loop indices, etc.) |
| 64 |
|
|
% |
| 65 |
|
|
\end{itemize} |
| 66 |
|
|
% |
| 67 |
|
|
\item |
| 68 |
|
|
% |
| 69 |
|
|
generate code for storing and reading adjoint state variables |
| 70 |
|
|
% |
| 71 |
|
|
\item |
| 72 |
|
|
generate code for bookkeeping , i.e. maintaining a file |
| 73 |
|
|
with index information |
| 74 |
|
|
% |
| 75 |
|
|
\item |
| 76 |
|
|
generate a suitable adjoint loop to propagate adjoint values |
| 77 |
|
|
for dump/restart with a minimum overhad of adjoint intermediate |
| 78 |
|
|
values. |
| 79 |
|
|
% |
| 80 |
|
|
\end{enumerate} |
| 81 |
|
|
|
| 82 |
|
|
TAF (but not TAMC!) |
| 83 |
|
|
generates adjoint code which performs the above specified |
| 84 |
|
|
tasks. It is closely tied to the adjoint multi-level checkpointing. |
| 85 |
|
|
The adjoint state is dumped (and restarted) at each step of the |
| 86 |
|
|
outermost checkpointing level and adjoint intergration is performed |
| 87 |
|
|
over one outermost checkpointing interval. |
| 88 |
|
|
Prior to the adjoint computations, a full foward sweep is performed to |
| 89 |
|
|
generate the outermost (forward state) tapes and to calculate |
| 90 |
|
|
the cost function. |
| 91 |
|
|
In the current implementation, the forward sweep is |
| 92 |
|
|
immediately followed by the first adjoint leg. |
| 93 |
|
|
Thus, in theory, the following steps are performed (automatically) |
| 94 |
|
|
% |
| 95 |
|
|
\begin{itemize} |
| 96 |
|
|
% |
| 97 |
|
|
\item {\bf 1st model call:} \\ |
| 98 |
|
|
This is the case if file {\tt costfinal} does {\it not} exist. |
| 99 |
|
|
S/R {\tt mdthe\_main\_loop} is called. |
| 100 |
|
|
% |
| 101 |
|
|
\begin{enumerate} |
| 102 |
|
|
% |
| 103 |
|
|
\item |
| 104 |
|
|
calculate forward trajectory and dump model state after each |
| 105 |
|
|
outermost checkpointing interval to files {\tt tapelev3} |
| 106 |
|
|
% |
| 107 |
|
|
\item |
| 108 |
|
|
calculate cost function {\tt fc} and write it to file |
| 109 |
|
|
{\tt costfinal} |
| 110 |
|
|
% |
| 111 |
|
|
\end{enumerate} |
| 112 |
|
|
% |
| 113 |
|
|
\item{\bf 2nd and all remaining model call:} \\ |
| 114 |
|
|
This is the case if file {\tt costfinal} {\it does} exist. |
| 115 |
|
|
S/R {\tt adthe\_main\_loop} is called. |
| 116 |
|
|
% |
| 117 |
|
|
\begin{enumerate} |
| 118 |
|
|
% |
| 119 |
|
|
\item |
| 120 |
|
|
(forward run and cost function call is avoided |
| 121 |
|
|
since all values are known) |
| 122 |
|
|
% |
| 123 |
|
|
\begin{itemize} |
| 124 |
|
|
% |
| 125 |
|
|
\item |
| 126 |
|
|
if 1st adjoint leg: \\ |
| 127 |
|
|
create index file {\tt divided.ctrl} which contains |
| 128 |
|
|
info on current checkpointing index $ilev3$ |
| 129 |
|
|
% |
| 130 |
|
|
\item |
| 131 |
|
|
if not $i$-th adjoint leg: \\ |
| 132 |
|
|
adjoint picks up at $ilev3 = nlev3-i+1$ and runs to $nlev3 - i$ |
| 133 |
|
|
% |
| 134 |
|
|
\end{itemize} |
| 135 |
|
|
% |
| 136 |
|
|
\item |
| 137 |
|
|
perform adjoint leg from $nlev3-i+1$ to $nlev3 - i$ |
| 138 |
|
|
% |
| 139 |
|
|
\item |
| 140 |
|
|
dump adjoint state to file {\tt snapshot} |
| 141 |
|
|
% |
| 142 |
|
|
\item |
| 143 |
|
|
dump index file {\tt divided.ctrl} for next adjoint leg |
| 144 |
|
|
% |
| 145 |
|
|
\item |
| 146 |
|
|
in the last step the gradient is written. |
| 147 |
|
|
% |
| 148 |
|
|
\end{enumerate} |
| 149 |
|
|
% |
| 150 |
|
|
\end{itemize} |
| 151 |
|
|
|
| 152 |
|
|
A few modififications were performed in the forward code, |
| 153 |
|
|
obvious ones such as adding the corresponding TAF-directive |
| 154 |
|
|
at the appropriate place, and less obvious ones |
| 155 |
|
|
(avoid some re-initializations, when in an intermediate |
| 156 |
|
|
adjoint integration interval). |
| 157 |
|
|
|
| 158 |
|
|
[For TAF-1.4.20 a number of hand-modifications were necessary |
| 159 |
|
|
to compensate for TAF bugs. |
| 160 |
|
|
Since we refer to TAF-1.4.26 onwards, |
| 161 |
|
|
these modifications are not documented here]. |
| 162 |
|
|
|
| 163 |
|
|
\subsection{Recipe 1: single processor} |
| 164 |
|
|
|
| 165 |
|
|
|
| 166 |
|
|
\begin{enumerate} |
| 167 |
|
|
|
| 168 |
|
|
\item |
| 169 |
|
|
In {\tt ECCO\_CPPOPTIONS.h} set: |
| 170 |
|
|
% |
| 171 |
|
|
{\footnotesize |
| 172 |
|
|
\begin{verbatim} |
| 173 |
|
|
#define ALLOW_DIVIDED_ADJOINT |
| 174 |
|
|
#undef ALLOW_DIVIDED_ADJOINT_MPI |
| 175 |
|
|
\end{verbatim} |
| 176 |
|
|
} |
| 177 |
|
|
|
| 178 |
|
|
\item |
| 179 |
|
|
Generate adjoint code. |
| 180 |
|
|
Using the TAF option '{\tt -pure}', two codes are generated: |
| 181 |
|
|
% |
| 182 |
|
|
\begin{itemize} |
| 183 |
|
|
% |
| 184 |
|
|
\item {\tt mdthe\_main\_loop}: \\ |
| 185 |
|
|
Is responsible for the forward trajectory, storing of outermost |
| 186 |
|
|
checkpoint levels to file, computation of cost function, and |
| 187 |
|
|
storing of cost function to file (1st step). |
| 188 |
|
|
% |
| 189 |
|
|
\item {\tt adthe\_main\_loop}: \\ |
| 190 |
|
|
Is responsible for computing one adjoint leg, dump adjoint state |
| 191 |
|
|
to file and write index info to file (2nd and consecutive steps). |
| 192 |
|
|
|
| 193 |
|
|
for adjoint code generation, e.g. add '{\tt -pure}' to |
| 194 |
|
|
TAF option list |
| 195 |
|
|
{\footnotesize |
| 196 |
|
|
\begin{verbatim} |
| 197 |
|
|
make adtaf |
| 198 |
|
|
\end{verbatim} |
| 199 |
|
|
} |
| 200 |
|
|
% |
| 201 |
|
|
|
| 202 |
|
|
\item |
| 203 |
|
|
One modification needs to be made to adjoint codes in |
| 204 |
|
|
S/R adecco\_the\_main\_loop: |
| 205 |
|
|
|
| 206 |
|
|
There's a remaining issue with the '{\tt -pure}' option. |
| 207 |
|
|
The '{\tt call ad...}' |
| 208 |
|
|
between '{\tt call ad...}' and the read of the {\tt snapshot} file |
| 209 |
|
|
should be called only in the firt adjoint leg between |
| 210 |
|
|
$nlev3$ and $nlev3-1$. |
| 211 |
|
|
In the ecco-branch, the following lines should be |
| 212 |
|
|
bracketed by an {\tt if (idivbeg .GE. nchklev\_3) then}, thus: |
| 213 |
|
|
|
| 214 |
|
|
{\footnotesize |
| 215 |
|
|
\begin{verbatim} |
| 216 |
|
|
|
| 217 |
|
|
... |
| 218 |
|
|
xx_psbar_mean_dummy = onetape_xx_psbar_mean_dummy_3h(1) |
| 219 |
|
|
xx_tbar_mean_dummy = onetape_xx_tbar_mean_dummy_4h(1) |
| 220 |
|
|
xx_sbar_mean_dummy = onetape_xx_sbar_mean_dummy_5h(1) |
| 221 |
|
|
call barrier( mythid ) |
| 222 |
|
|
cAdd( |
| 223 |
|
|
if (idivbeg .GE. nchklev_3) then |
| 224 |
|
|
cAdd) |
| 225 |
|
|
|
| 226 |
|
|
call adcost_final( mythid ) |
| 227 |
|
|
call barrier( mythid ) |
| 228 |
|
|
call adcost_sst( mythid ) |
| 229 |
|
|
call adcost_ssh( mythid ) |
| 230 |
|
|
call adcost_hyd( mythid ) |
| 231 |
|
|
call adcost_averagesfields( mytime,myiter,mythid ) |
| 232 |
|
|
call barrier( mythid ) |
| 233 |
|
|
cAdd( |
| 234 |
|
|
endif |
| 235 |
|
|
cAdd) |
| 236 |
|
|
|
| 237 |
|
|
C---------------------------------------------- |
| 238 |
|
|
C read snapshot |
| 239 |
|
|
C---------------------------------------------- |
| 240 |
|
|
if (idivbeg .lt. nchklev_3) then |
| 241 |
|
|
open(unit=77,file='snapshot',status='old',form='unformatted', |
| 242 |
|
|
$iostat=iers) |
| 243 |
|
|
... |
| 244 |
|
|
|
| 245 |
|
|
\end{verbatim} |
| 246 |
|
|
} |
| 247 |
|
|
|
| 248 |
|
|
For the main code, in all likelihood the block which needs to |
| 249 |
|
|
be bracketed consists of {\tt adcost\_final} only. |
| 250 |
|
|
|
| 251 |
|
|
\item |
| 252 |
|
|
Now the code can be copied as usual to {\tt adjoint\_model.F} |
| 253 |
|
|
and then be compiled: |
| 254 |
|
|
% |
| 255 |
|
|
{\footnotesize |
| 256 |
|
|
\begin{verbatim} |
| 257 |
|
|
make adchange |
| 258 |
|
|
then compile |
| 259 |
|
|
\end{verbatim} |
| 260 |
|
|
} |
| 261 |
|
|
|
| 262 |
|
|
\end{itemize} |
| 263 |
|
|
|
| 264 |
|
|
\end{enumerate} |
| 265 |
|
|
|
| 266 |
|
|
\subsection{Recipe 2: multi processor (MPI)} |
| 267 |
|
|
|
| 268 |
|
|
|
| 269 |
|
|
\begin{enumerate} |
| 270 |
|
|
|
| 271 |
|
|
\item |
| 272 |
|
|
On the machine where you execute the code |
| 273 |
|
|
(most likely not the machine where you run TAF) |
| 274 |
|
|
find the includes directory for MPI containing {\tt mpif.h}. |
| 275 |
|
|
Either copy {\tt mpif.h} to the machine where you generate the |
| 276 |
|
|
{\tt .f} files before TAF-ing, or add the path to the includes |
| 277 |
|
|
directory to you genmake {\tt platform} setup, |
| 278 |
|
|
TAF needs some MPI parameter settings |
| 279 |
|
|
(essentially {\tt mpi\_comm\_world} and {\tt mpi\_integer}) |
| 280 |
|
|
to incorporate those in the adjoint code. |
| 281 |
|
|
|
| 282 |
|
|
\item |
| 283 |
|
|
In {\tt ECCO\_CPPOPTIONS.h} set |
| 284 |
|
|
% |
| 285 |
|
|
{\footnotesize |
| 286 |
|
|
\begin{verbatim} |
| 287 |
|
|
#define ALLOW_DIVIDED_ADJOINT |
| 288 |
|
|
#define ALLOW_DIVIDED_ADJOINT_MPI |
| 289 |
|
|
\end{verbatim} |
| 290 |
|
|
} |
| 291 |
|
|
% |
| 292 |
|
|
This will include the header file {\tt mpif.h} |
| 293 |
|
|
into the top level routine for TAF. |
| 294 |
|
|
|
| 295 |
|
|
\item |
| 296 |
|
|
Add the TAF option '{\tt -mpi}' to the TAF argument list in the makefile. |
| 297 |
|
|
|
| 298 |
|
|
\item |
| 299 |
|
|
Follow the same steps as in {\bf Recipe 1} (previous section). |
| 300 |
|
|
|
| 301 |
|
|
\end{enumerate} |
| 302 |
|
|
|
| 303 |
|
|
That's it. Good luck \& have fun. |
| 304 |
|
|
|