| 1 |
|
% $Header$ |
| 2 |
|
|
| 3 |
In this chapter we describe the software architecture and |
In this chapter we describe the software architecture and |
| 4 |
implementation strategy for the MITgcm code. The first part of this |
implementation strategy for the MITgcm code. The first part of this |
| 12 |
three-fold |
three-fold |
| 13 |
|
|
| 14 |
\begin{itemize} |
\begin{itemize} |
|
|
|
| 15 |
\item We wish to be able to study a very broad range |
\item We wish to be able to study a very broad range |
| 16 |
of interesting and challenging rotating fluids problems. |
of interesting and challenging rotating fluids problems. |
|
|
|
| 17 |
\item We wish the model code to be readily targeted to |
\item We wish the model code to be readily targeted to |
| 18 |
a wide range of platforms |
a wide range of platforms |
|
|
|
| 19 |
\item On any given platform we would like to be |
\item On any given platform we would like to be |
| 20 |
able to achieve performance comparable to an implementation |
able to achieve performance comparable to an implementation |
| 21 |
developed and specialized specifically for that platform. |
developed and specialized specifically for that platform. |
|
|
|
| 22 |
\end{itemize} |
\end{itemize} |
| 23 |
|
|
| 24 |
These points are summarized in figure \ref{fig:mitgcm_architecture_goals} |
These points are summarized in figure \ref{fig:mitgcm_architecture_goals} |
| 27 |
of |
of |
| 28 |
|
|
| 29 |
\begin{enumerate} |
\begin{enumerate} |
|
|
|
| 30 |
\item A core set of numerical and support code. This is discussed in detail in |
\item A core set of numerical and support code. This is discussed in detail in |
| 31 |
section \ref{sec:partII}. |
section \ref{sec:partII}. |
|
|
|
| 32 |
\item A scheme for supporting optional "pluggable" {\bf packages} (containing |
\item A scheme for supporting optional "pluggable" {\bf packages} (containing |
| 33 |
for example mixed-layer schemes, biogeochemical schemes, atmospheric physics). |
for example mixed-layer schemes, biogeochemical schemes, atmospheric physics). |
| 34 |
These packages are used both to overlay alternate dynamics and to introduce |
These packages are used both to overlay alternate dynamics and to introduce |
| 35 |
specialized physical content onto the core numerical code. An overview of |
specialized physical content onto the core numerical code. An overview of |
| 36 |
the {\bf package} scheme is given at the start of part \ref{part:packages}. |
the {\bf package} scheme is given at the start of part \ref{part:packages}. |
|
|
|
|
|
|
| 37 |
\item A support framework called {\bf WRAPPER} (Wrappable Application Parallel |
\item A support framework called {\bf WRAPPER} (Wrappable Application Parallel |
| 38 |
Programming Environment Resource), within which the core numerics and pluggable |
Programming Environment Resource), within which the core numerics and pluggable |
| 39 |
packages operate. |
packages operate. |
|
|
|
| 40 |
\end{enumerate} |
\end{enumerate} |
| 41 |
|
|
| 42 |
This chapter focuses on describing the {\bf WRAPPER} environment under which |
This chapter focuses on describing the {\bf WRAPPER} environment under which |
| 49 |
starting from an example code and adapting it to suit a particular situation |
starting from an example code and adapting it to suit a particular situation |
| 50 |
will be all that is required. |
will be all that is required. |
| 51 |
|
|
| 52 |
|
|
| 53 |
\begin{figure} |
\begin{figure} |
| 54 |
\begin{center} |
\begin{center} |
| 55 |
\resizebox{!}{2.5in}{ |
\resizebox{!}{2.5in}{\includegraphics{part4/mitgcm_goals.eps}} |
|
\includegraphics*[1.5in,2.4in][9.5in,6.3in]{part4/mitgcm_goals.eps} |
|
|
} |
|
| 56 |
\end{center} |
\end{center} |
| 57 |
\caption{The MITgcm architecture is designed to allow simulation of a wide |
\caption{ |
| 58 |
|
The MITgcm architecture is designed to allow simulation of a wide |
| 59 |
range of physical problems on a wide range of hardware. The computational |
range of physical problems on a wide range of hardware. The computational |
| 60 |
resource requirements of the applications targeted range from around |
resource requirements of the applications targeted range from around |
| 61 |
$10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes |
$10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes |
| 62 |
( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of |
( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of |
| 63 |
interest range from $10^{9}$ floating point operations to more than $10^{17}$ |
interest range from $10^{9}$ floating point operations to more than $10^{17}$ |
| 64 |
floating point operations.} \label{fig:mitgcm_architecture_goals} |
floating point operations.} |
| 65 |
|
\label{fig:mitgcm_architecture_goals} |
| 66 |
\end{figure} |
\end{figure} |
| 67 |
|
|
| 68 |
\section{WRAPPER} |
\section{WRAPPER} |
| 80 |
\ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code |
\ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code |
| 81 |
that fits within it from architectural differences between hardware platforms |
that fits within it from architectural differences between hardware platforms |
| 82 |
and operating systems. This allows numerical code to be easily retargetted. |
and operating systems. This allows numerical code to be easily retargetted. |
| 83 |
|
|
| 84 |
|
|
| 85 |
\begin{figure} |
\begin{figure} |
| 86 |
\begin{center} |
\begin{center} |
| 87 |
\resizebox{6in}{4.5in}{ |
\resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}} |
|
\includegraphics*[0.6in,0.7in][9.0in,8.5in]{part4/fit_in_wrapper.eps} |
|
|
} |
|
| 88 |
\end{center} |
\end{center} |
| 89 |
\caption{ Numerical code is written too fit within a software support |
\caption{ |
| 90 |
|
Numerical code is written too fit within a software support |
| 91 |
infrastructure called WRAPPER. The WRAPPER is portable and |
infrastructure called WRAPPER. The WRAPPER is portable and |
| 92 |
can be sepcialized for a wide range of specific target hardware and |
can be sepcialized for a wide range of specific target hardware and |
| 93 |
programming environments, without impacting numerical code that fits |
programming environments, without impacting numerical code that fits |
| 94 |
within the WRAPPER. Codes that fit within the WRAPPER can generally be |
within the WRAPPER. Codes that fit within the WRAPPER can generally be |
| 95 |
made to run as fast on a particular platform as codes specially |
made to run as fast on a particular platform as codes specially |
| 96 |
optimized for that platform. |
optimized for that platform.} |
| 97 |
} \label{fig:fit_in_wrapper} |
\label{fig:fit_in_wrapper} |
| 98 |
\end{figure} |
\end{figure} |
| 99 |
|
|
| 100 |
\subsection{Target hardware} |
\subsection{Target hardware} |
| 180 |
|
|
| 181 |
\begin{figure} |
\begin{figure} |
| 182 |
\begin{center} |
\begin{center} |
| 183 |
\resizebox{7in}{3in}{ |
\resizebox{5in}{!}{ |
| 184 |
\includegraphics*[0.5in,2.7in][12.5in,6.4in]{part4/domain_decomp.eps} |
\includegraphics{part4/domain_decomp.eps} |
| 185 |
} |
} |
| 186 |
\end{center} |
\end{center} |
| 187 |
\caption{ The WRAPPER provides support for one and two dimensional |
\caption{ The WRAPPER provides support for one and two dimensional |
| 216 |
|
|
| 217 |
\begin{figure} |
\begin{figure} |
| 218 |
\begin{center} |
\begin{center} |
| 219 |
\resizebox{7in}{3in}{ |
\resizebox{5in}{!}{ |
| 220 |
\includegraphics*[4.5in,3.7in][12.5in,6.7in]{part4/tiled-world.eps} |
\includegraphics{part4/tiled-world.eps} |
| 221 |
} |
} |
| 222 |
\end{center} |
\end{center} |
| 223 |
\caption{ A global grid subdivided into tiles. |
\caption{ A global grid subdivided into tiles. |
| 398 |
|
|
| 399 |
\begin{figure} |
\begin{figure} |
| 400 |
\begin{center} |
\begin{center} |
| 401 |
\resizebox{5in}{3in}{ |
\resizebox{5in}{!}{ |
| 402 |
\includegraphics*[1.5in,0.7in][7.9in,4.4in]{part4/comm-primm.eps} |
\includegraphics{part4/comm-primm.eps} |
| 403 |
} |
} |
| 404 |
\end{center} |
\end{center} |
| 405 |
\caption{Three performance critical parallel primititives are provided |
\caption{Three performance critical parallel primititives are provided |
| 479 |
|
|
| 480 |
\begin{figure} |
\begin{figure} |
| 481 |
\begin{center} |
\begin{center} |
| 482 |
\resizebox{5in}{3in}{ |
\resizebox{5in}{!}{ |
| 483 |
\includegraphics*[0.5in,1.3in][7.9in,5.7in]{part4/tiling_detail.eps} |
\includegraphics{part4/tiling_detail.eps} |
| 484 |
} |
} |
| 485 |
\end{center} |
\end{center} |
| 486 |
\caption{The tiling strategy that the WRAPPER supports allows tiles |
\caption{The tiling strategy that the WRAPPER supports allows tiles |
| 583 |
|
|
| 584 |
\begin{figure} |
\begin{figure} |
| 585 |
\begin{center} |
\begin{center} |
| 586 |
\resizebox{5in}{7in}{ |
\resizebox{5in}{!}{ |
| 587 |
\includegraphics*[0.5in,0.3in][7.9in,10.7in]{part4/size_h.eps} |
\includegraphics{part4/size_h.eps} |
| 588 |
} |
} |
| 589 |
\end{center} |
\end{center} |
| 590 |
\caption{ The three level domain decomposition hierarchy employed by the |
\caption{ The three level domain decomposition hierarchy employed by the |
| 806 |
by the application code. The startup calling sequence followed by the |
by the application code. The startup calling sequence followed by the |
| 807 |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
| 808 |
|
|
|
|
|
| 809 |
\begin{figure} |
\begin{figure} |
| 810 |
\begin{verbatim} |
\begin{verbatim} |
| 811 |
|
|
| 897 |
\end{enumerate} |
\end{enumerate} |
| 898 |
|
|
| 899 |
|
|
|
\paragraph{Environment variables} |
|
|
On most systems multi-threaded execution also requires the setting |
|
|
of a special environment variable. On many machines this variable |
|
|
is called PARALLEL and its values should be set to the number |
|
|
of parallel threads required. Generally the help pages associated |
|
|
with the multi-threaded compiler on a machine will explain |
|
|
how to set the required environment variables for that machines. |
|
|
|
|
|
\paragraph{Runtime input parameters} |
|
|
Finally the file {\em eedata} needs to be configured to indicate |
|
|
the number of threads to be used in the x and y directions. |
|
|
The variables {\em nTx} and {\em nTy} in this file are used to |
|
|
specify the information required. The product of {\em nTx} and |
|
|
{\em nTy} must be equal to the number of threads spawned i.e. |
|
|
the setting of the environment variable PARALLEL. |
|
|
The value of {\em nTx} must subdivide the number of sub-domains |
|
|
in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the |
|
|
number of sub-domains in y ({\em nSy}) exactly. |
|
|
|
|
| 900 |
An example of valid settings for the {\em eedata} file for a |
An example of valid settings for the {\em eedata} file for a |
| 901 |
domain with two subdomains in y and running with two threads is shown |
domain with two subdomains in y and running with two threads is shown |
| 902 |
below |
below |
| 1020 |
\end{minipage} |
\end{minipage} |
| 1021 |
} \\ |
} \\ |
| 1022 |
|
|
| 1023 |
|
|
| 1024 |
|
\paragraph{Environment variables} |
| 1025 |
|
On most systems multi-threaded execution also requires the setting |
| 1026 |
|
of a special environment variable. On many machines this variable |
| 1027 |
|
is called PARALLEL and its values should be set to the number |
| 1028 |
|
of parallel threads required. Generally the help pages associated |
| 1029 |
|
with the multi-threaded compiler on a machine will explain |
| 1030 |
|
how to set the required environment variables for that machines. |
| 1031 |
|
|
| 1032 |
|
\paragraph{Runtime input parameters} |
| 1033 |
|
Finally the file {\em eedata} needs to be configured to indicate |
| 1034 |
|
the number of threads to be used in the x and y directions. |
| 1035 |
|
The variables {\em nTx} and {\em nTy} in this file are used to |
| 1036 |
|
specify the information required. The product of {\em nTx} and |
| 1037 |
|
{\em nTy} must be equal to the number of threads spawned i.e. |
| 1038 |
|
the setting of the environment variable PARALLEL. |
| 1039 |
|
The value of {\em nTx} must subdivide the number of sub-domains |
| 1040 |
|
in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the |
| 1041 |
|
number of sub-domains in y ({\em nSy}) exactly. |
| 1042 |
The multiprocess startup of the MITgcm executable {\em mitgcmuv} |
The multiprocess startup of the MITgcm executable {\em mitgcmuv} |
| 1043 |
is controlled by the routines {\em EEBOOT\_MINIMAL()} and |
is controlled by the routines {\em EEBOOT\_MINIMAL()} and |
| 1044 |
{\em INI\_PROCS()}. The first routine performs basic steps required |
{\em INI\_PROCS()}. The first routine performs basic steps required |
| 1051 |
output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files |
output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files |
| 1052 |
are used for reporting status and configuration information and |
are used for reporting status and configuration information and |
| 1053 |
for reporting error conditions on a process by process basis. |
for reporting error conditions on a process by process basis. |
| 1054 |
The {{\em EEBOOT\_MINIMAL()} procedure also sets the variables |
The {\em EEBOOT\_MINIMAL()} procedure also sets the variables |
| 1055 |
{\em myProcId} and {\em MPI\_COMM\_MODEL}. |
{\em myProcId} and {\em MPI\_COMM\_MODEL}. |
| 1056 |
These variables are related |
These variables are related |
| 1057 |
to processor identification are are used later in the routine |
to processor identification are are used later in the routine |
| 1092 |
The WRAPPER maintains internal information that is used for communication |
The WRAPPER maintains internal information that is used for communication |
| 1093 |
operations and that can be customized for different platforms. This section |
operations and that can be customized for different platforms. This section |
| 1094 |
describes the information that is held and used. |
describes the information that is held and used. |
| 1095 |
|
|
| 1096 |
\begin{enumerate} |
\begin{enumerate} |
| 1097 |
\item {\bf Tile-tile connectivity information} For each tile the WRAPPER |
\item {\bf Tile-tile connectivity information} For each tile the WRAPPER |
| 1098 |
sets a flag that sets the tile number to the north, south, east and |
sets a flag that sets the tile number to the north, south, east and |
| 1180 |
\end{minipage} |
\end{minipage} |
| 1181 |
} |
} |
| 1182 |
|
|
|
\begin{figure} |
|
|
\begin{verbatim} |
|
|
C-- |
|
|
C-- Parallel directives for MIPS Pro Fortran compiler |
|
|
C-- |
|
|
C Parallel compiler directives for SGI with IRIX |
|
|
C$PAR PARALLEL DO |
|
|
C$PAR& CHUNK=1,MP_SCHEDTYPE=INTERLEAVE, |
|
|
C$PAR& SHARE(nThreads),LOCAL(myThid,I) |
|
|
C |
|
|
DO I=1,nThreads |
|
|
myThid = I |
|
|
|
|
|
C-- Invoke nThreads instances of the numerical model |
|
|
CALL THE_MODEL_MAIN(myThid) |
|
|
|
|
|
ENDDO |
|
|
\end{verbatim} |
|
|
\caption{Prior to transferring control to |
|
|
the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use |
|
|
MP directives to spawn multiple threads. |
|
|
} \label{fig:mp_directives} |
|
|
\end{figure} |
|
|
|
|
|
|
|
| 1183 |
\item {\bf memsync flags} |
\item {\bf memsync flags} |
| 1184 |
As discussed in section \ref{sec:memory_consistency}, when using shared memory, |
As discussed in section \ref{sec:memory_consistency}, when using shared memory, |
| 1185 |
a low-level system function may be need to force memory consistency. |
a low-level system function may be need to force memory consistency. |
| 1313 |
because there has to be enough storage allocated for all tiles. |
because there has to be enough storage allocated for all tiles. |
| 1314 |
However, the technique can sometimes be a useful scheme for reducing memory |
However, the technique can sometimes be a useful scheme for reducing memory |
| 1315 |
requirements in complex physical paramterisations. |
requirements in complex physical paramterisations. |
|
|
|
| 1316 |
\end{enumerate} |
\end{enumerate} |
| 1317 |
|
|
| 1318 |
|
\begin{figure} |
| 1319 |
|
\begin{verbatim} |
| 1320 |
|
C-- |
| 1321 |
|
C-- Parallel directives for MIPS Pro Fortran compiler |
| 1322 |
|
C-- |
| 1323 |
|
C Parallel compiler directives for SGI with IRIX |
| 1324 |
|
C$PAR PARALLEL DO |
| 1325 |
|
C$PAR& CHUNK=1,MP_SCHEDTYPE=INTERLEAVE, |
| 1326 |
|
C$PAR& SHARE(nThreads),LOCAL(myThid,I) |
| 1327 |
|
C |
| 1328 |
|
DO I=1,nThreads |
| 1329 |
|
myThid = I |
| 1330 |
|
|
| 1331 |
|
C-- Invoke nThreads instances of the numerical model |
| 1332 |
|
CALL THE_MODEL_MAIN(myThid) |
| 1333 |
|
|
| 1334 |
|
ENDDO |
| 1335 |
|
\end{verbatim} |
| 1336 |
|
\caption{Prior to transferring control to |
| 1337 |
|
the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use |
| 1338 |
|
MP directives to spawn multiple threads. |
| 1339 |
|
} \label{fig:mp_directives} |
| 1340 |
|
\end{figure} |
| 1341 |
|
|
| 1342 |
|
|
| 1343 |
\subsubsection{Specializing the Communication Code} |
\subsubsection{Specializing the Communication Code} |
| 1344 |
|
|
| 1345 |
The isolation of performance critical communication primitives and the |
The isolation of performance critical communication primitives and the |