1 |
|
% $Header$ |
2 |
|
|
3 |
In this chapter we describe the software architecture and |
In this chapter we describe the software architecture and |
4 |
implementation strategy for the MITgcm code. The first part of this |
implementation strategy for the MITgcm code. The first part of this |
12 |
three-fold |
three-fold |
13 |
|
|
14 |
\begin{itemize} |
\begin{itemize} |
|
|
|
15 |
\item We wish to be able to study a very broad range |
\item We wish to be able to study a very broad range |
16 |
of interesting and challenging rotating fluids problems. |
of interesting and challenging rotating fluids problems. |
|
|
|
17 |
\item We wish the model code to be readily targeted to |
\item We wish the model code to be readily targeted to |
18 |
a wide range of platforms |
a wide range of platforms |
|
|
|
19 |
\item On any given platform we would like to be |
\item On any given platform we would like to be |
20 |
able to achieve performance comparable to an implementation |
able to achieve performance comparable to an implementation |
21 |
developed and specialized specifically for that platform. |
developed and specialized specifically for that platform. |
|
|
|
22 |
\end{itemize} |
\end{itemize} |
23 |
|
|
24 |
These points are summarized in figure \ref{fig:mitgcm_architecture_goals} |
These points are summarized in figure \ref{fig:mitgcm_architecture_goals} |
27 |
of |
of |
28 |
|
|
29 |
\begin{enumerate} |
\begin{enumerate} |
|
|
|
30 |
\item A core set of numerical and support code. This is discussed in detail in |
\item A core set of numerical and support code. This is discussed in detail in |
31 |
section \ref{sec:partII}. |
section \ref{sec:partII}. |
|
|
|
32 |
\item A scheme for supporting optional "pluggable" {\bf packages} (containing |
\item A scheme for supporting optional "pluggable" {\bf packages} (containing |
33 |
for example mixed-layer schemes, biogeochemical schemes, atmospheric physics). |
for example mixed-layer schemes, biogeochemical schemes, atmospheric physics). |
34 |
These packages are used both to overlay alternate dynamics and to introduce |
These packages are used both to overlay alternate dynamics and to introduce |
35 |
specialized physical content onto the core numerical code. An overview of |
specialized physical content onto the core numerical code. An overview of |
36 |
the {\bf package} scheme is given at the start of part \ref{part:packages}. |
the {\bf package} scheme is given at the start of part \ref{part:packages}. |
|
|
|
|
|
|
37 |
\item A support framework called {\bf WRAPPER} (Wrappable Application Parallel |
\item A support framework called {\bf WRAPPER} (Wrappable Application Parallel |
38 |
Programming Environment Resource), within which the core numerics and pluggable |
Programming Environment Resource), within which the core numerics and pluggable |
39 |
packages operate. |
packages operate. |
|
|
|
40 |
\end{enumerate} |
\end{enumerate} |
41 |
|
|
42 |
This chapter focuses on describing the {\bf WRAPPER} environment under which |
This chapter focuses on describing the {\bf WRAPPER} environment under which |
49 |
starting from an example code and adapting it to suit a particular situation |
starting from an example code and adapting it to suit a particular situation |
50 |
will be all that is required. |
will be all that is required. |
51 |
|
|
52 |
|
|
53 |
\begin{figure} |
\begin{figure} |
54 |
\begin{center} |
\begin{center} |
55 |
\resizebox{!}{2.5in}{ |
\resizebox{!}{2.5in}{\includegraphics{part4/mitgcm_goals.eps}} |
|
\includegraphics*[1.5in,2.4in][9.5in,6.3in]{part4/mitgcm_goals.eps} |
|
|
} |
|
56 |
\end{center} |
\end{center} |
57 |
\caption{The MITgcm architecture is designed to allow simulation of a wide |
\caption{ |
58 |
|
The MITgcm architecture is designed to allow simulation of a wide |
59 |
range of physical problems on a wide range of hardware. The computational |
range of physical problems on a wide range of hardware. The computational |
60 |
resource requirements of the applications targeted range from around |
resource requirements of the applications targeted range from around |
61 |
$10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes |
$10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes |
62 |
( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of |
( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of |
63 |
interest range from $10^{9}$ floating point operations to more than $10^{17}$ |
interest range from $10^{9}$ floating point operations to more than $10^{17}$ |
64 |
floating point operations.} \label{fig:mitgcm_architecture_goals} |
floating point operations.} |
65 |
|
\label{fig:mitgcm_architecture_goals} |
66 |
\end{figure} |
\end{figure} |
67 |
|
|
68 |
\section{WRAPPER} |
\section{WRAPPER} |
80 |
\ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code |
\ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code |
81 |
that fits within it from architectural differences between hardware platforms |
that fits within it from architectural differences between hardware platforms |
82 |
and operating systems. This allows numerical code to be easily retargetted. |
and operating systems. This allows numerical code to be easily retargetted. |
83 |
|
|
84 |
|
|
85 |
\begin{figure} |
\begin{figure} |
86 |
\begin{center} |
\begin{center} |
87 |
\resizebox{6in}{4.5in}{ |
\resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}} |
|
\includegraphics*[0.6in,0.7in][9.0in,8.5in]{part4/fit_in_wrapper.eps} |
|
|
} |
|
88 |
\end{center} |
\end{center} |
89 |
\caption{ Numerical code is written too fit within a software support |
\caption{ |
90 |
|
Numerical code is written too fit within a software support |
91 |
infrastructure called WRAPPER. The WRAPPER is portable and |
infrastructure called WRAPPER. The WRAPPER is portable and |
92 |
can be sepcialized for a wide range of specific target hardware and |
can be sepcialized for a wide range of specific target hardware and |
93 |
programming environments, without impacting numerical code that fits |
programming environments, without impacting numerical code that fits |
94 |
within the WRAPPER. Codes that fit within the WRAPPER can generally be |
within the WRAPPER. Codes that fit within the WRAPPER can generally be |
95 |
made to run as fast on a particular platform as codes specially |
made to run as fast on a particular platform as codes specially |
96 |
optimized for that platform. |
optimized for that platform.} |
97 |
} \label{fig:fit_in_wrapper} |
\label{fig:fit_in_wrapper} |
98 |
\end{figure} |
\end{figure} |
99 |
|
|
100 |
\subsection{Target hardware} |
\subsection{Target hardware} |
180 |
|
|
181 |
\begin{figure} |
\begin{figure} |
182 |
\begin{center} |
\begin{center} |
183 |
\resizebox{7in}{3in}{ |
\resizebox{5in}{!}{ |
184 |
\includegraphics*[0.5in,2.7in][12.5in,6.4in]{part4/domain_decomp.eps} |
\includegraphics{part4/domain_decomp.eps} |
185 |
} |
} |
186 |
\end{center} |
\end{center} |
187 |
\caption{ The WRAPPER provides support for one and two dimensional |
\caption{ The WRAPPER provides support for one and two dimensional |
216 |
|
|
217 |
\begin{figure} |
\begin{figure} |
218 |
\begin{center} |
\begin{center} |
219 |
\resizebox{7in}{3in}{ |
\resizebox{5in}{!}{ |
220 |
\includegraphics*[4.5in,3.7in][12.5in,6.7in]{part4/tiled-world.eps} |
\includegraphics{part4/tiled-world.eps} |
221 |
} |
} |
222 |
\end{center} |
\end{center} |
223 |
\caption{ A global grid subdivided into tiles. |
\caption{ A global grid subdivided into tiles. |
398 |
|
|
399 |
\begin{figure} |
\begin{figure} |
400 |
\begin{center} |
\begin{center} |
401 |
\resizebox{5in}{3in}{ |
\resizebox{5in}{!}{ |
402 |
\includegraphics*[1.5in,0.7in][7.9in,4.4in]{part4/comm-primm.eps} |
\includegraphics{part4/comm-primm.eps} |
403 |
} |
} |
404 |
\end{center} |
\end{center} |
405 |
\caption{Three performance critical parallel primititives are provided |
\caption{Three performance critical parallel primititives are provided |
479 |
|
|
480 |
\begin{figure} |
\begin{figure} |
481 |
\begin{center} |
\begin{center} |
482 |
\resizebox{5in}{3in}{ |
\resizebox{5in}{!}{ |
483 |
\includegraphics*[0.5in,1.3in][7.9in,5.7in]{part4/tiling_detail.eps} |
\includegraphics{part4/tiling_detail.eps} |
484 |
} |
} |
485 |
\end{center} |
\end{center} |
486 |
\caption{The tiling strategy that the WRAPPER supports allows tiles |
\caption{The tiling strategy that the WRAPPER supports allows tiles |
583 |
|
|
584 |
\begin{figure} |
\begin{figure} |
585 |
\begin{center} |
\begin{center} |
586 |
\resizebox{5in}{7in}{ |
\resizebox{5in}{!}{ |
587 |
\includegraphics*[0.5in,0.3in][7.9in,10.7in]{part4/size_h.eps} |
\includegraphics{part4/size_h.eps} |
588 |
} |
} |
589 |
\end{center} |
\end{center} |
590 |
\caption{ The three level domain decomposition hierarchy employed by the |
\caption{ The three level domain decomposition hierarchy employed by the |
806 |
by the application code. The startup calling sequence followed by the |
by the application code. The startup calling sequence followed by the |
807 |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
808 |
|
|
|
|
|
809 |
\begin{figure} |
\begin{figure} |
810 |
\begin{verbatim} |
\begin{verbatim} |
811 |
|
|
897 |
\end{enumerate} |
\end{enumerate} |
898 |
|
|
899 |
|
|
|
\paragraph{Environment variables} |
|
|
On most systems multi-threaded execution also requires the setting |
|
|
of a special environment variable. On many machines this variable |
|
|
is called PARALLEL and its values should be set to the number |
|
|
of parallel threads required. Generally the help pages associated |
|
|
with the multi-threaded compiler on a machine will explain |
|
|
how to set the required environment variables for that machines. |
|
|
|
|
|
\paragraph{Runtime input parameters} |
|
|
Finally the file {\em eedata} needs to be configured to indicate |
|
|
the number of threads to be used in the x and y directions. |
|
|
The variables {\em nTx} and {\em nTy} in this file are used to |
|
|
specify the information required. The product of {\em nTx} and |
|
|
{\em nTy} must be equal to the number of threads spawned i.e. |
|
|
the setting of the environment variable PARALLEL. |
|
|
The value of {\em nTx} must subdivide the number of sub-domains |
|
|
in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the |
|
|
number of sub-domains in y ({\em nSy}) exactly. |
|
|
|
|
900 |
An example of valid settings for the {\em eedata} file for a |
An example of valid settings for the {\em eedata} file for a |
901 |
domain with two subdomains in y and running with two threads is shown |
domain with two subdomains in y and running with two threads is shown |
902 |
below |
below |
1020 |
\end{minipage} |
\end{minipage} |
1021 |
} \\ |
} \\ |
1022 |
|
|
1023 |
|
|
1024 |
|
\paragraph{Environment variables} |
1025 |
|
On most systems multi-threaded execution also requires the setting |
1026 |
|
of a special environment variable. On many machines this variable |
1027 |
|
is called PARALLEL and its values should be set to the number |
1028 |
|
of parallel threads required. Generally the help pages associated |
1029 |
|
with the multi-threaded compiler on a machine will explain |
1030 |
|
how to set the required environment variables for that machines. |
1031 |
|
|
1032 |
|
\paragraph{Runtime input parameters} |
1033 |
|
Finally the file {\em eedata} needs to be configured to indicate |
1034 |
|
the number of threads to be used in the x and y directions. |
1035 |
|
The variables {\em nTx} and {\em nTy} in this file are used to |
1036 |
|
specify the information required. The product of {\em nTx} and |
1037 |
|
{\em nTy} must be equal to the number of threads spawned i.e. |
1038 |
|
the setting of the environment variable PARALLEL. |
1039 |
|
The value of {\em nTx} must subdivide the number of sub-domains |
1040 |
|
in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the |
1041 |
|
number of sub-domains in y ({\em nSy}) exactly. |
1042 |
The multiprocess startup of the MITgcm executable {\em mitgcmuv} |
The multiprocess startup of the MITgcm executable {\em mitgcmuv} |
1043 |
is controlled by the routines {\em EEBOOT\_MINIMAL()} and |
is controlled by the routines {\em EEBOOT\_MINIMAL()} and |
1044 |
{\em INI\_PROCS()}. The first routine performs basic steps required |
{\em INI\_PROCS()}. The first routine performs basic steps required |
1051 |
output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files |
output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files |
1052 |
are used for reporting status and configuration information and |
are used for reporting status and configuration information and |
1053 |
for reporting error conditions on a process by process basis. |
for reporting error conditions on a process by process basis. |
1054 |
The {{\em EEBOOT\_MINIMAL()} procedure also sets the variables |
The {\em EEBOOT\_MINIMAL()} procedure also sets the variables |
1055 |
{\em myProcId} and {\em MPI\_COMM\_MODEL}. |
{\em myProcId} and {\em MPI\_COMM\_MODEL}. |
1056 |
These variables are related |
These variables are related |
1057 |
to processor identification are are used later in the routine |
to processor identification are are used later in the routine |
1092 |
The WRAPPER maintains internal information that is used for communication |
The WRAPPER maintains internal information that is used for communication |
1093 |
operations and that can be customized for different platforms. This section |
operations and that can be customized for different platforms. This section |
1094 |
describes the information that is held and used. |
describes the information that is held and used. |
1095 |
|
|
1096 |
\begin{enumerate} |
\begin{enumerate} |
1097 |
\item {\bf Tile-tile connectivity information} For each tile the WRAPPER |
\item {\bf Tile-tile connectivity information} For each tile the WRAPPER |
1098 |
sets a flag that sets the tile number to the north, south, east and |
sets a flag that sets the tile number to the north, south, east and |
1180 |
\end{minipage} |
\end{minipage} |
1181 |
} |
} |
1182 |
|
|
|
\begin{figure} |
|
|
\begin{verbatim} |
|
|
C-- |
|
|
C-- Parallel directives for MIPS Pro Fortran compiler |
|
|
C-- |
|
|
C Parallel compiler directives for SGI with IRIX |
|
|
C$PAR PARALLEL DO |
|
|
C$PAR& CHUNK=1,MP_SCHEDTYPE=INTERLEAVE, |
|
|
C$PAR& SHARE(nThreads),LOCAL(myThid,I) |
|
|
C |
|
|
DO I=1,nThreads |
|
|
myThid = I |
|
|
|
|
|
C-- Invoke nThreads instances of the numerical model |
|
|
CALL THE_MODEL_MAIN(myThid) |
|
|
|
|
|
ENDDO |
|
|
\end{verbatim} |
|
|
\caption{Prior to transferring control to |
|
|
the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use |
|
|
MP directives to spawn multiple threads. |
|
|
} \label{fig:mp_directives} |
|
|
\end{figure} |
|
|
|
|
|
|
|
1183 |
\item {\bf memsync flags} |
\item {\bf memsync flags} |
1184 |
As discussed in section \ref{sec:memory_consistency}, when using shared memory, |
As discussed in section \ref{sec:memory_consistency}, when using shared memory, |
1185 |
a low-level system function may be need to force memory consistency. |
a low-level system function may be need to force memory consistency. |
1313 |
because there has to be enough storage allocated for all tiles. |
because there has to be enough storage allocated for all tiles. |
1314 |
However, the technique can sometimes be a useful scheme for reducing memory |
However, the technique can sometimes be a useful scheme for reducing memory |
1315 |
requirements in complex physical paramterisations. |
requirements in complex physical paramterisations. |
|
|
|
1316 |
\end{enumerate} |
\end{enumerate} |
1317 |
|
|
1318 |
|
\begin{figure} |
1319 |
|
\begin{verbatim} |
1320 |
|
C-- |
1321 |
|
C-- Parallel directives for MIPS Pro Fortran compiler |
1322 |
|
C-- |
1323 |
|
C Parallel compiler directives for SGI with IRIX |
1324 |
|
C$PAR PARALLEL DO |
1325 |
|
C$PAR& CHUNK=1,MP_SCHEDTYPE=INTERLEAVE, |
1326 |
|
C$PAR& SHARE(nThreads),LOCAL(myThid,I) |
1327 |
|
C |
1328 |
|
DO I=1,nThreads |
1329 |
|
myThid = I |
1330 |
|
|
1331 |
|
C-- Invoke nThreads instances of the numerical model |
1332 |
|
CALL THE_MODEL_MAIN(myThid) |
1333 |
|
|
1334 |
|
ENDDO |
1335 |
|
\end{verbatim} |
1336 |
|
\caption{Prior to transferring control to |
1337 |
|
the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use |
1338 |
|
MP directives to spawn multiple threads. |
1339 |
|
} \label{fig:mp_directives} |
1340 |
|
\end{figure} |
1341 |
|
|
1342 |
|
|
1343 |
\subsubsection{Specializing the Communication Code} |
\subsubsection{Specializing the Communication Code} |
1344 |
|
|
1345 |
The isolation of performance critical communication primitives and the |
The isolation of performance critical communication primitives and the |