4 |
both the core numerics and the pluggable packages operate. The description |
both the core numerics and the pluggable packages operate. The description |
5 |
presented here is intended to be a detailed exposition and contains significant |
presented here is intended to be a detailed exposition and contains significant |
6 |
background material, as well as advanced details on working with the WRAPPER. |
background material, as well as advanced details on working with the WRAPPER. |
7 |
The tutorial sections of this manual (see Chapters |
The tutorial sections of this manual (see sections |
8 |
\ref{chap:tutorialI}, \ref{chap:tutorialII} and \ref{chap:tutorialIII}) |
\ref{sect:tutorials} and \ref{sect:tutorialIII}) |
9 |
contain more succinct, step-by-step instructions on running basic numerical |
contain more succinct, step-by-step instructions on running basic numerical |
10 |
experiments, of varous types, both sequentially and in parallel. For many |
experiments, of varous types, both sequentially and in parallel. For many |
11 |
projects simply starting from an example code and adapting it to suit a |
projects simply starting from an example code and adapting it to suit a |
97 |
\resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}} |
\resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}} |
98 |
\end{center} |
\end{center} |
99 |
\caption{ |
\caption{ |
100 |
Numerical code is written too fit within a software support |
Numerical code is written to fit within a software support |
101 |
infrastructure called WRAPPER. The WRAPPER is portable and |
infrastructure called WRAPPER. The WRAPPER is portable and |
102 |
can be specialized for a wide range of specific target hardware and |
can be specialized for a wide range of specific target hardware and |
103 |
programming environments, without impacting numerical code that fits |
programming environments, without impacting numerical code that fits |
120 |
(UMA) and non-uniform memory access (NUMA) designs. Significant work has also |
(UMA) and non-uniform memory access (NUMA) designs. Significant work has also |
121 |
been undertaken on x86 cluster systems, Alpha processor based clustered SMP |
been undertaken on x86 cluster systems, Alpha processor based clustered SMP |
122 |
systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics. |
systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics. |
123 |
The MITgcm code, operating within the WRAPPER, is also used routinely used on |
The MITgcm code, operating within the WRAPPER, is also routinely used on |
124 |
large scale MPP systems (for example T3E systems and IBM SP systems). In all |
large scale MPP systems (for example T3E systems and IBM SP systems). In all |
125 |
cases numerical code, operating within the WRAPPER, performs and scales very |
cases numerical code, operating within the WRAPPER, performs and scales very |
126 |
competitively with equivalent numerical code that has been modified to contain |
competitively with equivalent numerical code that has been modified to contain |
661 |
computation is performed concurrently over as many processes and threads |
computation is performed concurrently over as many processes and threads |
662 |
as there are physical processors available to compute. |
as there are physical processors available to compute. |
663 |
|
|
664 |
|
An exception to the the use of {\em bi} and {\em bj} in loops arises in the |
665 |
|
exchange routines used when the exch2 package is used with the cubed |
666 |
|
sphere. In this case {\em bj} is generally set to 1 and the loop runs from |
667 |
|
1,{\em bi}. Within the loop {\em bi} is used to retrieve the tile number, |
668 |
|
which is then used to reference exchange parameters. |
669 |
|
|
670 |
The amount of computation that can be embedded |
The amount of computation that can be embedded |
671 |
a single loop over {\em bi} and {\em bj} varies for different parts of the |
a single loop over {\em bi} and {\em bj} varies for different parts of the |
672 |
MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract |
MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract |
787 |
forty grid points in y. The two sub-domains in each process will be computed |
forty grid points in y. The two sub-domains in each process will be computed |
788 |
sequentially if they are given to a single thread within a single process. |
sequentially if they are given to a single thread within a single process. |
789 |
Alternatively if the code is invoked with multiple threads per process |
Alternatively if the code is invoked with multiple threads per process |
790 |
the two domains in y may be computed on concurrently. |
the two domains in y may be computed concurrently. |
791 |
\item |
\item |
792 |
\begin{verbatim} |
\begin{verbatim} |
793 |
PARAMETER ( |
PARAMETER ( |
823 |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
WRAPPER is shown in figure \ref{fig:wrapper_startup}. |
824 |
|
|
825 |
\begin{figure} |
\begin{figure} |
826 |
|
{\footnotesize |
827 |
\begin{verbatim} |
\begin{verbatim} |
828 |
|
|
829 |
MAIN |
MAIN |
852 |
|
|
853 |
|
|
854 |
\end{verbatim} |
\end{verbatim} |
855 |
|
} |
856 |
\caption{Main stages of the WRAPPER startup procedure. |
\caption{Main stages of the WRAPPER startup procedure. |
857 |
This process proceeds transfer of control to application code, which |
This process proceeds transfer of control to application code, which |
858 |
occurs through the procedure {\em THE\_MODEL\_MAIN()}. |
occurs through the procedure {\em THE\_MODEL\_MAIN()}. |
935 |
File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\ |
File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\ |
936 |
File: {\em model/src/THE\_MODEL\_MAIN.F}\\ |
File: {\em model/src/THE\_MODEL\_MAIN.F}\\ |
937 |
File: {\em eesupp/src/MAIN.F}\\ |
File: {\em eesupp/src/MAIN.F}\\ |
938 |
File: {\em tools/genmake}\\ |
File: {\em tools/genmake2}\\ |
939 |
File: {\em eedata}\\ |
File: {\em eedata}\\ |
940 |
CPP: {\em TARGET\_SUN}\\ |
CPP: {\em TARGET\_SUN}\\ |
941 |
CPP: {\em TARGET\_DEC}\\ |
CPP: {\em TARGET\_DEC}\\ |
974 |
of controlling and coordinating the start up of a large number |
of controlling and coordinating the start up of a large number |
975 |
(hundreds and possibly even thousands) of copies of the same |
(hundreds and possibly even thousands) of copies of the same |
976 |
program, MPI is used. The calls to the MPI multi-process startup |
program, MPI is used. The calls to the MPI multi-process startup |
977 |
routines must be activated at compile time. This is done |
routines must be activated at compile time. Currently MPI libraries are |
978 |
by setting the {\em ALLOW\_USE\_MPI} and {\em ALWAYS\_USE\_MPI} |
invoked by |
979 |
flags in the {\em CPP\_EEOPTIONS.h} file.\\ |
specifying the appropriate options file with the |
980 |
|
\begin{verbatim}-of\end{verbatim} flag when running the {\em genmake2} |
981 |
|
script, which generates the Makefile for compiling and linking MITgcm. |
982 |
|
(Previously this was done by setting the {\em ALLOW\_USE\_MPI} and |
983 |
|
{\em ALWAYS\_USE\_MPI} flags in the {\em CPP\_EEOPTIONS.h} file.) More |
984 |
|
detailed information about the use of {\em genmake2} for specifying |
985 |
|
local compiler flags is located in section 3 ??\\ |
986 |
|
|
|
\fbox{ |
|
|
\begin{minipage}{4.75in} |
|
|
File: {\em eesupp/inc/CPP\_EEOPTIONS.h}\\ |
|
|
CPP: {\em ALLOW\_USE\_MPI}\\ |
|
|
CPP: {\em ALWAYS\_USE\_MPI}\\ |
|
|
Parameter: {\em nPx}\\ |
|
|
Parameter: {\em nPy} |
|
|
\end{minipage} |
|
|
} \\ |
|
|
|
|
|
Additionally, compile time options are required to link in the |
|
|
MPI libraries and header files. Examples of these options |
|
|
can be found in the {\em genmake} script that creates makefiles |
|
|
for compilation. When this script is executed with the {bf -mpi} |
|
|
flag it will generate a makefile that includes |
|
|
paths for search for MPI head files and for linking in |
|
|
MPI libraries. For example the {\bf -mpi} flag on a |
|
|
Silicon Graphics IRIX system causes a |
|
|
Makefile with the compilation command |
|
|
Graphics IRIX system \begin{verbatim} |
|
|
mpif77 -I/usr/local/mpi/include -DALLOW_USE_MPI -DALWAYS_USE_MPI |
|
|
\end{verbatim} |
|
|
to be generated. |
|
|
This is the correct set of options for using the MPICH open-source |
|
|
version of MPI, when it has been installed under the subdirectory |
|
|
/usr/local/mpi. |
|
|
However, on many systems there may be several |
|
|
versions of MPI installed. For example many systems have both |
|
|
the open source MPICH set of libraries and a vendor specific native form |
|
|
of the MPI libraries. The correct setup to use will depend on the |
|
|
local configuration of your system.\\ |
|
987 |
|
|
988 |
\fbox{ |
\fbox{ |
989 |
\begin{minipage}{4.75in} |
\begin{minipage}{4.75in} |
990 |
File: {\em tools/genmake} |
File: {\em tools/genmake2} |
991 |
\end{minipage} |
\end{minipage} |
992 |
} \\ |
} \\ |
993 |
\paragraph{\bf Execution} The mechanics of starting a program in |
\paragraph{\bf Execution} The mechanics of starting a program in |
1064 |
processes holding tiles to the west, east, south and north |
processes holding tiles to the west, east, south and north |
1065 |
of this process. These values are stored in global storage |
of this process. These values are stored in global storage |
1066 |
in the header file {\em EESUPPORT.h} for use by |
in the header file {\em EESUPPORT.h} for use by |
1067 |
communication routines. |
communication routines. The above does not hold when the |
1068 |
|
exch2 package is used -- exch2 sets its own parameters to |
1069 |
|
specify the global indices of tiles and their relationships |
1070 |
|
to each other. |
1071 |
\\ |
\\ |
1072 |
|
|
1073 |
\fbox{ |
\fbox{ |
1410 |
|
|
1411 |
WRAPPER layer. |
WRAPPER layer. |
1412 |
|
|
1413 |
|
{\footnotesize |
1414 |
\begin{verbatim} |
\begin{verbatim} |
1415 |
|
|
1416 |
MAIN |
MAIN |
1438 |
|--THE_MODEL_MAIN :: Numerical code top-level driver routine |
|--THE_MODEL_MAIN :: Numerical code top-level driver routine |
1439 |
|
|
1440 |
\end{verbatim} |
\end{verbatim} |
1441 |
|
} |
1442 |
|
|
1443 |
Core equations plus packages. |
Core equations plus packages. |
1444 |
|
|
1445 |
|
{\footnotesize |
1446 |
\begin{verbatim} |
\begin{verbatim} |
1447 |
C |
C |
1448 |
C |
C |
1781 |
C :: events. |
C :: events. |
1782 |
C |
C |
1783 |
\end{verbatim} |
\end{verbatim} |
1784 |
|
} |
1785 |
|
|
1786 |
\subsection{Measuring and Characterizing Performance} |
\subsection{Measuring and Characterizing Performance} |
1787 |
|
|