/[MITgcm]/manual/s_software/text/sarch.tex
ViewVC logotype

Diff of /manual/s_software/text/sarch.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph | View Patch Patch

revision 1.9 by adcroft, Wed Apr 24 19:23:58 2002 UTC revision 1.20 by edhill, Sat Oct 16 03:40:16 2004 UTC
# Line 17  details of the MITgcm implementation and Line 17  details of the MITgcm implementation and
17  features that are employed.  features that are employed.
18    
19  \section{Overall architectural goals}  \section{Overall architectural goals}
20    \begin{rawhtml}
21    <!-- CMIREDIR:overall_architectural_goals: -->
22    \end{rawhtml}
23    
24  Broadly, the goals of the software architecture employed in MITgcm are  Broadly, the goals of the software architecture employed in MITgcm are
25  three-fold  three-fold
# Line 76  floating point operations.} Line 79  floating point operations.}
79  \end{figure}  \end{figure}
80    
81  \section{WRAPPER}  \section{WRAPPER}
82    \begin{rawhtml}
83    <!-- CMIREDIR:wrapper: -->
84    \end{rawhtml}
85    
86  A significant element of the software architecture utilized in  A significant element of the software architecture utilized in
87  MITgcm is a software superstructure and substructure collectively  MITgcm is a software superstructure and substructure collectively
# Line 97  and operating systems. This allows numer Line 103  and operating systems. This allows numer
103  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}
104  \end{center}  \end{center}
105  \caption{  \caption{
106  Numerical code is written too fit within a software support  Numerical code is written to fit within a software support
107  infrastructure called WRAPPER. The WRAPPER is portable and  infrastructure called WRAPPER. The WRAPPER is portable and
108  can be specialized for a wide range of specific target hardware and  can be specialized for a wide range of specific target hardware and
109  programming environments, without impacting numerical code that fits  programming environments, without impacting numerical code that fits
# Line 120  uniprocessor and multi-processor Sun sys Line 126  uniprocessor and multi-processor Sun sys
126  (UMA) and non-uniform memory access (NUMA) designs. Significant work has also  (UMA) and non-uniform memory access (NUMA) designs. Significant work has also
127  been undertaken on x86 cluster systems, Alpha processor based clustered SMP  been undertaken on x86 cluster systems, Alpha processor based clustered SMP
128  systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics.  systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics.
129  The MITgcm code, operating within the WRAPPER, is also used routinely used on  The MITgcm code, operating within the WRAPPER, is also routinely used on
130  large scale MPP systems (for example T3E systems and IBM SP systems). In all  large scale MPP systems (for example T3E systems and IBM SP systems). In all
131  cases numerical code, operating within the WRAPPER, performs and scales very  cases numerical code, operating within the WRAPPER, performs and scales very
132  competitively with equivalent numerical code that has been modified to contain  competitively with equivalent numerical code that has been modified to contain
# Line 150  easily be specialized to fit, in a compu Line 156  easily be specialized to fit, in a compu
156  computer architecture currently available to the scientific computing community.  computer architecture currently available to the scientific computing community.
157    
158  \subsection{Machine model parallelism}  \subsection{Machine model parallelism}
159    \begin{rawhtml}
160    <!-- CMIREDIR:domain_decomp: -->
161    \end{rawhtml}
162    
163   Codes operating under the WRAPPER target an abstract machine that is assumed to   Codes operating under the WRAPPER target an abstract machine that is assumed to
164  consist of one or more logical processors that can compute concurrently.    consist of one or more logical processors that can compute concurrently.  
# Line 537  of almost all successful scientific comp Line 546  of almost all successful scientific comp
546  last 50 years.  last 50 years.
547    
548  \section{Using the WRAPPER}  \section{Using the WRAPPER}
549    \begin{rawhtml}
550    <!-- CMIREDIR:using_the_wrapper: -->
551    \end{rawhtml}
552    
553  In order to support maximum portability the WRAPPER is implemented primarily  In order to support maximum portability the WRAPPER is implemented primarily
554  in sequential Fortran 77. At a practical level the key steps provided by the  in sequential Fortran 77. At a practical level the key steps provided by the
# Line 661  Within a {\em bi}, {\em bj} loop Line 673  Within a {\em bi}, {\em bj} loop
673  computation is performed concurrently over as many processes and threads  computation is performed concurrently over as many processes and threads
674  as there are physical processors available to compute.  as there are physical processors available to compute.
675    
676    An exception to the the use of {\em bi} and {\em bj} in loops arises in the
677    exchange routines used when the exch2 package is used with the cubed
678    sphere.  In this case {\em bj} is generally set to 1 and the loop runs from
679    1,{\em bi}.  Within the loop {\em bi} is used to retrieve the tile number,
680    which is then used to reference exchange parameters.
681    
682  The amount of computation that can be embedded  The amount of computation that can be embedded
683  a single loop over {\em bi} and {\em bj} varies for different parts of the  a single loop over {\em bi} and {\em bj} varies for different parts of the
684  MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract  MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract
# Line 781  The global domain size is again ninety g Line 799  The global domain size is again ninety g
799  forty grid points in y. The two sub-domains in each process will be computed  forty grid points in y. The two sub-domains in each process will be computed
800  sequentially if they are given to a single thread within a single process.  sequentially if they are given to a single thread within a single process.
801  Alternatively if the code is invoked with multiple threads per process  Alternatively if the code is invoked with multiple threads per process
802  the two domains in y may be computed on concurrently.  the two domains in y may be computed concurrently.
803  \item  \item
804  \begin{verbatim}  \begin{verbatim}
805        PARAMETER (        PARAMETER (
# Line 929  File: {\em eesupp/inc/MAIN\_PDIRECTIVES1 Line 947  File: {\em eesupp/inc/MAIN\_PDIRECTIVES1
947  File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\  File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\
948  File: {\em model/src/THE\_MODEL\_MAIN.F}\\  File: {\em model/src/THE\_MODEL\_MAIN.F}\\
949  File: {\em eesupp/src/MAIN.F}\\  File: {\em eesupp/src/MAIN.F}\\
950  File: {\em tools/genmake}\\  File: {\em tools/genmake2}\\
951  File: {\em eedata}\\  File: {\em eedata}\\
952  CPP:  {\em TARGET\_SUN}\\  CPP:  {\em TARGET\_SUN}\\
953  CPP:  {\em TARGET\_DEC}\\  CPP:  {\em TARGET\_DEC}\\
# Line 968  critical communication. However, in orde Line 986  critical communication. However, in orde
986  of controlling and coordinating the start up of a large number  of controlling and coordinating the start up of a large number
987  (hundreds and possibly even thousands) of copies of the same  (hundreds and possibly even thousands) of copies of the same
988  program, MPI is used. The calls to the MPI multi-process startup  program, MPI is used. The calls to the MPI multi-process startup
989  routines must be activated at compile time. This is done  routines must be activated at compile time.  Currently MPI libraries are
990  by setting the {\em ALLOW\_USE\_MPI} and {\em ALWAYS\_USE\_MPI}  invoked by
991  flags in the {\em CPP\_EEOPTIONS.h} file.\\  specifying the appropriate options file with the
992    {\tt-of} flag when running the {\em genmake2}
993  \fbox{  script, which generates the Makefile for compiling and linking MITgcm.
994  \begin{minipage}{4.75in}  (Previously this was done by setting the {\em ALLOW\_USE\_MPI} and
995  File: {\em eesupp/inc/CPP\_EEOPTIONS.h}\\  {\em ALWAYS\_USE\_MPI} flags in the {\em CPP\_EEOPTIONS.h} file.)  More
996  CPP:  {\em ALLOW\_USE\_MPI}\\  detailed information about the use of {\em genmake2} for specifying
997  CPP:  {\em ALWAYS\_USE\_MPI}\\  local compiler flags is located in section \ref{sect:genmake}.\\  
 Parameter:  {\em nPx}\\  
 Parameter:  {\em nPy}  
 \end{minipage}  
 } \\  
998    
 Additionally, compile time options are required to link in the  
 MPI libraries and header files. Examples of these options  
 can be found in the {\em genmake} script that creates makefiles  
 for compilation. When this script is executed with the {bf -mpi}  
 flag it will generate a makefile that includes  
 paths for search for MPI head files and for linking in  
 MPI libraries. For example the {\bf -mpi} flag on a  
  Silicon Graphics IRIX system causes a  
 Makefile with the compilation command  
 Graphics IRIX system \begin{verbatim}  
 mpif77 -I/usr/local/mpi/include -DALLOW_USE_MPI -DALWAYS_USE_MPI  
 \end{verbatim}  
 to be generated.  
 This is the correct set of options for using the MPICH open-source  
 version of MPI, when it has been installed under the subdirectory  
 /usr/local/mpi.  
 However, on many systems there may be several  
 versions of MPI installed. For example many systems have both  
 the open source MPICH set of libraries and a vendor specific native form  
 of the MPI libraries. The correct setup to use will depend on the  
 local configuration of your system.\\  
999    
1000  \fbox{  \fbox{
1001  \begin{minipage}{4.75in}  \begin{minipage}{4.75in}
1002  File: {\em tools/genmake}  Directory: {\em tools/build\_options}\\
1003    File: {\em tools/genmake2}
1004  \end{minipage}  \end{minipage}
1005  } \\  } \\
1006  \paragraph{\bf Execution} The mechanics of starting a program in  \paragraph{\bf Execution} The mechanics of starting a program in
# Line 1024  product of the processor grid settings o Line 1018  product of the processor grid settings o
1018  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
1019  called ``mf'' will be read to get a list of processor names on  called ``mf'' will be read to get a list of processor names on
1020  which the sixty-four processes will execute. The syntax of this file  which the sixty-four processes will execute. The syntax of this file
1021  is specified by the MPI distribution  is specified by the MPI distribution.
1022  \\  \\
1023    
1024  \fbox{  \fbox{
# Line 1075  to processor identification are are used Line 1069  to processor identification are are used
1069  Allocation of processes to tiles in controlled by the routine  Allocation of processes to tiles in controlled by the routine
1070  {\em INI\_PROCS()}. For each process this routine sets  {\em INI\_PROCS()}. For each process this routine sets
1071  the variables {\em myXGlobalLo} and {\em myYGlobalLo}.  the variables {\em myXGlobalLo} and {\em myYGlobalLo}.
1072  These variables specify (in index space) the coordinate  These variables specify in index space the coordinates
1073  of the southern most and western most corner of the  of the southernmost and westernmost corner of the
1074  southern most and western most tile owned by this process.  southernmost and westernmost tile owned by this process.
1075  The variables {\em pidW}, {\em pidE}, {\em pidS} and {\em pidN}  The variables {\em pidW}, {\em pidE}, {\em pidS} and {\em pidN}
1076  are also set in this routine. These are used to identify  are also set in this routine. These are used to identify
1077  processes holding tiles to the west, east, south and north  processes holding tiles to the west, east, south and north
1078  of this process. These values are stored in global storage  of this process. These values are stored in global storage
1079  in the header file {\em EESUPPORT.h} for use by  in the header file {\em EESUPPORT.h} for use by
1080  communication routines.  communication routines.  The above does not hold when the
1081    exch2 package is used -- exch2 sets its own parameters to
1082    specify the global indices of tiles and their relationships
1083    to each other.  See the documentation on the exch2 package
1084    (\ref{sec:exch2})  for
1085    details.
1086  \\  \\
1087    
1088  \fbox{  \fbox{
# Line 1109  operations and that can be customized fo Line 1108  operations and that can be customized fo
1108  describes the information that is held and used.  describes the information that is held and used.
1109    
1110  \begin{enumerate}  \begin{enumerate}
1111  \item {\bf Tile-tile connectivity information} For each tile the WRAPPER  \item {\bf Tile-tile connectivity information}
1112  sets a flag that sets the tile number to the north, south, east and  For each tile the WRAPPER
1113    sets a flag that sets the tile number to the north,
1114    south, east and
1115  west of that tile. This number is unique over all tiles in a  west of that tile. This number is unique over all tiles in a
1116  configuration. The number is held in the variables {\em tileNo}  configuration. Except when using the cubed sphere and the exch2 package,
1117    the number is held in the variables {\em tileNo}
1118  ( this holds the tiles own number), {\em tileNoN}, {\em tileNoS},  ( this holds the tiles own number), {\em tileNoN}, {\em tileNoS},
1119  {\em tileNoE} and {\em tileNoW}. A parameter is also stored with each tile  {\em tileNoE} and {\em tileNoW}. A parameter is also stored with each tile
1120  that specifies the type of communication that is used between tiles.  that specifies the type of communication that is used between tiles.
# Line 1135  of the WRAPPER exchange primitive Line 1137  of the WRAPPER exchange primitive
1137  (see figure \ref{fig:communication_primitives}). The routine  (see figure \ref{fig:communication_primitives}). The routine
1138  {\em ini\_communication\_patterns()} is responsible for setting the  {\em ini\_communication\_patterns()} is responsible for setting the
1139  communication mode values for each tile.  communication mode values for each tile.
1140  \\  
1141    When using the cubed sphere configuration with the exch2 package, the
1142    relationships between tiles and their communication methods are set
1143    by the package in other variables.  See the exch2 package documentation
1144    (\ref{sec:exch2} for details.
1145    
1146    
1147    
1148  \fbox{  \fbox{
1149  \begin{minipage}{4.75in}  \begin{minipage}{4.75in}
# Line 1271  The \_EXCH operation is also central to Line 1279  The \_EXCH operation is also central to
1279  the cube-sphere grid. In this class of grid a rotation may be required  the cube-sphere grid. In this class of grid a rotation may be required
1280  between tiles. Aligning the coordinate requiring rotation with the  between tiles. Aligning the coordinate requiring rotation with the
1281  tile decomposition, allows the coordinate transformation to  tile decomposition, allows the coordinate transformation to
1282  be embedded within a custom form of the \_EXCH primitive.  be embedded within a custom form of the \_EXCH primitive.  In these
1283    cases \_EXCH is mapped to exch2 routines, as detailed in the exch2
1284    package documentation  \ref{sec:exch2}.
1285    
1286  \item {\bf Reverse Mode}  \item {\bf Reverse Mode}
1287  The communication primitives \_EXCH and \_GSUM both employ  The communication primitives \_EXCH and \_GSUM both employ
# Line 1288  operations. However, the routine argumen Line 1298  operations. However, the routine argumen
1298  is set to the value {\em REVERSE\_SIMULATION}. This signifies  is set to the value {\em REVERSE\_SIMULATION}. This signifies
1299  ti the low-level routines that the adjoint forms of the  ti the low-level routines that the adjoint forms of the
1300  appropriate communication operation should be performed.  appropriate communication operation should be performed.
1301    
1302  \item {\bf MAX\_NO\_THREADS}  \item {\bf MAX\_NO\_THREADS}
1303  The variable {\em MAX\_NO\_THREADS} is used to indicate the  The variable {\em MAX\_NO\_THREADS} is used to indicate the
1304  maximum number of OS threads that a code will use. This  maximum number of OS threads that a code will use. This
# Line 1392  a series of template files, for example Line 1403  a series of template files, for example
1403  This is done to allow a large number of variations on the exchange  This is done to allow a large number of variations on the exchange
1404  process to be maintained. One set of variations supports the  process to be maintained. One set of variations supports the
1405  cube sphere grid. Support for a cube sphere grid in MITgcm is based  cube sphere grid. Support for a cube sphere grid in MITgcm is based
1406  on having each face of the cube as a separate tile (or tiles).  on having each face of the cube as a separate tile or tiles.
1407  The exchange routines are then able to absorb much of the  The exchange routines are then able to absorb much of the
1408  detailed rotation and reorientation required when moving around the  detailed rotation and reorientation required when moving around the
1409  cube grid. The set of {\em \_EXCH} routines that contain the  cube grid. The set of {\em \_EXCH} routines that contain the
# Line 1416  quantities at the C-grid vorticity point Line 1427  quantities at the C-grid vorticity point
1427    
1428    
1429  \section{MITgcm execution under WRAPPER}  \section{MITgcm execution under WRAPPER}
1430    \begin{rawhtml}
1431    <!-- CMIREDIR:mitgcm_wrapper: -->
1432    \end{rawhtml}
1433    
1434  Fitting together the WRAPPER elements, package elements and  Fitting together the WRAPPER elements, package elements and
1435  MITgcm core equation elements of the source code produces calling  MITgcm core equation elements of the source code produces calling

Legend:
Removed from v.1.9  
changed lines
  Added in v.1.20

  ViewVC Help
Powered by ViewVC 1.1.22