/[MITgcm]/manual/s_software/text/sarch.tex
ViewVC logotype

Diff of /manual/s_software/text/sarch.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph | View Patch Patch

revision 1.1 by cnh, Tue Oct 9 10:33:17 2001 UTC revision 1.2 by adcroft, Thu Oct 11 19:12:38 2001 UTC
# Line 1  Line 1 
1    % $Header$
2    
3  In this chapter we describe the software architecture and  In this chapter we describe the software architecture and
4  implementation strategy for the MITgcm code. The first part of this  implementation strategy for the MITgcm code. The first part of this
# Line 11  Broadly, the goals of the software archi Line 12  Broadly, the goals of the software archi
12  three-fold  three-fold
13    
14  \begin{itemize}  \begin{itemize}
   
15  \item We wish to be able to study a very broad range  \item We wish to be able to study a very broad range
16  of interesting and challenging rotating fluids problems.  of interesting and challenging rotating fluids problems.
   
17  \item We wish the model code to be readily targeted to  \item We wish the model code to be readily targeted to
18  a wide range of platforms  a wide range of platforms
   
19  \item On any given platform we would like to be  \item On any given platform we would like to be
20  able to achieve performance comparable to an implementation  able to achieve performance comparable to an implementation
21  developed and specialized specifically for that platform.  developed and specialized specifically for that platform.
   
22  \end{itemize}  \end{itemize}
23    
24  These points are summarized in figure \ref{fig:mitgcm_architecture_goals}  These points are summarized in figure \ref{fig:mitgcm_architecture_goals}
# Line 30  a software architecture which at the hig Line 27  a software architecture which at the hig
27  of  of
28    
29  \begin{enumerate}  \begin{enumerate}
   
30  \item A core set of numerical and support code. This is discussed in detail in  \item A core set of numerical and support code. This is discussed in detail in
31  section \ref{sec:partII}.  section \ref{sec:partII}.
   
32  \item A scheme for supporting optional "pluggable" {\bf packages} (containing  \item A scheme for supporting optional "pluggable" {\bf packages} (containing
33  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).
34  These packages are used both to overlay alternate dynamics and to introduce  These packages are used both to overlay alternate dynamics and to introduce
35  specialized physical content onto the core numerical code. An overview of  specialized physical content onto the core numerical code. An overview of
36  the {\bf package} scheme is given at the start of part \ref{part:packages}.  the {\bf package} scheme is given at the start of part \ref{part:packages}.
   
   
37  \item A support framework called {\bf WRAPPER} (Wrappable Application Parallel  \item A support framework called {\bf WRAPPER} (Wrappable Application Parallel
38  Programming Environment Resource), within which the core numerics and pluggable  Programming Environment Resource), within which the core numerics and pluggable
39  packages operate.  packages operate.
   
40  \end{enumerate}  \end{enumerate}
41    
42  This chapter focuses on describing the {\bf WRAPPER} environment under which  This chapter focuses on describing the {\bf WRAPPER} environment under which
# Line 57  experiments both sequentially and in par Line 49  experiments both sequentially and in par
49  starting from an example code and adapting it to suit a particular situation  starting from an example code and adapting it to suit a particular situation
50  will be all that is required.  will be all that is required.
51    
52    
53  \begin{figure}  \begin{figure}
54  \begin{center}  \begin{center}
55   \resizebox{!}{2.5in}{  \resizebox{!}{2.5in}{\includegraphics{part4/mitgcm_goals.eps}}
   \includegraphics*[1.5in,2.4in][9.5in,6.3in]{part4/mitgcm_goals.eps}  
  }  
56  \end{center}  \end{center}
57  \caption{The MITgcm architecture is designed to allow simulation of a wide  \caption{
58    The MITgcm architecture is designed to allow simulation of a wide
59  range of physical problems on a wide range of hardware. The computational  range of physical problems on a wide range of hardware. The computational
60  resource requirements of the applications targeted range from around  resource requirements of the applications targeted range from around
61  $10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes  $10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes
62  ( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of  ( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of
63  interest range from $10^{9}$ floating point operations to more than $10^{17}$  interest range from $10^{9}$ floating point operations to more than $10^{17}$
64  floating point operations.} \label{fig:mitgcm_architecture_goals}  floating point operations.}
65    \label{fig:mitgcm_architecture_goals}
66  \end{figure}  \end{figure}
67    
68  \section{WRAPPER}  \section{WRAPPER}
# Line 87  The approach taken by the WRAPPER is ill Line 80  The approach taken by the WRAPPER is ill
80  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code
81  that fits within it from architectural differences between hardware platforms  that fits within it from architectural differences between hardware platforms
82  and operating systems. This allows numerical code to be easily retargetted.  and operating systems. This allows numerical code to be easily retargetted.
83    
84    
85  \begin{figure}  \begin{figure}
86  \begin{center}  \begin{center}
87   \resizebox{6in}{4.5in}{  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}
   \includegraphics*[0.6in,0.7in][9.0in,8.5in]{part4/fit_in_wrapper.eps}  
  }  
88  \end{center}  \end{center}
89  \caption{ Numerical code is written too fit within a software support  \caption{
90    Numerical code is written too fit within a software support
91  infrastructure called WRAPPER. The WRAPPER is portable and  infrastructure called WRAPPER. The WRAPPER is portable and
92  can be sepcialized for a wide range of specific target hardware and  can be sepcialized for a wide range of specific target hardware and
93  programming environments, without impacting numerical code that fits  programming environments, without impacting numerical code that fits
94  within the WRAPPER. Codes that fit within the WRAPPER can generally be  within the WRAPPER. Codes that fit within the WRAPPER can generally be
95  made to run as fast on a particular platform as codes specially  made to run as fast on a particular platform as codes specially
96  optimized for that platform.  optimized for that platform.}
97  } \label{fig:fit_in_wrapper}  \label{fig:fit_in_wrapper}
98  \end{figure}  \end{figure}
99    
100  \subsection{Target hardware}  \subsection{Target hardware}
# Line 186  independently of the other tiles, in a s Line 180  independently of the other tiles, in a s
180    
181  \begin{figure}  \begin{figure}
182  \begin{center}  \begin{center}
183   \resizebox{7in}{3in}{   \resizebox{5in}{!}{
184    \includegraphics*[0.5in,2.7in][12.5in,6.4in]{part4/domain_decomp.eps}    \includegraphics{part4/domain_decomp.eps}
185   }   }
186  \end{center}  \end{center}
187  \caption{ The WRAPPER provides support for one and two dimensional  \caption{ The WRAPPER provides support for one and two dimensional
# Line 222  variety of different mechanisms to commu Line 216  variety of different mechanisms to commu
216    
217  \begin{figure}  \begin{figure}
218  \begin{center}  \begin{center}
219   \resizebox{7in}{3in}{   \resizebox{5in}{!}{
220    \includegraphics*[4.5in,3.7in][12.5in,6.7in]{part4/tiled-world.eps}    \includegraphics{part4/tiled-world.eps}
221   }   }
222  \end{center}  \end{center}
223  \caption{ A global grid subdivided into tiles.  \caption{ A global grid subdivided into tiles.
# Line 404  highly optimized library. Line 398  highly optimized library.
398    
399  \begin{figure}  \begin{figure}
400  \begin{center}  \begin{center}
401   \resizebox{5in}{3in}{   \resizebox{5in}{!}{
402    \includegraphics*[1.5in,0.7in][7.9in,4.4in]{part4/comm-primm.eps}    \includegraphics{part4/comm-primm.eps}
403   }   }
404  \end{center}  \end{center}
405  \caption{Three performance critical parallel primititives are provided  \caption{Three performance critical parallel primititives are provided
# Line 485  sub-domains. Line 479  sub-domains.
479    
480  \begin{figure}  \begin{figure}
481  \begin{center}  \begin{center}
482   \resizebox{5in}{3in}{   \resizebox{5in}{!}{
483    \includegraphics*[0.5in,1.3in][7.9in,5.7in]{part4/tiling_detail.eps}    \includegraphics{part4/tiling_detail.eps}
484   }   }
485  \end{center}  \end{center}
486  \caption{The tiling strategy that the WRAPPER supports allows tiles  \caption{The tiling strategy that the WRAPPER supports allows tiles
# Line 589  not cause any other problems. Line 583  not cause any other problems.
583    
584  \begin{figure}  \begin{figure}
585  \begin{center}  \begin{center}
586   \resizebox{5in}{7in}{   \resizebox{5in}{!}{
587    \includegraphics*[0.5in,0.3in][7.9in,10.7in]{part4/size_h.eps}    \includegraphics{part4/size_h.eps}
588   }   }
589  \end{center}  \end{center}
590  \caption{ The three level domain decomposition hierarchy employed by the  \caption{ The three level domain decomposition hierarchy employed by the
# Line 812  to support subsequent calls to communica Line 806  to support subsequent calls to communica
806  by the application code. The startup calling sequence followed by the  by the application code. The startup calling sequence followed by the
807  WRAPPER is shown in figure \ref{fig:wrapper_startup}.  WRAPPER is shown in figure \ref{fig:wrapper_startup}.
808    
   
809  \begin{figure}  \begin{figure}
810  \begin{verbatim}  \begin{verbatim}
811    
# Line 904  parallelization the compiler may otherwi Line 897  parallelization the compiler may otherwi
897  \end{enumerate}  \end{enumerate}
898    
899    
 \paragraph{Environment variables}  
 On most systems multi-threaded execution also requires the setting  
 of a special environment variable. On many machines this variable  
 is called PARALLEL and its values should be set to the number  
 of parallel threads required. Generally the help pages associated  
 with the multi-threaded compiler on a machine will explain  
 how to set the required environment variables for that machines.  
   
 \paragraph{Runtime input parameters}  
 Finally the file {\em eedata} needs to be configured to indicate  
 the number of threads to be used in the x and y directions.  
 The variables {\em nTx} and {\em nTy} in this file are used to  
 specify the information required. The product of {\em nTx} and  
 {\em nTy} must be equal to the number of threads spawned i.e.  
 the setting of the environment variable PARALLEL.  
 The value of {\em nTx} must subdivide the number of sub-domains  
 in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the  
 number of sub-domains in y ({\em nSy}) exactly.  
   
900  An example of valid settings for the {\em eedata} file for a  An example of valid settings for the {\em eedata} file for a
901  domain with two subdomains in y and running with two threads is shown  domain with two subdomains in y and running with two threads is shown
902  below  below
# Line 1046  Parameter: {\em nPy} Line 1020  Parameter: {\em nPy}
1020  \end{minipage}  \end{minipage}
1021  } \\  } \\
1022    
1023    
1024    \paragraph{Environment variables}
1025    On most systems multi-threaded execution also requires the setting
1026    of a special environment variable. On many machines this variable
1027    is called PARALLEL and its values should be set to the number
1028    of parallel threads required. Generally the help pages associated
1029    with the multi-threaded compiler on a machine will explain
1030    how to set the required environment variables for that machines.
1031    
1032    \paragraph{Runtime input parameters}
1033    Finally the file {\em eedata} needs to be configured to indicate
1034    the number of threads to be used in the x and y directions.
1035    The variables {\em nTx} and {\em nTy} in this file are used to
1036    specify the information required. The product of {\em nTx} and
1037    {\em nTy} must be equal to the number of threads spawned i.e.
1038    the setting of the environment variable PARALLEL.
1039    The value of {\em nTx} must subdivide the number of sub-domains
1040    in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the
1041    number of sub-domains in y ({\em nSy}) exactly.
1042  The multiprocess startup of the MITgcm executable {\em mitgcmuv}  The multiprocess startup of the MITgcm executable {\em mitgcmuv}
1043  is controlled by the routines {\em EEBOOT\_MINIMAL()} and  is controlled by the routines {\em EEBOOT\_MINIMAL()} and
1044  {\em INI\_PROCS()}. The first routine performs basic steps required  {\em INI\_PROCS()}. The first routine performs basic steps required
# Line 1058  number so that process number 0 will cre Line 1051  number so that process number 0 will cre
1051  output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files  output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files
1052  are used for reporting status and configuration information and  are used for reporting status and configuration information and
1053  for reporting error conditions on a process by process basis.  for reporting error conditions on a process by process basis.
1054  The {{\em EEBOOT\_MINIMAL()} procedure also sets the variables  The {\em EEBOOT\_MINIMAL()} procedure also sets the variables
1055  {\em myProcId} and {\em MPI\_COMM\_MODEL}.  {\em myProcId} and {\em MPI\_COMM\_MODEL}.
1056  These variables are related  These variables are related
1057  to processor identification are are used later in the routine  to processor identification are are used later in the routine
# Line 1099  Parameter: {\em pidN       } Line 1092  Parameter: {\em pidN       }
1092  The WRAPPER maintains internal information that is used for communication  The WRAPPER maintains internal information that is used for communication
1093  operations and that can be customized for different platforms. This section  operations and that can be customized for different platforms. This section
1094  describes the information that is held and used.  describes the information that is held and used.
1095    
1096  \begin{enumerate}  \begin{enumerate}
1097  \item {\bf Tile-tile connectivity information} For each tile the WRAPPER  \item {\bf Tile-tile connectivity information} For each tile the WRAPPER
1098  sets a flag that sets the tile number to the north, south, east and  sets a flag that sets the tile number to the north, south, east and
# Line 1186  Parameter: {\em nTy} \\ Line 1180  Parameter: {\em nTy} \\
1180  \end{minipage}  \end{minipage}
1181  }  }
1182    
 \begin{figure}  
 \begin{verbatim}  
 C--  
 C--  Parallel directives for MIPS Pro Fortran compiler  
 C--  
 C      Parallel compiler directives for SGI with IRIX  
 C$PAR  PARALLEL DO  
 C$PAR&  CHUNK=1,MP_SCHEDTYPE=INTERLEAVE,  
 C$PAR&  SHARE(nThreads),LOCAL(myThid,I)  
 C  
       DO I=1,nThreads  
         myThid = I  
   
 C--     Invoke nThreads instances of the numerical model  
         CALL THE_MODEL_MAIN(myThid)  
   
       ENDDO  
 \end{verbatim}  
 \caption{Prior to transferring control to  
 the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use  
 MP directives to spawn multiple threads.  
 } \label{fig:mp_directives}  
 \end{figure}  
   
   
1183  \item {\bf memsync flags}  \item {\bf memsync flags}
1184  As discussed in section \ref{sec:memory_consistency}, when using shared memory,  As discussed in section \ref{sec:memory_consistency}, when using shared memory,
1185  a low-level system function may be need to force memory consistency.  a low-level system function may be need to force memory consistency.
# Line 1344  In the case of permanent state informati Line 1313  In the case of permanent state informati
1313  because there has to be enough storage allocated for all tiles.  because there has to be enough storage allocated for all tiles.
1314  However, the technique can sometimes be a useful scheme for reducing memory  However, the technique can sometimes be a useful scheme for reducing memory
1315  requirements in complex physical paramterisations.  requirements in complex physical paramterisations.
   
1316  \end{enumerate}  \end{enumerate}
1317    
1318    \begin{figure}
1319    \begin{verbatim}
1320    C--
1321    C--  Parallel directives for MIPS Pro Fortran compiler
1322    C--
1323    C      Parallel compiler directives for SGI with IRIX
1324    C$PAR  PARALLEL DO
1325    C$PAR&  CHUNK=1,MP_SCHEDTYPE=INTERLEAVE,
1326    C$PAR&  SHARE(nThreads),LOCAL(myThid,I)
1327    C
1328          DO I=1,nThreads
1329            myThid = I
1330    
1331    C--     Invoke nThreads instances of the numerical model
1332            CALL THE_MODEL_MAIN(myThid)
1333    
1334          ENDDO
1335    \end{verbatim}
1336    \caption{Prior to transferring control to
1337    the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use
1338    MP directives to spawn multiple threads.
1339    } \label{fig:mp_directives}
1340    \end{figure}
1341    
1342    
1343  \subsubsection{Specializing the Communication Code}  \subsubsection{Specializing the Communication Code}
1344    
1345  The isolation of performance critical communication primitives and the  The isolation of performance critical communication primitives and the

Legend:
Removed from v.1.1  
changed lines
  Added in v.1.2

  ViewVC Help
Powered by ViewVC 1.1.22