/[MITgcm]/manual/s_software/text/sarch.tex
ViewVC logotype

Diff of /manual/s_software/text/sarch.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph | View Patch Patch

revision 1.1 by cnh, Tue Oct 9 10:33:17 2001 UTC revision 1.5 by cnh, Tue Nov 13 18:32:33 2001 UTC
# Line 1  Line 1 
1    % $Header$
2    
3  In this chapter we describe the software architecture and  In this chapter we describe the software architecture and
4  implementation strategy for the MITgcm code. The first part of this  implementation strategy for the MITgcm code. The first part of this
# Line 11  Broadly, the goals of the software archi Line 12  Broadly, the goals of the software archi
12  three-fold  three-fold
13    
14  \begin{itemize}  \begin{itemize}
   
15  \item We wish to be able to study a very broad range  \item We wish to be able to study a very broad range
16  of interesting and challenging rotating fluids problems.  of interesting and challenging rotating fluids problems.
   
17  \item We wish the model code to be readily targeted to  \item We wish the model code to be readily targeted to
18  a wide range of platforms  a wide range of platforms
   
19  \item On any given platform we would like to be  \item On any given platform we would like to be
20  able to achieve performance comparable to an implementation  able to achieve performance comparable to an implementation
21  developed and specialized specifically for that platform.  developed and specialized specifically for that platform.
   
22  \end{itemize}  \end{itemize}
23    
24  These points are summarized in figure \ref{fig:mitgcm_architecture_goals}  These points are summarized in figure \ref{fig:mitgcm_architecture_goals}
# Line 30  a software architecture which at the hig Line 27  a software architecture which at the hig
27  of  of
28    
29  \begin{enumerate}  \begin{enumerate}
   
30  \item A core set of numerical and support code. This is discussed in detail in  \item A core set of numerical and support code. This is discussed in detail in
31  section \ref{sec:partII}.  section \ref{sec:partII}.
   
32  \item A scheme for supporting optional "pluggable" {\bf packages} (containing  \item A scheme for supporting optional "pluggable" {\bf packages} (containing
33  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).
34  These packages are used both to overlay alternate dynamics and to introduce  These packages are used both to overlay alternate dynamics and to introduce
35  specialized physical content onto the core numerical code. An overview of  specialized physical content onto the core numerical code. An overview of
36  the {\bf package} scheme is given at the start of part \ref{part:packages}.  the {\bf package} scheme is given at the start of part \ref{part:packages}.
   
   
37  \item A support framework called {\bf WRAPPER} (Wrappable Application Parallel  \item A support framework called {\bf WRAPPER} (Wrappable Application Parallel
38  Programming Environment Resource), within which the core numerics and pluggable  Programming Environment Resource), within which the core numerics and pluggable
39  packages operate.  packages operate.
   
40  \end{enumerate}  \end{enumerate}
41    
42  This chapter focuses on describing the {\bf WRAPPER} environment under which  This chapter focuses on describing the {\bf WRAPPER} environment under which
43  both the core numerics and the pluggable packages function. The description  both the core numerics and the pluggable packages function. The description
44  presented here is intended to be a detailed exposistion and contains significant  presented here is intended to be a detailed exposition and contains significant
45  background material, as well as advanced details on working with the WRAPPER.  background material, as well as advanced details on working with the WRAPPER.
46  The examples section of this manual (part \ref{part:example}) contains more  The examples section of this manual (part \ref{part:example}) contains more
47  succinct, step-by-step instructions on running basic numerical  succinct, step-by-step instructions on running basic numerical
# Line 57  experiments both sequentially and in par Line 49  experiments both sequentially and in par
49  starting from an example code and adapting it to suit a particular situation  starting from an example code and adapting it to suit a particular situation
50  will be all that is required.  will be all that is required.
51    
52    
53  \begin{figure}  \begin{figure}
54  \begin{center}  \begin{center}
55   \resizebox{!}{2.5in}{  \resizebox{!}{2.5in}{\includegraphics{part4/mitgcm_goals.eps}}
   \includegraphics*[1.5in,2.4in][9.5in,6.3in]{part4/mitgcm_goals.eps}  
  }  
56  \end{center}  \end{center}
57  \caption{The MITgcm architecture is designed to allow simulation of a wide  \caption{
58    The MITgcm architecture is designed to allow simulation of a wide
59  range of physical problems on a wide range of hardware. The computational  range of physical problems on a wide range of hardware. The computational
60  resource requirements of the applications targeted range from around  resource requirements of the applications targeted range from around
61  $10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes  $10^7$ bytes ( $\approx 10$ megabytes ) of memory to $10^{11}$ bytes
62  ( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of  ( $\approx 100$ gigabytes). Arithmetic operation counts for the applications of
63  interest range from $10^{9}$ floating point operations to more than $10^{17}$  interest range from $10^{9}$ floating point operations to more than $10^{17}$
64  floating point operations.} \label{fig:mitgcm_architecture_goals}  floating point operations.}
65    \label{fig:mitgcm_architecture_goals}
66  \end{figure}  \end{figure}
67    
68  \section{WRAPPER}  \section{WRAPPER}
# Line 87  The approach taken by the WRAPPER is ill Line 80  The approach taken by the WRAPPER is ill
80  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code
81  that fits within it from architectural differences between hardware platforms  that fits within it from architectural differences between hardware platforms
82  and operating systems. This allows numerical code to be easily retargetted.  and operating systems. This allows numerical code to be easily retargetted.
83    
84    
85  \begin{figure}  \begin{figure}
86  \begin{center}  \begin{center}
87   \resizebox{6in}{4.5in}{  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}
   \includegraphics*[0.6in,0.7in][9.0in,8.5in]{part4/fit_in_wrapper.eps}  
  }  
88  \end{center}  \end{center}
89  \caption{ Numerical code is written too fit within a software support  \caption{
90    Numerical code is written too fit within a software support
91  infrastructure called WRAPPER. The WRAPPER is portable and  infrastructure called WRAPPER. The WRAPPER is portable and
92  can be sepcialized for a wide range of specific target hardware and  can be specialized for a wide range of specific target hardware and
93  programming environments, without impacting numerical code that fits  programming environments, without impacting numerical code that fits
94  within the WRAPPER. Codes that fit within the WRAPPER can generally be  within the WRAPPER. Codes that fit within the WRAPPER can generally be
95  made to run as fast on a particular platform as codes specially  made to run as fast on a particular platform as codes specially
96  optimized for that platform.  optimized for that platform.}
97  } \label{fig:fit_in_wrapper}  \label{fig:fit_in_wrapper}
98  \end{figure}  \end{figure}
99    
100  \subsection{Target hardware}  \subsection{Target hardware}
# Line 142  particular machine (for example an IBM S Line 136  particular machine (for example an IBM S
136  class of machines (for example Parallel Vector Processor Systems). Instead the  class of machines (for example Parallel Vector Processor Systems). Instead the
137  WRAPPER provides applications with an  WRAPPER provides applications with an
138  abstract {\it machine model}. The machine model is very general, however, it can  abstract {\it machine model}. The machine model is very general, however, it can
139  easily be specialized to fit, in a computationally effificent manner, any  easily be specialized to fit, in a computationally efficient manner, any
140  computer architecture currently available to the scientific computing community.  computer architecture currently available to the scientific computing community.
141    
142  \subsection{Machine model parallelism}  \subsection{Machine model parallelism}
143    
144   Codes operating under the WRAPPER target an abstract machine that is assumed to   Codes operating under the WRAPPER target an abstract machine that is assumed to
145  consist of one or more logical processors that can compute concurrently.    consist of one or more logical processors that can compute concurrently.  
146  Computational work is divided amongst the logical  Computational work is divided among the logical
147  processors by allocating ``ownership'' to  processors by allocating ``ownership'' to
148  each processor of a certain set (or sets) of calculations. Each set of  each processor of a certain set (or sets) of calculations. Each set of
149  calculations owned by a particular processor is associated with a specific  calculations owned by a particular processor is associated with a specific
# Line 172  Computationally, associated with each re Line 166  Computationally, associated with each re
166  space allocated to a particular logical processor, there will be data  space allocated to a particular logical processor, there will be data
167  structures (arrays, scalar variables etc...) that hold the simulated state of  structures (arrays, scalar variables etc...) that hold the simulated state of
168  that region. We refer to these data structures as being {\bf owned} by the  that region. We refer to these data structures as being {\bf owned} by the
169  pprocessor to which their  processor to which their
170  associated region of physical space has been allocated. Individual  associated region of physical space has been allocated. Individual
171  regions that are allocated to processors are called {\bf tiles}. A  regions that are allocated to processors are called {\bf tiles}. A
172  processor can own more  processor can own more
# Line 186  independently of the other tiles, in a s Line 180  independently of the other tiles, in a s
180    
181  \begin{figure}  \begin{figure}
182  \begin{center}  \begin{center}
183   \resizebox{7in}{3in}{   \resizebox{5in}{!}{
184    \includegraphics*[0.5in,2.7in][12.5in,6.4in]{part4/domain_decomp.eps}    \includegraphics{part4/domain_decomp.eps}
185   }   }
186  \end{center}  \end{center}
187  \caption{ The WRAPPER provides support for one and two dimensional  \caption{ The WRAPPER provides support for one and two dimensional
# Line 222  variety of different mechanisms to commu Line 216  variety of different mechanisms to commu
216    
217  \begin{figure}  \begin{figure}
218  \begin{center}  \begin{center}
219   \resizebox{7in}{3in}{   \resizebox{5in}{!}{
220    \includegraphics*[4.5in,3.7in][12.5in,6.7in]{part4/tiled-world.eps}    \includegraphics{part4/tiled-world.eps}
221   }   }
222  \end{center}  \end{center}
223  \caption{ A global grid subdivided into tiles.  \caption{ A global grid subdivided into tiles.
# Line 404  highly optimized library. Line 398  highly optimized library.
398    
399  \begin{figure}  \begin{figure}
400  \begin{center}  \begin{center}
401   \resizebox{5in}{3in}{   \resizebox{5in}{!}{
402    \includegraphics*[1.5in,0.7in][7.9in,4.4in]{part4/comm-primm.eps}    \includegraphics{part4/comm-primm.eps}
403   }   }
404  \end{center}  \end{center}
405  \caption{Three performance critical parallel primititives are provided  \caption{Three performance critical parallel primitives are provided
406  by the WRAPPER. These primititives are always used to communicate data  by the WRAPPER. These primitives are always used to communicate data
407  between tiles. The figure shows four tiles. The curved arrows indicate  between tiles. The figure shows four tiles. The curved arrows indicate
408  exchange primitives which transfer data between the overlap regions at tile  exchange primitives which transfer data between the overlap regions at tile
409  edges and interior regions for nearest-neighbor tiles.  edges and interior regions for nearest-neighbor tiles.
# Line 485  sub-domains. Line 479  sub-domains.
479    
480  \begin{figure}  \begin{figure}
481  \begin{center}  \begin{center}
482   \resizebox{5in}{3in}{   \resizebox{5in}{!}{
483    \includegraphics*[0.5in,1.3in][7.9in,5.7in]{part4/tiling_detail.eps}    \includegraphics{part4/tiling_detail.eps}
484   }   }
485  \end{center}  \end{center}
486  \caption{The tiling strategy that the WRAPPER supports allows tiles  \caption{The tiling strategy that the WRAPPER supports allows tiles
# Line 589  not cause any other problems. Line 583  not cause any other problems.
583    
584  \begin{figure}  \begin{figure}
585  \begin{center}  \begin{center}
586   \resizebox{5in}{7in}{   \resizebox{5in}{!}{
587    \includegraphics*[0.5in,0.3in][7.9in,10.7in]{part4/size_h.eps}    \includegraphics{part4/size_h.eps}
588   }   }
589  \end{center}  \end{center}
590  \caption{ The three level domain decomposition hierarchy employed by the  \caption{ The three level domain decomposition hierarchy employed by the
# Line 795  thirty-two grid points, and x and y over Line 789  thirty-two grid points, and x and y over
789  There are six tiles allocated to six separate logical processors ({\em nSx=6}).  There are six tiles allocated to six separate logical processors ({\em nSx=6}).
790  This set of values can be used for a cube sphere calculation.  This set of values can be used for a cube sphere calculation.
791  Each tile of size $32 \times 32$ represents a face of the  Each tile of size $32 \times 32$ represents a face of the
792  cube. Initialising the tile connectivity correctly ( see section  cube. Initializing the tile connectivity correctly ( see section
793  \ref{sec:cube_sphere_communication}. allows the rotations associated with  \ref{sec:cube_sphere_communication}. allows the rotations associated with
794  moving between the six cube faces to be embedded within the  moving between the six cube faces to be embedded within the
795  tile-tile communication code.  tile-tile communication code.
# Line 812  to support subsequent calls to communica Line 806  to support subsequent calls to communica
806  by the application code. The startup calling sequence followed by the  by the application code. The startup calling sequence followed by the
807  WRAPPER is shown in figure \ref{fig:wrapper_startup}.  WRAPPER is shown in figure \ref{fig:wrapper_startup}.
808    
   
809  \begin{figure}  \begin{figure}
810  \begin{verbatim}  \begin{verbatim}
811    
# Line 849  occurs through the procedure {\em THE\_M Line 842  occurs through the procedure {\em THE\_M
842  \end{figure}  \end{figure}
843    
844  \subsubsection{Multi-threaded execution}  \subsubsection{Multi-threaded execution}
845    \label{sec:multi-threaded-execution}
846  Prior to transferring control to the procedure {\em THE\_MODEL\_MAIN()} the  Prior to transferring control to the procedure {\em THE\_MODEL\_MAIN()} the
847  WRAPPER may cause several coarse grain threads to be initialized. The routine  WRAPPER may cause several coarse grain threads to be initialized. The routine
848  {\em THE\_MODEL\_MAIN()} is called once for each thread and is passed a single  {\em THE\_MODEL\_MAIN()} is called once for each thread and is passed a single
# Line 904  parallelization the compiler may otherwi Line 898  parallelization the compiler may otherwi
898  \end{enumerate}  \end{enumerate}
899    
900    
 \paragraph{Environment variables}  
 On most systems multi-threaded execution also requires the setting  
 of a special environment variable. On many machines this variable  
 is called PARALLEL and its values should be set to the number  
 of parallel threads required. Generally the help pages associated  
 with the multi-threaded compiler on a machine will explain  
 how to set the required environment variables for that machines.  
   
 \paragraph{Runtime input parameters}  
 Finally the file {\em eedata} needs to be configured to indicate  
 the number of threads to be used in the x and y directions.  
 The variables {\em nTx} and {\em nTy} in this file are used to  
 specify the information required. The product of {\em nTx} and  
 {\em nTy} must be equal to the number of threads spawned i.e.  
 the setting of the environment variable PARALLEL.  
 The value of {\em nTx} must subdivide the number of sub-domains  
 in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the  
 number of sub-domains in y ({\em nSy}) exactly.  
   
901  An example of valid settings for the {\em eedata} file for a  An example of valid settings for the {\em eedata} file for a
902  domain with two subdomains in y and running with two threads is shown  domain with two subdomains in y and running with two threads is shown
903  below  below
# Line 955  Parameter:  {\em nTy} Line 930  Parameter:  {\em nTy}
930  } \\  } \\
931    
932  \subsubsection{Multi-process execution}  \subsubsection{Multi-process execution}
933    \label{sec:multi-process-execution}
934    
935  Despite its appealing programming model, multi-threaded execution remains  Despite its appealing programming model, multi-threaded execution remains
936  less common then multi-process execution. One major reason for this  less common then multi-process execution. One major reason for this
# Line 966  models varies between systems. Line 942  models varies between systems.
942    
943  Multi-process execution is more ubiquitous.  Multi-process execution is more ubiquitous.
944  In order to run code in a multi-process configuration a decomposition  In order to run code in a multi-process configuration a decomposition
945  specification is given ( in which the at least one of the  specification ( see section \ref{sec:specifying_a_decomposition})
946    is given ( in which the at least one of the
947  parameters {\em nPx} or {\em nPy} will be greater than one)  parameters {\em nPx} or {\em nPy} will be greater than one)
948  and then, as for multi-threaded operation,  and then, as for multi-threaded operation,
949  appropriate compile time and run time steps must be taken.  appropriate compile time and run time steps must be taken.
# Line 1029  using a command such as Line 1006  using a command such as
1006  \begin{verbatim}  \begin{verbatim}
1007  mpirun -np 64 -machinefile mf ./mitgcmuv  mpirun -np 64 -machinefile mf ./mitgcmuv
1008  \end{verbatim}  \end{verbatim}
1009  In this example the text {\em -np 64} specifices the number of processes  In this example the text {\em -np 64} specifies the number of processes
1010  that will be created. The numeric value {\em 64} must be equal to the  that will be created. The numeric value {\em 64} must be equal to the
1011  product of the processor grid settings of {\em nPx} and {\em nPy}  product of the processor grid settings of {\em nPx} and {\em nPy}
1012  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
# Line 1046  Parameter: {\em nPy} Line 1023  Parameter: {\em nPy}
1023  \end{minipage}  \end{minipage}
1024  } \\  } \\
1025    
1026    
1027    \paragraph{Environment variables}
1028    On most systems multi-threaded execution also requires the setting
1029    of a special environment variable. On many machines this variable
1030    is called PARALLEL and its values should be set to the number
1031    of parallel threads required. Generally the help pages associated
1032    with the multi-threaded compiler on a machine will explain
1033    how to set the required environment variables for that machines.
1034    
1035    \paragraph{Runtime input parameters}
1036    Finally the file {\em eedata} needs to be configured to indicate
1037    the number of threads to be used in the x and y directions.
1038    The variables {\em nTx} and {\em nTy} in this file are used to
1039    specify the information required. The product of {\em nTx} and
1040    {\em nTy} must be equal to the number of threads spawned i.e.
1041    the setting of the environment variable PARALLEL.
1042    The value of {\em nTx} must subdivide the number of sub-domains
1043    in x ({\em nSx}) exactly. The value of {\em nTy} must subdivide the
1044    number of sub-domains in y ({\em nSy}) exactly.
1045  The multiprocess startup of the MITgcm executable {\em mitgcmuv}  The multiprocess startup of the MITgcm executable {\em mitgcmuv}
1046  is controlled by the routines {\em EEBOOT\_MINIMAL()} and  is controlled by the routines {\em EEBOOT\_MINIMAL()} and
1047  {\em INI\_PROCS()}. The first routine performs basic steps required  {\em INI\_PROCS()}. The first routine performs basic steps required
# Line 1058  number so that process number 0 will cre Line 1054  number so that process number 0 will cre
1054  output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files  output files {\bf STDOUT.0001} and {\bf STDERR.0001} etc... These files
1055  are used for reporting status and configuration information and  are used for reporting status and configuration information and
1056  for reporting error conditions on a process by process basis.  for reporting error conditions on a process by process basis.
1057  The {{\em EEBOOT\_MINIMAL()} procedure also sets the variables  The {\em EEBOOT\_MINIMAL()} procedure also sets the variables
1058  {\em myProcId} and {\em MPI\_COMM\_MODEL}.  {\em myProcId} and {\em MPI\_COMM\_MODEL}.
1059  These variables are related  These variables are related
1060  to processor identification are are used later in the routine  to processor identification are are used later in the routine
# Line 1099  Parameter: {\em pidN       } Line 1095  Parameter: {\em pidN       }
1095  The WRAPPER maintains internal information that is used for communication  The WRAPPER maintains internal information that is used for communication
1096  operations and that can be customized for different platforms. This section  operations and that can be customized for different platforms. This section
1097  describes the information that is held and used.  describes the information that is held and used.
1098    
1099  \begin{enumerate}  \begin{enumerate}
1100  \item {\bf Tile-tile connectivity information} For each tile the WRAPPER  \item {\bf Tile-tile connectivity information} For each tile the WRAPPER
1101  sets a flag that sets the tile number to the north, south, east and  sets a flag that sets the tile number to the north, south, east and
# Line 1112  This information is held in the variable Line 1109  This information is held in the variable
1109  This latter set of variables can take one of the following values  This latter set of variables can take one of the following values
1110  {\em COMM\_NONE}, {\em COMM\_MSG}, {\em COMM\_PUT} and {\em COMM\_GET}.  {\em COMM\_NONE}, {\em COMM\_MSG}, {\em COMM\_PUT} and {\em COMM\_GET}.
1111  A value of {\em COMM\_NONE} is used to indicate that a tile has no  A value of {\em COMM\_NONE} is used to indicate that a tile has no
1112  neighbor to cummnicate with on a particular face. A value  neighbor to communicate with on a particular face. A value
1113  of {\em COMM\_MSG} is used to indicated that some form of distributed  of {\em COMM\_MSG} is used to indicated that some form of distributed
1114  memory communication is required to communicate between  memory communication is required to communicate between
1115  these tile faces ( see section \ref{sec:distributed_memory_communication}).  these tile faces ( see section \ref{sec:distributed_memory_communication}).
# Line 1169  the product of the parameters {\em nTx} Line 1166  the product of the parameters {\em nTx}
1166  are read from the file {\em eedata}. If the value of {\em nThreads}  are read from the file {\em eedata}. If the value of {\em nThreads}
1167  is inconsistent with the number of threads requested from the  is inconsistent with the number of threads requested from the
1168  operating system (for example by using an environment  operating system (for example by using an environment
1169  varialble as described in section \ref{sec:multi_threaded_execution})  variable as described in section \ref{sec:multi_threaded_execution})
1170  then usually an error will be reported by the routine  then usually an error will be reported by the routine
1171  {\em CHECK\_THREADS}.\\  {\em CHECK\_THREADS}.\\
1172    
# Line 1186  Parameter: {\em nTy} \\ Line 1183  Parameter: {\em nTy} \\
1183  \end{minipage}  \end{minipage}
1184  }  }
1185    
 \begin{figure}  
 \begin{verbatim}  
 C--  
 C--  Parallel directives for MIPS Pro Fortran compiler  
 C--  
 C      Parallel compiler directives for SGI with IRIX  
 C$PAR  PARALLEL DO  
 C$PAR&  CHUNK=1,MP_SCHEDTYPE=INTERLEAVE,  
 C$PAR&  SHARE(nThreads),LOCAL(myThid,I)  
 C  
       DO I=1,nThreads  
         myThid = I  
   
 C--     Invoke nThreads instances of the numerical model  
         CALL THE_MODEL_MAIN(myThid)  
   
       ENDDO  
 \end{verbatim}  
 \caption{Prior to transferring control to  
 the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use  
 MP directives to spawn multiple threads.  
 } \label{fig:mp_directives}  
 \end{figure}  
   
   
1186  \item {\bf memsync flags}  \item {\bf memsync flags}
1187  As discussed in section \ref{sec:memory_consistency}, when using shared memory,  As discussed in section \ref{sec:memory_consistency}, when using shared memory,
1188  a low-level system function may be need to force memory consistency.  a low-level system function may be need to force memory consistency.
# Line 1228  For an Ultra Sparc system the following Line 1200  For an Ultra Sparc system the following
1200  \begin{verbatim}  \begin{verbatim}
1201  asm("membar #LoadStore|#StoreStore");  asm("membar #LoadStore|#StoreStore");
1202  \end{verbatim}  \end{verbatim}
1203  for an Alpha based sytem the euivalent code reads  for an Alpha based system the equivalent code reads
1204  \begin{verbatim}  \begin{verbatim}
1205  asm("mb");  asm("mb");
1206  \end{verbatim}  \end{verbatim}
# Line 1240  asm("lock; addl $0,0(%%esp)": : :"memory Line 1212  asm("lock; addl $0,0(%%esp)": : :"memory
1212  \item {\bf Cache line size}  \item {\bf Cache line size}
1213  As discussed in section \ref{sec:cache_effects_and_false_sharing},  As discussed in section \ref{sec:cache_effects_and_false_sharing},
1214  milti-threaded codes explicitly avoid penalties associated with excessive  milti-threaded codes explicitly avoid penalties associated with excessive
1215  coherence traffic on an SMP system. To do this the sgared memory data structures  coherence traffic on an SMP system. To do this the shared memory data structures
1216  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines
1217  are padded. The variables that control the padding are set in the  are padded. The variables that control the padding are set in the
1218  header file {\em EEPARAMS.h}. These variables are called  header file {\em EEPARAMS.h}. These variables are called
# Line 1248  header file {\em EEPARAMS.h}. These vari Line 1220  header file {\em EEPARAMS.h}. These vari
1220  {\em lShare8}. The default values should not normally need changing.  {\em lShare8}. The default values should not normally need changing.
1221  \item {\bf \_BARRIER}  \item {\bf \_BARRIER}
1222  This is a CPP macro that is expanded to a call to a routine  This is a CPP macro that is expanded to a call to a routine
1223  which synchronises all the logical processors running under the  which synchronizes all the logical processors running under the
1224  WRAPPER. Using a macro here preserves flexibility to insert  WRAPPER. Using a macro here preserves flexibility to insert
1225  a specialized call in-line into application code. By default this  a specialized call in-line into application code. By default this
1226  resolves to calling the procedure {\em BARRIER()}. The default  resolves to calling the procedure {\em BARRIER()}. The default
# Line 1256  setting for the \_BARRIER macro is given Line 1228  setting for the \_BARRIER macro is given
1228    
1229  \item {\bf \_GSUM}  \item {\bf \_GSUM}
1230  This is a CPP macro that is expanded to a call to a routine  This is a CPP macro that is expanded to a call to a routine
1231  which sums up a floating point numner  which sums up a floating point number
1232  over all the logical processors running under the  over all the logical processors running under the
1233  WRAPPER. Using a macro here provides extra flexibility to insert  WRAPPER. Using a macro here provides extra flexibility to insert
1234  a specialized call in-line into application code. By default this  a specialized call in-line into application code. By default this
1235  resolves to calling the procedure {\em GLOBAL\_SOM\_R8()} ( for  resolves to calling the procedure {\em GLOBAL\_SUM\_R8()} ( for
1236  84=bit floating point operands)  64-bit floating point operands)
1237  or {\em GLOBAL\_SOM\_R4()} (for 32-bit floating point operands). The default  or {\em GLOBAL\_SUM\_R4()} (for 32-bit floating point operands). The default
1238  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.
1239  The \_GSUM macro is a performance critical operation, especially for  The \_GSUM macro is a performance critical operation, especially for
1240  large processor count, small tile size configurations.  large processor count, small tile size configurations.
# Line 1281  in the header file {\em CPP\_EEMACROS.h} Line 1253  in the header file {\em CPP\_EEMACROS.h}
1253  \_EXCH operation plays a crucial role in scaling to small tile,  \_EXCH operation plays a crucial role in scaling to small tile,
1254  large logical and physical processor count configurations.  large logical and physical processor count configurations.
1255  The example in section \ref{sec:jam_example} discusses defining an  The example in section \ref{sec:jam_example} discusses defining an
1256  optimised and specialized form on the \_EXCH operation.  optimized and specialized form on the \_EXCH operation.
1257    
1258  The \_EXCH operation is also central to supporting grids such as  The \_EXCH operation is also central to supporting grids such as
1259  the cube-sphere grid. In this class of grid a rotation may be required  the cube-sphere grid. In this class of grid a rotation may be required
1260  between tiles. Aligning the coordinate requiring rotation with the  between tiles. Aligning the coordinate requiring rotation with the
1261  tile decomposistion, allows the coordinate transformation to  tile decomposition, allows the coordinate transformation to
1262  be embedded within a custom form of the \_EXCH primitive.  be embedded within a custom form of the \_EXCH primitive.
1263    
1264  \item {\bf Reverse Mode}  \item {\bf Reverse Mode}
1265  The communication primitives \_EXCH and \_GSUM both employ  The communication primitives \_EXCH and \_GSUM both employ
1266  hand-written adjoint forms (or reverse mode) forms.  hand-written adjoint forms (or reverse mode) forms.
1267  These reverse mode forms can be found in the  These reverse mode forms can be found in the
1268  sourc code directory {\em pkg/autodiff}.  source code directory {\em pkg/autodiff}.
1269  For the global sum primitive the reverse mode form  For the global sum primitive the reverse mode form
1270  calls are to {\em GLOBAL\_ADSUM\_R4} and  calls are to {\em GLOBAL\_ADSUM\_R4} and
1271  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the
1272  exchamge primitives are found in routines  exchange primitives are found in routines
1273  prefixed {\em ADEXCH}. The exchange routines make calls to  prefixed {\em ADEXCH}. The exchange routines make calls to
1274  the same low-level communication primitives as the forward mode  the same low-level communication primitives as the forward mode
1275  operations. However, the routine argument {\em simulationMode}  operations. However, the routine argument {\em simulationMode}
# Line 1309  The variable {\em MAX\_NO\_THREADS} is u Line 1281  The variable {\em MAX\_NO\_THREADS} is u
1281  maximum number of OS threads that a code will use. This  maximum number of OS threads that a code will use. This
1282  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.
1283  For single threaded execution it can be reduced to one if required.  For single threaded execution it can be reduced to one if required.
1284  The va;lue is largely private to the WRAPPER and application code  The value; is largely private to the WRAPPER and application code
1285  will nor normally reference the value, except in the following scenario.  will nor normally reference the value, except in the following scenario.
1286    
1287  For certain physical parametrization schemes it is necessary to have  For certain physical parametrization schemes it is necessary to have
# Line 1324  and {\em nSy} ( as described in section Line 1296  and {\em nSy} ( as described in section
1296  being specified involves many more tiles than OS threads then  being specified involves many more tiles than OS threads then
1297  it can save memory resources to reduce the variable  it can save memory resources to reduce the variable
1298  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that
1299  will be used and to declare the physical parameterisation  will be used and to declare the physical parameterization
1300  work arrays with a sinble {\em MAX\_NO\_THREADS} extra dimension.  work arrays with a single {\em MAX\_NO\_THREADS} extra dimension.
1301  An example of this is given in the verification experiment  An example of this is given in the verification experiment
1302  {\em aim.5l\_cs}. Here the default setting of  {\em aim.5l\_cs}. Here the default setting of
1303  {\em MAX\_NO\_THREADS} is altered to  {\em MAX\_NO\_THREADS} is altered to
# Line 1338  created with declarations of the form. Line 1310  created with declarations of the form.
1310  \begin{verbatim}  \begin{verbatim}
1311        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)
1312  \end{verbatim}  \end{verbatim}
1313  This declaration scheme is not used widely, becuase most global data  This declaration scheme is not used widely, because most global data
1314  is used for permanent not temporary storage of state information.  is used for permanent not temporary storage of state information.
1315  In the case of permanent state information this approach cannot be used  In the case of permanent state information this approach cannot be used
1316  because there has to be enough storage allocated for all tiles.  because there has to be enough storage allocated for all tiles.
1317  However, the technique can sometimes be a useful scheme for reducing memory  However, the technique can sometimes be a useful scheme for reducing memory
1318  requirements in complex physical paramterisations.  requirements in complex physical parameterizations.
   
1319  \end{enumerate}  \end{enumerate}
1320    
1321    \begin{figure}
1322    \begin{verbatim}
1323    C--
1324    C--  Parallel directives for MIPS Pro Fortran compiler
1325    C--
1326    C      Parallel compiler directives for SGI with IRIX
1327    C$PAR  PARALLEL DO
1328    C$PAR&  CHUNK=1,MP_SCHEDTYPE=INTERLEAVE,
1329    C$PAR&  SHARE(nThreads),LOCAL(myThid,I)
1330    C
1331          DO I=1,nThreads
1332            myThid = I
1333    
1334    C--     Invoke nThreads instances of the numerical model
1335            CALL THE_MODEL_MAIN(myThid)
1336    
1337          ENDDO
1338    \end{verbatim}
1339    \caption{Prior to transferring control to
1340    the procedure {\em THE\_MODEL\_MAIN()} the WRAPPER may use
1341    MP directives to spawn multiple threads.
1342    } \label{fig:mp_directives}
1343    \end{figure}
1344    
1345    
1346  \subsubsection{Specializing the Communication Code}  \subsubsection{Specializing the Communication Code}
1347    
1348  The isolation of performance critical communication primitives and the  The isolation of performance critical communication primitives and the
1349  sub-division of the simulation domain into tiles is a powerful tool.  sub-division of the simulation domain into tiles is a powerful tool.
1350  Here we show how it can be used to improve application performance and  Here we show how it can be used to improve application performance and
1351  how it can be used to adapt to new gridding approaches.  how it can be used to adapt to new griding approaches.
1352    
1353  \subsubsection{JAM example}  \subsubsection{JAM example}
1354  \label{sec:jam_example}  \label{sec:jam_example}
# Line 1371  communications library ( see {\em ini\_j Line 1367  communications library ( see {\em ini\_j
1367  \item The {\em \_GSUM} and {\em \_EXCH} macro definitions are replaced  \item The {\em \_GSUM} and {\em \_EXCH} macro definitions are replaced
1368  with calls to custom routines ( see {\em gsum\_jam.F} and {\em exch\_jam.F})  with calls to custom routines ( see {\em gsum\_jam.F} and {\em exch\_jam.F})
1369  \item a highly specialized form of the exchange operator (optimized  \item a highly specialized form of the exchange operator (optimized
1370  for overlap regions of width one) is substitued into the elliptic  for overlap regions of width one) is substituted into the elliptic
1371  solver routine {\em cg2d.F}.  solver routine {\em cg2d.F}.
1372  \end{itemize}  \end{itemize}
1373  Developing specialized code for other libraries follows a similar  Developing specialized code for other libraries follows a similar
# Line 1383  Actual {\em \_EXCH} routine code is gene Line 1379  Actual {\em \_EXCH} routine code is gene
1379  a series of template files, for example {\em exch\_rx.template}.  a series of template files, for example {\em exch\_rx.template}.
1380  This is done to allow a large number of variations on the exchange  This is done to allow a large number of variations on the exchange
1381  process to be maintained. One set of variations supports the  process to be maintained. One set of variations supports the
1382  cube sphere grid. Support for a cube sphere gris in MITgcm is based  cube sphere grid. Support for a cube sphere grid in MITgcm is based
1383  on having each face of the cube as a separate tile (or tiles).  on having each face of the cube as a separate tile (or tiles).
1384  The exchage routines are then able to absorb much of the  The exchange routines are then able to absorb much of the
1385  detailed rotation and reorientation required when moving around the  detailed rotation and reorientation required when moving around the
1386  cube grid. The set of {\em \_EXCH} routines that contain the  cube grid. The set of {\em \_EXCH} routines that contain the
1387  word cube in their name perform these transformations.  word cube in their name perform these transformations.
1388  They are invoked when the run-time logical parameter  They are invoked when the run-time logical parameter
1389  {\em useCubedSphereExchange} is set true. To facilitate the  {\em useCubedSphereExchange} is set true. To facilitate the
1390  transformations on a staggered C-grid, exchange operations are defined  transformations on a staggered C-grid, exchange operations are defined
1391  separately for both vector and scalar quantitities and for  separately for both vector and scalar quantities and for
1392  grid-centered and for grid-face and corner quantities.  grid-centered and for grid-face and corner quantities.
1393  Three sets of exchange routines are defined. Routines  Three sets of exchange routines are defined. Routines
1394  with names of the form {\em exch\_rx} are used to exchange  with names of the form {\em exch\_rx} are used to exchange
# Line 1457  C  : Line 1453  C  :
1453  C  |  C  |
1454  C  |-THE_MODEL_MAIN :: Primary driver for the MITgcm algorithm  C  |-THE_MODEL_MAIN :: Primary driver for the MITgcm algorithm
1455  C    |              :: Called from WRAPPER level numerical  C    |              :: Called from WRAPPER level numerical
1456  C    |              :: code innvocation routine. On entry  C    |              :: code invocation routine. On entry
1457  C    |              :: to THE_MODEL_MAIN separate thread and  C    |              :: to THE_MODEL_MAIN separate thread and
1458  C    |              :: separate processes will have been established.  C    |              :: separate processes will have been established.
1459  C    |              :: Each thread and process will have a unique ID  C    |              :: Each thread and process will have a unique ID
# Line 1471  C    | |-INI_PARMS :: Routine to set ker Line 1467  C    | |-INI_PARMS :: Routine to set ker
1467  C    | |           :: By default kernel parameters are read from file  C    | |           :: By default kernel parameters are read from file
1468  C    | |           :: "data" in directory in which code executes.  C    | |           :: "data" in directory in which code executes.
1469  C    | |  C    | |
1470  C    | |-MON_INIT :: Initialises monitor pacakge ( see pkg/monitor )  C    | |-MON_INIT :: Initializes monitor package ( see pkg/monitor )
1471  C    | |  C    | |
1472  C    | |-INI_GRID :: Control grid array (vert. and hori.) initialisation.  C    | |-INI_GRID :: Control grid array (vert. and hori.) initialization.
1473  C    | | |        :: Grid arrays are held and described in GRID.h.  C    | | |        :: Grid arrays are held and described in GRID.h.
1474  C    | | |  C    | | |
1475  C    | | |-INI_VERTICAL_GRID        :: Initialise vertical grid arrays.  C    | | |-INI_VERTICAL_GRID        :: Initialize vertical grid arrays.
1476  C    | | |  C    | | |
1477  C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialisation  C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialization
1478  C    | | |                          :: (calculate grid from kernel parameters).  C    | | |                          :: (calculate grid from kernel parameters).
1479  C    | | |  C    | | |
1480  C    | | |-INI_SPHERICAL_POLAR_GRID :: Spherical polar horiz. grid  C    | | |-INI_SPHERICAL_POLAR_GRID :: Spherical polar horiz. grid
1481  C    | | |                          :: initialisation (calculate grid from  C    | | |                          :: initialization (calculate grid from
1482  C    | | |                          :: kernel parameters).  C    | | |                          :: kernel parameters).
1483  C    | | |  C    | | |
1484  C    | | |-INI_CURVILINEAR_GRID     :: General orthogonal, structured horiz.  C    | | |-INI_CURVILINEAR_GRID     :: General orthogonal, structured horiz.
1485  C    | |                            :: grid initialisations. ( input from raw  C    | |                            :: grid initializations. ( input from raw
1486  C    | |                            :: grid files, LONC.bin, DXF.bin etc... )  C    | |                            :: grid files, LONC.bin, DXF.bin etc... )
1487  C    | |  C    | |
1488  C    | |-INI_DEPTHS    :: Read (from "bathyFile") or set bathymetry/orgography.  C    | |-INI_DEPTHS    :: Read (from "bathyFile") or set bathymetry/orgography.
# Line 1497  C    | | Line 1493  C    | |
1493  C    | |-INI_LINEAR_PHSURF :: Set ref. surface Bo_surf  C    | |-INI_LINEAR_PHSURF :: Set ref. surface Bo_surf
1494  C    | |  C    | |
1495  C    | |-INI_CORI          :: Set coriolis term. zero, f-plane, beta-plane,  C    | |-INI_CORI          :: Set coriolis term. zero, f-plane, beta-plane,
1496  C    | |                   :: sphere optins are coded.  C    | |                   :: sphere options are coded.
1497  C    | |  C    | |
1498  C    | |-PACAKGES_BOOT      :: Start up the optional package environment.  C    | |-PACAKGES_BOOT      :: Start up the optional package environment.
1499  C    | |                    :: Runtime selection of active packages.  C    | |                    :: Runtime selection of active packages.
# Line 1518  C    | | Line 1514  C    | |
1514  C    | |-PACKAGES_CHECK  C    | |-PACKAGES_CHECK
1515  C    | | |  C    | | |
1516  C    | | |-KPP_CHECK           :: KPP Package. pkg/kpp  C    | | |-KPP_CHECK           :: KPP Package. pkg/kpp
1517  C    | | |-OBCS_CHECK          :: Open bndy Pacakge. pkg/obcs  C    | | |-OBCS_CHECK          :: Open bndy Package. pkg/obcs
1518  C    | | |-GMREDI_CHECK        :: GM Package. pkg/gmredi  C    | | |-GMREDI_CHECK        :: GM Package. pkg/gmredi
1519  C    | |  C    | |
1520  C    | |-PACKAGES_INIT_FIXED  C    | |-PACKAGES_INIT_FIXED
# Line 1538  C    | Line 1534  C    |
1534  C    |-CTRL_UNPACK :: Control vector support package. see pkg/ctrl  C    |-CTRL_UNPACK :: Control vector support package. see pkg/ctrl
1535  C    |  C    |
1536  C    |-ADTHE_MAIN_LOOP :: Derivative evaluating form of main time stepping loop  C    |-ADTHE_MAIN_LOOP :: Derivative evaluating form of main time stepping loop
1537  C    !                 :: Auotmatically gerenrated by TAMC/TAF.  C    !                 :: Auotmatically generated by TAMC/TAF.
1538  C    |  C    |
1539  C    |-CTRL_PACK   :: Control vector support package. see pkg/ctrl  C    |-CTRL_PACK   :: Control vector support package. see pkg/ctrl
1540  C    |  C    |
# Line 1552  C    | | | Line 1548  C    | | |
1548  C    | | |-INI_LINEAR_PHISURF :: Set ref. surface Bo_surf  C    | | |-INI_LINEAR_PHISURF :: Set ref. surface Bo_surf
1549  C    | | |  C    | | |
1550  C    | | |-INI_CORI     :: Set coriolis term. zero, f-plane, beta-plane,  C    | | |-INI_CORI     :: Set coriolis term. zero, f-plane, beta-plane,
1551  C    | | |              :: sphere optins are coded.  C    | | |              :: sphere options are coded.
1552  C    | | |  C    | | |
1553  C    | | |-INI_CG2D     :: 2d con. grad solver initialisation.  C    | | |-INI_CG2D     :: 2d con. grad solver initialisation.
1554  C    | | |-INI_CG3D     :: 3d con. grad solver initialisation.  C    | | |-INI_CG3D     :: 3d con. grad solver initialisation.
# Line 1560  C    | | |-INI_MIXING   :: Initialise di Line 1556  C    | | |-INI_MIXING   :: Initialise di
1556  C    | | |-INI_DYNVARS  :: Initialise to zero all DYNVARS.h arrays (dynamical  C    | | |-INI_DYNVARS  :: Initialise to zero all DYNVARS.h arrays (dynamical
1557  C    | | |              :: fields).  C    | | |              :: fields).
1558  C    | | |  C    | | |
1559  C    | | |-INI_FIELDS   :: Control initialising model fields to non-zero  C    | | |-INI_FIELDS   :: Control initializing model fields to non-zero
1560  C    | | | |-INI_VEL    :: Initialize 3D flow field.  C    | | | |-INI_VEL    :: Initialize 3D flow field.
1561  C    | | | |-INI_THETA  :: Set model initial temperature field.  C    | | | |-INI_THETA  :: Set model initial temperature field.
1562  C    | | | |-INI_SALT   :: Set model initial salinity field.  C    | | | |-INI_SALT   :: Set model initial salinity field.
# Line 1638  C/\  | | |-CALC_EXACT_ETA :: Change SSH Line 1634  C/\  | | |-CALC_EXACT_ETA :: Change SSH
1634  C/\  | | |-CALC_SURF_DR   :: Calculate the new surface level thickness.  C/\  | | |-CALC_SURF_DR   :: Calculate the new surface level thickness.
1635  C/\  | | |-EXF_GETFORCING :: External forcing package. ( pkg/exf )  C/\  | | |-EXF_GETFORCING :: External forcing package. ( pkg/exf )
1636  C/\  | | |-EXTERNAL_FIELDS_LOAD :: Control loading time dep. external data.  C/\  | | |-EXTERNAL_FIELDS_LOAD :: Control loading time dep. external data.
1637  C/\  | | | |                    :: Simple interpolcation between end-points  C/\  | | | |                    :: Simple interpolation between end-points
1638  C/\  | | | |                    :: for forcing datasets.  C/\  | | | |                    :: for forcing datasets.
1639  C/\  | | | |                    C/\  | | | |                  
1640  C/\  | | | |-EXCH :: Sync forcing. in overlap regions.  C/\  | | | |-EXCH :: Sync forcing. in overlap regions.

Legend:
Removed from v.1.1  
changed lines
  Added in v.1.5

  ViewVC Help
Powered by ViewVC 1.1.22