/[MITgcm]/manual/s_software/text/sarch.tex
ViewVC logotype

Diff of /manual/s_software/text/sarch.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph | View Patch Patch

revision 1.3 by cnh, Mon Oct 22 03:28:33 2001 UTC revision 1.13 by afe, Wed Jan 28 20:27:54 2004 UTC
# Line 1  Line 1 
1  % $Header$  % $Header$
2    
3  In this chapter we describe the software architecture and  This chapter focuses on describing the {\bf WRAPPER} environment within which
4  implementation strategy for the MITgcm code. The first part of this  both the core numerics and the pluggable packages operate. The description
5  chapter discusses the MITgcm architecture at an abstract level. In the second  presented here is intended to be a detailed exposition and contains significant
6  part of the chapter we described practical details of the MITgcm implementation  background material, as well as advanced details on working with the WRAPPER.
7  and of current tools and operating system features that are employed.  The tutorial sections of this manual (see sections
8    \ref{sect:tutorials}  and \ref{sect:tutorialIII})
9    contain more succinct, step-by-step instructions on running basic numerical
10    experiments, of varous types, both sequentially and in parallel. For many
11    projects simply starting from an example code and adapting it to suit a
12    particular situation
13    will be all that is required.
14    The first part of this chapter discusses the MITgcm architecture at an
15    abstract level. In the second part of the chapter we described practical
16    details of the MITgcm implementation and of current tools and operating system
17    features that are employed.
18    
19  \section{Overall architectural goals}  \section{Overall architectural goals}
20    
# Line 28  of Line 38  of
38    
39  \begin{enumerate}  \begin{enumerate}
40  \item A core set of numerical and support code. This is discussed in detail in  \item A core set of numerical and support code. This is discussed in detail in
41  section \ref{sec:partII}.  section \ref{sect:partII}.
42  \item A scheme for supporting optional "pluggable" {\bf packages} (containing  \item A scheme for supporting optional "pluggable" {\bf packages} (containing
43  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).  for example mixed-layer schemes, biogeochemical schemes, atmospheric physics).
44  These packages are used both to overlay alternate dynamics and to introduce  These packages are used both to overlay alternate dynamics and to introduce
# Line 41  packages operate. Line 51  packages operate.
51    
52  This chapter focuses on describing the {\bf WRAPPER} environment under which  This chapter focuses on describing the {\bf WRAPPER} environment under which
53  both the core numerics and the pluggable packages function. The description  both the core numerics and the pluggable packages function. The description
54  presented here is intended to be a detailed exposistion and contains significant  presented here is intended to be a detailed exposition and contains significant
55  background material, as well as advanced details on working with the WRAPPER.  background material, as well as advanced details on working with the WRAPPER.
56  The examples section of this manual (part \ref{part:example}) contains more  The examples section of this manual (part \ref{part:example}) contains more
57  succinct, step-by-step instructions on running basic numerical  succinct, step-by-step instructions on running basic numerical
# Line 74  Environment Resource). All numerical and Line 84  Environment Resource). All numerical and
84  to ``fit'' within the WRAPPER infrastructure. Writing code to ``fit'' within  to ``fit'' within the WRAPPER infrastructure. Writing code to ``fit'' within
85  the WRAPPER means that coding has to follow certain, relatively  the WRAPPER means that coding has to follow certain, relatively
86  straightforward, rules and conventions ( these are discussed further in  straightforward, rules and conventions ( these are discussed further in
87  section \ref{sec:specifying_a_decomposition} ).  section \ref{sect:specifying_a_decomposition} ).
88    
89  The approach taken by the WRAPPER is illustrated in figure  The approach taken by the WRAPPER is illustrated in figure
90  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code  \ref{fig:fit_in_wrapper} which shows how the WRAPPER serves to insulate code
# Line 87  and operating systems. This allows numer Line 97  and operating systems. This allows numer
97  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}  \resizebox{!}{4.5in}{\includegraphics{part4/fit_in_wrapper.eps}}
98  \end{center}  \end{center}
99  \caption{  \caption{
100  Numerical code is written too fit within a software support  Numerical code is written to fit within a software support
101  infrastructure called WRAPPER. The WRAPPER is portable and  infrastructure called WRAPPER. The WRAPPER is portable and
102  can be sepcialized for a wide range of specific target hardware and  can be specialized for a wide range of specific target hardware and
103  programming environments, without impacting numerical code that fits  programming environments, without impacting numerical code that fits
104  within the WRAPPER. Codes that fit within the WRAPPER can generally be  within the WRAPPER. Codes that fit within the WRAPPER can generally be
105  made to run as fast on a particular platform as codes specially  made to run as fast on a particular platform as codes specially
# Line 98  optimized for that platform.} Line 108  optimized for that platform.}
108  \end{figure}  \end{figure}
109    
110  \subsection{Target hardware}  \subsection{Target hardware}
111  \label{sec:target_hardware}  \label{sect:target_hardware}
112    
113  The WRAPPER is designed to target as broad as possible a range of computer  The WRAPPER is designed to target as broad as possible a range of computer
114  systems. The original development of the WRAPPER took place on a  systems. The original development of the WRAPPER took place on a
# Line 110  uniprocessor and multi-processor Sun sys Line 120  uniprocessor and multi-processor Sun sys
120  (UMA) and non-uniform memory access (NUMA) designs. Significant work has also  (UMA) and non-uniform memory access (NUMA) designs. Significant work has also
121  been undertaken on x86 cluster systems, Alpha processor based clustered SMP  been undertaken on x86 cluster systems, Alpha processor based clustered SMP
122  systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics.  systems, and on cache-coherent NUMA (CC-NUMA) systems from Silicon Graphics.
123  The MITgcm code, operating within the WRAPPER, is also used routinely used on  The MITgcm code, operating within the WRAPPER, is also routinely used on
124  large scale MPP systems (for example T3E systems and IBM SP systems). In all  large scale MPP systems (for example T3E systems and IBM SP systems). In all
125  cases numerical code, operating within the WRAPPER, performs and scales very  cases numerical code, operating within the WRAPPER, performs and scales very
126  competitively with equivalent numerical code that has been modified to contain  competitively with equivalent numerical code that has been modified to contain
# Line 118  native optimizations for a particular sy Line 128  native optimizations for a particular sy
128    
129  \subsection{Supporting hardware neutrality}  \subsection{Supporting hardware neutrality}
130    
131  The different systems listed in section \ref{sec:target_hardware} can be  The different systems listed in section \ref{sect:target_hardware} can be
132  categorized in many different ways. For example, one common distinction is  categorized in many different ways. For example, one common distinction is
133  between shared-memory parallel systems (SMP's, PVP's) and distributed memory  between shared-memory parallel systems (SMP's, PVP's) and distributed memory
134  parallel systems (for example x86 clusters and large MPP systems). This is one  parallel systems (for example x86 clusters and large MPP systems). This is one
# Line 136  particular machine (for example an IBM S Line 146  particular machine (for example an IBM S
146  class of machines (for example Parallel Vector Processor Systems). Instead the  class of machines (for example Parallel Vector Processor Systems). Instead the
147  WRAPPER provides applications with an  WRAPPER provides applications with an
148  abstract {\it machine model}. The machine model is very general, however, it can  abstract {\it machine model}. The machine model is very general, however, it can
149  easily be specialized to fit, in a computationally effificent manner, any  easily be specialized to fit, in a computationally efficient manner, any
150  computer architecture currently available to the scientific computing community.  computer architecture currently available to the scientific computing community.
151    
152  \subsection{Machine model parallelism}  \subsection{Machine model parallelism}
153    
154   Codes operating under the WRAPPER target an abstract machine that is assumed to   Codes operating under the WRAPPER target an abstract machine that is assumed to
155  consist of one or more logical processors that can compute concurrently.    consist of one or more logical processors that can compute concurrently.  
156  Computational work is divided amongst the logical  Computational work is divided among the logical
157  processors by allocating ``ownership'' to  processors by allocating ``ownership'' to
158  each processor of a certain set (or sets) of calculations. Each set of  each processor of a certain set (or sets) of calculations. Each set of
159  calculations owned by a particular processor is associated with a specific  calculations owned by a particular processor is associated with a specific
# Line 166  Computationally, associated with each re Line 176  Computationally, associated with each re
176  space allocated to a particular logical processor, there will be data  space allocated to a particular logical processor, there will be data
177  structures (arrays, scalar variables etc...) that hold the simulated state of  structures (arrays, scalar variables etc...) that hold the simulated state of
178  that region. We refer to these data structures as being {\bf owned} by the  that region. We refer to these data structures as being {\bf owned} by the
179  pprocessor to which their  processor to which their
180  associated region of physical space has been allocated. Individual  associated region of physical space has been allocated. Individual
181  regions that are allocated to processors are called {\bf tiles}. A  regions that are allocated to processors are called {\bf tiles}. A
182  processor can own more  processor can own more
# Line 211  computational phases a processor will re Line 221  computational phases a processor will re
221  whenever it requires values that outside the domain it owns. Periodically  whenever it requires values that outside the domain it owns. Periodically
222  processors will make calls to WRAPPER functions to communicate data between  processors will make calls to WRAPPER functions to communicate data between
223  tiles, in order to keep the overlap regions up to date (see section  tiles, in order to keep the overlap regions up to date (see section
224  \ref{sec:communication_primitives}). The WRAPPER functions can use a  \ref{sect:communication_primitives}). The WRAPPER functions can use a
225  variety of different mechanisms to communicate data between tiles.  variety of different mechanisms to communicate data between tiles.
226    
227  \begin{figure}  \begin{figure}
# Line 298  value to be communicated between CPU's. Line 308  value to be communicated between CPU's.
308  \end{figure}  \end{figure}
309    
310  \subsection{Shared memory communication}  \subsection{Shared memory communication}
311  \label{sec:shared_memory_communication}  \label{sect:shared_memory_communication}
312    
313  Under shared communication independent CPU's are operating  Under shared communication independent CPU's are operating
314  on the exact same global address space at the application level.  on the exact same global address space at the application level.
# Line 324  the systems main-memory interconnect. Th Line 334  the systems main-memory interconnect. Th
334  communication very efficient provided it is used appropriately.  communication very efficient provided it is used appropriately.
335    
336  \subsubsection{Memory consistency}  \subsubsection{Memory consistency}
337  \label{sec:memory_consistency}  \label{sect:memory_consistency}
338    
339  When using shared memory communication between  When using shared memory communication between
340  multiple processors the WRAPPER level shields user applications from  multiple processors the WRAPPER level shields user applications from
# Line 348  memory, the WRAPPER provides a place to Line 358  memory, the WRAPPER provides a place to
358  ensure memory consistency for a particular platform.  ensure memory consistency for a particular platform.
359    
360  \subsubsection{Cache effects and false sharing}  \subsubsection{Cache effects and false sharing}
361  \label{sec:cache_effects_and_false_sharing}  \label{sect:cache_effects_and_false_sharing}
362    
363  Shared-memory machines often have local to processor memory caches  Shared-memory machines often have local to processor memory caches
364  which contain mirrored copies of main memory. Automatic cache-coherence  which contain mirrored copies of main memory. Automatic cache-coherence
# Line 367  in an application are potentially visibl Line 377  in an application are potentially visibl
377  threads operating within a single process is the standard mechanism for  threads operating within a single process is the standard mechanism for
378  supporting shared memory that the WRAPPER utilizes. Configuring and launching  supporting shared memory that the WRAPPER utilizes. Configuring and launching
379  code to run in multi-threaded mode on specific platforms is discussed in  code to run in multi-threaded mode on specific platforms is discussed in
380  section \ref{sec:running_with_threads}.  However, on many systems, potentially  section \ref{sect:running_with_threads}.  However, on many systems, potentially
381  very efficient mechanisms for using shared memory communication between  very efficient mechanisms for using shared memory communication between
382  multiple processes (in contrast to multiple threads within a single  multiple processes (in contrast to multiple threads within a single
383  process) also exist. In most cases this works by making a limited region of  process) also exist. In most cases this works by making a limited region of
# Line 380  distributed with the default WRAPPER sou Line 390  distributed with the default WRAPPER sou
390  nature.  nature.
391    
392  \subsection{Distributed memory communication}  \subsection{Distributed memory communication}
393  \label{sec:distributed_memory_communication}  \label{sect:distributed_memory_communication}
394  Many parallel systems are not constructed in a way where it is  Many parallel systems are not constructed in a way where it is
395  possible or practical for an application to use shared memory  possible or practical for an application to use shared memory
396  for communication. For example cluster systems consist of individual computers  for communication. For example cluster systems consist of individual computers
# Line 394  described in \ref{hoe-hill:99} substitut Line 404  described in \ref{hoe-hill:99} substitut
404  highly optimized library.  highly optimized library.
405    
406  \subsection{Communication primitives}  \subsection{Communication primitives}
407  \label{sec:communication_primitives}  \label{sect:communication_primitives}
408    
409  \begin{figure}  \begin{figure}
410  \begin{center}  \begin{center}
# Line 402  highly optimized library. Line 412  highly optimized library.
412    \includegraphics{part4/comm-primm.eps}    \includegraphics{part4/comm-primm.eps}
413   }   }
414  \end{center}  \end{center}
415  \caption{Three performance critical parallel primititives are provided  \caption{Three performance critical parallel primitives are provided
416  by the WRAPPER. These primititives are always used to communicate data  by the WRAPPER. These primitives are always used to communicate data
417  between tiles. The figure shows four tiles. The curved arrows indicate  between tiles. The figure shows four tiles. The curved arrows indicate
418  exchange primitives which transfer data between the overlap regions at tile  exchange primitives which transfer data between the overlap regions at tile
419  edges and interior regions for nearest-neighbor tiles.  edges and interior regions for nearest-neighbor tiles.
# Line 538  WRAPPER are Line 548  WRAPPER are
548  computing CPU's.  computing CPU's.
549  \end{enumerate}  \end{enumerate}
550  This section describes the details of each of these operations.  This section describes the details of each of these operations.
551  Section \ref{sec:specifying_a_decomposition} explains how the way in which  Section \ref{sect:specifying_a_decomposition} explains how the way in which
552  a domain is decomposed (or composed) is expressed. Section  a domain is decomposed (or composed) is expressed. Section
553  \ref{sec:starting_a_code} describes practical details of running codes  \ref{sect:starting_a_code} describes practical details of running codes
554  in various different parallel modes on contemporary computer systems.  in various different parallel modes on contemporary computer systems.
555  Section \ref{sec:controlling_communication} explains the internal information  Section \ref{sect:controlling_communication} explains the internal information
556  that the WRAPPER uses to control how information is communicated between  that the WRAPPER uses to control how information is communicated between
557  tiles.  tiles.
558    
559  \subsection{Specifying a domain decomposition}  \subsection{Specifying a domain decomposition}
560  \label{sec:specifying_a_decomposition}  \label{sect:specifying_a_decomposition}
561    
562  At its heart much of the WRAPPER works only in terms of a collection of tiles  At its heart much of the WRAPPER works only in terms of a collection of tiles
563  which are interconnected to each other. This is also true of application  which are interconnected to each other. This is also true of application
# Line 599  be created within a single process. Each Line 609  be created within a single process. Each
609  dimensions of {\em sNx} and {\em sNy}. If, when the code is executed, these tiles are  dimensions of {\em sNx} and {\em sNy}. If, when the code is executed, these tiles are
610  allocated to different threads of a process that are then bound to  allocated to different threads of a process that are then bound to
611  different physical processors ( see the multi-threaded  different physical processors ( see the multi-threaded
612  execution discussion in section \ref{sec:starting_the_code} ) then  execution discussion in section \ref{sect:starting_the_code} ) then
613  computation will be performed concurrently on each tile. However, it is also  computation will be performed concurrently on each tile. However, it is also
614  possible to run the same decomposition within a process running a single thread on  possible to run the same decomposition within a process running a single thread on
615  a single processor. In this case the tiles will be computed over sequentially.  a single processor. In this case the tiles will be computed over sequentially.
# Line 651  Within a {\em bi}, {\em bj} loop Line 661  Within a {\em bi}, {\em bj} loop
661  computation is performed concurrently over as many processes and threads  computation is performed concurrently over as many processes and threads
662  as there are physical processors available to compute.  as there are physical processors available to compute.
663    
664    An exception to the the use of {\em bi} and {\em bj} in loops arises in the
665    exchange routines used when the exch2 package is used with the cubed
666    sphere.  In this case {\em bj} is generally set to 1 and the loop runs from
667    1,{\em bi}.  Within the loop {\em bi} is used to retrieve the tile number,
668    which is then used to reference exchange parameters.
669    
670  The amount of computation that can be embedded  The amount of computation that can be embedded
671  a single loop over {\em bi} and {\em bj} varies for different parts of the  a single loop over {\em bi} and {\em bj} varies for different parts of the
672  MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract  MITgcm algorithm. Figure \ref{fig:bibj_extract} shows a code extract
# Line 771  The global domain size is again ninety g Line 787  The global domain size is again ninety g
787  forty grid points in y. The two sub-domains in each process will be computed  forty grid points in y. The two sub-domains in each process will be computed
788  sequentially if they are given to a single thread within a single process.  sequentially if they are given to a single thread within a single process.
789  Alternatively if the code is invoked with multiple threads per process  Alternatively if the code is invoked with multiple threads per process
790  the two domains in y may be computed on concurrently.  the two domains in y may be computed concurrently.
791  \item  \item
792  \begin{verbatim}  \begin{verbatim}
793        PARAMETER (        PARAMETER (
# Line 789  thirty-two grid points, and x and y over Line 805  thirty-two grid points, and x and y over
805  There are six tiles allocated to six separate logical processors ({\em nSx=6}).  There are six tiles allocated to six separate logical processors ({\em nSx=6}).
806  This set of values can be used for a cube sphere calculation.  This set of values can be used for a cube sphere calculation.
807  Each tile of size $32 \times 32$ represents a face of the  Each tile of size $32 \times 32$ represents a face of the
808  cube. Initialising the tile connectivity correctly ( see section  cube. Initializing the tile connectivity correctly ( see section
809  \ref{sec:cube_sphere_communication}. allows the rotations associated with  \ref{sect:cube_sphere_communication}. allows the rotations associated with
810  moving between the six cube faces to be embedded within the  moving between the six cube faces to be embedded within the
811  tile-tile communication code.  tile-tile communication code.
812  \end{enumerate}  \end{enumerate}
813    
814    
815  \subsection{Starting the code}  \subsection{Starting the code}
816  \label{sec:starting_the_code}  \label{sect:starting_the_code}
817  When code is started under the WRAPPER, execution begins in a main routine {\em  When code is started under the WRAPPER, execution begins in a main routine {\em
818  eesupp/src/main.F} that is owned by the WRAPPER. Control is transferred  eesupp/src/main.F} that is owned by the WRAPPER. Control is transferred
819  to the application through a routine called {\em THE\_MODEL\_MAIN()}  to the application through a routine called {\em THE\_MODEL\_MAIN()}
# Line 807  by the application code. The startup cal Line 823  by the application code. The startup cal
823  WRAPPER is shown in figure \ref{fig:wrapper_startup}.  WRAPPER is shown in figure \ref{fig:wrapper_startup}.
824    
825  \begin{figure}  \begin{figure}
826    {\footnotesize
827  \begin{verbatim}  \begin{verbatim}
828    
829         MAIN           MAIN  
# Line 835  WRAPPER is shown in figure \ref{fig:wrap Line 852  WRAPPER is shown in figure \ref{fig:wrap
852    
853    
854  \end{verbatim}  \end{verbatim}
855    }
856  \caption{Main stages of the WRAPPER startup procedure.  \caption{Main stages of the WRAPPER startup procedure.
857  This process proceeds transfer of control to application code, which  This process proceeds transfer of control to application code, which
858  occurs through the procedure {\em THE\_MODEL\_MAIN()}.  occurs through the procedure {\em THE\_MODEL\_MAIN()}.
# Line 842  occurs through the procedure {\em THE\_M Line 860  occurs through the procedure {\em THE\_M
860  \end{figure}  \end{figure}
861    
862  \subsubsection{Multi-threaded execution}  \subsubsection{Multi-threaded execution}
863  \label{sec:multi-threaded-execution}  \label{sect:multi-threaded-execution}
864  Prior to transferring control to the procedure {\em THE\_MODEL\_MAIN()} the  Prior to transferring control to the procedure {\em THE\_MODEL\_MAIN()} the
865  WRAPPER may cause several coarse grain threads to be initialized. The routine  WRAPPER may cause several coarse grain threads to be initialized. The routine
866  {\em THE\_MODEL\_MAIN()} is called once for each thread and is passed a single  {\em THE\_MODEL\_MAIN()} is called once for each thread and is passed a single
867  stack argument which is the thread number, stored in the  stack argument which is the thread number, stored in the
868  variable {\em myThid}. In addition to specifying a decomposition with  variable {\em myThid}. In addition to specifying a decomposition with
869  multiple tiles per process ( see section \ref{sec:specifying_a_decomposition})  multiple tiles per process ( see section \ref{sect:specifying_a_decomposition})
870  configuring and starting a code to run using multiple threads requires the following  configuring and starting a code to run using multiple threads requires the following
871  steps.\\  steps.\\
872    
# Line 917  File: {\em eesupp/inc/MAIN\_PDIRECTIVES1 Line 935  File: {\em eesupp/inc/MAIN\_PDIRECTIVES1
935  File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\  File: {\em eesupp/inc/MAIN\_PDIRECTIVES2.h}\\
936  File: {\em model/src/THE\_MODEL\_MAIN.F}\\  File: {\em model/src/THE\_MODEL\_MAIN.F}\\
937  File: {\em eesupp/src/MAIN.F}\\  File: {\em eesupp/src/MAIN.F}\\
938  File: {\em tools/genmake}\\  File: {\em tools/genmake2}\\
939  File: {\em eedata}\\  File: {\em eedata}\\
940  CPP:  {\em TARGET\_SUN}\\  CPP:  {\em TARGET\_SUN}\\
941  CPP:  {\em TARGET\_DEC}\\  CPP:  {\em TARGET\_DEC}\\
# Line 930  Parameter:  {\em nTy} Line 948  Parameter:  {\em nTy}
948  } \\  } \\
949    
950  \subsubsection{Multi-process execution}  \subsubsection{Multi-process execution}
951  \label{sec:multi-process-execution}  \label{sect:multi-process-execution}
952    
953  Despite its appealing programming model, multi-threaded execution remains  Despite its appealing programming model, multi-threaded execution remains
954  less common then multi-process execution. One major reason for this  less common then multi-process execution. One major reason for this
# Line 942  models varies between systems. Line 960  models varies between systems.
960    
961  Multi-process execution is more ubiquitous.  Multi-process execution is more ubiquitous.
962  In order to run code in a multi-process configuration a decomposition  In order to run code in a multi-process configuration a decomposition
963  specification ( see section \ref{sec:specifying_a_decomposition})  specification ( see section \ref{sect:specifying_a_decomposition})
964  is given ( in which the at least one of the  is given ( in which the at least one of the
965  parameters {\em nPx} or {\em nPy} will be greater than one)  parameters {\em nPx} or {\em nPy} will be greater than one)
966  and then, as for multi-threaded operation,  and then, as for multi-threaded operation,
# Line 956  critical communication. However, in orde Line 974  critical communication. However, in orde
974  of controlling and coordinating the start up of a large number  of controlling and coordinating the start up of a large number
975  (hundreds and possibly even thousands) of copies of the same  (hundreds and possibly even thousands) of copies of the same
976  program, MPI is used. The calls to the MPI multi-process startup  program, MPI is used. The calls to the MPI multi-process startup
977  routines must be activated at compile time. This is done  routines must be activated at compile time.  Currently MPI libraries are
978  by setting the {\em ALLOW\_USE\_MPI} and {\em ALWAYS\_USE\_MPI}  invoked by
979  flags in the {\em CPP\_EEOPTIONS.h} file.\\  specifying the appropriate options file with the
980    \begin{verbatim}-of\end{verbatim} flag when running the {\em genmake2}
981    script, which generates the Makefile for compiling and linking MITgcm.
982    (Previously this was done by setting the {\em ALLOW\_USE\_MPI} and
983    {\em ALWAYS\_USE\_MPI} flags in the {\em CPP\_EEOPTIONS.h} file.)  More
984    detailed information about the use of {\em genmake2} for specifying
985    local compiler flags is located in section 3 ??\\  
986    
 \fbox{  
 \begin{minipage}{4.75in}  
 File: {\em eesupp/inc/CPP\_EEOPTIONS.h}\\  
 CPP:  {\em ALLOW\_USE\_MPI}\\  
 CPP:  {\em ALWAYS\_USE\_MPI}\\  
 Parameter:  {\em nPx}\\  
 Parameter:  {\em nPy}  
 \end{minipage}  
 } \\  
   
 Additionally, compile time options are required to link in the  
 MPI libraries and header files. Examples of these options  
 can be found in the {\em genmake} script that creates makefiles  
 for compilation. When this script is executed with the {bf -mpi}  
 flag it will generate a makefile that includes  
 paths for search for MPI head files and for linking in  
 MPI libraries. For example the {\bf -mpi} flag on a  
  Silicon Graphics IRIX system causes a  
 Makefile with the compilation command  
 Graphics IRIX system \begin{verbatim}  
 mpif77 -I/usr/local/mpi/include -DALLOW_USE_MPI -DALWAYS_USE_MPI  
 \end{verbatim}  
 to be generated.  
 This is the correct set of options for using the MPICH open-source  
 version of MPI, when it has been installed under the subdirectory  
 /usr/local/mpi.  
 However, on many systems there may be several  
 versions of MPI installed. For example many systems have both  
 the open source MPICH set of libraries and a vendor specific native form  
 of the MPI libraries. The correct setup to use will depend on the  
 local configuration of your system.\\  
987    
988  \fbox{  \fbox{
989  \begin{minipage}{4.75in}  \begin{minipage}{4.75in}
990  File: {\em tools/genmake}  File: {\em tools/genmake2}
991  \end{minipage}  \end{minipage}
992  } \\  } \\
993  \paragraph{\bf Execution} The mechanics of starting a program in  \paragraph{\bf Execution} The mechanics of starting a program in
# Line 1006  using a command such as Line 999  using a command such as
999  \begin{verbatim}  \begin{verbatim}
1000  mpirun -np 64 -machinefile mf ./mitgcmuv  mpirun -np 64 -machinefile mf ./mitgcmuv
1001  \end{verbatim}  \end{verbatim}
1002  In this example the text {\em -np 64} specifices the number of processes  In this example the text {\em -np 64} specifies the number of processes
1003  that will be created. The numeric value {\em 64} must be equal to the  that will be created. The numeric value {\em 64} must be equal to the
1004  product of the processor grid settings of {\em nPx} and {\em nPy}  product of the processor grid settings of {\em nPx} and {\em nPy}
1005  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
# Line 1071  are also set in this routine. These are Line 1064  are also set in this routine. These are
1064  processes holding tiles to the west, east, south and north  processes holding tiles to the west, east, south and north
1065  of this process. These values are stored in global storage  of this process. These values are stored in global storage
1066  in the header file {\em EESUPPORT.h} for use by  in the header file {\em EESUPPORT.h} for use by
1067  communication routines.  communication routines.  The above does not hold when the
1068    exch2 package is used -- exch2 sets its own parameters to
1069    specify the global indices of tiles and their relationships
1070    to each other.
1071  \\  \\
1072    
1073  \fbox{  \fbox{
# Line 1109  This information is held in the variable Line 1105  This information is held in the variable
1105  This latter set of variables can take one of the following values  This latter set of variables can take one of the following values
1106  {\em COMM\_NONE}, {\em COMM\_MSG}, {\em COMM\_PUT} and {\em COMM\_GET}.  {\em COMM\_NONE}, {\em COMM\_MSG}, {\em COMM\_PUT} and {\em COMM\_GET}.
1107  A value of {\em COMM\_NONE} is used to indicate that a tile has no  A value of {\em COMM\_NONE} is used to indicate that a tile has no
1108  neighbor to cummnicate with on a particular face. A value  neighbor to communicate with on a particular face. A value
1109  of {\em COMM\_MSG} is used to indicated that some form of distributed  of {\em COMM\_MSG} is used to indicated that some form of distributed
1110  memory communication is required to communicate between  memory communication is required to communicate between
1111  these tile faces ( see section \ref{sec:distributed_memory_communication}).  these tile faces ( see section \ref{sect:distributed_memory_communication}).
1112  A value of {\em COMM\_PUT} or {\em COMM\_GET} is used to indicate  A value of {\em COMM\_PUT} or {\em COMM\_GET} is used to indicate
1113  forms of shared memory communication ( see section  forms of shared memory communication ( see section
1114  \ref{sec:shared_memory_communication}). The {\em COMM\_PUT} value indicates  \ref{sect:shared_memory_communication}). The {\em COMM\_PUT} value indicates
1115  that a CPU should communicate by writing to data structures owned by another  that a CPU should communicate by writing to data structures owned by another
1116  CPU. A {\em COMM\_GET} value indicates that a CPU should communicate by reading  CPU. A {\em COMM\_GET} value indicates that a CPU should communicate by reading
1117  from data structures owned by another CPU. These flags affect the behavior  from data structures owned by another CPU. These flags affect the behavior
# Line 1166  the product of the parameters {\em nTx} Line 1162  the product of the parameters {\em nTx}
1162  are read from the file {\em eedata}. If the value of {\em nThreads}  are read from the file {\em eedata}. If the value of {\em nThreads}
1163  is inconsistent with the number of threads requested from the  is inconsistent with the number of threads requested from the
1164  operating system (for example by using an environment  operating system (for example by using an environment
1165  varialble as described in section \ref{sec:multi_threaded_execution})  variable as described in section \ref{sect:multi_threaded_execution})
1166  then usually an error will be reported by the routine  then usually an error will be reported by the routine
1167  {\em CHECK\_THREADS}.\\  {\em CHECK\_THREADS}.\\
1168    
# Line 1184  Parameter: {\em nTy} \\ Line 1180  Parameter: {\em nTy} \\
1180  }  }
1181    
1182  \item {\bf memsync flags}  \item {\bf memsync flags}
1183  As discussed in section \ref{sec:memory_consistency}, when using shared memory,  As discussed in section \ref{sect:memory_consistency}, when using shared memory,
1184  a low-level system function may be need to force memory consistency.  a low-level system function may be need to force memory consistency.
1185  The routine {\em MEMSYNC()} is used for this purpose. This routine should  The routine {\em MEMSYNC()} is used for this purpose. This routine should
1186  not need modifying and the information below is only provided for  not need modifying and the information below is only provided for
# Line 1200  For an Ultra Sparc system the following Line 1196  For an Ultra Sparc system the following
1196  \begin{verbatim}  \begin{verbatim}
1197  asm("membar #LoadStore|#StoreStore");  asm("membar #LoadStore|#StoreStore");
1198  \end{verbatim}  \end{verbatim}
1199  for an Alpha based sytem the euivalent code reads  for an Alpha based system the equivalent code reads
1200  \begin{verbatim}  \begin{verbatim}
1201  asm("mb");  asm("mb");
1202  \end{verbatim}  \end{verbatim}
# Line 1210  asm("lock; addl $0,0(%%esp)": : :"memory Line 1206  asm("lock; addl $0,0(%%esp)": : :"memory
1206  \end{verbatim}  \end{verbatim}
1207    
1208  \item {\bf Cache line size}  \item {\bf Cache line size}
1209  As discussed in section \ref{sec:cache_effects_and_false_sharing},  As discussed in section \ref{sect:cache_effects_and_false_sharing},
1210  milti-threaded codes explicitly avoid penalties associated with excessive  milti-threaded codes explicitly avoid penalties associated with excessive
1211  coherence traffic on an SMP system. To do this the sgared memory data structures  coherence traffic on an SMP system. To do this the shared memory data structures
1212  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines
1213  are padded. The variables that control the padding are set in the  are padded. The variables that control the padding are set in the
1214  header file {\em EEPARAMS.h}. These variables are called  header file {\em EEPARAMS.h}. These variables are called
# Line 1220  header file {\em EEPARAMS.h}. These vari Line 1216  header file {\em EEPARAMS.h}. These vari
1216  {\em lShare8}. The default values should not normally need changing.  {\em lShare8}. The default values should not normally need changing.
1217  \item {\bf \_BARRIER}  \item {\bf \_BARRIER}
1218  This is a CPP macro that is expanded to a call to a routine  This is a CPP macro that is expanded to a call to a routine
1219  which synchronises all the logical processors running under the  which synchronizes all the logical processors running under the
1220  WRAPPER. Using a macro here preserves flexibility to insert  WRAPPER. Using a macro here preserves flexibility to insert
1221  a specialized call in-line into application code. By default this  a specialized call in-line into application code. By default this
1222  resolves to calling the procedure {\em BARRIER()}. The default  resolves to calling the procedure {\em BARRIER()}. The default
# Line 1228  setting for the \_BARRIER macro is given Line 1224  setting for the \_BARRIER macro is given
1224    
1225  \item {\bf \_GSUM}  \item {\bf \_GSUM}
1226  This is a CPP macro that is expanded to a call to a routine  This is a CPP macro that is expanded to a call to a routine
1227  which sums up a floating point numner  which sums up a floating point number
1228  over all the logical processors running under the  over all the logical processors running under the
1229  WRAPPER. Using a macro here provides extra flexibility to insert  WRAPPER. Using a macro here provides extra flexibility to insert
1230  a specialized call in-line into application code. By default this  a specialized call in-line into application code. By default this
1231  resolves to calling the procedure {\em GLOBAL\_SOM\_R8()} ( for  resolves to calling the procedure {\em GLOBAL\_SUM\_R8()} ( for
1232  84=bit floating point operands)  64-bit floating point operands)
1233  or {\em GLOBAL\_SOM\_R4()} (for 32-bit floating point operands). The default  or {\em GLOBAL\_SUM\_R4()} (for 32-bit floating point operands). The default
1234  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.
1235  The \_GSUM macro is a performance critical operation, especially for  The \_GSUM macro is a performance critical operation, especially for
1236  large processor count, small tile size configurations.  large processor count, small tile size configurations.
1237  The custom communication example discussed in section \ref{sec:jam_example}  The custom communication example discussed in section \ref{sect:jam_example}
1238  shows how the macro is used to invoke a custom global sum routine  shows how the macro is used to invoke a custom global sum routine
1239  for a specific set of hardware.  for a specific set of hardware.
1240    
# Line 1252  physical fields and whether fields are 3 Line 1248  physical fields and whether fields are 3
1248  in the header file {\em CPP\_EEMACROS.h}. As with \_GSUM, the  in the header file {\em CPP\_EEMACROS.h}. As with \_GSUM, the
1249  \_EXCH operation plays a crucial role in scaling to small tile,  \_EXCH operation plays a crucial role in scaling to small tile,
1250  large logical and physical processor count configurations.  large logical and physical processor count configurations.
1251  The example in section \ref{sec:jam_example} discusses defining an  The example in section \ref{sect:jam_example} discusses defining an
1252  optimised and specialized form on the \_EXCH operation.  optimized and specialized form on the \_EXCH operation.
1253    
1254  The \_EXCH operation is also central to supporting grids such as  The \_EXCH operation is also central to supporting grids such as
1255  the cube-sphere grid. In this class of grid a rotation may be required  the cube-sphere grid. In this class of grid a rotation may be required
1256  between tiles. Aligning the coordinate requiring rotation with the  between tiles. Aligning the coordinate requiring rotation with the
1257  tile decomposistion, allows the coordinate transformation to  tile decomposition, allows the coordinate transformation to
1258  be embedded within a custom form of the \_EXCH primitive.  be embedded within a custom form of the \_EXCH primitive.
1259    
1260  \item {\bf Reverse Mode}  \item {\bf Reverse Mode}
1261  The communication primitives \_EXCH and \_GSUM both employ  The communication primitives \_EXCH and \_GSUM both employ
1262  hand-written adjoint forms (or reverse mode) forms.  hand-written adjoint forms (or reverse mode) forms.
1263  These reverse mode forms can be found in the  These reverse mode forms can be found in the
1264  sourc code directory {\em pkg/autodiff}.  source code directory {\em pkg/autodiff}.
1265  For the global sum primitive the reverse mode form  For the global sum primitive the reverse mode form
1266  calls are to {\em GLOBAL\_ADSUM\_R4} and  calls are to {\em GLOBAL\_ADSUM\_R4} and
1267  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the
1268  exchamge primitives are found in routines  exchange primitives are found in routines
1269  prefixed {\em ADEXCH}. The exchange routines make calls to  prefixed {\em ADEXCH}. The exchange routines make calls to
1270  the same low-level communication primitives as the forward mode  the same low-level communication primitives as the forward mode
1271  operations. However, the routine argument {\em simulationMode}  operations. However, the routine argument {\em simulationMode}
# Line 1281  The variable {\em MAX\_NO\_THREADS} is u Line 1277  The variable {\em MAX\_NO\_THREADS} is u
1277  maximum number of OS threads that a code will use. This  maximum number of OS threads that a code will use. This
1278  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.
1279  For single threaded execution it can be reduced to one if required.  For single threaded execution it can be reduced to one if required.
1280  The va;lue is largely private to the WRAPPER and application code  The value; is largely private to the WRAPPER and application code
1281  will nor normally reference the value, except in the following scenario.  will nor normally reference the value, except in the following scenario.
1282    
1283  For certain physical parametrization schemes it is necessary to have  For certain physical parametrization schemes it is necessary to have
# Line 1292  This can be achieved using a Fortran 90 Line 1288  This can be achieved using a Fortran 90
1288  if this might be unavailable then the work arrays can be extended  if this might be unavailable then the work arrays can be extended
1289  with dimensions use the tile dimensioning scheme of {\em nSx}  with dimensions use the tile dimensioning scheme of {\em nSx}
1290  and {\em nSy} ( as described in section  and {\em nSy} ( as described in section
1291  \ref{sec:specifying_a_decomposition}). However, if the configuration  \ref{sect:specifying_a_decomposition}). However, if the configuration
1292  being specified involves many more tiles than OS threads then  being specified involves many more tiles than OS threads then
1293  it can save memory resources to reduce the variable  it can save memory resources to reduce the variable
1294  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that
1295  will be used and to declare the physical parameterisation  will be used and to declare the physical parameterization
1296  work arrays with a sinble {\em MAX\_NO\_THREADS} extra dimension.  work arrays with a single {\em MAX\_NO\_THREADS} extra dimension.
1297  An example of this is given in the verification experiment  An example of this is given in the verification experiment
1298  {\em aim.5l\_cs}. Here the default setting of  {\em aim.5l\_cs}. Here the default setting of
1299  {\em MAX\_NO\_THREADS} is altered to  {\em MAX\_NO\_THREADS} is altered to
# Line 1310  created with declarations of the form. Line 1306  created with declarations of the form.
1306  \begin{verbatim}  \begin{verbatim}
1307        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)
1308  \end{verbatim}  \end{verbatim}
1309  This declaration scheme is not used widely, becuase most global data  This declaration scheme is not used widely, because most global data
1310  is used for permanent not temporary storage of state information.  is used for permanent not temporary storage of state information.
1311  In the case of permanent state information this approach cannot be used  In the case of permanent state information this approach cannot be used
1312  because there has to be enough storage allocated for all tiles.  because there has to be enough storage allocated for all tiles.
1313  However, the technique can sometimes be a useful scheme for reducing memory  However, the technique can sometimes be a useful scheme for reducing memory
1314  requirements in complex physical paramterisations.  requirements in complex physical parameterizations.
1315  \end{enumerate}  \end{enumerate}
1316    
1317  \begin{figure}  \begin{figure}
# Line 1348  MP directives to spawn multiple threads. Line 1344  MP directives to spawn multiple threads.
1344  The isolation of performance critical communication primitives and the  The isolation of performance critical communication primitives and the
1345  sub-division of the simulation domain into tiles is a powerful tool.  sub-division of the simulation domain into tiles is a powerful tool.
1346  Here we show how it can be used to improve application performance and  Here we show how it can be used to improve application performance and
1347  how it can be used to adapt to new gridding approaches.  how it can be used to adapt to new griding approaches.
1348    
1349  \subsubsection{JAM example}  \subsubsection{JAM example}
1350  \label{sec:jam_example}  \label{sect:jam_example}
1351  On some platforms a big performance boost can be obtained by  On some platforms a big performance boost can be obtained by
1352  binding the communication routines {\em \_EXCH} and  binding the communication routines {\em \_EXCH} and
1353  {\em \_GSUM} to specialized native libraries ) fro example the  {\em \_GSUM} to specialized native libraries ) fro example the
# Line 1367  communications library ( see {\em ini\_j Line 1363  communications library ( see {\em ini\_j
1363  \item The {\em \_GSUM} and {\em \_EXCH} macro definitions are replaced  \item The {\em \_GSUM} and {\em \_EXCH} macro definitions are replaced
1364  with calls to custom routines ( see {\em gsum\_jam.F} and {\em exch\_jam.F})  with calls to custom routines ( see {\em gsum\_jam.F} and {\em exch\_jam.F})
1365  \item a highly specialized form of the exchange operator (optimized  \item a highly specialized form of the exchange operator (optimized
1366  for overlap regions of width one) is substitued into the elliptic  for overlap regions of width one) is substituted into the elliptic
1367  solver routine {\em cg2d.F}.  solver routine {\em cg2d.F}.
1368  \end{itemize}  \end{itemize}
1369  Developing specialized code for other libraries follows a similar  Developing specialized code for other libraries follows a similar
1370  pattern.  pattern.
1371    
1372  \subsubsection{Cube sphere communication}  \subsubsection{Cube sphere communication}
1373  \label{sec:cube_sphere_communication}  \label{sect:cube_sphere_communication}
1374  Actual {\em \_EXCH} routine code is generated automatically from  Actual {\em \_EXCH} routine code is generated automatically from
1375  a series of template files, for example {\em exch\_rx.template}.  a series of template files, for example {\em exch\_rx.template}.
1376  This is done to allow a large number of variations on the exchange  This is done to allow a large number of variations on the exchange
1377  process to be maintained. One set of variations supports the  process to be maintained. One set of variations supports the
1378  cube sphere grid. Support for a cube sphere gris in MITgcm is based  cube sphere grid. Support for a cube sphere grid in MITgcm is based
1379  on having each face of the cube as a separate tile (or tiles).  on having each face of the cube as a separate tile (or tiles).
1380  The exchage routines are then able to absorb much of the  The exchange routines are then able to absorb much of the
1381  detailed rotation and reorientation required when moving around the  detailed rotation and reorientation required when moving around the
1382  cube grid. The set of {\em \_EXCH} routines that contain the  cube grid. The set of {\em \_EXCH} routines that contain the
1383  word cube in their name perform these transformations.  word cube in their name perform these transformations.
1384  They are invoked when the run-time logical parameter  They are invoked when the run-time logical parameter
1385  {\em useCubedSphereExchange} is set true. To facilitate the  {\em useCubedSphereExchange} is set true. To facilitate the
1386  transformations on a staggered C-grid, exchange operations are defined  transformations on a staggered C-grid, exchange operations are defined
1387  separately for both vector and scalar quantitities and for  separately for both vector and scalar quantities and for
1388  grid-centered and for grid-face and corner quantities.  grid-centered and for grid-face and corner quantities.
1389  Three sets of exchange routines are defined. Routines  Three sets of exchange routines are defined. Routines
1390  with names of the form {\em exch\_rx} are used to exchange  with names of the form {\em exch\_rx} are used to exchange
# Line 1407  quantities at the C-grid vorticity point Line 1403  quantities at the C-grid vorticity point
1403    
1404  Fitting together the WRAPPER elements, package elements and  Fitting together the WRAPPER elements, package elements and
1405  MITgcm core equation elements of the source code produces calling  MITgcm core equation elements of the source code produces calling
1406  sequence shown in section \ref{sec:calling_sequence}  sequence shown in section \ref{sect:calling_sequence}
1407    
1408  \subsection{Annotated call tree for MITgcm and WRAPPER}  \subsection{Annotated call tree for MITgcm and WRAPPER}
1409  \label{sec:calling_sequence}  \label{sect:calling_sequence}
1410    
1411  WRAPPER layer.  WRAPPER layer.
1412    
1413    {\footnotesize
1414  \begin{verbatim}  \begin{verbatim}
1415    
1416         MAIN           MAIN  
# Line 1441  WRAPPER layer. Line 1438  WRAPPER layer.
1438         |--THE_MODEL_MAIN   :: Numerical code top-level driver routine         |--THE_MODEL_MAIN   :: Numerical code top-level driver routine
1439    
1440  \end{verbatim}  \end{verbatim}
1441    }
1442    
1443  Core equations plus packages.  Core equations plus packages.
1444    
1445    {\footnotesize
1446  \begin{verbatim}  \begin{verbatim}
1447  C  C
1448  C  C
# Line 1453  C  : Line 1452  C  :
1452  C  |  C  |
1453  C  |-THE_MODEL_MAIN :: Primary driver for the MITgcm algorithm  C  |-THE_MODEL_MAIN :: Primary driver for the MITgcm algorithm
1454  C    |              :: Called from WRAPPER level numerical  C    |              :: Called from WRAPPER level numerical
1455  C    |              :: code innvocation routine. On entry  C    |              :: code invocation routine. On entry
1456  C    |              :: to THE_MODEL_MAIN separate thread and  C    |              :: to THE_MODEL_MAIN separate thread and
1457  C    |              :: separate processes will have been established.  C    |              :: separate processes will have been established.
1458  C    |              :: Each thread and process will have a unique ID  C    |              :: Each thread and process will have a unique ID
# Line 1467  C    | |-INI_PARMS :: Routine to set ker Line 1466  C    | |-INI_PARMS :: Routine to set ker
1466  C    | |           :: By default kernel parameters are read from file  C    | |           :: By default kernel parameters are read from file
1467  C    | |           :: "data" in directory in which code executes.  C    | |           :: "data" in directory in which code executes.
1468  C    | |  C    | |
1469  C    | |-MON_INIT :: Initialises monitor pacakge ( see pkg/monitor )  C    | |-MON_INIT :: Initializes monitor package ( see pkg/monitor )
1470  C    | |  C    | |
1471  C    | |-INI_GRID :: Control grid array (vert. and hori.) initialisation.  C    | |-INI_GRID :: Control grid array (vert. and hori.) initialization.
1472  C    | | |        :: Grid arrays are held and described in GRID.h.  C    | | |        :: Grid arrays are held and described in GRID.h.
1473  C    | | |  C    | | |
1474  C    | | |-INI_VERTICAL_GRID        :: Initialise vertical grid arrays.  C    | | |-INI_VERTICAL_GRID        :: Initialize vertical grid arrays.
1475  C    | | |  C    | | |
1476  C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialisation  C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialization
1477  C    | | |                          :: (calculate grid from kernel parameters).  C    | | |                          :: (calculate grid from kernel parameters).
1478  C    | | |  C    | | |
1479  C    | | |-INI_SPHERICAL_POLAR_GRID :: Spherical polar horiz. grid  C    | | |-INI_SPHERICAL_POLAR_GRID :: Spherical polar horiz. grid
1480  C    | | |                          :: initialisation (calculate grid from  C    | | |                          :: initialization (calculate grid from
1481  C    | | |                          :: kernel parameters).  C    | | |                          :: kernel parameters).
1482  C    | | |  C    | | |
1483  C    | | |-INI_CURVILINEAR_GRID     :: General orthogonal, structured horiz.  C    | | |-INI_CURVILINEAR_GRID     :: General orthogonal, structured horiz.
1484  C    | |                            :: grid initialisations. ( input from raw  C    | |                            :: grid initializations. ( input from raw
1485  C    | |                            :: grid files, LONC.bin, DXF.bin etc... )  C    | |                            :: grid files, LONC.bin, DXF.bin etc... )
1486  C    | |  C    | |
1487  C    | |-INI_DEPTHS    :: Read (from "bathyFile") or set bathymetry/orgography.  C    | |-INI_DEPTHS    :: Read (from "bathyFile") or set bathymetry/orgography.
# Line 1493  C    | | Line 1492  C    | |
1492  C    | |-INI_LINEAR_PHSURF :: Set ref. surface Bo_surf  C    | |-INI_LINEAR_PHSURF :: Set ref. surface Bo_surf
1493  C    | |  C    | |
1494  C    | |-INI_CORI          :: Set coriolis term. zero, f-plane, beta-plane,  C    | |-INI_CORI          :: Set coriolis term. zero, f-plane, beta-plane,
1495  C    | |                   :: sphere optins are coded.  C    | |                   :: sphere options are coded.
1496  C    | |  C    | |
1497  C    | |-PACAKGES_BOOT      :: Start up the optional package environment.  C    | |-PACAKGES_BOOT      :: Start up the optional package environment.
1498  C    | |                    :: Runtime selection of active packages.  C    | |                    :: Runtime selection of active packages.
# Line 1514  C    | | Line 1513  C    | |
1513  C    | |-PACKAGES_CHECK  C    | |-PACKAGES_CHECK
1514  C    | | |  C    | | |
1515  C    | | |-KPP_CHECK           :: KPP Package. pkg/kpp  C    | | |-KPP_CHECK           :: KPP Package. pkg/kpp
1516  C    | | |-OBCS_CHECK          :: Open bndy Pacakge. pkg/obcs  C    | | |-OBCS_CHECK          :: Open bndy Package. pkg/obcs
1517  C    | | |-GMREDI_CHECK        :: GM Package. pkg/gmredi  C    | | |-GMREDI_CHECK        :: GM Package. pkg/gmredi
1518  C    | |  C    | |
1519  C    | |-PACKAGES_INIT_FIXED  C    | |-PACKAGES_INIT_FIXED
# Line 1534  C    | Line 1533  C    |
1533  C    |-CTRL_UNPACK :: Control vector support package. see pkg/ctrl  C    |-CTRL_UNPACK :: Control vector support package. see pkg/ctrl
1534  C    |  C    |
1535  C    |-ADTHE_MAIN_LOOP :: Derivative evaluating form of main time stepping loop  C    |-ADTHE_MAIN_LOOP :: Derivative evaluating form of main time stepping loop
1536  C    !                 :: Auotmatically gerenrated by TAMC/TAF.  C    !                 :: Auotmatically generated by TAMC/TAF.
1537  C    |  C    |
1538  C    |-CTRL_PACK   :: Control vector support package. see pkg/ctrl  C    |-CTRL_PACK   :: Control vector support package. see pkg/ctrl
1539  C    |  C    |
# Line 1548  C    | | | Line 1547  C    | | |
1547  C    | | |-INI_LINEAR_PHISURF :: Set ref. surface Bo_surf  C    | | |-INI_LINEAR_PHISURF :: Set ref. surface Bo_surf
1548  C    | | |  C    | | |
1549  C    | | |-INI_CORI     :: Set coriolis term. zero, f-plane, beta-plane,  C    | | |-INI_CORI     :: Set coriolis term. zero, f-plane, beta-plane,
1550  C    | | |              :: sphere optins are coded.  C    | | |              :: sphere options are coded.
1551  C    | | |  C    | | |
1552  C    | | |-INI_CG2D     :: 2d con. grad solver initialisation.  C    | | |-INI_CG2D     :: 2d con. grad solver initialisation.
1553  C    | | |-INI_CG3D     :: 3d con. grad solver initialisation.  C    | | |-INI_CG3D     :: 3d con. grad solver initialisation.
# Line 1556  C    | | |-INI_MIXING   :: Initialise di Line 1555  C    | | |-INI_MIXING   :: Initialise di
1555  C    | | |-INI_DYNVARS  :: Initialise to zero all DYNVARS.h arrays (dynamical  C    | | |-INI_DYNVARS  :: Initialise to zero all DYNVARS.h arrays (dynamical
1556  C    | | |              :: fields).  C    | | |              :: fields).
1557  C    | | |  C    | | |
1558  C    | | |-INI_FIELDS   :: Control initialising model fields to non-zero  C    | | |-INI_FIELDS   :: Control initializing model fields to non-zero
1559  C    | | | |-INI_VEL    :: Initialize 3D flow field.  C    | | | |-INI_VEL    :: Initialize 3D flow field.
1560  C    | | | |-INI_THETA  :: Set model initial temperature field.  C    | | | |-INI_THETA  :: Set model initial temperature field.
1561  C    | | | |-INI_SALT   :: Set model initial salinity field.  C    | | | |-INI_SALT   :: Set model initial salinity field.
# Line 1634  C/\  | | |-CALC_EXACT_ETA :: Change SSH Line 1633  C/\  | | |-CALC_EXACT_ETA :: Change SSH
1633  C/\  | | |-CALC_SURF_DR   :: Calculate the new surface level thickness.  C/\  | | |-CALC_SURF_DR   :: Calculate the new surface level thickness.
1634  C/\  | | |-EXF_GETFORCING :: External forcing package. ( pkg/exf )  C/\  | | |-EXF_GETFORCING :: External forcing package. ( pkg/exf )
1635  C/\  | | |-EXTERNAL_FIELDS_LOAD :: Control loading time dep. external data.  C/\  | | |-EXTERNAL_FIELDS_LOAD :: Control loading time dep. external data.
1636  C/\  | | | |                    :: Simple interpolcation between end-points  C/\  | | | |                    :: Simple interpolation between end-points
1637  C/\  | | | |                    :: for forcing datasets.  C/\  | | | |                    :: for forcing datasets.
1638  C/\  | | | |                    C/\  | | | |                  
1639  C/\  | | | |-EXCH :: Sync forcing. in overlap regions.  C/\  | | | |-EXCH :: Sync forcing. in overlap regions.
# Line 1782  C    |-COMM_STATS     :: Summarise inter Line 1781  C    |-COMM_STATS     :: Summarise inter
1781  C                     :: events.  C                     :: events.
1782  C  C
1783  \end{verbatim}  \end{verbatim}
1784    }
1785    
1786  \subsection{Measuring and Characterizing Performance}  \subsection{Measuring and Characterizing Performance}
1787    

Legend:
Removed from v.1.3  
changed lines
  Added in v.1.13

  ViewVC Help
Powered by ViewVC 1.1.22