/[MITgcm]/manual/s_software/text/sarch.tex

Diff of /manual/s_software/text/sarch.tex

Parent Directory | Revision Log | View Revision Graph Revision Graph | View Patch Patch

-revision 1.2 by adcroft,
Thu Oct 11 19:12:38 2001 UTC
+revision 1.5 by cnh,
Tue Nov 13 18:32:33 2001 UTC
 Line 41 
 packages operate.
  This chapter focuses on describing the {\bf WRAPPER} environment under which
  both the core numerics and the pluggable packages function. The description
- presented here is intended to be a detailed exposistion and contains significant
+ presented here is intended to be a detailed exposition and contains significant
  background material, as well as advanced details on working with the WRAPPER.
  The examples section of this manual (part \ref{part:example}) contains more
  succinct, step-by-step instructions on running basic numerical
 Line 89 
 and operating systems. This allows numer
  \caption{
  Numerical code is written too fit within a software support
  infrastructure called WRAPPER. The WRAPPER is portable and
- can be sepcialized for a wide range of specific target hardware and
+ can be specialized for a wide range of specific target hardware and
  programming environments, without impacting numerical code that fits
  within the WRAPPER. Codes that fit within the WRAPPER can generally be
  made to run as fast on a particular platform as codes specially
 Line 136 
 particular machine (for example an IBM S
  class of machines (for example Parallel Vector Processor Systems). Instead the
  WRAPPER provides applications with an
  abstract {\it machine model}. The machine model is very general, however, it can
- easily be specialized to fit, in a computationally effificent manner, any
+ easily be specialized to fit, in a computationally efficient manner, any
  computer architecture currently available to the scientific computing community.
  \subsection{Machine model parallelism}
   Codes operating under the WRAPPER target an abstract machine that is assumed to
  consist of one or more logical processors that can compute concurrently.
- Computational work is divided amongst the logical
+ Computational work is divided among the logical
  processors by allocating ``ownership'' to
  each processor of a certain set (or sets) of calculations. Each set of
  calculations owned by a particular processor is associated with a specific
 Line 166 
 Computationally, associated with each re
  space allocated to a particular logical processor, there will be data
  structures (arrays, scalar variables etc...) that hold the simulated state of
  that region. We refer to these data structures as being {\bf owned} by the
- pprocessor to which their
+ processor to which their
  associated region of physical space has been allocated. Individual
  regions that are allocated to processors are called {\bf tiles}. A
  processor can own more
 Line 402 
 highly optimized library.
    \includegraphics{part4/comm-primm.eps}
   }
  \end{center}
- \caption{Three performance critical parallel primititives are provided
+ \caption{Three performance critical parallel primitives are provided
- by the WRAPPER. These primititives are always used to communicate data
+ by the WRAPPER. These primitives are always used to communicate data
  between tiles. The figure shows four tiles. The curved arrows indicate
  exchange primitives which transfer data between the overlap regions at tile
  edges and interior regions for nearest-neighbor tiles.
 Line 789 
 thirty-two grid points, and x and y over
  There are six tiles allocated to six separate logical processors ({\em nSx=6}).
  This set of values can be used for a cube sphere calculation.
  Each tile of size $32 \times 32$ represents a face of the
- cube. Initialising the tile connectivity correctly ( see section
+ cube. Initializing the tile connectivity correctly ( see section
  \ref{sec:cube_sphere_communication}. allows the rotations associated with
  moving between the six cube faces to be embedded within the
  tile-tile communication code.
 Line 842 
 occurs through the procedure {\em THE\_M
  \end{figure}
  \subsubsection{Multi-threaded execution}
+ \label{sec:multi-threaded-execution}
  Prior to transferring control to the procedure {\em THE\_MODEL\_MAIN()} the
  WRAPPER may cause several coarse grain threads to be initialized. The routine
  {\em THE\_MODEL\_MAIN()} is called once for each thread and is passed a single
-Line 929 
 Parameter:  {\em nTy}
+Line 930 
 Parameter:  {\em nTy}
  } \\
  \subsubsection{Multi-process execution}
+ \label{sec:multi-process-execution}
  Despite its appealing programming model, multi-threaded execution remains
  less common then multi-process execution. One major reason for this
-Line 940 
 models varies between systems.
+Line 942 
 models varies between systems.
  Multi-process execution is more ubiquitous.
  In order to run code in a multi-process configuration a decomposition
- specification is given ( in which the at least one of the
+ specification ( see section \ref{sec:specifying_a_decomposition})
+ is given ( in which the at least one of the
  parameters {\em nPx} or {\em nPy} will be greater than one)
  and then, as for multi-threaded operation,
  appropriate compile time and run time steps must be taken.
-Line 1003 
 using a command such as
+Line 1006 
 using a command such as
  \begin{verbatim}
  mpirun -np 64 -machinefile mf ./mitgcmuv
  \end{verbatim}
- In this example the text {\em -np 64} specifices the number of processes
+ In this example the text {\em -np 64} specifies the number of processes
  that will be created. The numeric value {\em 64} must be equal to the
  product of the processor grid settings of {\em nPx} and {\em nPy}
  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
-Line 1106 
 This information is held in the variable
+Line 1109 
 This information is held in the variable
  This latter set of variables can take one of the following values
  {\em COMM\_NONE}, {\em COMM\_MSG}, {\em COMM\_PUT} and {\em COMM\_GET}.
  A value of {\em COMM\_NONE} is used to indicate that a tile has no
- neighbor to cummnicate with on a particular face. A value
+ neighbor to communicate with on a particular face. A value
  of {\em COMM\_MSG} is used to indicated that some form of distributed
  memory communication is required to communicate between
  these tile faces ( see section \ref{sec:distributed_memory_communication}).
-Line 1163 
 the product of the parameters {\em nTx}
+Line 1166 
 the product of the parameters {\em nTx}
  are read from the file {\em eedata}. If the value of {\em nThreads}
  is inconsistent with the number of threads requested from the
  operating system (for example by using an environment
- varialble as described in section \ref{sec:multi_threaded_execution})
+ variable as described in section \ref{sec:multi_threaded_execution})
  then usually an error will be reported by the routine
  {\em CHECK\_THREADS}.\\
-Line 1197 
 For an Ultra Sparc system the following
+Line 1200 
 For an Ultra Sparc system the following
  \begin{verbatim}
  asm("membar #LoadStore|#StoreStore");
  \end{verbatim}
- for an Alpha based sytem the euivalent code reads
+ for an Alpha based system the equivalent code reads
  \begin{verbatim}
  asm("mb");
  \end{verbatim}
-Line 1209 
 asm("lock; addl $0,0(%%esp)": : :"memory
+Line 1212 
 asm("lock; addl $0,0(%%esp)": : :"memory
  \item {\bf Cache line size}
  As discussed in section \ref{sec:cache_effects_and_false_sharing},
  milti-threaded codes explicitly avoid penalties associated with excessive
- coherence traffic on an SMP system. To do this the sgared memory data structures
+ coherence traffic on an SMP system. To do this the shared memory data structures
  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines
  are padded. The variables that control the padding are set in the
  header file {\em EEPARAMS.h}. These variables are called
-Line 1217 
 header file {\em EEPARAMS.h}. These vari
+Line 1220 
 header file {\em EEPARAMS.h}. These vari
  {\em lShare8}. The default values should not normally need changing.
  \item {\bf \_BARRIER}
  This is a CPP macro that is expanded to a call to a routine
- which synchronises all the logical processors running under the
+ which synchronizes all the logical processors running under the
  WRAPPER. Using a macro here preserves flexibility to insert
  a specialized call in-line into application code. By default this
  resolves to calling the procedure {\em BARRIER()}. The default
-Line 1225 
 setting for the \_BARRIER macro is given
+Line 1228 
 setting for the \_BARRIER macro is given
  \item {\bf \_GSUM}
  This is a CPP macro that is expanded to a call to a routine
- which sums up a floating point numner
+ which sums up a floating point number
  over all the logical processors running under the
  WRAPPER. Using a macro here provides extra flexibility to insert
  a specialized call in-line into application code. By default this
- resolves to calling the procedure {\em GLOBAL\_SOM\_R8()} ( for
+ resolves to calling the procedure {\em GLOBAL\_SUM\_R8()} ( for
-=bit floating point operands)
+-bit floating point operands)
- or {\em GLOBAL\_SOM\_R4()} (for 32-bit floating point operands). The default
+ or {\em GLOBAL\_SUM\_R4()} (for 32-bit floating point operands). The default
  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.
  The \_GSUM macro is a performance critical operation, especially for
  large processor count, small tile size configurations.
-Line 1250 
 in the header file {\em CPP\_EEMACROS.h}
+Line 1253 
 in the header file {\em CPP\_EEMACROS.h}
  \_EXCH operation plays a crucial role in scaling to small tile,
  large logical and physical processor count configurations.
  The example in section \ref{sec:jam_example} discusses defining an
- optimised and specialized form on the \_EXCH operation.
+ optimized and specialized form on the \_EXCH operation.
  The \_EXCH operation is also central to supporting grids such as
  the cube-sphere grid. In this class of grid a rotation may be required
  between tiles. Aligning the coordinate requiring rotation with the
- tile decomposistion, allows the coordinate transformation to
+ tile decomposition, allows the coordinate transformation to
  be embedded within a custom form of the \_EXCH primitive.
  \item {\bf Reverse Mode}
  The communication primitives \_EXCH and \_GSUM both employ
  hand-written adjoint forms (or reverse mode) forms.
  These reverse mode forms can be found in the
- sourc code directory {\em pkg/autodiff}.
+ source code directory {\em pkg/autodiff}.
  For the global sum primitive the reverse mode form
  calls are to {\em GLOBAL\_ADSUM\_R4} and
  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the
- exchamge primitives are found in routines
+ exchange primitives are found in routines
  prefixed {\em ADEXCH}. The exchange routines make calls to
  the same low-level communication primitives as the forward mode
  operations. However, the routine argument {\em simulationMode}
-Line 1278 
 The variable {\em MAX\_NO\_THREADS} is u
+Line 1281 
 The variable {\em MAX\_NO\_THREADS} is u
  maximum number of OS threads that a code will use. This
  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.
  For single threaded execution it can be reduced to one if required.
- The va;lue is largely private to the WRAPPER and application code
+ The value; is largely private to the WRAPPER and application code
  will nor normally reference the value, except in the following scenario.
  For certain physical parametrization schemes it is necessary to have
-Line 1293 
 and {\em nSy} ( as described in section
+Line 1296 
 and {\em nSy} ( as described in section
  being specified involves many more tiles than OS threads then
  it can save memory resources to reduce the variable
  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that
- will be used and to declare the physical parameterisation
+ will be used and to declare the physical parameterization
- work arrays with a sinble {\em MAX\_NO\_THREADS} extra dimension.
+ work arrays with a single {\em MAX\_NO\_THREADS} extra dimension.
  An example of this is given in the verification experiment
  {\em aim.5l\_cs}. Here the default setting of
  {\em MAX\_NO\_THREADS} is altered to
-Line 1307 
 created with declarations of the form.
+Line 1310 
 created with declarations of the form.
  \begin{verbatim}
        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)
  \end{verbatim}
- This declaration scheme is not used widely, becuase most global data
+ This declaration scheme is not used widely, because most global data
  is used for permanent not temporary storage of state information.
  In the case of permanent state information this approach cannot be used
  because there has to be enough storage allocated for all tiles.
  However, the technique can sometimes be a useful scheme for reducing memory
- requirements in complex physical paramterisations.
+ requirements in complex physical parameterizations.
  \end{enumerate}
  \begin{figure}
-Line 1345 
 MP directives to spawn multiple threads.
+Line 1348 
 MP directives to spawn multiple threads.
  The isolation of performance critical communication primitives and the
  sub-division of the simulation domain into tiles is a powerful tool.
  Here we show how it can be used to improve application performance and
- how it can be used to adapt to new gridding approaches.
+ how it can be used to adapt to new griding approaches.
  \subsubsection{JAM example}
  \label{sec:jam_example}
-Line 1364 
 communications library ( see {\em ini\_j
+Line 1367 
 communications library ( see {\em ini\_j
  \item The {\em \_GSUM} and {\em \_EXCH} macro definitions are replaced
  with calls to custom routines ( see {\em gsum\_jam.F} and {\em exch\_jam.F})
  \item a highly specialized form of the exchange operator (optimized
- for overlap regions of width one) is substitued into the elliptic
+ for overlap regions of width one) is substituted into the elliptic
  solver routine {\em cg2d.F}.
  \end{itemize}
  Developing specialized code for other libraries follows a similar
-Line 1376 
 Actual {\em \_EXCH} routine code is gene
+Line 1379 
 Actual {\em \_EXCH} routine code is gene
  a series of template files, for example {\em exch\_rx.template}.
  This is done to allow a large number of variations on the exchange
  process to be maintained. One set of variations supports the
- cube sphere grid. Support for a cube sphere gris in MITgcm is based
+ cube sphere grid. Support for a cube sphere grid in MITgcm is based
  on having each face of the cube as a separate tile (or tiles).
- The exchage routines are then able to absorb much of the
+ The exchange routines are then able to absorb much of the
  detailed rotation and reorientation required when moving around the
  cube grid. The set of {\em \_EXCH} routines that contain the
  word cube in their name perform these transformations.
  They are invoked when the run-time logical parameter
  {\em useCubedSphereExchange} is set true. To facilitate the
  transformations on a staggered C-grid, exchange operations are defined
- separately for both vector and scalar quantitities and for
+ separately for both vector and scalar quantities and for
  grid-centered and for grid-face and corner quantities.
  Three sets of exchange routines are defined. Routines
  with names of the form {\em exch\_rx} are used to exchange
-Line 1450 
 C  :
+Line 1453 
 C  :
  C  |
  C  |-THE_MODEL_MAIN :: Primary driver for the MITgcm algorithm
  C    |              :: Called from WRAPPER level numerical
- C    |              :: code innvocation routine. On entry
+ C    |              :: code invocation routine. On entry
  C    |              :: to THE_MODEL_MAIN separate thread and
  C    |              :: separate processes will have been established.
  C    |              :: Each thread and process will have a unique ID
-Line 1464 
 C    | |-INI_PARMS :: Routine to set ker
+Line 1467 
 C    | |-INI_PARMS :: Routine to set ker
  C    | |           :: By default kernel parameters are read from file
  C    | |           :: "data" in directory in which code executes.
  C    | |
- C    | |-MON_INIT :: Initialises monitor pacakge ( see pkg/monitor )
+ C    | |-MON_INIT :: Initializes monitor package ( see pkg/monitor )
  C    | |
- C    | |-INI_GRID :: Control grid array (vert. and hori.) initialisation.
+ C    | |-INI_GRID :: Control grid array (vert. and hori.) initialization.
  C    | | |        :: Grid arrays are held and described in GRID.h.
  C    | | |
- C    | | |-INI_VERTICAL_GRID        :: Initialise vertical grid arrays.
+ C    | | |-INI_VERTICAL_GRID        :: Initialize vertical grid arrays.
  C    | | |
- C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialisation
+ C    | | |-INI_CARTESIAN_GRID       :: Cartesian horiz. grid initialization
  C    | | |                          :: (calculate grid from kernel parameters).
  C    | | |
  C    | | |-INI_SPHERICAL_POLAR_GRID :: Spherical polar horiz. grid
- C    | | |                          :: initialisation (calculate grid from
+ C    | | |                          :: initialization (calculate grid from
  C    | | |                          :: kernel parameters).
  C    | | |
  C    | | |-INI_CURVILINEAR_GRID     :: General orthogonal, structured horiz.
- C    | |                            :: grid initialisations. ( input from raw
+ C    | |                            :: grid initializations. ( input from raw
  C    | |                            :: grid files, LONC.bin, DXF.bin etc... )
  C    | |
  C    | |-INI_DEPTHS    :: Read (from "bathyFile") or set bathymetry/orgography.
-Line 1490 
 C    | |
+Line 1493 
 C    | |
  C    | |-INI_LINEAR_PHSURF :: Set ref. surface Bo_surf
  C    | |
  C    | |-INI_CORI          :: Set coriolis term. zero, f-plane, beta-plane,
- C    | |                   :: sphere optins are coded.
+ C    | |                   :: sphere options are coded.
  C    | |
  C    | |-PACAKGES_BOOT      :: Start up the optional package environment.
  C    | |                    :: Runtime selection of active packages.
-Line 1511 
 C    | |
+Line 1514 
 C    | |
  C    | |-PACKAGES_CHECK
  C    | | |
  C    | | |-KPP_CHECK           :: KPP Package. pkg/kpp
- C    | | |-OBCS_CHECK          :: Open bndy Pacakge. pkg/obcs
+ C    | | |-OBCS_CHECK          :: Open bndy Package. pkg/obcs
  C    | | |-GMREDI_CHECK        :: GM Package. pkg/gmredi
  C    | |
  C    | |-PACKAGES_INIT_FIXED
-Line 1531 
 C    |
+Line 1534 
 C    |
  C    |-CTRL_UNPACK :: Control vector support package. see pkg/ctrl
  C    |
  C    |-ADTHE_MAIN_LOOP :: Derivative evaluating form of main time stepping loop
- C    !                 :: Auotmatically gerenrated by TAMC/TAF.
+ C    !                 :: Auotmatically generated by TAMC/TAF.
  C    |
  C    |-CTRL_PACK   :: Control vector support package. see pkg/ctrl
  C    |
-Line 1545 
 C    | | |
+Line 1548 
 C    | | |
  C    | | |-INI_LINEAR_PHISURF :: Set ref. surface Bo_surf
  C    | | |
  C    | | |-INI_CORI     :: Set coriolis term. zero, f-plane, beta-plane,
- C    | | |              :: sphere optins are coded.
+ C    | | |              :: sphere options are coded.
  C    | | |
  C    | | |-INI_CG2D     :: 2d con. grad solver initialisation.
  C    | | |-INI_CG3D     :: 3d con. grad solver initialisation.
-Line 1553 
 C    | | |-INI_MIXING   :: Initialise di
+Line 1556 
 C    | | |-INI_MIXING   :: Initialise di
  C    | | |-INI_DYNVARS  :: Initialise to zero all DYNVARS.h arrays (dynamical
  C    | | |              :: fields).
  C    | | |
- C    | | |-INI_FIELDS   :: Control initialising model fields to non-zero
+ C    | | |-INI_FIELDS   :: Control initializing model fields to non-zero
  C    | | | |-INI_VEL    :: Initialize 3D flow field.
  C    | | | |-INI_THETA  :: Set model initial temperature field.
  C    | | | |-INI_SALT   :: Set model initial salinity field.
-Line 1631 
 C/\  | | |-CALC_EXACT_ETA :: Change SSH
+Line 1634 
 C/\  | | |-CALC_EXACT_ETA :: Change SSH
  C/\  | | |-CALC_SURF_DR   :: Calculate the new surface level thickness.
  C/\  | | |-EXF_GETFORCING :: External forcing package. ( pkg/exf )
  C/\  | | |-EXTERNAL_FIELDS_LOAD :: Control loading time dep. external data.
- C/\  | | | |                    :: Simple interpolcation between end-points
+ C/\  | | | |                    :: Simple interpolation between end-points
  C/\  | | | |                    :: for forcing datasets.
  C/\  | | | |
  C/\  | | | |-EXCH :: Sync forcing. in overlap regions.

 Legend:



Removed from v.1.2
 


changed lines


 
Added in v.1.5
 Legend:



Removed from v.1.2
 


changed lines


 
Added in v.1.5
-Removed from v.1.2
+Added in v.1.5

	ViewVC Help
Powered by ViewVC 1.1.22