/[MITgcm]/manual/s_software/text/sarch.tex

Diff of /manual/s_software/text/sarch.tex

Parent Directory | Revision Log | View Revision Graph Revision Graph | View Patch Patch

-revision 1.4 by cnh,
Thu Oct 25 18:36:55 2001 UTC
+revision 1.5 by cnh,
Tue Nov 13 18:32:33 2001 UTC
 Line 136 
 particular machine (for example an IBM S
  class of machines (for example Parallel Vector Processor Systems). Instead the
  WRAPPER provides applications with an
  abstract {\it machine model}. The machine model is very general, however, it can
- easily be specialized to fit, in a computationally effificent manner, any
+ easily be specialized to fit, in a computationally efficient manner, any
  computer architecture currently available to the scientific computing community.
  \subsection{Machine model parallelism}
   Codes operating under the WRAPPER target an abstract machine that is assumed to
  consist of one or more logical processors that can compute concurrently.
- Computational work is divided amongst the logical
+ Computational work is divided among the logical
  processors by allocating ``ownership'' to
  each processor of a certain set (or sets) of calculations. Each set of
  calculations owned by a particular processor is associated with a specific
 Line 402 
 highly optimized library.
    \includegraphics{part4/comm-primm.eps}
   }
  \end{center}
- \caption{Three performance critical parallel primititives are provided
+ \caption{Three performance critical parallel primitives are provided
- by the WRAPPER. These primititives are always used to communicate data
+ by the WRAPPER. These primitives are always used to communicate data
  between tiles. The figure shows four tiles. The curved arrows indicate
  exchange primitives which transfer data between the overlap regions at tile
  edges and interior regions for nearest-neighbor tiles.
 Line 1006 
 using a command such as
  \begin{verbatim}
  mpirun -np 64 -machinefile mf ./mitgcmuv
  \end{verbatim}
- In this example the text {\em -np 64} specifices the number of processes
+ In this example the text {\em -np 64} specifies the number of processes
  that will be created. The numeric value {\em 64} must be equal to the
  product of the processor grid settings of {\em nPx} and {\em nPy}
  in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
 Line 1212 
 asm("lock; addl $0,0(%%esp)": : :"memory
  \item {\bf Cache line size}
  As discussed in section \ref{sec:cache_effects_and_false_sharing},
  milti-threaded codes explicitly avoid penalties associated with excessive
- coherence traffic on an SMP system. To do this the sgared memory data structures
+ coherence traffic on an SMP system. To do this the shared memory data structures
  used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines
  are padded. The variables that control the padding are set in the
  header file {\em EEPARAMS.h}. These variables are called
 Line 1220 
 header file {\em EEPARAMS.h}. These vari
  {\em lShare8}. The default values should not normally need changing.
  \item {\bf \_BARRIER}
  This is a CPP macro that is expanded to a call to a routine
- which synchronises all the logical processors running under the
+ which synchronizes all the logical processors running under the
  WRAPPER. Using a macro here preserves flexibility to insert
  a specialized call in-line into application code. By default this
  resolves to calling the procedure {\em BARRIER()}. The default
 Line 1228 
 setting for the \_BARRIER macro is given
  \item {\bf \_GSUM}
  This is a CPP macro that is expanded to a call to a routine
- which sums up a floating point numner
+ which sums up a floating point number
  over all the logical processors running under the
  WRAPPER. Using a macro here provides extra flexibility to insert
  a specialized call in-line into application code. By default this
- resolves to calling the procedure {\em GLOBAL\_SOM\_R8()} ( for
+ resolves to calling the procedure {\em GLOBAL\_SUM\_R8()} ( for
-=bit floating point operands)
+-bit floating point operands)
- or {\em GLOBAL\_SOM\_R4()} (for 32-bit floating point operands). The default
+ or {\em GLOBAL\_SUM\_R4()} (for 32-bit floating point operands). The default
  setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.
  The \_GSUM macro is a performance critical operation, especially for
  large processor count, small tile size configurations.
 Line 1253 
 in the header file {\em CPP\_EEMACROS.h}
  \_EXCH operation plays a crucial role in scaling to small tile,
  large logical and physical processor count configurations.
  The example in section \ref{sec:jam_example} discusses defining an
- optimised and specialized form on the \_EXCH operation.
+ optimized and specialized form on the \_EXCH operation.
  The \_EXCH operation is also central to supporting grids such as
  the cube-sphere grid. In this class of grid a rotation may be required
  between tiles. Aligning the coordinate requiring rotation with the
- tile decomposistion, allows the coordinate transformation to
+ tile decomposition, allows the coordinate transformation to
  be embedded within a custom form of the \_EXCH primitive.
  \item {\bf Reverse Mode}
  The communication primitives \_EXCH and \_GSUM both employ
  hand-written adjoint forms (or reverse mode) forms.
  These reverse mode forms can be found in the
- sourc code directory {\em pkg/autodiff}.
+ source code directory {\em pkg/autodiff}.
  For the global sum primitive the reverse mode form
  calls are to {\em GLOBAL\_ADSUM\_R4} and
  {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the
- exchamge primitives are found in routines
+ exchange primitives are found in routines
  prefixed {\em ADEXCH}. The exchange routines make calls to
  the same low-level communication primitives as the forward mode
  operations. However, the routine argument {\em simulationMode}
 Line 1281 
 The variable {\em MAX\_NO\_THREADS} is u
  maximum number of OS threads that a code will use. This
  value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.
  For single threaded execution it can be reduced to one if required.
- The va;lue is largely private to the WRAPPER and application code
+ The value; is largely private to the WRAPPER and application code
  will nor normally reference the value, except in the following scenario.
  For certain physical parametrization schemes it is necessary to have
 Line 1296 
 and {\em nSy} ( as described in section
  being specified involves many more tiles than OS threads then
  it can save memory resources to reduce the variable
  {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that
- will be used and to declare the physical parameterisation
+ will be used and to declare the physical parameterization
- work arrays with a sinble {\em MAX\_NO\_THREADS} extra dimension.
+ work arrays with a single {\em MAX\_NO\_THREADS} extra dimension.
  An example of this is given in the verification experiment
  {\em aim.5l\_cs}. Here the default setting of
  {\em MAX\_NO\_THREADS} is altered to
 Line 1310 
 created with declarations of the form.
  \begin{verbatim}
        common /FORCIN/ sst1(ngp,MAX_NO_THREADS)
  \end{verbatim}
- This declaration scheme is not used widely, becuase most global data
+ This declaration scheme is not used widely, because most global data
  is used for permanent not temporary storage of state information.
  In the case of permanent state information this approach cannot be used
  because there has to be enough storage allocated for all tiles.
  However, the technique can sometimes be a useful scheme for reducing memory
- requirements in complex physical paramterisations.
+ requirements in complex physical parameterizations.
  \end{enumerate}
  \begin{figure}
 Line 1348 
 MP directives to spawn multiple threads.
  The isolation of performance critical communication primitives and the
  sub-division of the simulation domain into tiles is a powerful tool.
  Here we show how it can be used to improve application performance and
- how it can be used to adapt to new gridding approaches.
+ how it can be used to adapt to new griding approaches.
  \subsubsection{JAM example}
  \label{sec:jam_example}

 Legend:



Removed from v.1.4
 


changed lines


 
Added in v.1.5
 Legend:



Removed from v.1.4
 


changed lines


 
Added in v.1.5
-Removed from v.1.4
+Added in v.1.5

	ViewVC Help
Powered by ViewVC 1.1.22