--- manual/s_software/text/sarch.tex	2001/10/25 18:36:55	1.4
+++ manual/s_software/text/sarch.tex	2001/11/13 18:32:33	1.5
@@ -1,4 +1,4 @@
-% $Header: /home/ubuntu/mnt/e9_copy/manual/s_software/text/sarch.tex,v 1.4 2001/10/25 18:36:55 cnh Exp $
+% $Header: /home/ubuntu/mnt/e9_copy/manual/s_software/text/sarch.tex,v 1.5 2001/11/13 18:32:33 cnh Exp $
 
 In this chapter we describe the software architecture and
 implementation strategy for the MITgcm code. The first part of this
@@ -136,14 +136,14 @@
 class of machines (for example Parallel Vector Processor Systems). Instead the
 WRAPPER provides applications with an 
 abstract {\it machine model}. The machine model is very general, however, it can
-easily be specialized to fit, in a computationally effificent manner, any 
+easily be specialized to fit, in a computationally efficient manner, any 
 computer architecture currently available to the scientific computing community.
 
 \subsection{Machine model parallelism}
 
  Codes operating under the WRAPPER target an abstract machine that is assumed to
 consist of one or more logical processors that can compute concurrently.  
-Computational work is divided amongst the logical
+Computational work is divided among the logical
 processors by allocating ``ownership'' to 
 each processor of a certain set (or sets) of calculations. Each set of 
 calculations owned by a particular processor is associated with a specific 
@@ -402,8 +402,8 @@
   \includegraphics{part4/comm-primm.eps}
  }
 \end{center}
-\caption{Three performance critical parallel primititives are provided
-by the WRAPPER. These primititives are always used to communicate data
+\caption{Three performance critical parallel primitives are provided
+by the WRAPPER. These primitives are always used to communicate data
 between tiles. The figure shows four tiles. The curved arrows indicate
 exchange primitives which transfer data between the overlap regions at tile
 edges and interior regions for nearest-neighbor tiles.
@@ -1006,7 +1006,7 @@
 \begin{verbatim}
 mpirun -np 64 -machinefile mf ./mitgcmuv
 \end{verbatim}
-In this example the text {\em -np 64} specifices the number of processes 
+In this example the text {\em -np 64} specifies the number of processes 
 that will be created. The numeric value {\em 64} must be equal to the
 product of the processor grid settings of {\em nPx} and {\em nPy}
 in the file {\em SIZE.h}. The parameter {\em mf} specifies that a text file
@@ -1212,7 +1212,7 @@
 \item {\bf Cache line size}
 As discussed in section \ref{sec:cache_effects_and_false_sharing},
 milti-threaded codes explicitly avoid penalties associated with excessive 
-coherence traffic on an SMP system. To do this the sgared memory data structures
+coherence traffic on an SMP system. To do this the shared memory data structures
 used by the {\em GLOBAL\_SUM}, {\em GLOBAL\_MAX} and {\em BARRIER} routines
 are padded. The variables that control the padding are set in the
 header file {\em EEPARAMS.h}. These variables are called
@@ -1220,7 +1220,7 @@
 {\em lShare8}. The default values should not normally need changing.
 \item {\bf \_BARRIER}
 This is a CPP macro that is expanded to a call to a routine
-which synchronises all the logical processors running under the
+which synchronizes all the logical processors running under the
 WRAPPER. Using a macro here preserves flexibility to insert
 a specialized call in-line into application code. By default this
 resolves to calling the procedure {\em BARRIER()}. The default
@@ -1228,13 +1228,13 @@
 
 \item {\bf \_GSUM}
 This is a CPP macro that is expanded to a call to a routine
-which sums up a floating point numner
+which sums up a floating point number
 over all the logical processors running under the
 WRAPPER. Using a macro here provides extra flexibility to insert
 a specialized call in-line into application code. By default this
-resolves to calling the procedure {\em GLOBAL\_SOM\_R8()} ( for
-84=bit floating point operands)
-or {\em GLOBAL\_SOM\_R4()} (for 32-bit floating point operands). The default
+resolves to calling the procedure {\em GLOBAL\_SUM\_R8()} ( for
+64-bit floating point operands)
+or {\em GLOBAL\_SUM\_R4()} (for 32-bit floating point operands). The default
 setting for the \_GSUM macro is given in the file {\em CPP\_EEMACROS.h}.
 The \_GSUM macro is a performance critical operation, especially for
 large processor count, small tile size configurations.
@@ -1253,23 +1253,23 @@
 \_EXCH operation plays a crucial role in scaling to small tile,
 large logical and physical processor count configurations.
 The example in section \ref{sec:jam_example} discusses defining an
-optimised and specialized form on the \_EXCH operation.
+optimized and specialized form on the \_EXCH operation.
 
 The \_EXCH operation is also central to supporting grids such as
 the cube-sphere grid. In this class of grid a rotation may be required
 between tiles. Aligning the coordinate requiring rotation with the
-tile decomposistion, allows the coordinate transformation to 
+tile decomposition, allows the coordinate transformation to 
 be embedded within a custom form of the \_EXCH primitive.
 
 \item {\bf Reverse Mode}
 The communication primitives \_EXCH and \_GSUM both employ 
 hand-written adjoint forms (or reverse mode) forms. 
 These reverse mode forms can be found in the
-sourc code directory {\em pkg/autodiff}.
+source code directory {\em pkg/autodiff}.
 For the global sum primitive the reverse mode form
 calls are to {\em GLOBAL\_ADSUM\_R4} and
 {\em GLOBAL\_ADSUM\_R8}. The reverse mode form of the
-exchamge primitives are found in routines
+exchange primitives are found in routines
 prefixed {\em ADEXCH}. The exchange routines make calls to
 the same low-level communication primitives as the forward mode
 operations. However, the routine argument {\em simulationMode}
@@ -1281,7 +1281,7 @@
 maximum number of OS threads that a code will use. This
 value defaults to thirty-two and is set in the file {\em EEPARAMS.h}.
 For single threaded execution it can be reduced to one if required.
-The va;lue is largely private to the WRAPPER and application code
+The value; is largely private to the WRAPPER and application code
 will nor normally reference the value, except in the following scenario.
 
 For certain physical parametrization schemes it is necessary to have 
@@ -1296,8 +1296,8 @@
 being specified involves many more tiles than OS threads then
 it can save memory resources to reduce the variable
 {\em MAX\_NO\_THREADS} to be equal to the actual number of threads that
-will be used and to declare the physical parameterisation
-work arrays with a sinble {\em MAX\_NO\_THREADS} extra dimension.
+will be used and to declare the physical parameterization
+work arrays with a single {\em MAX\_NO\_THREADS} extra dimension.
 An example of this is given in the verification experiment
 {\em aim.5l\_cs}. Here the default setting of 
 {\em MAX\_NO\_THREADS} is altered to
@@ -1310,12 +1310,12 @@
 \begin{verbatim}
       common /FORCIN/ sst1(ngp,MAX_NO_THREADS)
 \end{verbatim}
-This declaration scheme is not used widely, becuase most global data
+This declaration scheme is not used widely, because most global data
 is used for permanent not temporary storage of state information.
 In the case of permanent state information this approach cannot be used
 because there has to be enough storage allocated for all tiles.
 However, the technique can sometimes be a useful scheme for reducing memory 
-requirements in complex physical paramterisations.
+requirements in complex physical parameterizations.
 \end{enumerate}
 
 \begin{figure}
@@ -1348,7 +1348,7 @@
 The isolation of performance critical communication primitives and the
 sub-division of the simulation domain into tiles is a powerful tool.
 Here we show how it can be used to improve application performance and
-how it can be used to adapt to new gridding approaches.
+how it can be used to adapt to new griding approaches.
 
 \subsubsection{JAM example}
 \label{sec:jam_example}