/[MITgcm]/MITgcm/doc/notes
ViewVC logotype

Contents of /MITgcm/doc/notes

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.2 - (show annotations) (download)
Fri Apr 24 02:36:52 1998 UTC (26 years ago) by cnh
Branch: MAIN
Changes since 1.1: +2 -0 lines
Added versioning header to MITgcmUV "notes" file

1 $Header$
2
3 Miscellaneous notes relating to MITgcm UV
4 =========================================
5
6 o Something really weird is happening - variables keep
7 changing value!
8
9 Apart from the usual problems of out of bounds array refs.
10 and various bugs itis important to be sure that "stack"
11 variables really are stack variables in multi-threaded execution.
12 Some compilers put subroutines local variables in static storage.
13 This can result in an apparently private variable in a local
14 routine being mysteriously changed by concurrently executing
15 thread.
16
17 =====================================
18
19 o Something really weird is happening - the code gets stuck in
20 a loop somewhere!
21
22 The routines in barrier.F should be compiled without any
23 optimisation. The routines check variables that are updated by other threads
24 Compiler optimisations generally assume that the code being optimised
25 will obey the sequential semantics of regular Fortran. That means they
26 will assume that a variable is not going to change value unless the
27 code it is optimising changes it. Obviously this can cause problems.
28
29 =====================================
30
31 o Is the Fortran SAVE statement a problem.
32
33 Yes. On the whole the Fortran SAVE statement should not be used
34 for data in a multi-threaded code. SAVE causes data to be held in
35 static storage meaning that all threads will see the same location.
36 Therefore, generally if one thread updates the location all other threads
37 will see it. Note - there is often no specification for what should happen
38 in this situation in a multi-threaded environment, so this is
39 not a robust machanism for sharing data.
40 For most cases where SAVE might be appropriate either of the following
41 recipes should be used instead. Both these schemes are potential
42 performance bottlenecks if they are over-used.
43 Method 1
44 ********
45 1. Put the SAVE variable in a common block
46 2. Update the SAVE variable in a _BEGIN_MASTER, _END_MASTER block.
47 3. Include a _BARRIER after the _BEGIN_MASTER, _END_MASTER block.
48 e.g
49 C nIter - Current iteration counter
50 COMMON /PARAMS/ nIter
51 INTEGER nIter
52
53 _BEGIN_MASTER(myThid)
54 nIter = nIter+1
55 _END_MASTER(myThid)
56 _BARRIER
57
58 Note. The _BARRIER operation is potentially expensive. Be conservative
59 in your use of this scheme.
60
61 Method 2
62 ********
63 1. Put the SAVE variable in a common block but with an extra dimension
64 for the thread number.
65 2. Change the updates and references to the SAVE variable to a per thread
66 basis.
67 e.g
68 C nIter - Current iteration counter
69 COMMON /PARAMS/ nIter
70 INTEGER nIter(MAX_NO_THREADS)
71
72 nIter(myThid) = nIter(myThid)+1
73
74 Note. nIter(myThid) and nIter(myThid+1) will share the same
75 cache line. The update will cause extra low-level memory
76 traffic to maintain cache coherence. If the update is in
77 a tight loop this will be a problem and nIter will need
78 padding.
79 In a NUMA system nIter(1:MAX_NO_THREADS) is likely to reside
80 in a single page of physical memory on a single box. Again in
81 a tight loop this would cause lots of remote/far memory references
82 and would be a problem. Some compilers provide a machanism
83 for helping overcome this problem.
84
85 =====================================
86
87 o Can I debug using write statements.
88
89 Many systems do not have "thread-safe" Fortran I/O libraries.
90 On these systems I/O generally orks but it gets a bit intermingled!
91 Occaisionally doing multi-threaded I/O with an unsafe Fortran I/O library
92 will actual cause the program to fail. Note: SGI has a "thread-safe" Fortran
93 I/O library.
94
95 =====================================
96
97 o Mapping virtual memory to physical memory.
98
99 The current code declares arrays as
100 real aW2d (1-OLx:sNx+OLx,1-OLy:sNy+OLy,nSx,nSy)
101 This raises an issue on shared virtual-memory machines that have
102 an underlying non-uniform memory subsystem e.g. HP Exemplar, SGI
103 Origin, DG, Sequent etc.. . What most machines implement is a scheme
104 in which the physical memory that backs the virtual memory is allocated
105 on a page basis at
106 run-time. The OS manages this allocation and without exception
107 pages are assigned to physical memory on the box where the thread
108 which caused the page-fault is running. Pages are typically 4-8KB in
109 size. This means that in some environments it would make sense to
110 declare arrays
111 real aW2d (1-OLx:sNx+OLx+PX,1-OLy:sNy+OLy+PY,nSx,nSy)
112 where PX and PY are chosen so that the divides between near and
113 far memory will coincide with the boundaries of the virtual memory
114 regions a thread works on. In principle this is easy but it is
115 also inelegant and really one would like the OS/hardware to take
116 care of this issue. Doing it oneself requires PX and PY to be recalculated whenever
117 the mapping of the nSx, nSy blocks to nTx and nTy threads is changed. Also
118 different PX and PY are required depending on
119 page size
120 array element size ( real*4, real*8 )
121 array dimensions ( 2d, 3d Nz, 3d Nz+1 ) - in 3d a PZ would also be needed!
122 Note: 1. A C implementation would be a lot easier. An F90 including allocation
123 would also be fairly straightforward.
124 2. The padding really ought to be between the "collection" of blocks
125 that all the threads using the same near memory work on. To save on wasted
126 memory the padding really should be between these blocks. The
127 PX, PY, PZ mechanism does this three levels down on the heirarchy. This
128 wastes more memory.
129 3. For large problems this is less of an issue. For a large problem
130 even for a 2d array there might be say 16 pages per array per processor
131 and at least 4 processors in a uniform memory access box. Assuming a
132 sensible mapping of processors to blocks only one page (1.5% of the
133 memory) referenced by processors in another box.
134 On the other hand for a very small per processor problem size e.g.
135 32x32 per processor and again four processors per box as many as
136 50% of the memory references could be to far memory for 2d fields.
137 This could be very bad!
138
139 =====================================
140
141 =====================================
142
143 =====================================
144
145 =====================================
146
147 =====================================
148
149 =====================================

  ViewVC Help
Powered by ViewVC 1.1.22