/[MITgcm]/MITgcm_contrib/cg2d_bench/README
ViewVC logotype

Annotation of /MITgcm_contrib/cg2d_bench/README

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.3 - (hide annotations) (download)
Fri May 12 22:32:31 2006 UTC (17 years, 11 months ago) by ce107
Branch: MAIN
CVS Tags: HEAD
Changes since 1.2: +2 -2 lines
Damn typo in ini_params -> ini_parms

1 ce107 1.3 # $Id: README,v 1.2 2006/05/12 22:32:02 ce107 Exp $
2 ce107 1.1 Benchmarking routine of the CG2D solver in MITgcm (barotropic solve)
3    
4     To build:
5    
6 ce107 1.2 a) Parameterizations:
7     i) of SIZE.h:
8 ce107 1.1 sNx = size of tile in x-direction (ideally fits in cache, 30-60)
9     sNy = size of tile in y-direction (ideally fits in cache, 30-60)
10     OLx = overlap size in x-direction (1 or 3 usually)
11     OLy = overlap size in y-direction (1 or 3 usually)
12 ce107 1.3 ii) of ini_parms.F:
13 ce107 1.2 nTimeSteps = number of pseudo-timesteps to run for
14     cg2dMaxIters = maximum number of CG iterations per timestep
15    
16 ce107 1.1 b) Compilation
17     $CC $CFLAGS -c tim.c
18     $FC $DEFINES $INCLUDES $FCFLAGS -o cg2d *.F tim.o $LIBS -lm
19    
20     $DEFINES:
21     1) For single precision add
22     -DUSE_SINGLE_PRECISION
23     2) For mixed (single for most ops, double for reductions) precision add
24     -DUSE_MIXED_PRECISION to -DUSE_SINGLE_PRECISION
25     3) Parallel (MPI) operation
26     -DALLOW_MPI -DUSE_MPI_INIT -DUSE_MPI_GSUM -DUSE_MPI_EXCH
27     4) Use MPI timing routines
28     -DUSE_MPI_TIME
29     5) Use of MPI_Sendrecv() instead of MPI_Isend/MPI_Irecv()/MPI_Waitall()
30     -DUSE_SNDRCV
31     6) Use of JAM for exchanges (not available without the hardware)
32     -DUSE_JAM_EXCH
33     7) Use of JAM for the global sum (not available without the hardware)
34     -DUSE_JAM_GSUM
35     8) In order to avoid doing the global sum in MPI do not define
36     -DUSE_MPI_GSUM
37     and all processors will see their own residual instead (dangerous)
38     9) In order to avoid doing the exchanges in MPI do not define
39     -DUSE_MPI_EXCH
40     and all processors avoid exchanging shadow regions (dangerous)
41     10) Performance counters
42     -DUSE_PAPI_FLOPS To use PAPI to produce Mflop/s
43     or
44     -DUSE_PAPI_FLIPS To use PAPI to produce Mflip/s
45     To produce this information for every iteration instead of each "timestep"
46     add a -DPAPI_PER_ITERATION to the above
47     11) Extra (nearest neighbor) exchange steps to stress comms
48     -DTEN_EXTRA_EXCHS
49     12) Extra (global) sum steps to stress comms
50     -DHUNDRED_EXTRA_SUMS
51     13) 2D (PxQ) vs 1D decomposition
52     -DDECOMP2D
53     14) To output the residual every iteration:
54     -DRESIDUAL_PER_ITERATION
55    
56     $INCLUDES (if using PAPI)
57     -I$PAPI_ROOT/include
58    
59     $LIBS (if using PAPI - depending on the platform extra libs may be needed)
60     -L$PAPI_ROOT/lib -lpapi
61    
62     c) Running
63    
64     1) Allowing the system to choose the PxQ decomposition if setup for it:
65     mpiexec -n $NPROCS ./cg2d
66     2) Create a decomp.touse with the P & Q dimensions declared in the first
67     two lines as two integers, eg.
68    
69     cat > decomp.touse << EOF
70     10
71     20
72     EOF
73    
74     mpiexec -n 200 ./cg2d
75    

  ViewVC Help
Powered by ViewVC 1.1.22