Format for raw output files from MITgcmUV ========================================= Introduction ------------ When running in parallel mode with multiple processes the MITgcmUV model operates as N separate programs, each responsible for its "local" region of the "total" model domain. Synchronisation and sharing of data between these processes is done explicitly by calls to data exchange and barrier routines. Consequently there is no single program that has a view of the whole model domain as the code is running. Any simple I/O can only operate on the local region of the model domain - I/O operations to and from datasets that represent the total domain need to address the multiple process behavior explicitly. Under MITgcmUV there are a set of I/O support routines that mask the details of this process and enable end-users to read and write datasets in a straight-forward manner. The routines use the following design strategy: o Input datasets are for the total domain o Output datasets are for the local domain o A separate program "joinds" is provided which joins a set of local domain datasets together to form total model domain dataset. MITgcmUV IO support routines ---------------------------- o SUBROUTINE READ_FLD_XY_RS( pref, suff, fld, time, thid ) _RS fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nSx,nSy) o SUBROUTINE READ_FLD_XY_RL( pref, suff, fld, time, thid ) _RL fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nSx,nSy) o SUBROUTINE READ_FLD_XYZ_RS( pref, suff, fld, time, thid ) _RS fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nZ,nSx,nSy) o SUBROUTINE READ_FLD_XYZ_RL( pref, suff, fld, time, thid ) _RL fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nZ,nSx,nSy) o SUBROUTINE WRITE_FLD_XY_RS( pref, suff, fld, time, thid ) _RS fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nSx,nSy) o SUBROUTINE WRITE_FLD_XY_RL( pref, suff, fld, time, thid ) _RL fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nSx,nSy) o SUBROUTINE WRITE_FLD_XYZ_RS( pref, suff, fld, time, thid ) _RS fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nZ,nSx,nSy) o SUBROUTINE WRITE_FLD_XYZ_RL( pref, suff, fld, time, thid ) _RL fld(1-OLx:sNx+OLx,1-OLy,sNy+OLy,nZ,nSx,nSy) all routines CHARACTER*(*) pref CHARACTER*(*) suff INTEGER time INTEGER thid macros _RS -> REAL*4 or REAL*8 _RL -> REAL*8 pref - String used in prefix part of file name. Examples 'theta.' = temperature from 'uVel.' = zonal velocity MITgcmUV 'vVel.' = meridional velocity 'salt.' = salinity suff - String used in suffix part of file name. Examples '0000000100' = iteration number from 'ckptA' = checkpoint file MITgcmUV fld - Two or three dimensional REAL*4 or REAL*8 srray. Examples theta = temperature field from cg2d_x = surface elevation field MITgcmUV time - Time level in the calling subroutine this - Thread id of the calling subroutine Dataset format -------------- Datasets are written using the standard Fortran 77 sequential binary file format. The Fortran IO statements in he model code do not specify any particular format, however, compile and run-time flags are used on some platforms. On DEC platforms by default the IO form is set to big-endian with a compile time flag. On CRAY platforms a runtime flag is normally used to select IEEE representation. The Fortran 77 sequential binary file format is 4 byte header data 4 byte terminator The header and terminator are unsigned integers which give the length of the data section in bytes. This is format is standard over all UNIX platforms. In Fortran this style of file is generated by code of the form REAL A(dim1, dim2, ..... ) OPEN(unitnumber,filename,FORM='FORMATTED') WRITE(unitnumber) A END The data is sequenced in the standard Fortran convention of the left-most index varying fastest. This convention holds for any dimension of datsets one-dimensional, two-dimensional, three-dimensional and four-dimensional or more datasets are all written this way. Multiprocess support -------------------- The format described above is used for multi-process simulations. In this case the data written to separate files with each process writing data that is local to it. To support this approach a file naming convention is used and a second file of "meta" information accompanines the data. The naming convention is used to avoid duplicate names and to make it easy to identify sets of files that together represent the total domain data. The meta file contains information about the extent of the sub-domain within each file. The naming convention used is PREF.SUFF.pPNUMBER.tTNUMBER.data PREF.SUFF.pPNUMBER.tTNUMBER.meta where PREF - Is a field identifying the data within the file. For temperature PREF is T, for zonal velocity PREF is U etc... SUFF - Is a field identifying the "instance" of the data within the file. The instance is typically the time level. In general the instance will be a model timestep number. PNUMBER - Is a process number used to identitfy which process of a multi-process run generated this data. The number ranges from 0 to (number of processors)-1. TNUMBER - Is a thread number used to identify which thread of a multi-threaded run generated this data. The number ranges from 0 to (number of threads)-1. the .data suffix identifies the file containing the actual data. the .meta suffix identifies the file containing textual information indicating the extent of the domain written to the .data file. .meta file Format ----------------- This file contains a set of parameters that are specified using the generic parameter specification format used in GCMPACK software. This format consists of a sequence of assignments and comments Assignments have the form keyword =[ val-list ]; where keyword is a text string val-list is a sequence of one or more fields separated by commas Comments are preceeded by // or # characters or contained in /* */ pairs. The keywords contained in a .meta file are id - This is a numeric identifier. It can be used to verify consistency over a set of .meta files. nDims - This is a single integer indicating the dimensionality of the data in the .data file. dimList - This is a sequence of triplets. There is one triplet for each dimension and the triplets are ordered in the same way as the dimensions. Each triplet is made of three integers. The first integer gives the domain extent globally for the associated dimension. The second integer gives the low coordinate for the values within .data file for the associated dimension. The third integer gives the high coordinate for the values within .data file for the associated dimension. Thus for a .data file containing the north-west quadrant of a global domain of size 90 x 40 the .meta might read nDims = [ 2 ]; dimList = [ 90, 46, 90, 40, 1, 20]; For a global domain of size 90 x 40 x 33 the .meta file would read nDims = [ 3 ]; dimList = [ 90, 46, 90, 40, 1, 20, 33, 1, 33]; Example matlab program to join files ------------------------------------ The following matlab script joins together a collection of files that were written in split form. The files to join are indicated by a user defined PREF.SUFF pair. e.g. T.0000002800. The script uses the UNIX ls command to find all files starting with T.0000002800 and then scans the .meta files to extract the dimensions. It then merges all the sections together to form a complete representation of the global dataset. >> function [AA] = rdmeta(fname,varargin) >> % >> % Read MITgcmUV Meta/Data files >> % >> % A = RDMETA(FNAME) reads data described by meta/data file format. >> % FNAME is a string containing the "head" of the file names. >> % >> % eg. To load the meta-data files >> % T.0000002880.p0000.t0000.meta, T.0000002880.p0000.t0000.data >> % T.0000002880.p0001.t0000.meta, T.0000002880.p0001.t0000.data >> % T.0000002880.p0002.t0000.meta, T.0000002880.p0002.t0000.data >> % T.0000002880.p0003.t0000.meta, T.0000002880.p0003.t0000.data >> % use >> % >> A=rdmeta('T.0000002880'); >> % >> % A = RDMETA(FNAME,MACHINEFORMAT) allows the machine format to be specified >> % which MACHINEFORMAT is on of the following strings: >> % >> % 'native' or 'n' - local machine format - the default >> % 'ieee-le' or 'l' - IEEE floating point with little-endian >> % byte ordering >> % 'ieee-be' or 'b' - IEEE floating point with big-endian >> % byte ordering >> % 'vaxd' or 'd' - VAX D floating point and VAX ordering >> % 'vaxg' or 'g' - VAX G floating point and VAX ordering >> % 'cray' or 'c' - Cray floating point with big-endian >> % byte ordering >> % 'ieee-le.l64' or 'a' - IEEE floating point with little-endian >> % byte ordering and 64 bit long data type >> % 'ieee-be.l64' or 's' - IEEE floating point with big-endian byte >> % ordering and 64 bit long data type. >> % >> >> % Default options >> ieee='n'; >> >> % Check optional arguments >> args=char(varargin); >> while (size(args,1) > 0) >> if deblank(args(1,:)) == 'n' | deblank(args(1,:)) == 'native' >> ieee='n'; >> elseif deblank(args(1,:)) == 'l' | deblank(args(1,:)) == 'ieee-le' >> ieee='l'; >> elseif deblank(args(1,:)) == 'b' | deblank(args(1,:)) == 'ieee-be' >> ieee='b'; >> elseif deblank(args(1,:)) == 'c' | deblank(args(1,:)) == 'cray' >> ieee='c'; >> elseif deblank(args(1,:)) == 'a' | deblank(args(1,:)) == 'ieee-le.l64' >> ieee='a'; >> elseif deblank(args(1,:)) == 's' | deblank(args(1,:)) == 'ieee-be.l64' >> ieee='s'; >> else >> sprintf(['Optional argument ' args(1,:) ' is unknown']) >> return >> end >> args=args(2:end,:); >> end >> >> % Match name of all meta-files >> eval(['ls ' fname '*.meta;']); >> allfiles=ans; >> >> % Beginning and end of strings >> Iend=findstr(allfiles,'.meta')+4; >> Ibeg=[1 Iend(1:end-1)+2]; >> >> % Loop through allfiles >> for j=1:prod(size(Ibeg)), >> >> % Read meta- and data-file >> [A,N] = localrdmeta(allfiles(Ibeg(j):Iend(j)),ieee); >> >> bdims=N(1,:); >> r0=N(2,:); >> rN=N(3,:); >> ndims=prod(size(bdims)); >> if (ndims == 1) >> AA(r0(1):rN(1))=A; >> elseif (ndims == 2) >> AA(r0(1):rN(1),r0(2):rN(2))=A; >> elseif (ndims == 3) >> AA(r0(1):rN(1),r0(2):rN(2),r0(3):rN(3))=A; >> elseif (ndims == 4) >> AA(r0(1):rN(1),r0(2):rN(2),r0(3):rN(3),r0(4):rN(4))=A; >> else >> sprintf('Dimension of data set is larger than currently coded. Sorry!') >> return >> end >> >> end >> >> %------------------------------------------------------------------------------- >> >> function [A,N] = localrdmeta(fname,ieee) >> >> mname=fname; >> dname=strrep(mname,'.meta','.data'); >> >> % Read and interpret Meta file >> fid = fopen(mname,'r'); >> if (fid == -1) >> sprintf(['Fila e' mname ' could not be opened']) >> return >> end >> >> % Scan each line of the Meta file >> allstr=' '; >> keepgoing = 1; >> while keepgoing > 0, >> line = fgetl(fid); >> if (line == -1) >> keepgoing=-1; >> else >> % Strip out "(PID.TID *.*)" by finding first ")" >> ind=findstr([line ')'],')'); line=line(ind(1)+1:end); >> % Remove comments of form // >> line=[line ' //']; ind=findstr(line,'//'); line=line(1:ind(1)-1); >> % Add to total string >> allstr=[allstr line]; >> end >> end >> >> % Close meta file >> fclose(fid); >> >> % Strip out comments of form /* ... */ >> ind1=findstr(allstr,'/*'); ind2=findstr(allstr,'*/'); >> if size(ind1) ~= size(ind2) >> sprintf('The /* ... */ comments are not properly paired') >> return >> end >> while size(ind1,2) > 0 >> allstr=[allstr(1:ind1(1)-1) allstr(ind2(1)+3:end)]; >> ind1=findstr(allstr,'/*'); ind2=findstr(allstr,'*/'); >> end >> >> eval(lower(allstr)); >> >> N=reshape( dimlist , 3 , prod(size(dimlist))/3 ); >> >> A=allstr; >> % Open data file >> fid=fopen(dname,'r',ieee); >> >> % Read record size in bytes >> recsz=fread(fid,1,'uint32'); >> ldims=N(3,:)-N(2,:)+1; >> numels=prod(ldims); >> >> rat=recsz/numels; >> if rat == 4 >> A=fread(fid,numels,'real*4'); >> elseif rat == 8 >> A=fread(fid,numels,'real*8'); >> else >> sprintf('Ratio between record size and size in meta-file inconsistent') >> sprintf(' Implied size in meta-file = %d', numels ) >> sprintf(' Record size in data-file = %d', recsz ) >> return >> end >> >> erecsz=fread(fid,1,'uint32'); >> if erecsz ~= recsz >> sprintf('WARNING: Record sizes at beginning and end of file are inconsistent') >> end >> >> fclose(fid); >> >> A=reshape(A,ldims); >>