dwww Home | Manual pages | Find package

NCCOPY(1)                      UNIDATA UTILITIES                     NCCOPY(1)

NAME
       nccopy  -  Copy a netCDF file, optionally changing format, compression,
       or chunking in the output.

SYNOPSIS
       nccopy [-k  kind_name ] [-kind_code] [-d  n ]  [-s]  [-c   chunkspec  ]
              [-u]  [-w]  [-[v|V] var1,...]  [-[g|G] grp1,...]  [-m  bufsize ]
              [-h  chunk_cache ] [-e  cache_elems ] [-r] [-F  filterspec ] [-L
              n ] [-M  n ]  infile  outfile

DESCRIPTION
       The  nccopy utility copies an input netCDF file in any supported format
       variant to an output netCDF file, optionally converting the  output  to
       any compatible netCDF format variant, compressing the data, or rechunk-
       ing the data.  For example, if  built  with  the  netCDF-3  library,  a
       netCDF  classic file may be copied to a netCDF 64-bit offset file, per-
       mitting larger variables.  If built with the netCDF-4 library, a netCDF
       classic  file may be copied to a netCDF-4 file or to a netCDF-4 classic
       model file as  well,  permitting  data  compression,  efficient  schema
       changes, larger variable sizes, and use of other netCDF-4 features.

       If  no  output  format  is  specified,  with  either  -k  kind_name  or
       -kind_code, then the output will use the same format as the input,  un-
       less  the input is classic or 64-bit offset and either chunking or com-
       pression is specified, in which case the output will be netCDF-4  clas-
       sic  model format.  Attempting some kinds of format conversion will re-
       sult in an error, if the conversion is not possible.  For  example,  an
       attempt to copy a netCDF-4 file that uses features of the enhanced mod-
       el, such as groups or variable-length strings,  to  any  of  the  other
       kinds  of  netCDF  formats that use the classic model will result in an
       error.

       nccopy also serves as an example of a generic  netCDF-4  program,  with
       its  ability  to  read  any valid netCDF file and handle nested groups,
       strings, and user-defined types, including arbitrarily nested  compound
       types, variable-length types, and data of any valid netCDF-4 type.

       If  DAP  support  was  enabled when nccopy was built, the file name may
       specify a DAP URL. This may be used to convert data on DAP  servers  to
       local netCDF files.

OPTIONS
        -k   kind_name
              Use  format  name to specify the kind of file to be created and,
              by  inference,  the  data  model  (i.e.  netcdf-3  (classic)  or
              netcdf-4 (enhanced)).  The possible arguments are:

                     'nc3' or 'classic' => netCDF classic format

                     'nc6' or '64-bit offset' => netCDF 64-bit format

                     'nc4'  or  'netCDF-4'  =>  netCDF-4 format (enhanced data
                     model)

                     'nc7' or 'netCDF-4 classic  model'  =>  netCDF-4  classic
                     model format

              Note:  The  old format numbers '1', '2', '3', '4', equivalent to
              the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are
              also  still  accepted  but deprecated, due to easy confusion be-
              tween format numbers and format names.

       [-kind_code]
              Use format numeric code (instead of format name) to specify  the
              kind  of  file  to  be created and, by inference, the data model
              (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)).   The  nu-
              meric codes are:

                     3 => netcdf classic format

                     6 => netCDF 64-bit format

                     4 => netCDF-4 format (enhanced data model)

                     7 => netCDF-4 classic model format
       The  numeric  code  "7"  is used because "7=3+4", specifying the format
       that uses the netCDF-3 data model for compatibility with  the  netCDF-4
       storage  format  for performance. Credit is due to NCO for use of these
       numeric codes instead of the old and confusing format numbers.

        -d   n
              For netCDF-4 output, including netCDF-4 classic  model,  specify
              deflation level (level of compression) for variable data output.
              0 corresponds to no compression and 9  to  maximum  compression,
              with higher levels of compression requiring marginally more time
              to compress or uncompress than lower levels. As  a  side  effect
              specifying  a compression level of 0 (via "-d 0") actually turns
              off deflation altogether.  Compression achieved may also  depend
              on  output chunking parameters.  If this option is specified for
              a classic format or 64-bit offset format input file, it  is  not
              necessary  to  also  specify  that the output should be netCDF-4
              classic model, as that will be the default.  If this  option  is
              not  specified  and the input file has compressed variables, the
              compression will still be preserved in  the  output,  using  the
              same chunking as in the input by default.

              Note  that  nccopy requires all variables to be compressed using
              the same compression level, but the API has no such restriction.
              With  a  program you can customize compression for each variable
              independently.

        -s    For netCDF-4 output, including netCDF-4 classic  model,  specify
              shuffling of variable data bytes before compression or after de-
              compression.  Shuffling refers to  interlacing  of  bytes  in  a
              chunk  so  that  the first bytes of all values are contiguous in
              storage, followed by all the second bytes, and so on, which  of-
              ten  improves compression.  This option is ignored unless a non-
              zero deflation level is specified.  Using -d0 to specify no  de-
              flation  on  input  data  that  has been compressed and shuffled
              turns off both compression and shuffling in the output.

        -u    Convert any unlimited size dimensions in the input to fixed size
              dimensions  in the output.  This can speed up variable-at-a-time
              access, but slow down record-at-a-time access to multiple  vari-
              ables along an unlimited dimension.

        -w    Keep  output  in memory (as a diskless netCDF file) until output
              is closed, at which time output file is written to  disk.   This
              can  greatly speedup operations such as converting unlimited di-
              mension to fixed size (-u option), chunking, rechunking, or com-
              pressing  the input.  It requires that available memory is large
              enough to hold the output file.  This option may provide a larg-
              er speedup than careful tuning of the -m, -h, or -e options, and
              it's certainly a lot simpler.

        -c  chunkspec
              For netCDF-4 output, including netCDF-4 classic  model,  specify
              chunking (multidimensional tiling) for variable data in the out-
              put.  This is useful to specify the units of disk  access,  com-
              pression,  or  other  filters  such  as checksums.  Changing the
              chunking in a netCDF file can also greatly  speedup  access,  by
              choosing  chunk  shapes that are appropriate for the most common
              access patterns.

              The chunkspec argument has several forms. The first form is  the
              original, deprecated form and is a string of comma-separated as-
              sociations, each specifying a dimension name, a  '/'  character,
              and  optionally  the  corresponding chunk length for that dimen-
              sion.  No blanks should appear in the chunkspec  string,  except
              possibly  escaped  blanks  that are part of a dimension name.  A
              chunkspec names at least one dimension, and may omit  dimensions
              which  are  not  to  be  chunked  or for which the default chunk
              length is desired.  If a dimension name is  followed  by  a  '/'
              character  but  no subsequent chunk length, the actual dimension
              length is assumed.   If  copying  a  classic  model  file  to  a
              netCDF-4  output  file  and  not  naming  all  dimensions in the
              chunkspec, unnamed dimensions will also use the actual dimension
              length  for  the  chunk  length.   An example of a chunkspec for
              variables that use 'm' and 'n' dimensions might be 'm/100,n/200'
              to specify 100 by 200 chunks. To see the chunking resulting from
              copying with a chunkspec, use the '-s' option of ncdump  on  the
              output file.

              The chunkspec '/' that omits all dimension names and correspond-
              ing chunk lengths specifies that no chunking is to occur in  the
              output, so can be used to unchunk all the chunked variables.  To
              see the chunking resulting from copying with  a  chunkspec,  use
              the '-s' option of ncdump on the output file.

              As  an  I/O optimization, nccopy has a threshold for the minimum
              size of non-record variables that get  chunked,  currently  8192
              bytes. The -M flag can be used to override this value.

              Note  that  nccopy  requires variables that share a dimension to
              also share the chunk size associated with  that  dimension,  but
              the  programming interface has no such restriction.  If you need
              to customize chunking for variables independently, you will need
              to  use  the  second  form  of  chunkspec.  This  second form of
              chunkspec has this syntax:  var:n1,n2,...,nn . This assumes that
              the  variable named "var" has rank n. The chunking to be applied
              to each dimension of the variable is specified by the values  of
              n1 through nn. This second form of chunking specification can be
              repeated multiple times to specify the exact chunking  for  dif-
              ferent  variables.   If  the  variable is specified but no chunk
              sizes are specified (i.e.  -c var: ) then chunking  is  disabled
              for  that variable.  If the same variable is specified more than
              once, the second and later specifications  are  ignored.   Also,
              this  second  form, per-variable chunking, takes precedence over
              any per-dimension chunking except the bare "/" case.

              The third form of the chunkspec has the syntax:  var:compact  or
              var:contiguous.   This  explicitly  attempts to set the variable
              storage type as compact or contiguous, respectively.  These  may
              be overridden if other flags require the variable to be chunked.

        -v   var1,...
              The output will include data values for the specified variables,
              in addition to the declarations of  all  dimensions,  variables,
              and  attributes. One or more variables must be specified by name
              in the comma-delimited list following this option. The list must
              be  a  single  argument to the command, hence cannot contain un-
              escaped blanks or other white space characters. The named  vari-
              ables  must be valid netCDF variables in the input-file. A vari-
              able within a group in a netCDF-4 file may be specified with  an
              absolute  path  name,  such  as "/GroupA/GroupA2/var".  Use of a
              relative path name such as  'var'  or  "grp/var"  specifies  all
              matching  variable names in the file.  The default, without this
              option, is to include data values for   all   variables  in  the
              output.

        -V   var1,...
              The output will include the specified variables only but all di-
              mensions and global or group attributes. One or  more  variables
              must  be specified by name in the comma-delimited list following
              this option. The list must be a single argument to the  command,
              hence cannot contain unescaped blanks or other white space char-
              acters. The named variables must be valid  netCDF  variables  in
              the input-file. A variable within a group in a netCDF-4 file may
              be   specified   with   an   absolute   path   name,   such   as
              '/GroupA/GroupA2/var'.   Use  of  a  relative  path name such as
              'var' or 'grp/var' specifies all matching variable names in  the
              file.   The  default,  without  this  option, is to include  all
              variables in the output.

        -g   grp1,...
              The output will include  data  values  only  for  the  specified
              groups.   One  or  more  groups must be specified by name in the
              comma-delimited list following this option. The list must  be  a
              single  argument  to the command. The named groups must be valid
              netCDF groups in the input-file. The default, without  this  op-
              tion, is to include data values for all groups in the output.

        -G   grp1,...
              The  output will include only the specified groups.  One or more
              groups must be specified by name  in  the  comma-delimited  list
              following this option. The list must be a single argument to the
              command. The named groups must be valid netCDF groups in the in-
              put-file.  The  default,  without this option, is to include all
              groups in the output.

        -m   bufsize
              An integer or floating-point number that specifies the size,  in
              bytes,  of the copy buffer used to copy large variables.  A suf-
              fix of K, M, G, or T multiplies the  copy  buffer  size  by  one
              thousand,  million, billion, or trillion, respectively.  The de-
              fault is 5 Mbytes, but will be increased if necessary to hold at
              least one chunk of netCDF-4 chunked variables in the input file.
              You may want to specify a value  larger  than  the  default  for
              copying  large files over high latency networks.  Using the '-w'
              option may provide better performance, if  the  output  fits  in
              memory.

        -h   chunk_cache
              For  netCDF-4 output, including netCDF-4 classic model, an inte-
              ger or floating-point number that specifies the size in bytes of
              chunk  cache allocated for each chunked variable.  This is not a
              property of the file, but merely a performance tuning  parameter
              for avoiding compressing or decompressing the same data multiple
              times while copying and changing chunk shapes.  A suffix  of  K,
              M, G, or T multiplies the chunk cache size by one thousand, mil-
              lion,  billion,  or  trillion,  respectively.   The  default  is
              4.194304  Mbytes  (or  whatever was specified for the configure-
              time constant  CHUNK_CACHE_SIZE  when  the  netCDF  library  was
              built).  Ideally, the nccopy utility should accept only one mem-
              ory buffer size and divide it optimally between  a  copy  buffer
              and  chunk cache, but no general algorithm for computing the op-
              timum chunk cache size has been implemented yet. Using the  '-w'
              option  may  provide  better  performance, if the output fits in
              memory.

        -e   cache_elems
              For netCDF-4 output, including netCDF-4 classic model, specifies
              number  of  chunks that the chunk cache can hold. A suffix of K,
              M, G, or T multiplies the number of chunks that can be  held  in
              the  cache  by  one thousand, million, billion, or trillion, re-
              spectively.  This is not a property of the file,  but  merely  a
              performance  tuning parameter for avoiding compressing or decom-
              pressing the same data multiple times while copying and changing
              chunk  shapes.   The  default is 1009 (or whatever was specified
              for the  configure-time  constant  CHUNK_CACHE_NELEMS  when  the
              netCDF  library  was built).  Ideally, the nccopy utility should
              determine an optimum value for this parameter,  but  no  general
              algorithm  for  computing the optimum number of chunk cache ele-
              ments has been implemented yet.

        -r    Read netCDF classic or 64-bit offset input file into a  diskless
              netCDF  file in memory before copying.  Requires that input file
              be small enough to fit into memory.  For  nccopy,  this  doesn't
              seem  to provide any significant speedup, so may not be a useful
              option.

        -L  n Set the log level; only usable if nccopy supports netCDF-4  (en-
              hanced).

        -M  n Set  the  minimum  chunk  size;  only  usable if nccopy supports
              netCDF-4 (enhanced).

        -F  filterspec
              For netCDF-4 output, including netCDF-4 classic model, specify a
              filter  to  apply to a specified set of variables in the output.
              As a rule, the filter is a  compression/decompression  algorithm
              with  a unique numeric identifier assigned by the HDF Group (see
              https://support.hdfgroup.org/services/filters.html).

              The filterspec argument has this general form.
              fqn1|fqn2...,filterid,param1,param2...paramn      or      *,fil-
              terid,param1,param2...paramn
       An fqn (fully qualified name) is the name of a variable prefixed by its
       containing groups with the  group  names  separated  by  forward  slash
       ('/').   An  example might be /g1/g2/var. Alternatively, just the vari-
       able name can be given if it is in the root group: e.g. var.  Backslash
       escapes may be used as needed.  A note of warning: the '|' separator is
       a bash reserved character, so you will probably need to put the  filter
       spec in some kind of quotes or otherwise escape it.

              The filterid is an unsigned positive integer representing the id
              assigned by the HDFgroup to the filter. Following the  id  is  a
              sequence  of  parameters  defining  the operation of the filter.
              Each parameter is a 32-bit unsigned integer.

              This parameter may be repeated  multiple  times  with  different
              variable names.

EXAMPLES
       Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF
       file of the same type:

              nccopy foo1.nc foo2.nc

       Note that the above copy will not be as fast as use of cp or other sim-
       ple copy utility, because the file is copied using only the netCDF API.
       If the input file has extra bytes after the end  of  the  netCDF  data,
       those  will  not be copied, because they are not accessible through the
       netCDF interface.  If the original file was generated in "No fill" mode
       so  that fill values are not stored for padding for data alignment, the
       output file may have different padding bytes.

       Convert a netCDF-4 classic model file, compressed.nc,  that  uses  com-
       pression, to a netCDF-3 file classic.nc:

              nccopy -k classic compressed.nc classic.nc

       Note that 'nc3' could be used instead of 'classic'.

       Download the variable 'time_bnds' and its associated attributes from an
       OPeNDAP server and copy the result to a netCDF file named 'tb.nc':

              nccopy          'http://test.opendap.org/opendap/data/nc/sst.mn-
                     mean.nc.gz?time_bnds' tb.nc

       Note  that  URLs that name specific variables as command-line arguments
       should generally be quoted, to avoid  the  shell  interpreting  special
       characters such as '?'.

       Compress  all  the variables in the input file foo.nc, a netCDF file of
       any type, to the output file bar.nc:

              nccopy -d1 foo.nc bar.nc

       If foo.nc was a classic or 64-bit offset netCDF file, bar.nc will be  a
       netCDF-4 classic model netCDF file, because the classic and 64-bit off-
       set format  variants  don't  support  compression.   If  foo.nc  was  a
       netCDF-4  file  with  some variables compressed using various deflation
       levels, the output will also be a netCDF-4 file of the same  type,  but
       all  the  variables, including any uncompressed variables in the input,
       will now use deflation level 1.

       Assume the input data includes gridded variables that  use  time,  lat,
       lon  dimensions,  with 1000 times by 1000 latitudes by 1000 longitudes,
       and that the time dimension varies most slowly.  Also assume that users
       want  quick  access  to  data  at  all times for a small set of lat-lon
       points.  Accessing data for 1000 times would typically require  access-
       ing 1000 disk blocks, which may be slow.

       Reorganizing  the  data  into  chunks on disk that have all the time in
       each chunk for a few lat and lon coordinates  would  greatly  speed  up
       such  access.   To  chunk  the data in the input file slow.nc, a netCDF
       file of any type, to the output file fast.nc, you could use;

              nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc

       to specify data chunks of 1000 times, 40 latitudes, and 40  longitudes.
       If you had enough memory to contain the output file, you could speed up
       the rechunking operation significantly by creating the output in memory
       before writing it to disk on close (using the -w flag):

              nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
       Alternatively,  one could write this using the alternate, variable-spe-
       cific chunking specification and assuming that times, lat, and lon  are
       variables.

              nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc

Chunking Rules
       The complete set of chunking rules is captured here.  As a rough summa-
       ry, these rules preserve all chunking properties from the  input  file.
       These  rules apply only when the selected output format supports chunk-
       ing, i.e. for the netcdf-4 variants.

       The variable specific chunking  specification  should  be  obvious  and
       translates  directly  to  the  corresponding  "nc_def_var_chunking" API
       call.

       The original per-dimension, chunking specification requires some inter-
       pretation  by nccopy.  The following rules are applied in the given or-
       der independently for each variable to be copied from input to  output.
       The  rules are written assuming we are trying to determine the chunking
       for a given output variable Vout that comes from an input variable Vin.

       1.     If there is no '-c' option that applies to a  variable  and  the
              corresponding  input variable is contiguous or the input is some
              netcdf-3 variant, then let the netcdf-c library make all  chunk-
              ing decisions.

       2.     For  each  dimension of Vout explicitly specified on the command
              line (using the '-c' option), apply the chunking value for  that
              dimension regardless of input format or input properties.

       3.     For  dimensions  of Vout not named on the command line in a '-c'
              option, preserve chunk sizes from the corresponding input  vari-
              able, if it is chunked.

       4.     If  Vin  is  contiguous, and none of its dimensions are named on
              the command line, and chunking is not mandated by other options,
              then make Vout be contiguous.

       5.     If  the  input variable is contiguous (or is some netcdf-3 vari-
              ant) and there are no options requiring  chunking,  or  the  '/'
              special  case  for the '-c' option is specified, then the output
              variable V is marked as contiguous.

       6.     Final, default case: some or all chunk sizes are not  determined
              by  the  command  line  or the input variable. This includes the
              non-chunked input cases such as  netcdf-3,  cdf5,  and  DAP.  In
              these cases retain all chunk sizes determined by previous rules,
              and use the full dimension size as the default. The exception is
              unlimited dimensions, where the default is 4 megabytes.

SEE ALSO
       ncdump(1),ncgen(1),netcdf(3)

Release 4.2                       2012-03-08                         NCCOPY(1)

Generated by dwww version 1.14 on Fri Jan 24 09:24:14 CET 2025.