dwww Home | Show directory contents | Find package

                               Blosc Chunk Format

   The chunk is composed by a header and a blocks / splits section:

 +---------+--------+---------+
 |  header | blocks / splits  |
 +---------+--------+---------+

   These are described below.

                               The header section

   Blosc (as of Version 1.0.0) has the following 16 byte header that stores
   information about the compressed buffer:

 |-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-|
   ^   ^   ^   ^ |     nbytes    |   blocksize   |    cbytes     |
   |   |   |   |
   |   |   |   +--typesize
   |   |   +------flags
   |   +----------versionlz
   +--------------version

                        Datatypes of the header entries

   All entries are little endian.

    version:    (uint8) Blosc format version.                                 
    versionlz:  (uint8) Version of the internal compressor used.              
    flags and compressor enumeration:
                (bitfield) The flags of the buffer                            
                                                                              
                 bit 0 (0x01):  Whether the byte-shuffle filter has been      
                                applied or not.                               
                 bit 1 (0x02):  Whether the internal buffer is a pure         
                                memcpy or not.                                
                 bit 2 (0x04):  Whether the bit-shuffle filter has been       
                                applied or not.                               
                 bit 3 (0x08):  Reserved, must be zero.                       
                 bit 4 (0x10):  If set, the blocks will not be split in       
                                sub-blocks during compression.                
                 bit 5 (0x20):  Part of the enumeration for compressors.      
                 bit 6 (0x40):  Part of the enumeration for compressors.      
                 bit 7 (0x80):  Part of the enumeration for compressors.      
                                                                              
                The last three bits form an enumeration that allows to use    
                alternative compressors.                                      
                                                                              
                 0:  blosclz                                                  
                 1:  lz4 or lz4hc                                             
                 2:  snappy                                                   
                 3:  zlib                                                     
                 4:  zstd                                                     
    typesize:   (uint8) Number of bytes for the atomic type.                  
    nbytes:     (uint32) Uncompressed size of the buffer (this header is not  
                included).                                                    
    blocksize:  (uint32) Size of internal blocks.                             
    cbytes:     (uint32) Compressed size of the buffer (including this        
                header).                                                      

                          The blocks / splits section

   After the header, there come the blocks / splits section. Blocks are
   equal-sized parts of the chunk, except for the last block that can be
   shorter or equal than the rest.

   At the beginning of the blocks section, there come a list of int32_t
   bstarts to indicate where the different encoded blocks starts (counting
   from the end of this bstarts section):

 +=========+=========+========+=========+
 | bstart0 | bstart1 |   ...  | bstartN |
 +=========+=========+========+=========+

   Finally, it comes the actual list of compressed blocks / splits data
   streams. It turns out that a block may optionally (see bit 4 in flags
   above) be further split in so-called splits which are the actual data
   streams that are transmitted to codecs for compression. If a block is not
   split, then the split is equivalent to a whole block. Before each split in
   the list, there is the compressed size of it, expressed as an int32_t:

 +========+========+========+========+========+========+========+
 | csize0 | split0 | csize1 | split1 |   ...  | csizeN | splitN |
 +========+========+========+========+========+========+========+

   Note: all the integers are stored in little endian.

Generated by dwww version 1.14 on Thu Jan 23 03:30:39 CET 2025.