Announcing bcolz 1.2.1
What's new
This is a maintenance release where C-Blosc internal sources has been
updated to 1.14.3, in which Zstd codec should exhibit improved
performance. Also, np.datetime64 and other scalar objects that have
__getitem__() are now supported in _eval_blocks() (thanks to apalepu23).
Finally, there is improved support for ARM (specially for aarch64) and
PowerPC architectures (only little-endian).
For a more detailed change log, see:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst
For some comparison between bcolz and other compressed data containers,
see:
https://github.com/FrancescAlted/DataContainersTutorials
specially chapters 3 (in-memory containers) and 4 (on-disk containers).
What it is
bcolz provides columnar and compressed data containers that can live
either on-disk or in-memory. The compression is carried out transparently
by Blosc, an ultra fast meta-compressor that is optimized for binary data.
Compression is active by default.
Column storage allows for efficiently querying tables with a large number
of columns. It also allows for cheap addition and removal of columns.
Lastly, high-performance iterators (like iter(), where()) for querying the
objects are provided.
bcolz can use diffent backends internally (currently numexpr, Python/NumPy
or dask) so as to accelerate many vector and query operations (although it
can use pure NumPy for doing so too). Moreover, since the carray/ctable
containers can be disk-based, it is possible to use them for seamlessly
performing out-of-memory computations.
While NumPy is used as the standard way to feed and retrieve data from
bcolz internal containers, but it also comes with support for
high-performance import/export facilities to/from HDF5/PyTables tables and
pandas dataframes.
Have a look at how bcolz and the Blosc compressor, are making a better use
of the memory without an important overhead, at least for some real
scenarios:
http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots
bcolz has minimal dependencies (NumPy is the only strict requisite), comes
with an exhaustive test suite, and it is meant to be used in production.
Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/),
Quantopian (https://www.quantopian.com/) and scikit-allel:
* Visualfabriq:
* bquery, A query and aggregation framework for Bcolz:
* https://github.com/visualfabriq/bquery
* Quantopian:
* Using compressed data containers for faster backtesting at scale:
* https://quantopian.github.io/talks/NeedForSpeed/slides.html
* scikit-allel:
* Exploratory analysis of large scale genetic variation data.
* https://github.com/cggh/scikit-allel
Resources
Visit the main bcolz site repository at: http://github.com/Blosc/bcolz
Manual: http://bcolz.blosc.org
Home of Blosc compressor: http://blosc.org
User's mail list: bcolz@googlegroups.com
http://groups.google.com/group/bcolz
License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt
Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst
----------------------------------------------------------------------
Enjoy data!
Generated by dwww version 1.14 on Fri Dec 5 02:58:54 CET 2025.