Writing a configuration file¶
In this section we describe how to write a configuration file which tells Polycomp which data to compress, and how.
Note for Python developers: Polycomp configuration files are parsed using Python’s configparse library. Refer to its documentation for more information about all the facilities provided by this format.
Basic syntax¶
Empty lines and lines starting with #
are ignored.
Polycomp configuration files are divided in sections, each marked by its name enclosed in square brackets, like in the following example:
[section 1]
# Here are the contents of "section 1"
[section 2]
# Et cetera
Within each section, the file is expected to contain a sequence of key/value pairs, as in the following example:
cmb_temperature = 2.7
speed_of_light = 3.0e8
Strings must be specified without single/double quotes:
file_path = /data/my_test_data.fits
Specifying which data to compress¶
Every configuration file must at least contain one section named
polycomp
. This section should contain the key tables
, which
specifies a comma-separated list of tables that will be included in
the compressed .pcomp
file generated by the program. For every
table specified here there must be a section (either before or after
this one) with the same name, where details about the table are to be
provided.
Here is an example:
[polycomp]
tables = time, temperature, pressure
[time]
# Here we specify where to take time data, and how to compress them
[temperature]
# Ditto for the temperature...
[pressure]
# ...and for the pressure
In the next sections we are going to explain what should every “table section” contain.
Where to take the data to compress¶
In each table section the following key/value pairs must be present:
- The file key specifies the path to the FITS file containing the data to be compressed.
- The hdu key specifies the number/name of the HDU within the FITS file (the first HDU is 1).
- The column key specifies the number/name of the column in the HDU (the first column is 1).
Here is an example:
file = /opt/data/experiment_1.fits
hdu = 1
column = TIME
How to compress the data¶
The compression key must be present in each table section. It contains a string which identifies the compression algorithm to use, and it can be one of the following:
Value | Algorithm |
---|---|
none |
No compression |
rle |
Run-Length Encoding (RLE) |
diffrle |
Differential RLE |
quantization |
Quantization of floating-point values |
polynomial |
Polynomial compression |
zlib |
Zlib-based compression |
bzip2 |
Bzip2-based compression |
The none
, rle
and diffrle
compression algorithms do not
require other key/value pairs. For all the other cases, they are
presented in the next subsections.
Quantization parameters¶
The only parameter required for this kind of compression is
bits_per_sample
, which specifies the number of bits to be used
with each sample. Typical values are integers less than 32 or 64 bits,
depending on the width of the floating-point type used in the input
data.
Polynomial compression parameters¶
There are a number of key/value pairs that are understood when using this algorithm. Not every pair is required; in a handful of cases, Polycomp can provide a default value.
Key | Default value |
---|---|
num_of_coefficients |
(Required) |
samples_per_chunk |
(Required) |
max_error |
(Required) |
use_chebyshev |
True |
period |
If not specified, the input data will be assumed not to be periodic. |
no_smart_optimization |
False |
opt_delta_coeffs |
1 |
opt_delta_samples |
1 |
opt_max_num_of_iterations |
0 (no upper limit) |
The meaning of the parameters is the following:
num_of_coefficients
specifies the number of coefficients of the fitting polynomial, i.e.,deg p(x) + 1
. The best value for this parameter depends heavily on the input data to be compressed.samples_per_chunk
specifies the number of samples in each chunk. This number must always be greater thannum_of_coefficients
.- If
use_chebyshev
is set toFalse
, the so-called “simple compression algorithm” will be used. In some situations the code might run faster, but it can produce significantly worse compression ratios. - If the input data are periodic,
period
should be set to their period (e.g., 2π for angles measured in radians). The default is not to assume the input data periodic.
The remaining parameters (no_smart_optimization
,
opt_delta_coeffs
, opt_delta_samples
, and
opt_max_num_of_iterations
) are used when the user wants to search
the best possible configuration for the polynomial compressor.