## AWS C Compression This is a cross-platform C99 implementation of compression algorithms such as gzip, and huffman encoding/decoding. Currently only huffman is implemented. ## License This library is licensed under the Apache 2.0 License. ## Usage ### Building Note that aws-c-compression has a dependency on aws-c-common: ``` git clone git@github.com:awslabs/aws-c-common.git cmake -DCMAKE_PREFIX_PATH= -DCMAKE_INSTALL_PREFIX= -S aws-c-common -B aws-c-common/build cmake --build aws-c-common/build --target install git clone git@github.com:awslabs/aws-c-compression.git cmake -DCMAKE_PREFIX_PATH= -DCMAKE_INSTALL_PREFIX= -S aws-c-compression -B aws-c-compression/build cmake --build aws-c-compression/build --target install ``` ### Huffman The Huffman implemention in this library is designed around the concept of a generic "symbol coder" object, which defines how each symbol (value between 0 and 255) is encoded and decoded. This object looks like this: ```c typedef struct aws_huffman_code (*aws_huffman_symbol_encoder)(uint8_t symbol, void *userdata); typedef uint8_t (*aws_huffman_symbol_decoder)(uint32_t bits, uint8_t *symbol, void *userdata); struct aws_huffman_symbol_coder { aws_huffman_symbol_encoder encode; aws_huffman_symbol_decoder decode; void *userdata; }; ``` These callbacks may be implemented manually, or you may use the included Huffman coder generator to generate one from a table definition file. The generator expects to be called with the following arguments: ```shell $ aws-c-compression-huffman-generator path/to/table.def path/to/generated.c coder_name ``` The table definition file should be in the following format: ```c /* sym bits code len */ HUFFMAN_CODE( 0, "1100101110", 0x32e, 10) HUFFMAN_CODE( 1, "1100101111", 0x32f, 10) /* ... */ ``` The HUFFMAN_CODE macro expects 4 arguments: * sym: the symbol value [0-255] * bits: the bits representing the symbol in string form * code: the bits representing the symbol in numeric form * len: the number of bits used to represent the symbol > #### Note > This file may also be `#include`d in the following way to generate a static > list of codes: > ```c > /* Provides the HUFFMAN_CODE macro */ > #include > > static struct huffman_test_code_point code_points[] = { > #include "test_huffman_static_table.def" > }; > ``` This will emit a c file which exports a function with the following signiture: ```c struct aws_huffman_symbol_coder *{coder_name}_get_coder(); ``` Note that this function does not allocate, but maintains a static instance of the coder. An example implementation of this file is provided in `tests/test_huffman_static_table.def`. To use the coder, forward declare that function, and pass the result as the second argument to `aws_huffman_encoder_init` and `aws_huffman_decoder_init`. ```c struct aws_huffman_encoder encoder; aws_huffman_encoder_init(&encoder, {coder_name}_get_coder()); struct aws_huffman_decoder decoder; aws_huffman_decoder_init(&decoder, {coder_name}_get_coder()) ``` #### Encoding ```c /** * Encode a symbol buffer into the output buffer. * * \param[in] encoder The encoder object to use * \param[in] to_encode The symbol buffer to encode * \param[in,out] length In: The length of to_decode. Out: The number of bytes read from to_encode * \param[in] output The buffer to write encoded bytes to * \param[in,out] output_size In: The size of output. Out: The number of bytes written to output * * \return AWS_OP_SUCCESS if encoding is successful, AWS_OP_ERR the code for the error that occured */ int aws_huffman_encode(struct aws_huffman_encoder *encoder, const char *to_encode, size_t *length, uint8_t *output, size_t *output_size); ``` The encoder is built to support partial encoding. This means that if there isn't enough space in `output`, the encoder will encode as much as possible, update `length` to indicate how much was consumed, `output_size` won't change, and `AWS_ERROR_SHORT_BUFFER` will be raised. `aws_huffman_encode` may then be called again like the following pseudo-code: ```c void encode_and_send(const char *to_encode, size_t size) { while (size > 0) { uint8_t output[some_chunk_size]; size_t output_size = sizeof(output); size_t bytes_read = size; aws_huffman_encode(encoder, to_encode, &bytes_read, output, &output_size); /* AWS_ERROR_SHORT_BUFFER was raised... */ send_output_to_someone_else(output, output_size); to_encode += bytes_read; size -= bytes_read; } /* Be sure to reset the encoder after use */ aws_huffman_encoder_reset(encoder); } ``` `aws_huffman_encoder` also has a `uint8_t` field called `eos_padding` that defines how any unwritten bits in the last byte of output are filled. The most significant bits will used. For example, if the last byte contains only 3 bits and `eos_padding` is `0b01010101`, `01010` will be appended to the byte. #### Decoding ```c /** * Decodes a byte buffer into the provided symbol array. * * \param[in] decoder The decoder object to use * \param[in] to_decode The encoded byte buffer to read from * \param[in,out] length In: The length of to_decode. Out: The number of bytes read from to_decode * \param[in] output The buffer to write decoded symbols to * \param[in,out] output_size In: The size of output. Out: The number of bytes written to output * * \return AWS_OP_SUCCESS if encoding is successful, AWS_OP_ERR the code for the error that occured */ int aws_huffman_decode(struct aws_huffman_decoder *decoder, const uint8_t *to_decode, size_t *length, char *output, size_t *output_size); ``` The decoder is built to support partial encoding. This means that if there isn't enough space in `output`, the decoder will decode as much as possible, update `length` to indicate how much was consumed, `output_size` won't change, and `AWS_ERROR_SHORT_BUFFER` will be raised. `aws_huffman_decode` may then be called again like the following pseudo-code: ```c void decode_and_send(const char *to_decode, size_t size) { while (size > 0) { uint8_t output[some_chunk_size]; size_t output_size = sizeof(output); size_t bytes_read = size; aws_huffman_decode(decoder, to_decode, &bytes_read, output, &output_size); /* AWS_ERROR_SHORT_BUFFER was raised... */ send_output_to_someone_else(output, output_size); to_decode += bytes_read; size -= bytes_read; } /* Be sure to reset the decoder after use */ aws_huffman_decoder_reset(decoder); } ``` Upon completion of a decode, the most significant bits of `decoder->working_bits` will contain the final bits of `to_decode` that could not match a symbol. This is useful for verifying the padding bits of a stream. For example, to validate that a stream ends in all 1's (like HPACK requires), you could do the following: ```c AWS_ASSERT(decoder->working_bits == UINT64_MAX << (64 - decoder->num_bits)); ```